Reorganize Propane Grammar File user guide section

2026-02-21 22:06:09 -05:00 · 2026-02-21 22:06:09 -05:00 · f584389a29
commit f584389a29
parent de3fb0d120
1 changed files with 271 additions and 271 deletions
--- a/doc/user_guide.md
+++ b/doc/user_guide.md
@ -221,7 +221,7 @@ Parser rule code blocks are not available in tree generation mode.
 In tree generation mode, a full parse tree is automatically constructed in
 memory for user code to traverse after parsing is complete.

-### Context code blocks: the `context_user_fields` statement
+##> `context_user_fields` statement - adding custom fields to the context

 Propane uses a context structure for lexer and parser operations.
 Custom fields may be added to the context structure by using the grammar
@ -256,7 +256,273 @@ If a pointer to any allocated memory is stored in a user-defined context field,
 it is up to the user to free any memory when the program is finished using the
 context structure.

-### Custom token fields code blocks: the `token_user_fields` statement
+##> `drop` statement - ignoring input patterns
+
+A `drop` statement can be used to specify a lexer pattern that when matched
+should result in the matched input being dropped and lexing continuing after
+the matched input.
+
+A common use for a `drop` statement would be to ignore whitespace sequences in
+the user input.
+
+Example:
+
+```
+drop /\s+/;
+```
+
+See also ${#Regular expression syntax}.
+
+## `free_token_node` statement - freeing user-allocated memory in token node fields
+
+If user lexer code block allocates memory to store in a token node's `pvalue`
+or any custom token user fields store pointers to allocated memory, the
+`free_token_node` grammar statement can be used to provide a code block which
+can be used to free memory properly.
+
+Example freeing `pvalue` (C):
+
+```
+tree;
+free_token_node <<
+    free(${token.pvalue});
+>>
+ptype int *;
+token a <<
+  $$ = (int *)malloc(sizeof(int));
+  *$$ = 1;
+>>
+token b <<
+  $$ = (int *)malloc(sizeof(int));
+  *$$ = 2;
+>>
+Start -> a:a b:b;
+```
+
+Example freeing custom token user fields (C):
+
+```
+token_user_fields <<
+    char * comments;
+>>
+on_token_node <<
+    ${token.comments} = (char *)malloc(some_len);
+>>
+free_token_node <<
+    free(${token.comments});
+>>
+```
+
+The `free_token_node` statement user code block is not emitted for D language
+since D has a garbage collector.
+
+##> `module` statement - specifying the generated parser module name
+
+The `module` statement can be used to specify the module name for a generated
+D module.
+
+```
+module proj.parser;
+```
+
+If a module statement is not present, then the generated D module will not
+contain a module statement and the default module name will be used.
+
+##> `on_tree_node` statement -  custom initialization of a token tree node
+
+The `on_token_node` statement can be used to provide code that initializes
+any token user fields when a token tree node instance is created.
+
+For example (C++):
+
+```
+context_user_fields <<
+    std::string comments;
+>>
+token_user_fields <<
+    std::string comments;
+>>
+on_token_node <<
+    ${token.comments} = ${context.comments};
+    ${context.comments} = "";
+>>
+drop /#(.*)\n/ <<
+    /* Accumulate comments before the next parser tree node. */
+    ${context.comments} += std::string((const char *)match, match_length);
+>>
+```
+
+##> `prefix` statement - specifying the generated API prefix
+
+By default the public API (types, constants, and functions) of the generated
+lexer and parser uses a prefix of `p_`.
+
+This prefix can be changed with the `prefix` statement.
+
+Example:
+
+```
+prefix myparser_;
+```
+
+With a parser generated with this `prefix` statement, instead of calling
+`p_context_new()` you would call `myparser_context_new()`.
+
+The `prefix` statement can be optionally used if you would like to change the
+prefix used by your generated lexer and parser to something other than the
+default.
+
+It can also be used when generating multiple lexers/parsers to be used in the
+same program to avoid symbol collisions.
+
+##> `ptype` statement - specifying parser value types
+
+The `ptype` statement is used to define parser value type(s).
+Example:
+
+```
+ptype void *;
+```
+
+This defines the default parser value type to be `void *` (this is, in fact,
+the default parser value type if the grammar file does not specify otherwise).
+
+Each defined lexer token type and parser rule has an associated parser value
+type.
+When the lexer runs, each lexed token has a parser value associated with it.
+When the parser runs, each instance of a reduced rule has a parser value
+associated with it.
+Propane supports using different parser value types for different rules and
+token types.
+The example `ptype` statement above defines the default parser value type.
+A parser value type name can optionally be specified following the `ptype`
+keyword.
+For example:
+
+```
+ptype Value;
+ptype array = Value[];
+ptype dict = Value[string];
+
+Object -> lbrace rbrace << $$ = new Value(); >>
+
+Values (array) -> Value << $$ = [$1]; >>
+Values -> Values comma Value << $$ = $1 ~ [$3]; >>
+
+KeyValue (dict) -> string colon Value << $$ = [$1: $3]; >>
+```
+
+In this example, the default parser value type is `Value`.
+A parser value type named `array` is defined to mean `Value[]`.
+A parser value type named `dict` is defined to mean `Value[string]`.
+Any defined tokens or rules that do not specify a parser value type will have
+the default parser value type associated with them.
+To associate a different parser value type with a token or rule, write the
+parser value type name in parentheses following the name of the token or rule.
+In this example:
+
+  * a reduced `Object`'s parser value has a type of `Value`.
+  * a reduced `Values`'s parser value has a type of `Value[]`.
+  * a reduced `KeyValue`'s parser value has a type of `Value[string]`.
+
+When tree generation mode is active, the `ptype` functionality works differently.
+In this mode, only one `ptype` is used by the parser.
+Lexer user code blocks may assign a parse value to the generated `Token` node
+by assigning to `$$` within a lexer code block.
+The type of the parse value `$$` is given by the global `ptype` type.
+
+##> `start` statement - specifying the parser start rule name
+
+The start rule can be changed from the default of `Start` by using the `start`
+statement.
+Example:
+
+```
+start MyStartRule;
+```
+
+Multiple start rules can be specified, either with multiple `start` statements
+or one `start` statement listing multiple start rules.
+Example:
+
+```
+start Module ModuleItem Statement Expression;
+```
+
+When multiple start rules are specified, multiple `p_parse_*()` functions,
+`p_result_*()`, and `p_tree_delete_*()` functions (in tree mode) are generated.
+A default `p_parse()`, `p_result()`, `p_tree_delete()` are generated corresponding
+to the first start rule.
+Additionally, each start rule causes the generation of another version of each
+of these functions, for example `p_parse_Statement()`, `p_result_Statement()`,
+and `p_tree_delete_Statement()`.
+
+##> `token` statement - specifying tokens
+
+The `token` statement allows defining a lexer token and a pattern to match that
+token.
+The name of the token must be specified immediately following the `token`
+keyword.
+A regular expression pattern may optionally follow the token name.
+If a regular expression pattern is not specified, the name of the token is
+taken to be the pattern.
+See also: ${#Regular expression syntax}.
+
+Example:
+
+```
+token for;
+```
+
+In this example, the token name is `for` and the pattern to match it is
+`/for/`.
+
+Example:
+
+```
+token lbrace /\{/;
+```
+
+In this example, the token name is `lbrace` and a single left curly brace will
+match it.
+
+The `token` statement can also include a user code block.
+The user code block will be executed whenever the token is matched by the
+lexer.
+
+Example:
+
+```
+token if << writeln("'if' keyword lexed"); >>
+```
+
+The `token` statement is actually a shortcut statement for a combination of a
+`tokenid` statement and a pattern statement.
+To define a lexer token without an associated pattern to match it, use a
+`tokenid` statement.
+To define a lexer pattern that may or may not result in a matched token, use
+a pattern statement.
+
+##> `tokenid` statement - defining tokens without a matching pattern
+
+The `tokenid` statement can be used to define a token without associating it
+with a lexer pattern that matches it.
+
+Example:
+
+```
+tokenid string;
+```
+
+The `tokenid` statement can be useful when defining a token that may optionally
+be returned by user code associated with a pattern.
+
+It is also useful when lexer modes and multiple lexer patterns are required to
+build up a full token.
+A common example is parsing a string.
+See the ${#Lexer modes} chapter for more information.
+
+##> `token_user_fields` statement - adding custom token fields

 When tree generation mode is active, Propane generates a tree node structure
 and a token node structure for each matching rule and token instance in the
@ -302,31 +568,7 @@ will be executed immediately before the token node is freed.
 For C++, the `delete` statement is used to free the token tree node, so the
 destructor for any custom token user fields will be called.

-### Custom initialization of a token tree node - the `on_tree_node` statement
-
-The `on_token_node` statement can be used to provide code that initializes
-any token user fields when a token tree node instance is created.
-
-For example (C++):
-
-```
-context_user_fields <<
-    std::string comments;
->>
-token_user_fields <<
-    std::string comments;
->>
-on_token_node <<
-    ${token.comments} = ${context.comments};
-    ${context.comments} = "";
->>
-drop /#(.*)\n/ <<
-    /* Accumulate comments before the next parser tree node. */
-    ${context.comments} += std::string((const char *)match, match_length);
->>
-```
-
-##> Tree generation mode - the `tree` statement
+##> `tree` statement - tree generation mode

 To activate tree generation mode, place the `tree` statement in your grammar file:

@ -457,115 +699,7 @@ assert(itemsmore.pItem.pItem.pItem !is null);
 assert(itemsmore.pItem.pItem.pItem.pToken1 !is null);
 ```

-## Freeing user-allocated memory in token node `pvalue`: the `free_token_node` statement
-
-If user lexer code block allocates memory to store in a token node's `pvalue`
-or any custom token user fields store pointers to allocated memory, the
-`free_token_node` grammar statement can be used to provide a code block which
-can be used to free memory properly.
-
-Example freeing `pvalue` (C):
-
-```
-tree;
-free_token_node <<
-    free(${token.pvalue});
->>
-ptype int *;
-token a <<
-  $$ = (int *)malloc(sizeof(int));
-  *$$ = 1;
->>
-token b <<
-  $$ = (int *)malloc(sizeof(int));
-  *$$ = 2;
->>
-Start -> a:a b:b;
-```
-
-Example freeing custom token user fields (C):
-
-```
-token_user_fields <<
-    char * comments;
->>
-on_token_node <<
-    ${token.comments} = (char *)malloc(some_len);
->>
-free_token_node <<
-    free(${token.comments});
->>
-```
-
-The `free_token_node` statement user code block is not emitted for D language
-since D has a garbage collector.
-
-##> Specifying tokens - the `token` statement
-
-The `token` statement allows defining a lexer token and a pattern to match that
-token.
-The name of the token must be specified immediately following the `token`
-keyword.
-A regular expression pattern may optionally follow the token name.
-If a regular expression pattern is not specified, the name of the token is
-taken to be the pattern.
-See also: ${#Regular expression syntax}.
-
-Example:
-
-```
-token for;
-```
-
-In this example, the token name is `for` and the pattern to match it is
-`/for/`.
-
-Example:
-
-```
-token lbrace /\{/;
-```
-
-In this example, the token name is `lbrace` and a single left curly brace will
-match it.
-
-The `token` statement can also include a user code block.
-The user code block will be executed whenever the token is matched by the
-lexer.
-
-Example:
-
-```
-token if << writeln("'if' keyword lexed"); >>
-```
-
-The `token` statement is actually a shortcut statement for a combination of a
-`tokenid` statement and a pattern statement.
-To define a lexer token without an associated pattern to match it, use a
-`tokenid` statement.
-To define a lexer pattern that may or may not result in a matched token, use
-a pattern statement.
-
-##> Defining tokens without a matching pattern - the `tokenid` statement
-
-The `tokenid` statement can be used to define a token without associating it
-with a lexer pattern that matches it.
-
-Example:
-
-```
-tokenid string;
-```
-
-The `tokenid` statement can be useful when defining a token that may optionally
-be returned by user code associated with a pattern.
-
-It is also useful when lexer modes and multiple lexer patterns are required to
-build up a full token.
-A common example is parsing a string.
-See the ${#Lexer modes} chapter for more information.
-
-##> Specifying a lexer pattern - the pattern statement
+##> Specifying a lexer pattern

 A pattern statement is used to define a lexer pattern that can execute user
 code but may not result in a matched token.
@ -580,23 +714,6 @@ This can be especially useful with ${#Lexer modes}.

 See also ${#Regular expression syntax}.

-##> Ignoring input sections - the `drop` statement
-
-A `drop` statement can be used to specify a lexer pattern that when matched
-should result in the matched input being dropped and lexing continuing after
-the matched input.
-
-A common use for a `drop` statement would be to ignore whitespace sequences in
-the user input.
-
-Example:
-
-```
-drop /\s+/;
-```
-
-See also ${#Regular expression syntax}.
-
 ##> Regular expression syntax

 A regular expression ("regex") is used to define lexer patterns in `token`,
@ -735,63 +852,7 @@ token dot /\./ <<
 default, identonly: drop /\s+/;
 ```

-##> Specifying parser value types - the `ptype` statement
-
-The `ptype` statement is used to define parser value type(s).
-Example:
-
-```
-ptype void *;
-```
-
-This defines the default parser value type to be `void *` (this is, in fact,
-the default parser value type if the grammar file does not specify otherwise).
-
-Each defined lexer token type and parser rule has an associated parser value
-type.
-When the lexer runs, each lexed token has a parser value associated with it.
-When the parser runs, each instance of a reduced rule has a parser value
-associated with it.
-Propane supports using different parser value types for different rules and
-token types.
-The example `ptype` statement above defines the default parser value type.
-A parser value type name can optionally be specified following the `ptype`
-keyword.
-For example:
-
-```
-ptype Value;
-ptype array = Value[];
-ptype dict = Value[string];
-
-Object -> lbrace rbrace << $$ = new Value(); >>
-
-Values (array) -> Value << $$ = [$1]; >>
-Values -> Values comma Value << $$ = $1 ~ [$3]; >>
-
-KeyValue (dict) -> string colon Value << $$ = [$1: $3]; >>
-```
-
-In this example, the default parser value type is `Value`.
-A parser value type named `array` is defined to mean `Value[]`.
-A parser value type named `dict` is defined to mean `Value[string]`.
-Any defined tokens or rules that do not specify a parser value type will have
-the default parser value type associated with them.
-To associate a different parser value type with a token or rule, write the
-parser value type name in parentheses following the name of the token or rule.
-In this example:
-
-  * a reduced `Object`'s parser value has a type of `Value`.
-  * a reduced `Values`'s parser value has a type of `Value[]`.
-  * a reduced `KeyValue`'s parser value has a type of `Value[string]`.
-
-When tree generation mode is active, the `ptype` functionality works differently.
-In this mode, only one `ptype` is used by the parser.
-Lexer user code blocks may assign a parse value to the generated `Token` node
-by assigning to `$$` within a lexer code block.
-The type of the parse value `$$` is given by the global `ptype` type.
-
-##> Specifying a parser rule - the rule statement
+##> Specifying parser rules

 Rule statements create parser rules which define the grammar that will be
 parsed by the generated parser.
@ -872,67 +933,6 @@ can be used to produce the parser value for the accepted rule.
 Parser rule code blocks are not allowed and not used when tree generation mode
 is active.

-##> Specifying the parser start rule name - the `start` statement
-
-The start rule can be changed from the default of `Start` by using the `start`
-statement.
-Example:
-
-```
-start MyStartRule;
-```
-
-Multiple start rules can be specified, either with multiple `start` statements
-or one `start` statement listing multiple start rules.
-Example:
-
-```
-start Module ModuleItem Statement Expression;
-```
-
-When multiple start rules are specified, multiple `p_parse_*()` functions,
-`p_result_*()`, and `p_tree_delete_*()` functions (in tree mode) are generated.
-A default `p_parse()`, `p_result()`, `p_tree_delete()` are generated corresponding
-to the first start rule.
-Additionally, each start rule causes the generation of another version of each
-of these functions, for example `p_parse_Statement()`, `p_result_Statement()`,
-and `p_tree_delete_Statement()`.
-
-##> Specifying the parser module name - the `module` statement
-
-The `module` statement can be used to specify the module name for a generated
-D module.
-
-```
-module proj.parser;
-```
-
-If a module statement is not present, then the generated D module will not
-contain a module statement and the default module name will be used.
-
-##> Specifying the generated API prefix - the `prefix` statement
-
-By default the public API (types, constants, and functions) of the generated
-lexer and parser uses a prefix of `p_`.
-
-This prefix can be changed with the `prefix` statement.
-
-Example:
-
-```
-prefix myparser_;
-```
-
-With a parser generated with this `prefix` statement, instead of calling
-`p_context_new()` you would call `myparser_context_new()`.
-
-The `prefix` statement can be optionally used if you would like to change the
-prefix used by your generated lexer and parser to something other than the
-default.
-
-It can also be used when generating multiple lexers/parsers to be used in the
-same program to avoid symbol collisions.
-
 ##> User termination of the lexer or parser

 Propane supports allowing lexer or parser user code blocks to terminate