Reorganize Propane Grammar File user guide section

2026-02-21 22:06:09 -05:00 · 2026-02-21 22:06:09 -05:00 · f584389a29
commit f584389a29
parent de3fb0d120
1 changed files with 271 additions and 271 deletions
--- a/doc/user_guide.md
+++ b/doc/user_guide.md
@ -221,7 +221,7 @@ Parser rule code blocks are not available in tree generation mode.
 In tree generation mode, a full parse tree is automatically constructed in
 memory for user code to traverse after parsing is complete.
-### Context code blocks: the `context_user_fields` statement
+##> `context_user_fields` statement - adding custom fields to the context
 Propane uses a context structure for lexer and parser operations.
 Custom fields may be added to the context structure by using the grammar
@ -256,7 +256,273 @@ If a pointer to any allocated memory is stored in a user-defined context field,
 it is up to the user to free any memory when the program is finished using the
 context structure.
-### Custom token fields code blocks: the `token_user_fields` statement
+##> `drop` statement - ignoring input patterns
 A `drop` statement can be used to specify a lexer pattern that when matched
 should result in the matched input being dropped and lexing continuing after
 the matched input.
 A common use for a `drop` statement would be to ignore whitespace sequences in
 the user input.
 Example:
 ```
 drop /\s+/;
 ```
 See also ${#Regular expression syntax}.
 ## `free_token_node` statement - freeing user-allocated memory in token node fields
 If user lexer code block allocates memory to store in a token node's `pvalue`
 or any custom token user fields store pointers to allocated memory, the
 `free_token_node` grammar statement can be used to provide a code block which
 can be used to free memory properly.
 Example freeing `pvalue` (C):
 ```
 tree;
 free_token_node <<
    free(${token.pvalue});
 >>
 ptype int *;
 token a <<
  $$ = (int *)malloc(sizeof(int));
  *$$ = 1;
 >>
 token b <<
  $$ = (int *)malloc(sizeof(int));
  *$$ = 2;
 >>
 Start -> a:a b:b;
 ```
 Example freeing custom token user fields (C):
 ```
 token_user_fields <<
    char * comments;
 >>
 on_token_node <<
    ${token.comments} = (char *)malloc(some_len);
 >>
 free_token_node <<
    free(${token.comments});
 >>
 ```
 The `free_token_node` statement user code block is not emitted for D language
 since D has a garbage collector.
 ##> `module` statement - specifying the generated parser module name
 The `module` statement can be used to specify the module name for a generated
 D module.
 ```
 module proj.parser;
 ```
 If a module statement is not present, then the generated D module will not
 contain a module statement and the default module name will be used.
 ##> `on_tree_node` statement -  custom initialization of a token tree node
 The `on_token_node` statement can be used to provide code that initializes
 any token user fields when a token tree node instance is created.
 For example (C++):
 ```
 context_user_fields <<
    std::string comments;
 >>
 token_user_fields <<
    std::string comments;
 >>
 on_token_node <<
    ${token.comments} = ${context.comments};
    ${context.comments} = "";
 >>
 drop /#(.*)\n/ <<
    /* Accumulate comments before the next parser tree node. */
    ${context.comments} += std::string((const char *)match, match_length);
 >>
 ```
 ##> `prefix` statement - specifying the generated API prefix
 By default the public API (types, constants, and functions) of the generated
 lexer and parser uses a prefix of `p_`.
 This prefix can be changed with the `prefix` statement.
 Example:
 ```
 prefix myparser_;
 ```
 With a parser generated with this `prefix` statement, instead of calling
 `p_context_new()` you would call `myparser_context_new()`.
 The `prefix` statement can be optionally used if you would like to change the
 prefix used by your generated lexer and parser to something other than the
 default.
 It can also be used when generating multiple lexers/parsers to be used in the
 same program to avoid symbol collisions.
 ##> `ptype` statement - specifying parser value types
 The `ptype` statement is used to define parser value type(s).
 Example:
 ```
 ptype void *;
 ```
 This defines the default parser value type to be `void *` (this is, in fact,
 the default parser value type if the grammar file does not specify otherwise).
 Each defined lexer token type and parser rule has an associated parser value
 type.
 When the lexer runs, each lexed token has a parser value associated with it.
 When the parser runs, each instance of a reduced rule has a parser value
 associated with it.
 Propane supports using different parser value types for different rules and
 token types.
 The example `ptype` statement above defines the default parser value type.
 A parser value type name can optionally be specified following the `ptype`
 keyword.
 For example:
 ```
 ptype Value;
 ptype array = Value[];
 ptype dict = Value[string];
 Object -> lbrace rbrace << $$ = new Value(); >>
 Values (array) -> Value << $$ = [$1]; >>
 Values -> Values comma Value << $$ = $1 ~ [$3]; >>
 KeyValue (dict) -> string colon Value << $$ = [$1: $3]; >>
 ```
 In this example, the default parser value type is `Value`.
 A parser value type named `array` is defined to mean `Value[]`.
 A parser value type named `dict` is defined to mean `Value[string]`.
 Any defined tokens or rules that do not specify a parser value type will have
 the default parser value type associated with them.
 To associate a different parser value type with a token or rule, write the
 parser value type name in parentheses following the name of the token or rule.
 In this example:
  * a reduced `Object`'s parser value has a type of `Value`.
  * a reduced `Values`'s parser value has a type of `Value[]`.
  * a reduced `KeyValue`'s parser value has a type of `Value[string]`.
 When tree generation mode is active, the `ptype` functionality works differently.
 In this mode, only one `ptype` is used by the parser.
 Lexer user code blocks may assign a parse value to the generated `Token` node
 by assigning to `$$` within a lexer code block.
 The type of the parse value `$$` is given by the global `ptype` type.
 ##> `start` statement - specifying the parser start rule name
 The start rule can be changed from the default of `Start` by using the `start`
 statement.
 Example:
 ```
 start MyStartRule;
 ```
 Multiple start rules can be specified, either with multiple `start` statements
 or one `start` statement listing multiple start rules.
 Example:
 ```
 start Module ModuleItem Statement Expression;
 ```
 When multiple start rules are specified, multiple `p_parse_*()` functions,
 `p_result_*()`, and `p_tree_delete_*()` functions (in tree mode) are generated.
 A default `p_parse()`, `p_result()`, `p_tree_delete()` are generated corresponding
 to the first start rule.
 Additionally, each start rule causes the generation of another version of each
 of these functions, for example `p_parse_Statement()`, `p_result_Statement()`,
 and `p_tree_delete_Statement()`.
 ##> `token` statement - specifying tokens
 The `token` statement allows defining a lexer token and a pattern to match that
 token.
 The name of the token must be specified immediately following the `token`
 keyword.
 A regular expression pattern may optionally follow the token name.
 If a regular expression pattern is not specified, the name of the token is
 taken to be the pattern.
 See also: ${#Regular expression syntax}.
 Example:
 ```
 token for;
 ```
 In this example, the token name is `for` and the pattern to match it is
 `/for/`.
 Example:
 ```
 token lbrace /\{/;
 ```
 In this example, the token name is `lbrace` and a single left curly brace will
 match it.
 The `token` statement can also include a user code block.
 The user code block will be executed whenever the token is matched by the
 lexer.
 Example:
 ```
 token if << writeln("'if' keyword lexed"); >>
 ```
 The `token` statement is actually a shortcut statement for a combination of a
 `tokenid` statement and a pattern statement.
 To define a lexer token without an associated pattern to match it, use a
 `tokenid` statement.
 To define a lexer pattern that may or may not result in a matched token, use
 a pattern statement.
 ##> `tokenid` statement - defining tokens without a matching pattern
 The `tokenid` statement can be used to define a token without associating it
 with a lexer pattern that matches it.
 Example:
 ```
 tokenid string;
 ```
 The `tokenid` statement can be useful when defining a token that may optionally
 be returned by user code associated with a pattern.
 It is also useful when lexer modes and multiple lexer patterns are required to
 build up a full token.
 A common example is parsing a string.
 See the ${#Lexer modes} chapter for more information.
 ##> `token_user_fields` statement - adding custom token fields
 When tree generation mode is active, Propane generates a tree node structure
 and a token node structure for each matching rule and token instance in the
@ -302,31 +568,7 @@ will be executed immediately before the token node is freed.
 For C++, the `delete` statement is used to free the token tree node, so the
 destructor for any custom token user fields will be called.
-### Custom initialization of a token tree node - the `on_tree_node` statement
+##> `tree` statement - tree generation mode
 The `on_token_node` statement can be used to provide code that initializes
 any token user fields when a token tree node instance is created.
 For example (C++):
 ```
 context_user_fields <<
    std::string comments;
 >>
 token_user_fields <<
    std::string comments;
 >>
 on_token_node <<
    ${token.comments} = ${context.comments};
    ${context.comments} = "";
 >>
 drop /#(.*)\n/ <<
    /* Accumulate comments before the next parser tree node. */
    ${context.comments} += std::string((const char *)match, match_length);
 >>
 ```
 ##> Tree generation mode - the `tree` statement
 To activate tree generation mode, place the `tree` statement in your grammar file:
@ -457,115 +699,7 @@ assert(itemsmore.pItem.pItem.pItem !is null);
 assert(itemsmore.pItem.pItem.pItem.pToken1 !is null);
 ```
-## Freeing user-allocated memory in token node `pvalue`: the `free_token_node` statement
+##> Specifying a lexer pattern
 If user lexer code block allocates memory to store in a token node's `pvalue`
 or any custom token user fields store pointers to allocated memory, the
 `free_token_node` grammar statement can be used to provide a code block which
 can be used to free memory properly.
 Example freeing `pvalue` (C):
 ```
 tree;
 free_token_node <<
    free(${token.pvalue});
 >>
 ptype int *;
 token a <<
  $$ = (int *)malloc(sizeof(int));
  *$$ = 1;
 >>
 token b <<
  $$ = (int *)malloc(sizeof(int));
  *$$ = 2;
 >>
 Start -> a:a b:b;
 ```
 Example freeing custom token user fields (C):
 ```
 token_user_fields <<
    char * comments;
 >>
 on_token_node <<
    ${token.comments} = (char *)malloc(some_len);
 >>
 free_token_node <<
    free(${token.comments});
 >>
 ```
 The `free_token_node` statement user code block is not emitted for D language
 since D has a garbage collector.
 ##> Specifying tokens - the `token` statement
 The `token` statement allows defining a lexer token and a pattern to match that
 token.
 The name of the token must be specified immediately following the `token`
 keyword.
 A regular expression pattern may optionally follow the token name.
 If a regular expression pattern is not specified, the name of the token is
 taken to be the pattern.
 See also: ${#Regular expression syntax}.
 Example:
 ```
 token for;
 ```
 In this example, the token name is `for` and the pattern to match it is
 `/for/`.
 Example:
 ```
 token lbrace /\{/;
 ```
 In this example, the token name is `lbrace` and a single left curly brace will
 match it.
 The `token` statement can also include a user code block.
 The user code block will be executed whenever the token is matched by the
 lexer.
 Example:
 ```
 token if << writeln("'if' keyword lexed"); >>
 ```
 The `token` statement is actually a shortcut statement for a combination of a
 `tokenid` statement and a pattern statement.
 To define a lexer token without an associated pattern to match it, use a
 `tokenid` statement.
 To define a lexer pattern that may or may not result in a matched token, use
 a pattern statement.
 ##> Defining tokens without a matching pattern - the `tokenid` statement
 The `tokenid` statement can be used to define a token without associating it
 with a lexer pattern that matches it.
 Example:
 ```
 tokenid string;
 ```
 The `tokenid` statement can be useful when defining a token that may optionally
 be returned by user code associated with a pattern.
 It is also useful when lexer modes and multiple lexer patterns are required to
 build up a full token.
 A common example is parsing a string.
 See the ${#Lexer modes} chapter for more information.
 ##> Specifying a lexer pattern - the pattern statement
 A pattern statement is used to define a lexer pattern that can execute user
 code but may not result in a matched token.
@ -580,23 +714,6 @@ This can be especially useful with ${#Lexer modes}.
 See also ${#Regular expression syntax}.
 ##> Ignoring input sections - the `drop` statement
 A `drop` statement can be used to specify a lexer pattern that when matched
 should result in the matched input being dropped and lexing continuing after
 the matched input.
 A common use for a `drop` statement would be to ignore whitespace sequences in
 the user input.
 Example:
 ```
 drop /\s+/;
 ```
 See also ${#Regular expression syntax}.
 ##> Regular expression syntax
 A regular expression ("regex") is used to define lexer patterns in `token`,
@ -735,63 +852,7 @@ token dot /\./ <<
 default, identonly: drop /\s+/;
 ```
-##> Specifying parser value types - the `ptype` statement
+##> Specifying parser rules
 The `ptype` statement is used to define parser value type(s).
 Example:
 ```
 ptype void *;
 ```
 This defines the default parser value type to be `void *` (this is, in fact,
 the default parser value type if the grammar file does not specify otherwise).
 Each defined lexer token type and parser rule has an associated parser value
 type.
 When the lexer runs, each lexed token has a parser value associated with it.
 When the parser runs, each instance of a reduced rule has a parser value
 associated with it.
 Propane supports using different parser value types for different rules and
 token types.
 The example `ptype` statement above defines the default parser value type.
 A parser value type name can optionally be specified following the `ptype`
 keyword.
 For example:
 ```
 ptype Value;
 ptype array = Value[];
 ptype dict = Value[string];
 Object -> lbrace rbrace << $$ = new Value(); >>
 Values (array) -> Value << $$ = [$1]; >>
 Values -> Values comma Value << $$ = $1 ~ [$3]; >>
 KeyValue (dict) -> string colon Value << $$ = [$1: $3]; >>
 ```
 In this example, the default parser value type is `Value`.
 A parser value type named `array` is defined to mean `Value[]`.
 A parser value type named `dict` is defined to mean `Value[string]`.
 Any defined tokens or rules that do not specify a parser value type will have
 the default parser value type associated with them.
 To associate a different parser value type with a token or rule, write the
 parser value type name in parentheses following the name of the token or rule.
 In this example:
  * a reduced `Object`'s parser value has a type of `Value`.
  * a reduced `Values`'s parser value has a type of `Value[]`.
  * a reduced `KeyValue`'s parser value has a type of `Value[string]`.
 When tree generation mode is active, the `ptype` functionality works differently.
 In this mode, only one `ptype` is used by the parser.
 Lexer user code blocks may assign a parse value to the generated `Token` node
 by assigning to `$$` within a lexer code block.
 The type of the parse value `$$` is given by the global `ptype` type.
 ##> Specifying a parser rule - the rule statement
 Rule statements create parser rules which define the grammar that will be
 parsed by the generated parser.
@ -872,67 +933,6 @@ can be used to produce the parser value for the accepted rule.
 Parser rule code blocks are not allowed and not used when tree generation mode
 is active.
 ##> Specifying the parser start rule name - the `start` statement
 The start rule can be changed from the default of `Start` by using the `start`
 statement.
 Example:
 ```
 start MyStartRule;
 ```
 Multiple start rules can be specified, either with multiple `start` statements
 or one `start` statement listing multiple start rules.
 Example:
 ```
 start Module ModuleItem Statement Expression;
 ```
 When multiple start rules are specified, multiple `p_parse_*()` functions,
 `p_result_*()`, and `p_tree_delete_*()` functions (in tree mode) are generated.
 A default `p_parse()`, `p_result()`, `p_tree_delete()` are generated corresponding
 to the first start rule.
 Additionally, each start rule causes the generation of another version of each
 of these functions, for example `p_parse_Statement()`, `p_result_Statement()`,
 and `p_tree_delete_Statement()`.
 ##> Specifying the parser module name - the `module` statement
 The `module` statement can be used to specify the module name for a generated
 D module.
 ```
 module proj.parser;
 ```
 If a module statement is not present, then the generated D module will not
 contain a module statement and the default module name will be used.
 ##> Specifying the generated API prefix - the `prefix` statement
 By default the public API (types, constants, and functions) of the generated
 lexer and parser uses a prefix of `p_`.
 This prefix can be changed with the `prefix` statement.
 Example:
 ```
 prefix myparser_;
 ```
 With a parser generated with this `prefix` statement, instead of calling
 `p_context_new()` you would call `myparser_context_new()`.
 The `prefix` statement can be optionally used if you would like to change the
 prefix used by your generated lexer and parser to something other than the
 default.
 It can also be used when generating multiple lexers/parsers to be used in the
 same program to avoid symbol collisions.
 ##> User termination of the lexer or parser
 Propane supports allowing lexer or parser user code blocks to terminate