Reorganize Propane Grammar File user guide section
This commit is contained in:
parent
de3fb0d120
commit
f584389a29
@ -221,7 +221,7 @@ Parser rule code blocks are not available in tree generation mode.
|
||||
In tree generation mode, a full parse tree is automatically constructed in
|
||||
memory for user code to traverse after parsing is complete.
|
||||
|
||||
### Context code blocks: the `context_user_fields` statement
|
||||
##> `context_user_fields` statement - adding custom fields to the context
|
||||
|
||||
Propane uses a context structure for lexer and parser operations.
|
||||
Custom fields may be added to the context structure by using the grammar
|
||||
@ -256,7 +256,273 @@ If a pointer to any allocated memory is stored in a user-defined context field,
|
||||
it is up to the user to free any memory when the program is finished using the
|
||||
context structure.
|
||||
|
||||
### Custom token fields code blocks: the `token_user_fields` statement
|
||||
##> `drop` statement - ignoring input patterns
|
||||
|
||||
A `drop` statement can be used to specify a lexer pattern that when matched
|
||||
should result in the matched input being dropped and lexing continuing after
|
||||
the matched input.
|
||||
|
||||
A common use for a `drop` statement would be to ignore whitespace sequences in
|
||||
the user input.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
drop /\s+/;
|
||||
```
|
||||
|
||||
See also ${#Regular expression syntax}.
|
||||
|
||||
## `free_token_node` statement - freeing user-allocated memory in token node fields
|
||||
|
||||
If user lexer code block allocates memory to store in a token node's `pvalue`
|
||||
or any custom token user fields store pointers to allocated memory, the
|
||||
`free_token_node` grammar statement can be used to provide a code block which
|
||||
can be used to free memory properly.
|
||||
|
||||
Example freeing `pvalue` (C):
|
||||
|
||||
```
|
||||
tree;
|
||||
free_token_node <<
|
||||
free(${token.pvalue});
|
||||
>>
|
||||
ptype int *;
|
||||
token a <<
|
||||
$$ = (int *)malloc(sizeof(int));
|
||||
*$$ = 1;
|
||||
>>
|
||||
token b <<
|
||||
$$ = (int *)malloc(sizeof(int));
|
||||
*$$ = 2;
|
||||
>>
|
||||
Start -> a:a b:b;
|
||||
```
|
||||
|
||||
Example freeing custom token user fields (C):
|
||||
|
||||
```
|
||||
token_user_fields <<
|
||||
char * comments;
|
||||
>>
|
||||
on_token_node <<
|
||||
${token.comments} = (char *)malloc(some_len);
|
||||
>>
|
||||
free_token_node <<
|
||||
free(${token.comments});
|
||||
>>
|
||||
```
|
||||
|
||||
The `free_token_node` statement user code block is not emitted for D language
|
||||
since D has a garbage collector.
|
||||
|
||||
##> `module` statement - specifying the generated parser module name
|
||||
|
||||
The `module` statement can be used to specify the module name for a generated
|
||||
D module.
|
||||
|
||||
```
|
||||
module proj.parser;
|
||||
```
|
||||
|
||||
If a module statement is not present, then the generated D module will not
|
||||
contain a module statement and the default module name will be used.
|
||||
|
||||
##> `on_tree_node` statement - custom initialization of a token tree node
|
||||
|
||||
The `on_token_node` statement can be used to provide code that initializes
|
||||
any token user fields when a token tree node instance is created.
|
||||
|
||||
For example (C++):
|
||||
|
||||
```
|
||||
context_user_fields <<
|
||||
std::string comments;
|
||||
>>
|
||||
token_user_fields <<
|
||||
std::string comments;
|
||||
>>
|
||||
on_token_node <<
|
||||
${token.comments} = ${context.comments};
|
||||
${context.comments} = "";
|
||||
>>
|
||||
drop /#(.*)\n/ <<
|
||||
/* Accumulate comments before the next parser tree node. */
|
||||
${context.comments} += std::string((const char *)match, match_length);
|
||||
>>
|
||||
```
|
||||
|
||||
##> `prefix` statement - specifying the generated API prefix
|
||||
|
||||
By default the public API (types, constants, and functions) of the generated
|
||||
lexer and parser uses a prefix of `p_`.
|
||||
|
||||
This prefix can be changed with the `prefix` statement.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
prefix myparser_;
|
||||
```
|
||||
|
||||
With a parser generated with this `prefix` statement, instead of calling
|
||||
`p_context_new()` you would call `myparser_context_new()`.
|
||||
|
||||
The `prefix` statement can be optionally used if you would like to change the
|
||||
prefix used by your generated lexer and parser to something other than the
|
||||
default.
|
||||
|
||||
It can also be used when generating multiple lexers/parsers to be used in the
|
||||
same program to avoid symbol collisions.
|
||||
|
||||
##> `ptype` statement - specifying parser value types
|
||||
|
||||
The `ptype` statement is used to define parser value type(s).
|
||||
Example:
|
||||
|
||||
```
|
||||
ptype void *;
|
||||
```
|
||||
|
||||
This defines the default parser value type to be `void *` (this is, in fact,
|
||||
the default parser value type if the grammar file does not specify otherwise).
|
||||
|
||||
Each defined lexer token type and parser rule has an associated parser value
|
||||
type.
|
||||
When the lexer runs, each lexed token has a parser value associated with it.
|
||||
When the parser runs, each instance of a reduced rule has a parser value
|
||||
associated with it.
|
||||
Propane supports using different parser value types for different rules and
|
||||
token types.
|
||||
The example `ptype` statement above defines the default parser value type.
|
||||
A parser value type name can optionally be specified following the `ptype`
|
||||
keyword.
|
||||
For example:
|
||||
|
||||
```
|
||||
ptype Value;
|
||||
ptype array = Value[];
|
||||
ptype dict = Value[string];
|
||||
|
||||
Object -> lbrace rbrace << $$ = new Value(); >>
|
||||
|
||||
Values (array) -> Value << $$ = [$1]; >>
|
||||
Values -> Values comma Value << $$ = $1 ~ [$3]; >>
|
||||
|
||||
KeyValue (dict) -> string colon Value << $$ = [$1: $3]; >>
|
||||
```
|
||||
|
||||
In this example, the default parser value type is `Value`.
|
||||
A parser value type named `array` is defined to mean `Value[]`.
|
||||
A parser value type named `dict` is defined to mean `Value[string]`.
|
||||
Any defined tokens or rules that do not specify a parser value type will have
|
||||
the default parser value type associated with them.
|
||||
To associate a different parser value type with a token or rule, write the
|
||||
parser value type name in parentheses following the name of the token or rule.
|
||||
In this example:
|
||||
|
||||
* a reduced `Object`'s parser value has a type of `Value`.
|
||||
* a reduced `Values`'s parser value has a type of `Value[]`.
|
||||
* a reduced `KeyValue`'s parser value has a type of `Value[string]`.
|
||||
|
||||
When tree generation mode is active, the `ptype` functionality works differently.
|
||||
In this mode, only one `ptype` is used by the parser.
|
||||
Lexer user code blocks may assign a parse value to the generated `Token` node
|
||||
by assigning to `$$` within a lexer code block.
|
||||
The type of the parse value `$$` is given by the global `ptype` type.
|
||||
|
||||
##> `start` statement - specifying the parser start rule name
|
||||
|
||||
The start rule can be changed from the default of `Start` by using the `start`
|
||||
statement.
|
||||
Example:
|
||||
|
||||
```
|
||||
start MyStartRule;
|
||||
```
|
||||
|
||||
Multiple start rules can be specified, either with multiple `start` statements
|
||||
or one `start` statement listing multiple start rules.
|
||||
Example:
|
||||
|
||||
```
|
||||
start Module ModuleItem Statement Expression;
|
||||
```
|
||||
|
||||
When multiple start rules are specified, multiple `p_parse_*()` functions,
|
||||
`p_result_*()`, and `p_tree_delete_*()` functions (in tree mode) are generated.
|
||||
A default `p_parse()`, `p_result()`, `p_tree_delete()` are generated corresponding
|
||||
to the first start rule.
|
||||
Additionally, each start rule causes the generation of another version of each
|
||||
of these functions, for example `p_parse_Statement()`, `p_result_Statement()`,
|
||||
and `p_tree_delete_Statement()`.
|
||||
|
||||
##> `token` statement - specifying tokens
|
||||
|
||||
The `token` statement allows defining a lexer token and a pattern to match that
|
||||
token.
|
||||
The name of the token must be specified immediately following the `token`
|
||||
keyword.
|
||||
A regular expression pattern may optionally follow the token name.
|
||||
If a regular expression pattern is not specified, the name of the token is
|
||||
taken to be the pattern.
|
||||
See also: ${#Regular expression syntax}.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
token for;
|
||||
```
|
||||
|
||||
In this example, the token name is `for` and the pattern to match it is
|
||||
`/for/`.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
token lbrace /\{/;
|
||||
```
|
||||
|
||||
In this example, the token name is `lbrace` and a single left curly brace will
|
||||
match it.
|
||||
|
||||
The `token` statement can also include a user code block.
|
||||
The user code block will be executed whenever the token is matched by the
|
||||
lexer.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
token if << writeln("'if' keyword lexed"); >>
|
||||
```
|
||||
|
||||
The `token` statement is actually a shortcut statement for a combination of a
|
||||
`tokenid` statement and a pattern statement.
|
||||
To define a lexer token without an associated pattern to match it, use a
|
||||
`tokenid` statement.
|
||||
To define a lexer pattern that may or may not result in a matched token, use
|
||||
a pattern statement.
|
||||
|
||||
##> `tokenid` statement - defining tokens without a matching pattern
|
||||
|
||||
The `tokenid` statement can be used to define a token without associating it
|
||||
with a lexer pattern that matches it.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
tokenid string;
|
||||
```
|
||||
|
||||
The `tokenid` statement can be useful when defining a token that may optionally
|
||||
be returned by user code associated with a pattern.
|
||||
|
||||
It is also useful when lexer modes and multiple lexer patterns are required to
|
||||
build up a full token.
|
||||
A common example is parsing a string.
|
||||
See the ${#Lexer modes} chapter for more information.
|
||||
|
||||
##> `token_user_fields` statement - adding custom token fields
|
||||
|
||||
When tree generation mode is active, Propane generates a tree node structure
|
||||
and a token node structure for each matching rule and token instance in the
|
||||
@ -302,31 +568,7 @@ will be executed immediately before the token node is freed.
|
||||
For C++, the `delete` statement is used to free the token tree node, so the
|
||||
destructor for any custom token user fields will be called.
|
||||
|
||||
### Custom initialization of a token tree node - the `on_tree_node` statement
|
||||
|
||||
The `on_token_node` statement can be used to provide code that initializes
|
||||
any token user fields when a token tree node instance is created.
|
||||
|
||||
For example (C++):
|
||||
|
||||
```
|
||||
context_user_fields <<
|
||||
std::string comments;
|
||||
>>
|
||||
token_user_fields <<
|
||||
std::string comments;
|
||||
>>
|
||||
on_token_node <<
|
||||
${token.comments} = ${context.comments};
|
||||
${context.comments} = "";
|
||||
>>
|
||||
drop /#(.*)\n/ <<
|
||||
/* Accumulate comments before the next parser tree node. */
|
||||
${context.comments} += std::string((const char *)match, match_length);
|
||||
>>
|
||||
```
|
||||
|
||||
##> Tree generation mode - the `tree` statement
|
||||
##> `tree` statement - tree generation mode
|
||||
|
||||
To activate tree generation mode, place the `tree` statement in your grammar file:
|
||||
|
||||
@ -457,115 +699,7 @@ assert(itemsmore.pItem.pItem.pItem !is null);
|
||||
assert(itemsmore.pItem.pItem.pItem.pToken1 !is null);
|
||||
```
|
||||
|
||||
## Freeing user-allocated memory in token node `pvalue`: the `free_token_node` statement
|
||||
|
||||
If user lexer code block allocates memory to store in a token node's `pvalue`
|
||||
or any custom token user fields store pointers to allocated memory, the
|
||||
`free_token_node` grammar statement can be used to provide a code block which
|
||||
can be used to free memory properly.
|
||||
|
||||
Example freeing `pvalue` (C):
|
||||
|
||||
```
|
||||
tree;
|
||||
free_token_node <<
|
||||
free(${token.pvalue});
|
||||
>>
|
||||
ptype int *;
|
||||
token a <<
|
||||
$$ = (int *)malloc(sizeof(int));
|
||||
*$$ = 1;
|
||||
>>
|
||||
token b <<
|
||||
$$ = (int *)malloc(sizeof(int));
|
||||
*$$ = 2;
|
||||
>>
|
||||
Start -> a:a b:b;
|
||||
```
|
||||
|
||||
Example freeing custom token user fields (C):
|
||||
|
||||
```
|
||||
token_user_fields <<
|
||||
char * comments;
|
||||
>>
|
||||
on_token_node <<
|
||||
${token.comments} = (char *)malloc(some_len);
|
||||
>>
|
||||
free_token_node <<
|
||||
free(${token.comments});
|
||||
>>
|
||||
```
|
||||
|
||||
The `free_token_node` statement user code block is not emitted for D language
|
||||
since D has a garbage collector.
|
||||
|
||||
##> Specifying tokens - the `token` statement
|
||||
|
||||
The `token` statement allows defining a lexer token and a pattern to match that
|
||||
token.
|
||||
The name of the token must be specified immediately following the `token`
|
||||
keyword.
|
||||
A regular expression pattern may optionally follow the token name.
|
||||
If a regular expression pattern is not specified, the name of the token is
|
||||
taken to be the pattern.
|
||||
See also: ${#Regular expression syntax}.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
token for;
|
||||
```
|
||||
|
||||
In this example, the token name is `for` and the pattern to match it is
|
||||
`/for/`.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
token lbrace /\{/;
|
||||
```
|
||||
|
||||
In this example, the token name is `lbrace` and a single left curly brace will
|
||||
match it.
|
||||
|
||||
The `token` statement can also include a user code block.
|
||||
The user code block will be executed whenever the token is matched by the
|
||||
lexer.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
token if << writeln("'if' keyword lexed"); >>
|
||||
```
|
||||
|
||||
The `token` statement is actually a shortcut statement for a combination of a
|
||||
`tokenid` statement and a pattern statement.
|
||||
To define a lexer token without an associated pattern to match it, use a
|
||||
`tokenid` statement.
|
||||
To define a lexer pattern that may or may not result in a matched token, use
|
||||
a pattern statement.
|
||||
|
||||
##> Defining tokens without a matching pattern - the `tokenid` statement
|
||||
|
||||
The `tokenid` statement can be used to define a token without associating it
|
||||
with a lexer pattern that matches it.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
tokenid string;
|
||||
```
|
||||
|
||||
The `tokenid` statement can be useful when defining a token that may optionally
|
||||
be returned by user code associated with a pattern.
|
||||
|
||||
It is also useful when lexer modes and multiple lexer patterns are required to
|
||||
build up a full token.
|
||||
A common example is parsing a string.
|
||||
See the ${#Lexer modes} chapter for more information.
|
||||
|
||||
##> Specifying a lexer pattern - the pattern statement
|
||||
##> Specifying a lexer pattern
|
||||
|
||||
A pattern statement is used to define a lexer pattern that can execute user
|
||||
code but may not result in a matched token.
|
||||
@ -580,23 +714,6 @@ This can be especially useful with ${#Lexer modes}.
|
||||
|
||||
See also ${#Regular expression syntax}.
|
||||
|
||||
##> Ignoring input sections - the `drop` statement
|
||||
|
||||
A `drop` statement can be used to specify a lexer pattern that when matched
|
||||
should result in the matched input being dropped and lexing continuing after
|
||||
the matched input.
|
||||
|
||||
A common use for a `drop` statement would be to ignore whitespace sequences in
|
||||
the user input.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
drop /\s+/;
|
||||
```
|
||||
|
||||
See also ${#Regular expression syntax}.
|
||||
|
||||
##> Regular expression syntax
|
||||
|
||||
A regular expression ("regex") is used to define lexer patterns in `token`,
|
||||
@ -735,63 +852,7 @@ token dot /\./ <<
|
||||
default, identonly: drop /\s+/;
|
||||
```
|
||||
|
||||
##> Specifying parser value types - the `ptype` statement
|
||||
|
||||
The `ptype` statement is used to define parser value type(s).
|
||||
Example:
|
||||
|
||||
```
|
||||
ptype void *;
|
||||
```
|
||||
|
||||
This defines the default parser value type to be `void *` (this is, in fact,
|
||||
the default parser value type if the grammar file does not specify otherwise).
|
||||
|
||||
Each defined lexer token type and parser rule has an associated parser value
|
||||
type.
|
||||
When the lexer runs, each lexed token has a parser value associated with it.
|
||||
When the parser runs, each instance of a reduced rule has a parser value
|
||||
associated with it.
|
||||
Propane supports using different parser value types for different rules and
|
||||
token types.
|
||||
The example `ptype` statement above defines the default parser value type.
|
||||
A parser value type name can optionally be specified following the `ptype`
|
||||
keyword.
|
||||
For example:
|
||||
|
||||
```
|
||||
ptype Value;
|
||||
ptype array = Value[];
|
||||
ptype dict = Value[string];
|
||||
|
||||
Object -> lbrace rbrace << $$ = new Value(); >>
|
||||
|
||||
Values (array) -> Value << $$ = [$1]; >>
|
||||
Values -> Values comma Value << $$ = $1 ~ [$3]; >>
|
||||
|
||||
KeyValue (dict) -> string colon Value << $$ = [$1: $3]; >>
|
||||
```
|
||||
|
||||
In this example, the default parser value type is `Value`.
|
||||
A parser value type named `array` is defined to mean `Value[]`.
|
||||
A parser value type named `dict` is defined to mean `Value[string]`.
|
||||
Any defined tokens or rules that do not specify a parser value type will have
|
||||
the default parser value type associated with them.
|
||||
To associate a different parser value type with a token or rule, write the
|
||||
parser value type name in parentheses following the name of the token or rule.
|
||||
In this example:
|
||||
|
||||
* a reduced `Object`'s parser value has a type of `Value`.
|
||||
* a reduced `Values`'s parser value has a type of `Value[]`.
|
||||
* a reduced `KeyValue`'s parser value has a type of `Value[string]`.
|
||||
|
||||
When tree generation mode is active, the `ptype` functionality works differently.
|
||||
In this mode, only one `ptype` is used by the parser.
|
||||
Lexer user code blocks may assign a parse value to the generated `Token` node
|
||||
by assigning to `$$` within a lexer code block.
|
||||
The type of the parse value `$$` is given by the global `ptype` type.
|
||||
|
||||
##> Specifying a parser rule - the rule statement
|
||||
##> Specifying parser rules
|
||||
|
||||
Rule statements create parser rules which define the grammar that will be
|
||||
parsed by the generated parser.
|
||||
@ -872,67 +933,6 @@ can be used to produce the parser value for the accepted rule.
|
||||
Parser rule code blocks are not allowed and not used when tree generation mode
|
||||
is active.
|
||||
|
||||
##> Specifying the parser start rule name - the `start` statement
|
||||
|
||||
The start rule can be changed from the default of `Start` by using the `start`
|
||||
statement.
|
||||
Example:
|
||||
|
||||
```
|
||||
start MyStartRule;
|
||||
```
|
||||
|
||||
Multiple start rules can be specified, either with multiple `start` statements
|
||||
or one `start` statement listing multiple start rules.
|
||||
Example:
|
||||
|
||||
```
|
||||
start Module ModuleItem Statement Expression;
|
||||
```
|
||||
|
||||
When multiple start rules are specified, multiple `p_parse_*()` functions,
|
||||
`p_result_*()`, and `p_tree_delete_*()` functions (in tree mode) are generated.
|
||||
A default `p_parse()`, `p_result()`, `p_tree_delete()` are generated corresponding
|
||||
to the first start rule.
|
||||
Additionally, each start rule causes the generation of another version of each
|
||||
of these functions, for example `p_parse_Statement()`, `p_result_Statement()`,
|
||||
and `p_tree_delete_Statement()`.
|
||||
|
||||
##> Specifying the parser module name - the `module` statement
|
||||
|
||||
The `module` statement can be used to specify the module name for a generated
|
||||
D module.
|
||||
|
||||
```
|
||||
module proj.parser;
|
||||
```
|
||||
|
||||
If a module statement is not present, then the generated D module will not
|
||||
contain a module statement and the default module name will be used.
|
||||
|
||||
##> Specifying the generated API prefix - the `prefix` statement
|
||||
|
||||
By default the public API (types, constants, and functions) of the generated
|
||||
lexer and parser uses a prefix of `p_`.
|
||||
|
||||
This prefix can be changed with the `prefix` statement.
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
prefix myparser_;
|
||||
```
|
||||
|
||||
With a parser generated with this `prefix` statement, instead of calling
|
||||
`p_context_new()` you would call `myparser_context_new()`.
|
||||
|
||||
The `prefix` statement can be optionally used if you would like to change the
|
||||
prefix used by your generated lexer and parser to something other than the
|
||||
default.
|
||||
|
||||
It can also be used when generating multiple lexers/parsers to be used in the
|
||||
same program to avoid symbol collisions.
|
||||
|
||||
##> User termination of the lexer or parser
|
||||
|
||||
Propane supports allowing lexer or parser user code blocks to terminate
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user