Compare commits

...

39 Commits

Author SHA1 Message Date
3aced70356 Show line numbers of rules upon conflict - close #23 2024-07-14 20:52:52 -04:00
2dd89445fc Add command line switch to output warnings to stderr - close #26 2024-07-14 15:36:07 -04:00
4ae5ab79b3 Warn on shift/reduce conflicts 2024-07-13 21:35:53 -04:00
69cc8fa67d Always compute lookahead tokens for reduce rules
Even if they won't be needed for the generated parser, they'll be useful
to detect shift/reduce conflicts.
2024-07-13 21:01:44 -04:00
7f3eb8f315 Calculate follow token set for an ItemSet 2024-07-13 20:48:28 -04:00
d76e12fea1 Rename "following" to "next" - #25
The term "following" could potentially imply an association with the
"follow set", however it was used in a non-closed manner.
2024-07-08 10:14:09 -04:00
911e9505b7 Track token position in AST Token node 2024-05-27 22:10:05 -04:00
aaeb0c4db1 Remove leftover TODO from earlier restructuring 2024-05-27 20:44:42 -04:00
fd89c5c6b3 Add Vim syntax highlighting files for Propane 2024-05-26 14:49:30 -04:00
1468946735 v1.4.0 2024-05-11 11:46:28 -04:00
2bccf3303e Update CHANGELOG 2024-05-09 17:38:18 -04:00
0d1ee74ca6 Give a better error message when a referenced ptype has not been declared 2024-05-09 17:35:27 -04:00
985b180f62 Update CHANGELOG 2024-05-09 11:56:44 -04:00
f3e4941ad8 Allow rule terms to be marked as optional 2024-05-09 11:56:13 -04:00
494afb7307 Allow specifying the start rule name 2024-05-05 12:39:00 -04:00
508dabe760 Update CHANGELOG for v1.4.0 2024-05-04 21:49:13 -04:00
153f9d28f8 Allow user to specify AST node prefix or suffix
Add ast_prefix and ast_suffix grammar statements.
2024-05-04 21:49:13 -04:00
d0f542cbd7 v1.3.0 2024-04-23 00:31:56 -04:00
786c78b635 Update CHANGELOG for v1.3.0 2024-04-23 00:21:28 -04:00
f0bd8d8663 Add documentation for AST generation mode - close #22 2024-04-23 00:15:19 -04:00
c7a18ef821 Add AST node field name with no suffix when unique - #22 2024-04-22 21:50:26 -04:00
cb06a56f81 Add AST generation - #22 2024-04-22 20:51:27 -04:00
2b28ef622d Add specs to fully cover cli.rb 2024-04-06 14:37:15 -04:00
19c32b58dc Fix README example grammar 2024-04-06 14:16:27 -04:00
3a8dcac55f v1.2.0 2024-04-02 21:42:33 -04:00
632ab2fe6f Update CHANGELOG for v1.2.0 2024-04-02 21:42:18 -04:00
3eaf0d3d49 allow one line user code blocks - close #21 2024-04-02 17:44:15 -04:00
918dc7b2bb fix generator hang when state transition cycle is present - close #20 2024-04-02 14:27:08 -04:00
5b2cbe53e6 Add backslash escape codes - close #19 2024-03-29 16:45:54 -04:00
1d1590dfda Add API to access unexpected token found - close #18 2024-03-29 15:58:56 -04:00
1c91dcd298 Add token_names API - close #17 2024-03-29 15:02:01 -04:00
5dfd62b756 Add D example to user guide for p_context_init() - close #16 2024-03-29 13:52:16 -04:00
fad7f4fb36 Allow user termination from lexer code blocks - close #15 2024-03-29 13:45:08 -04:00
d55c5e0080 Update CHANGELOG for v1.1.0 2024-01-07 17:48:47 -05:00
6c847c05b1 v1.1.0 2024-01-07 17:43:06 -05:00
a5800575c8 Document generated API in user guide - close #14 2024-01-05 20:47:22 -05:00
24af3590d1 Allow user to terminate the parser - close #13 2024-01-03 22:32:10 -05:00
92c76b74c8 Update license year 2024-01-03 20:05:46 -05:00
a032ac027c Compilation warning for unreachable statement - close #12 2023-10-21 16:04:15 -04:00
43 changed files with 2586 additions and 430 deletions

View File

@ -1,3 +1,49 @@
## v1.5.0
### New Features
- Track token position in AST Token node
## v1.4.0
### New Features
- Allow user to specify AST node name prefix or suffix
- Allow specifying the start rule name
- Allow rule terms to be marked as optional
### Improvements
- Give a better error message when a referenced ptype has not been declared
## v1.3.0
### New Features
- Add AST generation (#22)
## v1.2.0
### New Features
- Allow one line user code blocks (#21)
- Add backslash escape codes (#19)
- Add API to access unexpected token found (#18)
- Add token_names API (#17)
- Add D example to user guide for p_context_init() (#16)
- Allow user termination from lexer code blocks (#15)
### Fixes
- Fix generator hang when state transition cycle is present (#20)
## v1.1.0
### New Features
- Add user parser terminations (#13)
- Document generated parser API in user guide (#14)
## v1.0.0 ## v1.0.0
- Initial release - Initial release

View File

@ -1,6 +1,6 @@
The MIT License (MIT) The MIT License (MIT)
Copyright (c) 2010-2023 Josh Holtrop Copyright (c) 2010-2024 Josh Holtrop
Permission is hereby granted, free of charge, to any person obtaining a copy Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal of this software and associated documentation files (the "Software"), to deal

View File

@ -6,7 +6,8 @@ Propane is a LALR Parser Generator (LPG) which:
* generates a built-in lexer to tokenize input * generates a built-in lexer to tokenize input
* supports UTF-8 lexer inputs * supports UTF-8 lexer inputs
* generates a table-driven shift/reduce parser to parse input in linear time * generates a table-driven shift/reduce parser to parse input in linear time
* target C or D language outputs * targets C or D language outputs
* optionally supports automatic full AST generation
* is MIT-licensed * is MIT-licensed
* is distributable as a standalone Ruby script * is distributable as a standalone Ruby script
@ -17,6 +18,7 @@ can be copied into and versioned in a project's source tree.
The only requirement to run Propane is that the system has a Ruby interpreter The only requirement to run Propane is that the system has a Ruby interpreter
installed. installed.
The latest release can be downloaded from [https://github.com/holtrop/propane/releases](https://github.com/holtrop/propane/releases). The latest release can be downloaded from [https://github.com/holtrop/propane/releases](https://github.com/holtrop/propane/releases).
Simply copy the `propane` executable script into the desired location within Simply copy the `propane` executable script into the desired location within
the project to be built (typically the root of the repository) and mark it the project to be built (typically the root of the repository) and mark it
executable. executable.
@ -55,10 +57,10 @@ import std.math;
ptype ulong; ptype ulong;
# A few basic arithmetic operators. # A few basic arithmetic operators.
token plus /\\+/; token plus /\+/;
token times /\\*/; token times /\*/;
token power /\\*\\*/; token power /\*\*/;
token integer /\\d+/ << token integer /\d+/ <<
ulong v; ulong v;
foreach (c; match) foreach (c; match)
{ {
@ -67,38 +69,22 @@ token integer /\\d+/ <<
} }
$$ = v; $$ = v;
>> >>
token lparen /\\(/; token lparen /\(/;
token rparen /\\)/; token rparen /\)/;
# Drop whitespace. # Drop whitespace.
drop /\\s+/; drop /\s+/;
Start -> E1 << Start -> E1 << $$ = $1; >>
$$ = $1; E1 -> E2 << $$ = $1; >>
>> E1 -> E1 plus E2 << $$ = $1 + $3; >>
E1 -> E2 << E2 -> E3 << $$ = $1; >>
$$ = $1; E2 -> E2 times E3 << $$ = $1 * $3; >>
>> E3 -> E4 << $$ = $1; >>
E1 -> E1 plus E2 <<
$$ = $1 + $3;
>>
E2 -> E3 <<
$$ = $1;
>>
E2 -> E2 times E3 <<
$$ = $1 * $3;
>>
E3 -> E4 <<
$$ = $1;
>>
E3 -> E3 power E4 << E3 -> E3 power E4 <<
$$ = pow($1, $3); $$ = pow($1, $3);
>> >>
E4 -> integer << E4 -> integer << $$ = $1; >>
$$ = $1; E4 -> lparen E1 rparen << $$ = $2; >>
>>
E4 -> lparen E1 rparen <<
$$ = $2;
>>
``` ```
Grammar files can contain comment lines beginning with `#` which are ignored. Grammar files can contain comment lines beginning with `#` which are ignored.

View File

@ -3,6 +3,17 @@
#include <stdlib.h> #include <stdlib.h>
#include <string.h> #include <string.h>
/**************************************************************************
* Public data
*************************************************************************/
/** Token names. */
const char * <%= @grammar.prefix %>token_names[] = {
<% @grammar.tokens.each_with_index do |token, index| %>
"<%= token.name %>",
<% end %>
};
/************************************************************************** /**************************************************************************
* User code blocks * User code blocks
*************************************************************************/ *************************************************************************/
@ -21,6 +32,7 @@
#define P_UNEXPECTED_TOKEN 3u #define P_UNEXPECTED_TOKEN 3u
#define P_DROP 4u #define P_DROP 4u
#define P_EOF 5u #define P_EOF 5u
#define P_USER_TERMINATED 6u
<% end %> <% end %>
/* An invalid ID value. */ /* An invalid ID value. */
@ -308,9 +320,12 @@ static lexer_state_id_t check_lexer_transition(uint32_t current_state, uint32_t
* *
* @param context * @param context
* Lexer/parser context structure. * Lexer/parser context structure.
* @param[out] out_token_info * @param[out] out_match_info
* The lexed token information is stored here if the return value is * The longest match information is stored here if the return value is
* P_SUCCESS. * P_SUCCESS or P_DECODE_ERROR.
* @param[out] out_unexpected_input_length
* The unexpected input length is stored here if the return value is
* P_UNEXPECTED_INPUT.
* *
* @reval P_SUCCESS * @reval P_SUCCESS
* A token was successfully lexed. * A token was successfully lexed.
@ -390,7 +405,6 @@ static size_t find_longest_match(<%= @grammar.prefix %>context_t * context,
/* Valid EOF return. */ /* Valid EOF return. */
return P_EOF; return P_EOF;
} }
break;
case P_DECODE_ERROR: case P_DECODE_ERROR:
/* If we see a decode error, we may be partially in the middle of /* If we see a decode error, we may be partially in the middle of
@ -422,13 +436,14 @@ static size_t find_longest_match(<%= @grammar.prefix %>context_t * context,
* Input text does not match any lexer pattern. * Input text does not match any lexer pattern.
* @retval P_DROP * @retval P_DROP
* A drop pattern was matched so the lexer should continue. * A drop pattern was matched so the lexer should continue.
* @retval P_USER_TERMINATED
* User code has requested to terminate the lexer.
*/ */
static size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%= @grammar.prefix %>token_info_t * out_token_info) static size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%= @grammar.prefix %>token_info_t * out_token_info)
{ {
<%= @grammar.prefix %>token_info_t token_info = {0}; <%= @grammar.prefix %>token_info_t token_info = {0};
token_info.position = context->text_position; token_info.position = context->text_position;
token_info.token = INVALID_TOKEN_ID; token_info.token = INVALID_TOKEN_ID;
*out_token_info = token_info; // TODO: remove
lexer_match_info_t match_info; lexer_match_info_t match_info;
size_t unexpected_input_length; size_t unexpected_input_length;
size_t result = find_longest_match(context, &match_info, &unexpected_input_length); size_t result = find_longest_match(context, &match_info, &unexpected_input_length);
@ -441,6 +456,12 @@ static size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%= @
uint8_t const * match = &context->input[context->input_index]; uint8_t const * match = &context->input[context->input_index];
<%= @grammar.prefix %>token_t user_code_token = lexer_user_code(context, <%= @grammar.prefix %>token_t user_code_token = lexer_user_code(context,
match_info.accepting_state->code_id, match, match_info.length, &token_info); match_info.accepting_state->code_id, match, match_info.length, &token_info);
/* A TERMINATE_TOKEN_ID return code from lexer_user_code() means
* that the user code is requesting to terminate the lexer. */
if (user_code_token == TERMINATE_TOKEN_ID)
{
return P_USER_TERMINATED;
}
/* An invalid token returned from lexer_user_code() means that the /* An invalid token returned from lexer_user_code() means that the
* user code did not explicitly return a token. So only override * user code did not explicitly return a token. So only override
* the token to return if the user code does explicitly return a * the token to return if the user code does explicitly return a
@ -511,6 +532,8 @@ static size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%= @
* The decoder encountered invalid text encoding. * The decoder encountered invalid text encoding.
* @reval P_UNEXPECTED_INPUT * @reval P_UNEXPECTED_INPUT
* Input text does not match any lexer pattern. * Input text does not match any lexer pattern.
* @retval P_USER_TERMINATED
* User code has requested to terminate the lexer.
*/ */
size_t <%= @grammar.prefix %>lex(<%= @grammar.prefix %>context_t * context, <%= @grammar.prefix %>token_info_t * out_token_info) size_t <%= @grammar.prefix %>lex(<%= @grammar.prefix %>context_t * context, <%= @grammar.prefix %>token_info_t * out_token_info)
{ {
@ -587,6 +610,25 @@ typedef struct
* reduce action. * reduce action.
*/ */
parser_state_id_t n_states; parser_state_id_t n_states;
<% if @grammar.ast %>
/**
* Map of rule components to rule set child fields.
*/
uint16_t const * rule_set_node_field_index_map;
/**
* Number of rule set AST node fields.
*/
uint16_t rule_set_node_field_array_size;
/**
* Whether this rule was a generated optional rule that matched the
* optional target. In this case, propagate the matched target node up
* instead of making a new node for this rule.
*/
bool propagate_optional_target;
<% end %>
} reduce_t; } reduce_t;
/** Parser state entry. */ /** Parser state entry. */
@ -617,19 +659,42 @@ typedef struct
/** Parser value from this state. */ /** Parser value from this state. */
<%= @grammar.prefix %>value_t pvalue; <%= @grammar.prefix %>value_t pvalue;
<% if @grammar.ast %>
/** AST node. */
void * ast_node;
<% end %>
} state_value_t; } state_value_t;
/** Parser shift table. */ /** Parser shift table. */
static const shift_t parser_shift_table[] = { static const shift_t parser_shift_table[] = {
<% @parser.shift_table.each do |shift| %> <% @parser.shift_table.each do |shift| %>
{<%= shift[:symbol_id] %>u, <%= shift[:state_id] %>u}, {<%= shift[:symbol].id %>u, <%= shift[:state_id] %>u},
<% end %> <% end %>
}; };
<% if @grammar.ast %>
<% @grammar.rules.each do |rule| %>
<% unless rule.flat_rule_set_node_field_index_map? %>
const uint16_t r_<%= rule.name.gsub("$", "_") %><%= rule.id %>_node_field_index_map[<%= rule.rule_set_node_field_index_map.size %>] = {<%= rule.rule_set_node_field_index_map.map {|v| v.to_s}.join(", ") %>};
<% end %>
<% end %>
<% end %>
/** Parser reduce table. */ /** Parser reduce table. */
static const reduce_t parser_reduce_table[] = { static const reduce_t parser_reduce_table[] = {
<% @parser.reduce_table.each do |reduce| %> <% @parser.reduce_table.each do |reduce| %>
{<%= reduce[:token_id] %>u, <%= reduce[:rule_id] %>u, <%= reduce[:rule_set_id] %>u, <%= reduce[:n_states] %>u}, {<%= reduce[:token_id] %>u, <%= reduce[:rule_id] %>u, <%= reduce[:rule_set_id] %>u, <%= reduce[:n_states] %>u
<% if @grammar.ast %>
<% if reduce[:rule].flat_rule_set_node_field_index_map? %>
, NULL
<% else %>
, &r_<%= reduce[:rule].name.gsub("$", "_") %><%= reduce[:rule].id %>_node_field_index_map[0]
<% end %>
, <%= reduce[:rule].rule_set.ast_fields.size %>
, <%= reduce[:propagate_optional_target] %>
<% end %>
},
<% end %> <% end %>
}; };
@ -733,17 +798,19 @@ static void state_values_stack_free(state_values_stack_t * stack)
free(stack->entries); free(stack->entries);
} }
<% unless @grammar.ast %>
/** /**
* Execute user code associated with a parser rule. * Execute user code associated with a parser rule.
* *
* @param rule The ID of the rule. * @param rule The ID of the rule.
* *
* @return Parse value. * @retval P_SUCCESS
* Continue parsing.
* @retval P_USER_TERMINATED
* User requested to terminate parsing.
*/ */
static <%= @grammar.prefix %>value_t parser_user_code(uint32_t rule, state_values_stack_t * statevalues, uint32_t n_states) static size_t parser_user_code(<%= @grammar.prefix %>value_t * _pvalue, uint32_t rule, state_values_stack_t * statevalues, uint32_t n_states, <%= @grammar.prefix %>context_t * context)
{ {
<%= @grammar.prefix %>value_t _pvalue = {0};
switch (rule) switch (rule)
{ {
<% @grammar.rules.each do |rule| %> <% @grammar.rules.each do |rule| %>
@ -756,8 +823,9 @@ static <%= @grammar.prefix %>value_t parser_user_code(uint32_t rule, state_value
default: break; default: break;
} }
return _pvalue; return P_SUCCESS;
} }
<% end %>
/** /**
* Check if the parser should shift to a new state. * Check if the parser should shift to a new state.
@ -819,7 +887,7 @@ static size_t check_reduce(size_t state_id, <%= @grammar.prefix %>token_t token)
* can be accessed with <%= @grammar.prefix %>result(). * can be accessed with <%= @grammar.prefix %>result().
* @retval P_UNEXPECTED_TOKEN * @retval P_UNEXPECTED_TOKEN
* An unexpected token was encountered that does not match any grammar rule. * An unexpected token was encountered that does not match any grammar rule.
* The value context->token holds the unexpected token. * The function p_token(&context) can be used to get the unexpected token.
* @reval P_DECODE_ERROR * @reval P_DECODE_ERROR
* The decoder encountered invalid text encoding. * The decoder encountered invalid text encoding.
* @reval P_UNEXPECTED_INPUT * @reval P_UNEXPECTED_INPUT
@ -831,7 +899,11 @@ size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context)
<%= @grammar.prefix %>token_t token = INVALID_TOKEN_ID; <%= @grammar.prefix %>token_t token = INVALID_TOKEN_ID;
state_values_stack_t statevalues; state_values_stack_t statevalues;
size_t reduced_rule_set = INVALID_ID; size_t reduced_rule_set = INVALID_ID;
<% if @grammar.ast %>
void * reduced_parser_node;
<% else %>
<%= @grammar.prefix %>value_t reduced_parser_value; <%= @grammar.prefix %>value_t reduced_parser_value;
<% end %>
state_values_stack_init(&statevalues); state_values_stack_init(&statevalues);
state_values_stack_push(&statevalues); state_values_stack_push(&statevalues);
size_t result; size_t result;
@ -858,7 +930,11 @@ size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context)
if ((shift_state != INVALID_ID) && (token == TOKEN___EOF)) if ((shift_state != INVALID_ID) && (token == TOKEN___EOF))
{ {
/* Successful parse. */ /* Successful parse. */
<% if @grammar.ast %>
context->parse_result = (<%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> *)state_values_stack_index(&statevalues, -1)->ast_node;
<% else %>
context->parse_result = state_values_stack_index(&statevalues, -1)->pvalue; context->parse_result = state_values_stack_index(&statevalues, -1)->pvalue;
<% end %>
result = P_SUCCESS; result = P_SUCCESS;
break; break;
} }
@ -871,15 +947,27 @@ size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context)
if (reduced_rule_set == INVALID_ID) if (reduced_rule_set == INVALID_ID)
{ {
/* We shifted a token, mark it consumed. */ /* We shifted a token, mark it consumed. */
token = INVALID_TOKEN_ID; <% if @grammar.ast %>
<%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %> * token_ast_node = malloc(sizeof(<%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %>));
token_ast_node->token = token;
token_ast_node->pvalue = token_info.pvalue;
token_ast_node->position = token_info.position;
state_values_stack_index(&statevalues, -1)->ast_node = token_ast_node;
<% else %>
state_values_stack_index(&statevalues, -1)->pvalue = token_info.pvalue; state_values_stack_index(&statevalues, -1)->pvalue = token_info.pvalue;
<% end %>
token = INVALID_TOKEN_ID;
} }
else else
{ {
/* We shifted a RuleSet. */ /* We shifted a RuleSet. */
<% if @grammar.ast %>
state_values_stack_index(&statevalues, -1)->ast_node = reduced_parser_node;
<% else %>
state_values_stack_index(&statevalues, -1)->pvalue = reduced_parser_value; state_values_stack_index(&statevalues, -1)->pvalue = reduced_parser_value;
<%= @grammar.prefix %>value_t new_parse_result = {0}; <%= @grammar.prefix %>value_t new_parse_result = {0};
reduced_parser_value = new_parse_result; reduced_parser_value = new_parse_result;
<% end %>
reduced_rule_set = INVALID_ID; reduced_rule_set = INVALID_ID;
} }
continue; continue;
@ -889,7 +977,42 @@ size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context)
if (reduce_index != INVALID_ID) if (reduce_index != INVALID_ID)
{ {
/* We have something to reduce. */ /* We have something to reduce. */
reduced_parser_value = parser_user_code(parser_reduce_table[reduce_index].rule, &statevalues, parser_reduce_table[reduce_index].n_states); <% if @grammar.ast %>
if (parser_reduce_table[reduce_index].propagate_optional_target)
{
reduced_parser_node = state_values_stack_index(&statevalues, -1)->ast_node;
}
else if (parser_reduce_table[reduce_index].n_states > 0)
{
void ** node_fields = calloc(parser_reduce_table[reduce_index].rule_set_node_field_array_size, sizeof(void *));
if (parser_reduce_table[reduce_index].rule_set_node_field_index_map == NULL)
{
for (size_t i = 0; i < parser_reduce_table[reduce_index].n_states; i++)
{
node_fields[i] = state_values_stack_index(&statevalues, -(int)parser_reduce_table[reduce_index].n_states + (int)i)->ast_node;
}
}
else
{
for (size_t i = 0; i < parser_reduce_table[reduce_index].n_states; i++)
{
node_fields[parser_reduce_table[reduce_index].rule_set_node_field_index_map[i]] = state_values_stack_index(&statevalues, -(int)parser_reduce_table[reduce_index].n_states + (int)i)->ast_node;
}
}
reduced_parser_node = node_fields;
}
else
{
reduced_parser_node = NULL;
}
<% else %>
<%= @grammar.prefix %>value_t reduced_parser_value2 = {0};
if (parser_user_code(&reduced_parser_value2, parser_reduce_table[reduce_index].rule, &statevalues, parser_reduce_table[reduce_index].n_states, context) == P_USER_TERMINATED)
{
return P_USER_TERMINATED;
}
reduced_parser_value = reduced_parser_value2;
<% end %>
reduced_rule_set = parser_reduce_table[reduce_index].rule_set; reduced_rule_set = parser_reduce_table[reduce_index].rule_set;
state_values_stack_pop(&statevalues, parser_reduce_table[reduce_index].n_states); state_values_stack_pop(&statevalues, parser_reduce_table[reduce_index].n_states);
continue; continue;
@ -917,9 +1040,17 @@ size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context)
* *
* @return Parse result value. * @return Parse result value.
*/ */
<% if @grammar.ast %>
<%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> * <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context)
<% else %>
<%= start_rule_type[1] %> <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context) <%= start_rule_type[1] %> <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context)
<% end %>
{ {
<% if @grammar.ast %>
return context->parse_result;
<% else %>
return context->parse_result.v_<%= start_rule_type[0] %>; return context->parse_result.v_<%= start_rule_type[0] %>;
<% end %>
} }
/** /**
@ -934,3 +1065,26 @@ size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context)
{ {
return context->text_position; return context->text_position;
} }
/**
* Get the user terminate code.
*
* @param context
* Lexer/parser context structure.
*
* @return User terminate code.
*/
size_t <%= @grammar.prefix %>user_terminate_code(<%= @grammar.prefix %>context_t * context)
{
return context->user_terminate_code;
}
/**
* Get the parse token.
*
* @return Parse token.
*/
<%= @grammar.prefix %>token_t <%= @grammar.prefix %>token(<%= @grammar.prefix %>context_t * context)
{
return context->token;
}

View File

@ -27,10 +27,11 @@ public enum : size_t
<%= @grammar.prefix.upcase %>UNEXPECTED_TOKEN, <%= @grammar.prefix.upcase %>UNEXPECTED_TOKEN,
<%= @grammar.prefix.upcase %>DROP, <%= @grammar.prefix.upcase %>DROP,
<%= @grammar.prefix.upcase %>EOF, <%= @grammar.prefix.upcase %>EOF,
<%= @grammar.prefix.upcase %>USER_TERMINATED,
} }
/** Token type. */ /** Token type. */
public alias <%= @grammar.prefix %>token_t = <%= get_type_for(@grammar.invalid_token_id) %>; public alias <%= @grammar.prefix %>token_t = <%= get_type_for(@grammar.terminate_token_id) %>;
/** Token IDs. */ /** Token IDs. */
public enum : <%= @grammar.prefix %>token_t public enum : <%= @grammar.prefix %>token_t
@ -42,21 +43,14 @@ public enum : <%= @grammar.prefix %>token_t
<% end %> <% end %>
<% end %> <% end %>
INVALID_TOKEN_ID = <%= @grammar.invalid_token_id %>, INVALID_TOKEN_ID = <%= @grammar.invalid_token_id %>,
TERMINATE_TOKEN_ID = <%= @grammar.terminate_token_id %>,
} }
/** Code point type. */ /** Code point type. */
public alias <%= @grammar.prefix %>code_point_t = uint; public alias <%= @grammar.prefix %>code_point_t = uint;
/** Parser values type(s). */
public union <%= @grammar.prefix %>value_t
{
<% @grammar.ptypes.each do |name, typestring| %>
<%= typestring %> v_<%= name %>;
<% end %>
}
/** /**
* A structure to keep track of parser position. * A structure to keep track of input position.
* *
* This is useful for reporting errors, etc... * This is useful for reporting errors, etc...
*/ */
@ -69,6 +63,47 @@ public struct <%= @grammar.prefix %>position_t
uint col; uint col;
} }
<% if @grammar.ast %>
/** Parser values type. */
public alias <%= @grammar.prefix %>value_t = <%= @grammar.ptype %>;
<% else %>
/** Parser values type(s). */
public union <%= @grammar.prefix %>value_t
{
<% @grammar.ptypes.each do |name, typestring| %>
<%= typestring %> v_<%= name %>;
<% end %>
}
<% end %>
<% if @grammar.ast %>
/** AST node types. @{ */
public struct <%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %>
{
<%= @grammar.prefix %>token_t token;
<%= @grammar.prefix %>value_t pvalue;
<%= @grammar.prefix %>position_t position;
}
<% @parser.rule_sets.each do |name, rule_set| %>
<% next if name.start_with?("$") %>
<% next if rule_set.optional? %>
public struct <%= @grammar.ast_prefix %><%= name %><%= @grammar.ast_suffix %>
{
<% rule_set.ast_fields.each do |fields| %>
union
{
<% fields.each do |field_name, type| %>
<%= type %> * <%= field_name %>;
<% end %>
}
<% end %>
}
<% end %>
/** @} */
<% end %>
/** Lexed token information. */ /** Lexed token information. */
public struct <%= @grammar.prefix %>token_info_t public struct <%= @grammar.prefix %>token_info_t
{ {
@ -110,10 +145,17 @@ public struct <%= @grammar.prefix %>context_t
/* Parser context data. */ /* Parser context data. */
/** Parse result value. */ /** Parse result value. */
<% if @grammar.ast %>
<%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> * parse_result;
<% else %>
<%= @grammar.prefix %>value_t parse_result; <%= @grammar.prefix %>value_t parse_result;
<% end %>
/** Unexpected token received. */ /** Unexpected token received. */
<%= @grammar.prefix %>token_t token; <%= @grammar.prefix %>token_t token;
/** User terminate code. */
size_t user_terminate_code;
} }
/************************************************************************** /**************************************************************************
@ -141,6 +183,7 @@ private enum : size_t
P_UNEXPECTED_TOKEN, P_UNEXPECTED_TOKEN,
P_DROP, P_DROP,
P_EOF, P_EOF,
P_USER_TERMINATED,
} }
<% end %> <% end %>
@ -422,9 +465,12 @@ private lexer_state_id_t check_lexer_transition(uint current_state, uint code_po
* *
* @param context * @param context
* Lexer/parser context structure. * Lexer/parser context structure.
* @param[out] out_token_info * @param[out] out_match_info
* The lexed token information is stored here if the return value is * The longest match information is stored here if the return value is
* P_SUCCESS. * P_SUCCESS or P_DECODE_ERROR.
* @param[out] out_unexpected_input_length
* The unexpected input length is stored here if the return value is
* P_UNEXPECTED_INPUT.
* *
* @reval P_SUCCESS * @reval P_SUCCESS
* A token was successfully lexed. * A token was successfully lexed.
@ -502,7 +548,6 @@ private size_t find_longest_match(<%= @grammar.prefix %>context_t * context,
/* Valid EOF return. */ /* Valid EOF return. */
return P_EOF; return P_EOF;
} }
break;
case P_DECODE_ERROR: case P_DECODE_ERROR:
/* If we see a decode error, we may be partially in the middle of /* If we see a decode error, we may be partially in the middle of
@ -534,13 +579,14 @@ private size_t find_longest_match(<%= @grammar.prefix %>context_t * context,
* Input text does not match any lexer pattern. * Input text does not match any lexer pattern.
* @retval P_DROP * @retval P_DROP
* A drop pattern was matched so the lexer should continue. * A drop pattern was matched so the lexer should continue.
* @retval P_USER_TERMINATED
* User code has requested to terminate the lexer.
*/ */
private size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%= @grammar.prefix %>token_info_t * out_token_info) private size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%= @grammar.prefix %>token_info_t * out_token_info)
{ {
<%= @grammar.prefix %>token_info_t token_info; <%= @grammar.prefix %>token_info_t token_info;
token_info.position = context.text_position; token_info.position = context.text_position;
token_info.token = INVALID_TOKEN_ID; token_info.token = INVALID_TOKEN_ID;
*out_token_info = token_info; // TODO: remove
lexer_match_info_t match_info; lexer_match_info_t match_info;
size_t unexpected_input_length; size_t unexpected_input_length;
size_t result = find_longest_match(context, &match_info, &unexpected_input_length); size_t result = find_longest_match(context, &match_info, &unexpected_input_length);
@ -553,6 +599,12 @@ private size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%=
string match = context.input[context.input_index..(context.input_index + match_info.length)]; string match = context.input[context.input_index..(context.input_index + match_info.length)];
<%= @grammar.prefix %>token_t user_code_token = lexer_user_code(context, <%= @grammar.prefix %>token_t user_code_token = lexer_user_code(context,
match_info.accepting_state.code_id, match, &token_info); match_info.accepting_state.code_id, match, &token_info);
/* A TERMINATE_TOKEN_ID return code from lexer_user_code() means
* that the user code is requesting to terminate the lexer. */
if (user_code_token == TERMINATE_TOKEN_ID)
{
return P_USER_TERMINATED;
}
/* An invalid token returned from lexer_user_code() means that the /* An invalid token returned from lexer_user_code() means that the
* user code did not explicitly return a token. So only override * user code did not explicitly return a token. So only override
* the token to return if the user code does explicitly return a * the token to return if the user code does explicitly return a
@ -623,6 +675,8 @@ private size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%=
* The decoder encountered invalid text encoding. * The decoder encountered invalid text encoding.
* @reval P_UNEXPECTED_INPUT * @reval P_UNEXPECTED_INPUT
* Input text does not match any lexer pattern. * Input text does not match any lexer pattern.
* @retval P_USER_TERMINATED
* User code has requested to terminate the lexer.
*/ */
public size_t <%= @grammar.prefix %>lex(<%= @grammar.prefix %>context_t * context, <%= @grammar.prefix %>token_info_t * out_token_info) public size_t <%= @grammar.prefix %>lex(<%= @grammar.prefix %>context_t * context, <%= @grammar.prefix %>token_info_t * out_token_info)
{ {
@ -699,6 +753,25 @@ private struct reduce_t
* reduce action. * reduce action.
*/ */
parser_state_id_t n_states; parser_state_id_t n_states;
<% if @grammar.ast %>
/**
* Map of rule components to rule set child fields.
*/
immutable(ushort) * rule_set_node_field_index_map;
/**
* Number of rule set AST node fields.
*/
ushort rule_set_node_field_array_size;
/**
* Whether this rule was a generated optional rule that matched the
* optional target. In this case, propagate the matched target node up
* instead of making a new node for this rule.
*/
bool propagate_optional_target;
<% end %>
} }
/** Parser state entry. */ /** Parser state entry. */
@ -730,6 +803,11 @@ private struct state_value_t
/** Parser value from this state. */ /** Parser value from this state. */
<%= @grammar.prefix %>value_t pvalue; <%= @grammar.prefix %>value_t pvalue;
<% if @grammar.ast %>
/** AST node. */
void * ast_node;
<% end %>
this(size_t state_id) this(size_t state_id)
{ {
this.state_id = state_id; this.state_id = state_id;
@ -739,14 +817,32 @@ private struct state_value_t
/** Parser shift table. */ /** Parser shift table. */
private immutable shift_t[] parser_shift_table = [ private immutable shift_t[] parser_shift_table = [
<% @parser.shift_table.each do |shift| %> <% @parser.shift_table.each do |shift| %>
shift_t(<%= shift[:symbol_id] %>u, <%= shift[:state_id] %>u), shift_t(<%= shift[:symbol].id %>u, <%= shift[:state_id] %>u),
<% end %> <% end %>
]; ];
<% if @grammar.ast %>
<% @grammar.rules.each do |rule| %>
<% unless rule.flat_rule_set_node_field_index_map? %>
immutable ushort[<%= rule.rule_set_node_field_index_map.size %>] r_<%= rule.name.gsub("$", "_") %><%= rule.id %>_node_field_index_map = [<%= rule.rule_set_node_field_index_map.map {|v| v.to_s}.join(", ") %>];
<% end %>
<% end %>
<% end %>
/** Parser reduce table. */ /** Parser reduce table. */
private immutable reduce_t[] parser_reduce_table = [ private immutable reduce_t[] parser_reduce_table = [
<% @parser.reduce_table.each do |reduce| %> <% @parser.reduce_table.each do |reduce| %>
reduce_t(<%= reduce[:token_id] %>u, <%= reduce[:rule_id] %>u, <%= reduce[:rule_set_id] %>u, <%= reduce[:n_states] %>u), reduce_t(<%= reduce[:token_id] %>u, <%= reduce[:rule_id] %>u, <%= reduce[:rule_set_id] %>u, <%= reduce[:n_states] %>u
<% if @grammar.ast %>
<% if reduce[:rule].flat_rule_set_node_field_index_map? %>
, null
<% else %>
, &r_<%= reduce[:rule].name.gsub("$", "_") %><%= reduce[:rule].id %>_node_field_index_map[0]
<% end %>
, <%= reduce[:rule].rule_set.ast_fields.size %>
, <%= reduce[:propagate_optional_target] %>
<% end %>
),
<% end %> <% end %>
]; ];
@ -757,17 +853,19 @@ private immutable parser_state_t[] parser_state_table = [
<% end %> <% end %>
]; ];
<% unless @grammar.ast %>
/** /**
* Execute user code associated with a parser rule. * Execute user code associated with a parser rule.
* *
* @param rule The ID of the rule. * @param rule The ID of the rule.
* *
* @return Parse value. * @retval P_SUCCESS
* Continue parsing.
* @retval P_USER_TERMINATED
* User requested to terminate parsing.
*/ */
private <%= @grammar.prefix %>value_t parser_user_code(uint rule, state_value_t[] statevalues, uint n_states) private size_t parser_user_code(<%= @grammar.prefix %>value_t * _pvalue, uint rule, state_value_t[] statevalues, uint n_states, <%= @grammar.prefix %>context_t * context)
{ {
<%= @grammar.prefix %>value_t _pvalue;
switch (rule) switch (rule)
{ {
<% @grammar.rules.each do |rule| %> <% @grammar.rules.each do |rule| %>
@ -780,8 +878,9 @@ private <%= @grammar.prefix %>value_t parser_user_code(uint rule, state_value_t[
default: break; default: break;
} }
return _pvalue; return P_SUCCESS;
} }
<% end %>
/** /**
* Check if the parser should shift to a new state. * Check if the parser should shift to a new state.
@ -843,7 +942,7 @@ private size_t check_reduce(size_t state_id, <%= @grammar.prefix %>token_t token
* can be accessed with <%= @grammar.prefix %>result(). * can be accessed with <%= @grammar.prefix %>result().
* @retval P_UNEXPECTED_TOKEN * @retval P_UNEXPECTED_TOKEN
* An unexpected token was encountered that does not match any grammar rule. * An unexpected token was encountered that does not match any grammar rule.
* The value context.token holds the unexpected token. * The function p_token(&context) can be used to get the unexpected token.
* @reval P_DECODE_ERROR * @reval P_DECODE_ERROR
* The decoder encountered invalid text encoding. * The decoder encountered invalid text encoding.
* @reval P_UNEXPECTED_INPUT * @reval P_UNEXPECTED_INPUT
@ -855,7 +954,11 @@ public size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * cont
<%= @grammar.prefix %>token_t token = INVALID_TOKEN_ID; <%= @grammar.prefix %>token_t token = INVALID_TOKEN_ID;
state_value_t[] statevalues = new state_value_t[](1); state_value_t[] statevalues = new state_value_t[](1);
size_t reduced_rule_set = INVALID_ID; size_t reduced_rule_set = INVALID_ID;
<% if @grammar.ast %>
void * reduced_parser_node;
<% else %>
<%= @grammar.prefix %>value_t reduced_parser_value; <%= @grammar.prefix %>value_t reduced_parser_value;
<% end %>
for (;;) for (;;)
{ {
if (token == INVALID_TOKEN_ID) if (token == INVALID_TOKEN_ID)
@ -878,7 +981,11 @@ public size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * cont
if ((shift_state != INVALID_ID) && (token == TOKEN___EOF)) if ((shift_state != INVALID_ID) && (token == TOKEN___EOF))
{ {
/* Successful parse. */ /* Successful parse. */
<% if @grammar.ast %>
context.parse_result = cast(<%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> *)statevalues[$-1].ast_node;
<% else %>
context.parse_result = statevalues[$-1].pvalue; context.parse_result = statevalues[$-1].pvalue;
<% end %>
return P_SUCCESS; return P_SUCCESS;
} }
} }
@ -889,15 +996,24 @@ public size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * cont
if (reduced_rule_set == INVALID_ID) if (reduced_rule_set == INVALID_ID)
{ {
/* We shifted a token, mark it consumed. */ /* We shifted a token, mark it consumed. */
token = INVALID_TOKEN_ID; <% if @grammar.ast %>
<%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %> * token_ast_node = new <%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %>(token, token_info.pvalue, token_info.position);
statevalues[$-1].ast_node = token_ast_node;
<% else %>
statevalues[$-1].pvalue = token_info.pvalue; statevalues[$-1].pvalue = token_info.pvalue;
<% end %>
token = INVALID_TOKEN_ID;
} }
else else
{ {
/* We shifted a RuleSet. */ /* We shifted a RuleSet. */
<% if @grammar.ast %>
statevalues[$-1].ast_node = reduced_parser_node;
<% else %>
statevalues[$-1].pvalue = reduced_parser_value; statevalues[$-1].pvalue = reduced_parser_value;
<%= @grammar.prefix %>value_t new_parse_result; <%= @grammar.prefix %>value_t new_parse_result;
reduced_parser_value = new_parse_result; reduced_parser_value = new_parse_result;
<% end %>
reduced_rule_set = INVALID_ID; reduced_rule_set = INVALID_ID;
} }
continue; continue;
@ -907,7 +1023,46 @@ public size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * cont
if (reduce_index != INVALID_ID) if (reduce_index != INVALID_ID)
{ {
/* We have something to reduce. */ /* We have something to reduce. */
reduced_parser_value = parser_user_code(parser_reduce_table[reduce_index].rule, statevalues, parser_reduce_table[reduce_index].n_states); <% if @grammar.ast %>
if (parser_reduce_table[reduce_index].propagate_optional_target)
{
reduced_parser_node = statevalues[$ - 1].ast_node;
}
else if (parser_reduce_table[reduce_index].n_states > 0)
{
void *[] node_fields = new void *[parser_reduce_table[reduce_index].rule_set_node_field_array_size];
foreach (i; 0..parser_reduce_table[reduce_index].rule_set_node_field_array_size)
{
node_fields[i] = null;
}
if (parser_reduce_table[reduce_index].rule_set_node_field_index_map is null)
{
foreach (i; 0..parser_reduce_table[reduce_index].n_states)
{
node_fields[i] = statevalues[$ - parser_reduce_table[reduce_index].n_states + i].ast_node;
}
}
else
{
foreach (i; 0..parser_reduce_table[reduce_index].n_states)
{
node_fields[parser_reduce_table[reduce_index].rule_set_node_field_index_map[i]] = statevalues[$ - parser_reduce_table[reduce_index].n_states + i].ast_node;
}
}
reduced_parser_node = node_fields.ptr;
}
else
{
reduced_parser_node = null;
}
<% else %>
<%= @grammar.prefix %>value_t reduced_parser_value2;
if (parser_user_code(&reduced_parser_value2, parser_reduce_table[reduce_index].rule, statevalues, parser_reduce_table[reduce_index].n_states, context) == P_USER_TERMINATED)
{
return P_USER_TERMINATED;
}
reduced_parser_value = reduced_parser_value2;
<% end %>
reduced_rule_set = parser_reduce_table[reduce_index].rule_set; reduced_rule_set = parser_reduce_table[reduce_index].rule_set;
statevalues.length -= parser_reduce_table[reduce_index].n_states; statevalues.length -= parser_reduce_table[reduce_index].n_states;
continue; continue;
@ -932,9 +1087,17 @@ public size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * cont
* *
* @return Parse result value. * @return Parse result value.
*/ */
<% if @grammar.ast %>
public <%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> * <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context)
<% else %>
public <%= start_rule_type[1] %> <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context) public <%= start_rule_type[1] %> <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context)
<% end %>
{ {
<% if @grammar.ast %>
return context.parse_result;
<% else %>
return context.parse_result.v_<%= start_rule_type[0] %>; return context.parse_result.v_<%= start_rule_type[0] %>;
<% end %>
} }
/** /**
@ -949,3 +1112,26 @@ public <%= @grammar.prefix %>position_t <%= @grammar.prefix %>position(<%= @gram
{ {
return context.text_position; return context.text_position;
} }
/**
* Get the user terminate code.
*
* @param context
* Lexer/parser context structure.
*
* @return User terminate code.
*/
public size_t <%= @grammar.prefix %>user_terminate_code(<%= @grammar.prefix %>context_t * context)
{
return context.user_terminate_code;
}
/**
* Get the parse token.
*
* @return Parse token.
*/
public <%= @grammar.prefix %>token_t <%= @grammar.prefix %>token(<%= @grammar.prefix %>context_t * context)
{
return context.token;
}

View File

@ -20,9 +20,10 @@
#define <%= @grammar.prefix.upcase %>UNEXPECTED_TOKEN 3u #define <%= @grammar.prefix.upcase %>UNEXPECTED_TOKEN 3u
#define <%= @grammar.prefix.upcase %>DROP 4u #define <%= @grammar.prefix.upcase %>DROP 4u
#define <%= @grammar.prefix.upcase %>EOF 5u #define <%= @grammar.prefix.upcase %>EOF 5u
#define <%= @grammar.prefix.upcase %>USER_TERMINATED 6u
/** Token type. */ /** Token type. */
typedef <%= get_type_for(@grammar.invalid_token_id) %> <%= @grammar.prefix %>token_t; typedef <%= get_type_for(@grammar.terminate_token_id) %> <%= @grammar.prefix %>token_t;
/** Token IDs. */ /** Token IDs. */
<% @grammar.tokens.each_with_index do |token, index| %> <% @grammar.tokens.each_with_index do |token, index| %>
@ -32,23 +33,13 @@ typedef <%= get_type_for(@grammar.invalid_token_id) %> <%= @grammar.prefix %>tok
<% end %> <% end %>
<% end %> <% end %>
#define INVALID_TOKEN_ID <%= @grammar.invalid_token_id %>u #define INVALID_TOKEN_ID <%= @grammar.invalid_token_id %>u
#define TERMINATE_TOKEN_ID <%= @grammar.terminate_token_id %>u
/** Code point type. */ /** Code point type. */
typedef uint32_t <%= @grammar.prefix %>code_point_t; typedef uint32_t <%= @grammar.prefix %>code_point_t;
/** User header code blocks. */
<%= @grammar.code_blocks.fetch("header", "") %>
/** Parser values type(s). */
typedef union
{
<% @grammar.ptypes.each do |name, typestring| %>
<%= typestring %> v_<%= name %>;
<% end %>
} <%= @grammar.prefix %>value_t;
/** /**
* A structure to keep track of parser position. * A structure to keep track of input position.
* *
* This is useful for reporting errors, etc... * This is useful for reporting errors, etc...
*/ */
@ -61,6 +52,56 @@ typedef struct
uint32_t col; uint32_t col;
} <%= @grammar.prefix %>position_t; } <%= @grammar.prefix %>position_t;
/** User header code blocks. */
<%= @grammar.code_blocks.fetch("header", "") %>
<% if @grammar.ast %>
/** Parser values type. */
typedef <%= @grammar.ptype %> <%= @grammar.prefix %>value_t;
<% else %>
/** Parser values type(s). */
typedef union
{
<% @grammar.ptypes.each do |name, typestring| %>
<%= typestring %> v_<%= name %>;
<% end %>
} <%= @grammar.prefix %>value_t;
<% end %>
<% if @grammar.ast %>
/** AST node types. @{ */
typedef struct <%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %>
{
<%= @grammar.prefix %>token_t token;
<%= @grammar.prefix %>value_t pvalue;
<%= @grammar.prefix %>position_t position;
} <%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %>;
<% @parser.rule_sets.each do |name, rule_set| %>
<% next if name.start_with?("$") %>
<% next if rule_set.optional? %>
struct <%= name %>;
<% end %>
<% @parser.rule_sets.each do |name, rule_set| %>
<% next if name.start_with?("$") %>
<% next if rule_set.optional? %>
typedef struct <%= @grammar.ast_prefix %><%= name %><%= @grammar.ast_suffix %>
{
<% rule_set.ast_fields.each do |fields| %>
union
{
<% fields.each do |field_name, type| %>
struct <%= type %> * <%= field_name %>;
<% end %>
};
<% end %>
} <%= @grammar.ast_prefix %><%= name %><%= @grammar.ast_suffix %>;
<% end %>
/** @} */
<% end %>
/** Lexed token information. */ /** Lexed token information. */
typedef struct typedef struct
{ {
@ -105,12 +146,26 @@ typedef struct
/* Parser context data. */ /* Parser context data. */
/** Parse result value. */ /** Parse result value. */
<% if @grammar.ast %>
<%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> * parse_result;
<% else %>
<%= @grammar.prefix %>value_t parse_result; <%= @grammar.prefix %>value_t parse_result;
<% end %>
/** Unexpected token received. */ /** Unexpected token received. */
<%= @grammar.prefix %>token_t token; <%= @grammar.prefix %>token_t token;
/** User terminate code. */
size_t user_terminate_code;
} <%= @grammar.prefix %>context_t; } <%= @grammar.prefix %>context_t;
/**************************************************************************
* Public data
*************************************************************************/
/** Token names. */
extern const char * <%= @grammar.prefix %>token_names[];
void <%= @grammar.prefix %>context_init(<%= @grammar.prefix %>context_t * context, uint8_t const * input, size_t input_length); void <%= @grammar.prefix %>context_init(<%= @grammar.prefix %>context_t * context, uint8_t const * input, size_t input_length);
size_t <%= @grammar.prefix %>decode_code_point(uint8_t const * input, size_t input_length, size_t <%= @grammar.prefix %>decode_code_point(uint8_t const * input, size_t input_length,
@ -120,6 +175,14 @@ size_t <%= @grammar.prefix %>lex(<%= @grammar.prefix %>context_t * context, <%=
size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context); size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context);
<% if @grammar.ast %>
<%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> * <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context);
<% else %>
<%= start_rule_type[1] %> <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context); <%= start_rule_type[1] %> <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context);
<% end %>
<%= @grammar.prefix %>position_t <%= @grammar.prefix %>position(<%= @grammar.prefix %>context_t * context); <%= @grammar.prefix %>position_t <%= @grammar.prefix %>position(<%= @grammar.prefix %>context_t * context);
size_t <%= @grammar.prefix %>user_terminate_code(<%= @grammar.prefix %>context_t * context);
<%= @grammar.prefix %>token_t <%= @grammar.prefix %>token(<%= @grammar.prefix %>context_t * context);

View File

@ -13,7 +13,8 @@ Propane is a LALR Parser Generator (LPG) which:
* generates a built-in lexer to tokenize input * generates a built-in lexer to tokenize input
* supports UTF-8 lexer inputs * supports UTF-8 lexer inputs
* generates a table-driven shift/reduce parser to parse input in linear time * generates a table-driven shift/reduce parser to parse input in linear time
* target C or D language outputs * targets C or D language outputs
* optionally supports automatic full AST generation
* is MIT-licensed * is MIT-licensed
* is distributable as a standalone Ruby script * is distributable as a standalone Ruby script
@ -77,33 +78,15 @@ token rparen /\\)/;
# Drop whitespace. # Drop whitespace.
drop /\\s+/; drop /\\s+/;
Start -> E1 << Start -> E1 << $$ = $1; >>
$$ = $1; E1 -> E2 << $$ = $1; >>
>> E1 -> E1 plus E2 << $$ = $1 + $3; >>
E1 -> E2 << E2 -> E3 << $$ = $1; >>
$$ = $1; E2 -> E2 times E3 << $$ = $1 * $3; >>
>> E3 -> E4 << $$ = $1; >>
E1 -> E1 plus E2 << E3 -> E3 power E4 << $$ = pow($1, $3); >>
$$ = $1 + $3; E4 -> integer << $$ = $1; >>
>> E4 -> lparen E1 rparen << $$ = $2; >>
E2 -> E3 <<
$$ = $1;
>>
E2 -> E2 times E3 <<
$$ = $1 * $3;
>>
E3 -> E4 <<
$$ = $1;
>>
E3 -> E3 power E4 <<
$$ = pow($1, $3);
>>
E4 -> integer <<
$$ = $1;
>>
E4 -> lparen E1 rparen <<
$$ = $2;
>>
``` ```
Grammar files can contain comment lines beginning with `#` which are ignored. Grammar files can contain comment lines beginning with `#` which are ignored.
@ -117,8 +100,8 @@ lowercase character and beginning a rule name with an uppercase character.
##> User Code Blocks ##> User Code Blocks
User code blocks begin with the line following a "<<" token and end with the User code blocks begin following a "<<" token and end with a ">>" token found
line preceding a grammar line consisting of solely the ">>" token. at the end of a line.
All text lines in the code block are copied verbatim into the output file. All text lines in the code block are copied verbatim into the output file.
### Standalone Code Blocks ### Standalone Code Blocks
@ -189,9 +172,7 @@ This parser value can then be used later in a parser rule.
Example: Example:
``` ```
E1 -> E1 plus E2 << E1 -> E1 plus E2 << $$ = $1 + $3; >>
$$ = $1 + $3;
>>
``` ```
Parser rule code blocks appear following a rule expression. Parser rule code blocks appear following a rule expression.
@ -202,6 +183,143 @@ rule.
Parser values for the rules or tokens in the rule pattern can be accessed Parser values for the rules or tokens in the rule pattern can be accessed
positionally with tokens `$1`, `$2`, `$3`, etc... positionally with tokens `$1`, `$2`, `$3`, etc...
Parser rule code blocks are not available in AST generation mode.
In AST generation mode, a full parse tree is automatically constructed in
memory for user code to traverse after parsing is complete.
##> AST generation mode - the `ast` statement
To activate AST generation mode, place the `ast` statement in your grammar file:
```
ast;
```
It is recommended to place this statement early in the grammar.
In AST generation mode various aspects of propane's behavior are changed:
* Only one `ptype` is allowed.
* Parser user code blocks are not supported.
* Structure types are generated to represent the parsed tokens and rules as
defined in the grammar.
* The parse result from `p_result()` points to a `Start` struct containing
the entire parse tree for the input. If the user has changed the start rule
with the `start` grammar statement, the name of the start struct will be
given by the user-specified start rule instead of `Start`.
Example AST generation grammar:
```
ast;
ptype int;
token a << $$ = 11; >>
token b << $$ = 22; >>
token one /1/;
token two /2/;
token comma /,/ <<
$$ = 42;
>>
token lparen /\\(/;
token rparen /\\)/;
drop /\\s+/;
Start -> Items;
Items -> Item ItemsMore;
Items -> ;
ItemsMore -> comma Item ItemsMore;
ItemsMore -> ;
Item -> a;
Item -> b;
Item -> lparen Item rparen;
Item -> Dual;
Dual -> One Two;
Dual -> Two One;
One -> one;
Two -> two;
```
The following unit test describes the fields that will be present for an
example parse:
```
string input = "a, ((b)), b";
p_context_t context;
p_context_init(&context, input);
assert_eq(P_SUCCESS, p_parse(&context));
Start * start = p_result(&context);
assert(start.pItems1 !is null);
assert(start.pItems !is null);
Items * items = start.pItems;
assert(items.pItem !is null);
assert(items.pItem.pToken1 !is null);
assert_eq(TOKEN_a, items.pItem.pToken1.token);
assert_eq(11, items.pItem.pToken1.pvalue);
assert(items.pItemsMore !is null);
ItemsMore * itemsmore = items.pItemsMore;
assert(itemsmore.pItem !is null);
assert(itemsmore.pItem.pItem !is null);
assert(itemsmore.pItem.pItem.pItem !is null);
assert(itemsmore.pItem.pItem.pItem.pToken1 !is null);
assert_eq(TOKEN_b, itemsmore.pItem.pItem.pItem.pToken1.token);
assert_eq(22, itemsmore.pItem.pItem.pItem.pToken1.pvalue);
assert(itemsmore.pItemsMore !is null);
itemsmore = itemsmore.pItemsMore;
assert(itemsmore.pItem !is null);
assert(itemsmore.pItem.pToken1 !is null);
assert_eq(TOKEN_b, itemsmore.pItem.pToken1.token);
assert_eq(22, itemsmore.pItem.pToken1.pvalue);
assert(itemsmore.pItemsMore is null);
```
## `ast_prefix` and `ast_suffix` statements
In AST generation mode, structure types are defined and named based on the
rules in the grammar.
Additionally, a structure type called `Token` is generated to hold parsed
token information.
These structure names can be modified by using the `ast_prefix` or `ast_suffix`
statements in the grammar file.
The field names that point to instances of the structures are not affected by
the `ast_prefix` or `ast_suffix` values.
For example, if the following two lines were added to the example above:
```
ast_prefix ABC;
ast_suffix XYZ;
```
Then the types would be used as such instead:
```
string input = "a, ((b)), b";
p_context_t context;
p_context_init(&context, input);
assert_eq(P_SUCCESS, p_parse(&context));
ABCStartXYZ * start = p_result(&context);
assert(start.pItems1 !is null);
assert(start.pItems !is null);
ABCItemsXYZ * items = start.pItems;
assert(items.pItem !is null);
assert(items.pItem.pToken1 !is null);
assert_eq(TOKEN_a, items.pItem.pToken1.token);
assert_eq(11, items.pItem.pToken1.pvalue);
assert(items.pItemsMore !is null);
ABCItemsMoreXYZ * itemsmore = items.pItemsMore;
assert(itemsmore.pItem !is null);
assert(itemsmore.pItem.pItem !is null);
assert(itemsmore.pItem.pItem.pItem !is null);
assert(itemsmore.pItem.pItem.pItem.pToken1 !is null);
```
##> Specifying tokens - the `token` statement ##> Specifying tokens - the `token` statement
The `token` statement allows defining a lexer token and a pattern to match that The `token` statement allows defining a lexer token and a pattern to match that
@ -238,9 +356,7 @@ lexer.
Example: Example:
``` ```
token if << token if << writeln("'if' keyword lexed"); >>
writeln("'if' keyword lexed");
>>
``` ```
The `token` statement is actually a shortcut statement for a combination of a The `token` statement is actually a shortcut statement for a combination of a
@ -277,9 +393,7 @@ code but may not result in a matched token.
Example: Example:
``` ```
/foo+/ << /foo+/ << writeln("saw a foo pattern"); >>
writeln("saw a foo pattern");
>>
``` ```
This can be especially useful with ${#Lexer modes}. This can be especially useful with ${#Lexer modes}.
@ -325,9 +439,16 @@ Regular expressions can include many special characters:
* The `(` character begins a matching group. * The `(` character begins a matching group.
* The `{` character begins a count qualifier. * The `{` character begins a count qualifier.
* The `\` character escapes the following character and changes its meaning: * The `\` character escapes the following character and changes its meaning:
* The `\a` sequence matches an ASCII bell character (0x07).
* The `\b` sequence matches an ASCII backspace character (0x08).
* The `\d` sequence matches any character `0` through `9`. * The `\d` sequence matches any character `0` through `9`.
* The `\f` sequence matches an ASCII form feed character (0x0C).
* The `\n` sequence matches an ASCII new line character (0x0A).
* The `\r` sequence matches an ASCII carriage return character (0x0D).
* The `\s` sequence matches a space, horizontal tab `\t`, carriage return * The `\s` sequence matches a space, horizontal tab `\t`, carriage return
`\r`, a form feed `\f`, or a vertical tab `\v` character. `\r`, a form feed `\f`, or a vertical tab `\v` character.
* The `\t` sequence matches an ASCII tab character (0x09).
* The `\v` sequence matches an ASCII vertical tab character (0x0B).
* Any other character matches itself. * Any other character matches itself.
* The `|` character creates an alternate match. * The `|` character creates an alternate match.
@ -381,9 +502,7 @@ tokenid str;
mystringvalue = ""; mystringvalue = "";
$mode(string); $mode(string);
>> >>
string: /[^"]+/ << string: /[^"]+/ << mystringvalue += match; >>
mystringvalue += match;
>>
string: /"/ << string: /"/ <<
$mode(default); $mode(default);
return $token(str); return $token(str);
@ -440,20 +559,12 @@ ptype Value;
ptype array = Value[]; ptype array = Value[];
ptype dict = Value[string]; ptype dict = Value[string];
Object -> lbrace rbrace << Object -> lbrace rbrace << $$ = new Value(); >>
$$ = new Value();
>>
Values (array) -> Value << Values (array) -> Value << $$ = [$1]; >>
$$ = [$1]; Values -> Values comma Value << $$ = $1 ~ [$3]; >>
>>
Values -> Values comma Value <<
$$ = $1 ~ [$3];
>>
KeyValue (dict) -> string colon Value << KeyValue (dict) -> string colon Value << $$ = [$1: $3]; >>
$$ = [$1: $3];
>>
``` ```
In this example, the default parser value type is `Value`. In this example, the default parser value type is `Value`.
@ -469,6 +580,12 @@ In this example:
* a reduced `Values`'s parser value has a type of `Value[]`. * a reduced `Values`'s parser value has a type of `Value[]`.
* a reduced `KeyValue`'s parser value has a type of `Value[string]`. * a reduced `KeyValue`'s parser value has a type of `Value[string]`.
When AST generation mode is active, the `ptype` functionality works differently.
In this mode, only one `ptype` is used by the parser.
Lexer user code blocks may assign a parse value to the generated `Token` node
by assigning to `$$` within a lexer code block.
The type of the parse value `$$` is given by the global `ptype` type.
##> Specifying a parser rule - the rule statement ##> Specifying a parser rule - the rule statement
Rule statements create parser rules which define the grammar that will be Rule statements create parser rules which define the grammar that will be
@ -479,58 +596,55 @@ Rules with the same name define a rule set for that name and act as
alternatives that the parser can accept when attempting to match a reference to alternatives that the parser can accept when attempting to match a reference to
that rule. that rule.
The grammar file must define a rule with the name `Start` which will be used as The default start rule name is `Start`.
the top-level starting rule that the parser attempts to reduce. This can be changed with the `start` statement.
The grammar file must define a rule with the name of the start rule name which
will be used as the top-level starting rule that the parser attempts to reduce.
Example: Example:
``` ```
ptype ulong; ptype ulong;
token word /[a-z]+/ << start Top;
$$ = match.length; token word /[a-z]+/ << $$ = match.length; >>
>> Top -> word << $$ = $1; >>
Start -> word <<
$$ = $1;
>>
``` ```
In the above example the `Start` rule is defined to match a single `word` In the above example the `Top` rule is defined to match a single `word`
token. token.
Example: Another example:
``` ```
Start -> E1 << Start -> E1 << $$ = $1; >>
$$ = $1; E1 -> E2 << $$ = $1; >>
>> E1 -> E1 plus E2 << $$ = $1 + $3; >>
E1 -> E2 << E2 -> E3 << $$ = $1; >>
$$ = $1; E2 -> E2 times E3 << $$ = $1 * $3; >>
>> E3 -> E4 << $$ = $1; >>
E1 -> E1 plus E2 << E3 -> E3 power E4 << $$ = pow($1, $3); >>
$$ = $1 + $3; E4 -> integer << $$ = $1; >>
>> E4 -> lparen E1 rparen << $$ = $2; >>
E2 -> E3 <<
$$ = $1;
>>
E2 -> E2 times E3 <<
$$ = $1 * $3;
>>
E3 -> E4 <<
$$ = $1;
>>
E3 -> E3 power E4 <<
$$ = pow($1, $3);
>>
E4 -> integer <<
$$ = $1;
>>
E4 -> lparen E1 rparen <<
$$ = $2;
>>
``` ```
This example uses the default start rule name of `Start`.
A parser rule has zero or more terms on the right side of its definition. A parser rule has zero or more terms on the right side of its definition.
Each of these terms is either a token name or a rule name. Each of these terms is either a token name or a rule name.
A term can be immediately followed by a `?` character to signify that it is
optional.
Another example:
```
token public;
token private;
token int;
token ident /[a-zA-Z_][a-zA-Z_0-9]*/;
token semicolon /;/;
IntegerDeclaration -> Visibility? int ident semicolon;
Visibility -> public;
Visibility -> private;
```
In a parser rule code block, parser values for the right side terms are In a parser rule code block, parser values for the right side terms are
accessible as `$1` for the first term's parser value, `$2` for the second accessible as `$1` for the first term's parser value, `$2` for the second
@ -539,6 +653,19 @@ The `$$` symbol accesses the output parser value for this rule.
The above examples demonstrate how the parser values for the rule components The above examples demonstrate how the parser values for the rule components
can be used to produce the parser value for the accepted rule. can be used to produce the parser value for the accepted rule.
Parser rule code blocks are not allowed and not used when AST generation mode
is active.
##> Specifying the parser start rule name - the `start` statement
The start rule can be changed from the default of `Start` by using the `start`
statement.
Example:
```
start MyStartRule;
```
##> Specifying the parser module name - the `module` statement ##> Specifying the parser module name - the `module` statement
The `module` statement can be used to specify the module name for a generated The `module` statement can be used to specify the module name for a generated
@ -574,6 +701,258 @@ default.
It can also be used when generating multiple lexers/parsers to be used in the It can also be used when generating multiple lexers/parsers to be used in the
same program to avoid symbol collisions. same program to avoid symbol collisions.
##> User termination of the lexer or parser
Propane supports allowing lexer or parser user code blocks to terminate
execution of the parser.
Some example uses of this functionality could be to:
* Detect integer overflow when lexing an integer literal constant.
* Detect and report an error as soon as possible during parsing before continuing to parse any more of the input.
* Determine whether parsing should stop and instead be performed using a different parser version.
To terminate parsing from a lexer or parser user code block, use the
`$terminate(code)` function, passing an integer expression argument.
For example:
```
NewExpression -> new Expression << $terminate(42); >>
```
The value passed to the `$terminate()` function is known as the "user terminate
code".
If the parser returns a `P_USER_TERMINATED` result code, then the user
terminate code can be accessed using the `p_user_terminate_code()` API
function.
#> Propane generated API
By default, Propane uses a prefix of `p_` when generating a lexer/parser.
This prefix is used for all publicly declared types and functions.
The uppercase version of the prefix is used for all constant values.
This section documents the generated API using the default `p_` or `P_` names.
##> Constants
Propane generates the following result code constants:
* `P_SUCCESS`: A successful decode/lex/parse operation has taken place.
* `P_DECODE_ERROR`: An error occurred when decoding UTF-8 input.
* `P_UNEXPECTED_INPUT`: Input was received by the lexer that does not match any lexer pattern.
* `P_UNEXPECTED_TOKEN`: A token was seen in a location that does not match any parser rule.
* `P_DROP`: The lexer matched a drop pattern.
* `P_EOF`: The lexer reached the end of the input string.
* `P_USER_TERMINATED`: A parser user code block has requested to terminate the parser.
Result codes are returned by the functions `p_decode_input()`, `p_lex()`, and `p_parse()`.
##> Types
### `p_context_t`
Propane defines a `p_context_t` structure type.
The structure is intended to be used opaquely and stores information related to
the state of the lexer and parser.
Integrating code must define an instance of the `p_context_t` structure.
A pointer to this instance is passed to the generated functions.
### `p_position_t`
The `p_position_t` structure contains two fields `row` and `col`.
These fields contain the 0-based row and column describing a parser position.
### AST Node Types
If AST generation mode is enabled, a structure type for each rule will be
generated.
The name of the structure type is given by the name of the rule.
Additionally a structure type called `Token` is generated to represent an
AST node which refers to a raw parser token rather than a composite rule.
#### AST Node Fields
A `Token` node has two fields:
* `token` which specifies which token was parsed (one of `TOKEN_*`)
* `pvalue` which specifies the parser value for the token. If a lexer user
code block assigned to `$$`, the assigned value will be stored here.
The other generated AST node structures have fields generated based on the
right hand side components specified for all rules of a given name.
In this example:
```
Start -> Items;
Items -> Item ItemsMore;
Items -> ;
```
The `Start` structure will have a field called `pItems` and another field of
the same name but with a positional suffix (`pItems1`) which both point to the
parsed `Items` node.
Their value will be null if the parsed `Items` rule was empty.
The `Items` structure will have fields:
* `pItem` and `pItem1` which point to the parsed `Item` structure.
* `pItemsMore` and `pItemsMore2` which point to the parsed `ItemsMore` structure.
If a rule can be empty (for example in the second `Items` rule above), then
an instance of a pointer to that rule's generated AST node will be null if the
parser matches the empty rule definition.
The non-positional AST node field pointer will not be generated if there are
multiple positions in which an instance of the node it points to could be
present.
For example, in the below rules:
```
Dual -> One Two;
Dual -> Two One;
```
The generated `Dual` structure will contain `pOne1`, `pTwo2`, `pTwo1`, and
`pOne2` fields.
However, a `pOne` field and `pTwo` field will not be generated since it would
be ambiguous which one was matched.
If the first rule is matched, then `pOne1` and `pTwo2` will be non-null while
`pTwo1` and `pOne2` will be null.
If the second rule is matched instead, then the opposite would be the case.
##> Functions
### `p_context_init`
The `p_context_init()` function must be called to initialize the context
structure.
The input to be used for lexing/parsing is passed in when initializing the
context structure.
C example:
```
p_context_t context;
p_context_init(&context, input, input_length);
```
D example:
```
p_context_t context;
p_context_init(&context, input);
```
### `p_parse`
The `p_parse()` function is the main entry point to the parser.
It must be passed a pointer to an initialized context structure.
Example:
```
p_context_t context;
p_context_init(&context, input, input_length);
size_t result = p_parse(&context);
```
### `p_result`
The `p_result()` function can be used to retrieve the final parse value after
`p_parse()` returns a `P_SUCCESS` value.
Example:
```
p_context_t context;
p_context_init(&context, input, input_length);
size_t result = p_parse(&context);
if (p_parse(&context) == P_SUCCESS)
{
result = p_result(&context);
}
```
If AST generation mode is active, then the `p_result()` function returns a
`Start *` pointing to the `Start` AST structure.
### `p_position`
The `p_position()` function can be used to retrieve the parser position where
an error occurred.
Example:
```
p_context_t context;
p_context_init(&context, input, input_length);
size_t result = p_parse(&context);
if (p_parse(&context) == P_UNEXPECTED_TOKEN)
{
p_position_t error_position = p_position(&context);
fprintf(stderr, "Error: unexpected token at row %u column %u\n",
error_position.row + 1, error_position.col + 1);
}
```
### `p_user_terminate_code`
The `p_user_terminate_code()` function can be used to retrieve the user
terminate code after `p_parse()` returns a `P_USER_TERMINATED` value.
User terminate codes are arbitrary values that can be defined by the user to
be returned when the user requests to terminate parsing.
They have no particular meaning to Propane.
Example:
```
if (p_parse(&context) == P_USER_TERMINATED)
{
size_t user_terminate_code = p_user_terminate_code(&context);
}
```
### `p_token`
The `p_token()` function can be used to retrieve the current parse token.
This is useful after `p_parse()` returns a `P_UNEXPECTED_TOKEN` value.
terminate code after `p_parse()` returns a `P_USER_TERMINATED` value to
indicate what token the parser was not expecting.
Example:
```
if (p_parse(&context) == P_UNEXPECTED_TOKEN)
{
p_token_t unexpected_token = p_token(&context);
}
```
##> Data
### `p_token_names`
The `p_token_names` array contains the grammar-specified token names.
It is indexed by the token ID.
C example:
```
p_context_t context;
p_context_init(&context, input, input_length);
size_t result = p_parse(&context);
if (p_parse(&context) == P_UNEXPECTED_TOKEN)
{
p_position_t error_position = p_position(&context);
fprintf(stderr, "Error: unexpected token `%s' at row %u column %u\n",
p_token_names[context.token],
error_position.row + 1, error_position.col + 1);
}
```
#> License #> License
Propane is licensed under the terms of the MIT License: Propane is licensed under the terms of the MIT License:

View File

@ -0,0 +1 @@
au BufNewFile,BufRead *.propane set filetype=propane

View File

@ -0,0 +1,28 @@
" Vim syntax file for Propane
" Language: propane
" Maintainer: Josh Holtrop
" URL: https://github.com/holtrop/propane
if exists("b:current_syntax")
finish
endif
if !exists("b:propane_subtype")
let b:propane_subtype = "d"
endif
exe "syn include @propaneTarget syntax/".b:propane_subtype.".vim"
syn region propaneTarget matchgroup=propaneDelimiter start="<<" end=">>$" contains=@propaneTarget keepend
syn match propaneComment "#.*"
syn match propaneOperator "->"
syn keyword propaneKeyword ast ast_prefix ast_suffix drop module prefix ptype start token tokenid
syn region propaneRegex start="/" end="/" skip="\\/"
hi def link propaneComment Comment
hi def link propaneKeyword Keyword
hi def link propaneRegex String
hi def link propaneOperator Operator
hi def link propaneDelimiter Delimiter

View File

@ -31,10 +31,10 @@ class Propane
class << self class << self
def run(input_file, output_file, log_file) def run(input_file, output_file, log_file, options)
begin begin
grammar = Grammar.new(File.read(input_file)) grammar = Grammar.new(File.read(input_file))
generator = Generator.new(grammar, output_file, log_file) generator = Generator.new(grammar, output_file, log_file, options)
generator.generate generator.generate
rescue Error => e rescue Error => e
$stderr.puts e.message $stderr.puts e.message

View File

@ -4,15 +4,21 @@ class Propane
USAGE = <<EOF USAGE = <<EOF
Usage: #{$0} [options] <input-file> <output-file> Usage: #{$0} [options] <input-file> <output-file>
Options: Options:
--log LOG Write log file --log LOG Write log file. This will show all parser states and their
--version Show program version and exit associated shifts and reduces. It can be helpful when
-h, --help Show this usage and exit debugging a grammar.
--version Show program version and exit.
-h, --help Show this usage and exit.
-w Treat warnings as errors. This option will treat shift/reduce
conflicts as fatal errors and will print them to stderr in
addition to the log file.
EOF EOF
class << self class << self
def run(args) def run(args)
params = [] params = []
options = {}
log_file = nil log_file = nil
i = 0 i = 0
while i < args.size while i < args.size
@ -24,11 +30,13 @@ EOF
log_file = args[i] log_file = args[i]
end end
when "--version" when "--version"
puts "propane v#{VERSION}" puts "propane version #{VERSION}"
return 0 return 0
when "-h", "--help" when "-h", "--help"
puts USAGE puts USAGE
return 0 return 0
when "-w"
options[:warnings_as_errors] = true
when /^-/ when /^-/
$stderr.puts "Error: unknown option #{arg}" $stderr.puts "Error: unknown option #{arg}"
return 1 return 1
@ -45,7 +53,7 @@ EOF
$stderr.puts "Error: cannot read #{params[0]}" $stderr.puts "Error: cannot read #{params[0]}"
return 2 return 2
end end
Propane.run(*params, log_file) Propane.run(*params, log_file, options)
end end
end end

View File

@ -2,7 +2,7 @@ class Propane
class Generator class Generator
def initialize(grammar, output_file, log_file) def initialize(grammar, output_file, log_file, options)
@grammar = grammar @grammar = grammar
@output_file = output_file @output_file = output_file
if log_file if log_file
@ -16,6 +16,7 @@ class Propane
else else
"d" "d"
end end
@options = options
process_grammar! process_grammar!
end end
@ -51,6 +52,7 @@ class Propane
unless found_default unless found_default
raise Error.new("No patterns found for default mode") raise Error.new("No patterns found for default mode")
end end
check_ptypes!
# Add EOF token. # Add EOF token.
@grammar.tokens << Token.new("$EOF", nil, nil) @grammar.tokens << Token.new("$EOF", nil, nil)
tokens_by_name = {} tokens_by_name = {}
@ -66,11 +68,14 @@ class Propane
tokens_by_name[token.name] = token tokens_by_name[token.name] = token
end end
# Check for user start rule. # Check for user start rule.
unless @grammar.rules.find {|rule| rule.name == "Start"} unless @grammar.rules.find {|rule| rule.name == @grammar.start_rule}
raise Error.new("Start rule not found") raise Error.new("Start rule `#{@grammar.start_rule}` not found")
end end
# Add "real" start rule. # Add "real" start rule.
@grammar.rules.unshift(Rule.new("$Start", ["Start", "$EOF"], nil, nil, nil)) @grammar.rules.unshift(Rule.new("$Start", [@grammar.start_rule, "$EOF"], nil, nil, nil))
# Generate and add rules for optional components.
generate_optional_component_rules!(tokens_by_name)
# Build rule sets.
rule_sets = {} rule_sets = {}
rule_set_id = @grammar.tokens.size rule_set_id = @grammar.tokens.size
@grammar.rules.each_with_index do |rule, rule_id| @grammar.rules.each_with_index do |rule, rule_id|
@ -119,10 +124,55 @@ class Propane
end end
end end
determine_possibly_empty_rulesets!(rule_sets) determine_possibly_empty_rulesets!(rule_sets)
rule_sets.each do |name, rule_set|
rule_set.finalize(@grammar)
end
# Generate the lexer. # Generate the lexer.
@lexer = Lexer.new(@grammar) @lexer = Lexer.new(@grammar)
# Generate the parser. # Generate the parser.
@parser = Parser.new(@grammar, rule_sets, @log) @parser = Parser.new(@grammar, rule_sets, @log, @options)
end
# Check that any referenced ptypes have been defined.
def check_ptypes!
(@grammar.patterns + @grammar.tokens + @grammar.rules).each do |potor|
if potor.ptypename
unless @grammar.ptypes.include?(potor.ptypename)
raise Error.new("Error: Line #{potor.line_number}: ptype #{potor.ptypename} not declared. Declare with `ptype` statement.")
end
end
end
end
# Generate and add rules for any optional components.
def generate_optional_component_rules!(tokens_by_name)
optional_rules_added = Set.new
@grammar.rules.each do |rule|
rule.components.each do |component|
if component =~ /^(.*)\?$/
c = $1
unless optional_rules_added.include?(component)
# Create two rules for the optional component: one empty and
# one just matching the component.
# We need to find the ptypename for the optional component in
# order to copy it to the generated rules.
if tokens_by_name[c]
# The optional component is a token.
ptypename = tokens_by_name[c].ptypename
else
# The optional component must be a rule, so find any instance
# of that rule that specifies a ptypename.
ptypename = @grammar.rules.reduce(nil) do |result, rule|
rule.name == c && rule.ptypename ? rule.ptypename : result
end
end
@grammar.rules << Rule.new(component, [], nil, ptypename, rule.line_number)
@grammar.rules << Rule.new(component, [c], "$$ = $1;\n", ptypename, rule.line_number)
optional_rules_added << component
end
end
end
end
end end
# Determine which grammar rules could expand to empty sequences. # Determine which grammar rules could expand to empty sequences.
@ -198,10 +248,25 @@ class Propane
code = code.gsub(/\$token\(([$\w]+)\)/) do |match| code = code.gsub(/\$token\(([$\w]+)\)/) do |match|
"TOKEN_#{Token.code_name($1)}" "TOKEN_#{Token.code_name($1)}"
end end
code = code.gsub(/\$terminate\((.*)\);/) do |match|
user_terminate_code = $1
retval = rule ? "P_USER_TERMINATED" : "TERMINATE_TOKEN_ID"
case @language
when "c"
"context->user_terminate_code = (#{user_terminate_code}); return #{retval};"
when "d"
"context.user_terminate_code = (#{user_terminate_code}); return #{retval};"
end
end
if parser if parser
code = code.gsub(/\$\$/) do |match| code = code.gsub(/\$\$/) do |match|
case @language
when "c"
"_pvalue->v_#{rule.ptypename}"
when "d"
"_pvalue.v_#{rule.ptypename}" "_pvalue.v_#{rule.ptypename}"
end end
end
code = code.gsub(/\$(\d+)/) do |match| code = code.gsub(/\$(\d+)/) do |match|
index = $1.to_i index = $1.to_i
case @language case @language
@ -213,6 +278,14 @@ class Propane
end end
else else
code = code.gsub(/\$\$/) do |match| code = code.gsub(/\$\$/) do |match|
if @grammar.ast
case @language
when "c"
"out_token_info->pvalue"
when "d"
"out_token_info.pvalue"
end
else
case @language case @language
when "c" when "c"
"out_token_info->pvalue.v_#{pattern.ptypename}" "out_token_info->pvalue.v_#{pattern.ptypename}"
@ -220,6 +293,7 @@ class Propane
"out_token_info.pvalue.v_#{pattern.ptypename}" "out_token_info.pvalue.v_#{pattern.ptypename}"
end end
end end
end
code = code.gsub(/\$mode\(([a-zA-Z_][a-zA-Z_0-9]*)\)/) do |match| code = code.gsub(/\$mode\(([a-zA-Z_][a-zA-Z_0-9]*)\)/) do |match|
mode_name = $1 mode_name = $1
mode_id = @lexer.mode_id(mode_name) mode_id = @lexer.mode_id(mode_name)
@ -243,7 +317,7 @@ class Propane
# Start rule parser value type name and type string. # Start rule parser value type name and type string.
def start_rule_type def start_rule_type
start_rule = @grammar.rules.find do |rule| start_rule = @grammar.rules.find do |rule|
rule.name == "Start" rule.name == @grammar.start_rule
end end
[start_rule.ptypename, @grammar.ptypes[start_rule.ptypename]] [start_rule.ptypename, @grammar.ptypes[start_rule.ptypename]]
end end

View File

@ -5,9 +5,13 @@ class Propane
# Reserve identifiers beginning with a double-underscore for internal use. # Reserve identifiers beginning with a double-underscore for internal use.
IDENTIFIER_REGEX = /(?:[a-zA-Z]|_[a-zA-Z0-9])[a-zA-Z_0-9]*/ IDENTIFIER_REGEX = /(?:[a-zA-Z]|_[a-zA-Z0-9])[a-zA-Z_0-9]*/
attr_reader :ast
attr_reader :ast_prefix
attr_reader :ast_suffix
attr_reader :modulename attr_reader :modulename
attr_reader :patterns attr_reader :patterns
attr_reader :rules attr_reader :rules
attr_reader :start_rule
attr_reader :tokens attr_reader :tokens
attr_reader :code_blocks attr_reader :code_blocks
attr_reader :ptypes attr_reader :ptypes
@ -15,6 +19,7 @@ class Propane
def initialize(input) def initialize(input)
@patterns = [] @patterns = []
@start_rule = "Start"
@tokens = [] @tokens = []
@rules = [] @rules = []
@code_blocks = {} @code_blocks = {}
@ -24,6 +29,9 @@ class Propane
@input = input.gsub("\r\n", "\n") @input = input.gsub("\r\n", "\n")
@ptypes = {"default" => "void *"} @ptypes = {"default" => "void *"}
@prefix = "p_" @prefix = "p_"
@ast = false
@ast_prefix = ""
@ast_suffix = ""
parse_grammar! parse_grammar!
end end
@ -35,6 +43,10 @@ class Propane
@tokens.size @tokens.size
end end
def terminate_token_id
@tokens.size + 1
end
private private
def parse_grammar! def parse_grammar!
@ -47,9 +59,13 @@ class Propane
if parse_white_space! if parse_white_space!
elsif parse_comment_line! elsif parse_comment_line!
elsif @mode.nil? && parse_mode_label! elsif @mode.nil? && parse_mode_label!
elsif parse_ast_statement!
elsif parse_ast_prefix_statement!
elsif parse_ast_suffix_statement!
elsif parse_module_statement! elsif parse_module_statement!
elsif parse_ptype_statement! elsif parse_ptype_statement!
elsif parse_pattern_statement! elsif parse_pattern_statement!
elsif parse_start_statement!
elsif parse_token_statement! elsif parse_token_statement!
elsif parse_tokenid_statement! elsif parse_tokenid_statement!
elsif parse_drop_statement! elsif parse_drop_statement!
@ -78,6 +94,24 @@ class Propane
consume!(/#.*\n/) consume!(/#.*\n/)
end end
def parse_ast_statement!
if consume!(/ast\s*;/)
@ast = true
end
end
def parse_ast_prefix_statement!
if md = consume!(/ast_prefix\s+(\w+)\s*;/)
@ast_prefix = md[1]
end
end
def parse_ast_suffix_statement!
if md = consume!(/ast_suffix\s+(\w+)\s*;/)
@ast_suffix = md[1]
end
end
def parse_module_statement! def parse_module_statement!
if consume!(/module\s+/) if consume!(/module\s+/)
md = consume!(/([\w.]+)\s*/, "expected module name") md = consume!(/([\w.]+)\s*/, "expected module name")
@ -92,6 +126,9 @@ class Propane
if consume!(/ptype\s+/) if consume!(/ptype\s+/)
name = "default" name = "default"
if md = consume!(/(#{IDENTIFIER_REGEX})\s*=\s*/) if md = consume!(/(#{IDENTIFIER_REGEX})\s*=\s*/)
if @ast
raise Error.new("Multiple ptypes are unsupported in AST mode")
end
name = md[1] name = md[1]
end end
md = consume!(/([^;]+);/, "expected parser result type expression") md = consume!(/([^;]+);/, "expected parser result type expression")
@ -104,12 +141,15 @@ class Propane
md = consume!(/(#{IDENTIFIER_REGEX})\s*/, "expected token name") md = consume!(/(#{IDENTIFIER_REGEX})\s*/, "expected token name")
name = md[1] name = md[1]
if md = consume!(/\((#{IDENTIFIER_REGEX})\)\s*/) if md = consume!(/\((#{IDENTIFIER_REGEX})\)\s*/)
if @ast
raise Error.new("Multiple ptypes are unsupported in AST mode")
end
ptypename = md[1] ptypename = md[1]
end end
pattern = parse_pattern! || name pattern = parse_pattern! || name
consume!(/\s+/) consume!(/\s+/)
unless code = parse_code_block! unless code = parse_code_block!
consume!(/;/, "expected pattern or `;' or code block") consume!(/;/, "expected `;' or code block")
end end
token = Token.new(name, ptypename, @line_number) token = Token.new(name, ptypename, @line_number)
@tokens << token @tokens << token
@ -125,6 +165,9 @@ class Propane
md = consume!(/(#{IDENTIFIER_REGEX})\s*/, "expected token name") md = consume!(/(#{IDENTIFIER_REGEX})\s*/, "expected token name")
name = md[1] name = md[1]
if md = consume!(/\((#{IDENTIFIER_REGEX})\)\s*/) if md = consume!(/\((#{IDENTIFIER_REGEX})\)\s*/)
if @ast
raise Error.new("Multiple ptypes are unsupported in AST mode")
end
ptypename = md[1] ptypename = md[1]
end end
consume!(/;/, "expected `;'"); consume!(/;/, "expected `;'");
@ -152,10 +195,17 @@ class Propane
def parse_rule_statement! def parse_rule_statement!
if md = consume!(/(#{IDENTIFIER_REGEX})\s*(?:\((#{IDENTIFIER_REGEX})\))?\s*->\s*/) if md = consume!(/(#{IDENTIFIER_REGEX})\s*(?:\((#{IDENTIFIER_REGEX})\))?\s*->\s*/)
rule_name, ptypename = *md[1, 2] rule_name, ptypename = *md[1, 2]
md = consume!(/((?:#{IDENTIFIER_REGEX}\s*)*)\s*/, "expected rule component list") if @ast && ptypename
raise Error.new("Multiple ptypes are unsupported in AST mode")
end
md = consume!(/((?:#{IDENTIFIER_REGEX}\??\s*)*)\s*/, "expected rule component list")
components = md[1].strip.split(/\s+/) components = md[1].strip.split(/\s+/)
if @ast
consume!(/;/, "expected `;'")
else
unless code = parse_code_block! unless code = parse_code_block!
consume!(/;/, "expected pattern or `;' or code block") consume!(/;/, "expected `;' or code block")
end
end end
@rules << Rule.new(rule_name, components, code, ptypename, @line_number) @rules << Rule.new(rule_name, components, code, ptypename, @line_number)
@mode = nil @mode = nil
@ -167,6 +217,9 @@ class Propane
if pattern = parse_pattern! if pattern = parse_pattern!
consume!(/\s+/) consume!(/\s+/)
if md = consume!(/\((#{IDENTIFIER_REGEX})\)\s*/) if md = consume!(/\((#{IDENTIFIER_REGEX})\)\s*/)
if @ast
raise Error.new("Multiple ptypes are unsupported in AST mode")
end
ptypename = md[1] ptypename = md[1]
end end
unless code = parse_code_block! unless code = parse_code_block!
@ -178,9 +231,17 @@ class Propane
end end
end end
def parse_start_statement!
if md = consume!(/start\s+(\w+)\s*;/)
@start_rule = md[1]
end
end
def parse_code_block_statement! def parse_code_block_statement!
if md = consume!(/<<([a-z]*)\n(.*?)^>>\n/m) if md = consume!(/<<([a-z]*)(.*?)>>\n/m)
name, code = md[1..2] name, code = md[1..2]
code.sub!(/\A\n/, "")
code += "\n" unless code.end_with?("\n")
if @code_blocks[name] if @code_blocks[name]
@code_blocks[name] += code @code_blocks[name] += code
else else
@ -218,8 +279,11 @@ class Propane
end end
def parse_code_block! def parse_code_block!
if md = consume!(/<<\n(.*?)^>>\n/m) if md = consume!(/<<(.*?)>>\n/m)
md[1] code = md[1]
code.sub!(/\A\n/, "")
code += "\n" unless code.end_with?("\n")
code
end end
end end

View File

@ -7,12 +7,14 @@ class Propane
attr_reader :reduce_table attr_reader :reduce_table
attr_reader :rule_sets attr_reader :rule_sets
def initialize(grammar, rule_sets, log) def initialize(grammar, rule_sets, log, options)
@grammar = grammar @grammar = grammar
@rule_sets = rule_sets @rule_sets = rule_sets
@log = log @log = log
@item_sets = [] @item_sets = []
@item_sets_set = {} @item_sets_set = {}
@warnings = Set.new
@options = options
start_item = Item.new(grammar.rules.first, 0) start_item = Item.new(grammar.rules.first, 0)
eval_item_sets = Set[ItemSet.new([start_item])] eval_item_sets = Set[ItemSet.new([start_item])]
@ -23,10 +25,10 @@ class Propane
item_set.id = @item_sets.size item_set.id = @item_sets.size
@item_sets << item_set @item_sets << item_set
@item_sets_set[item_set] = item_set @item_sets_set[item_set] = item_set
item_set.following_symbols.each do |following_symbol| item_set.next_symbols.each do |next_symbol|
unless following_symbol.name == "$EOF" unless next_symbol.name == "$EOF"
following_set = item_set.build_following_item_set(following_symbol) next_item_set = item_set.build_next_item_set(next_symbol)
eval_item_sets << following_set eval_item_sets << next_item_set
end end
end end
end end
@ -37,8 +39,12 @@ class Propane
end end
build_reduce_actions! build_reduce_actions!
write_log! build_follow_sets!
build_tables! build_tables!
write_log!
if @warnings.size > 0 && @options[:warnings_as_errors]
raise Error.new("Fatal errors (-w):\n" + @warnings.join("\n"))
end
end end
private private
@ -48,27 +54,37 @@ class Propane
@shift_table = [] @shift_table = []
@reduce_table = [] @reduce_table = []
@item_sets.each do |item_set| @item_sets.each do |item_set|
shift_entries = item_set.following_symbols.map do |following_symbol| shift_entries = item_set.next_symbols.map do |next_symbol|
state_id = state_id =
if following_symbol.name == "$EOF" if next_symbol.name == "$EOF"
0 0
else else
item_set.following_item_set[following_symbol].id item_set.next_item_set[next_symbol].id
end end
{ {
symbol_id: following_symbol.id, symbol: next_symbol,
state_id: state_id, state_id: state_id,
} }
end end
if item_set.reduce_actions
shift_entries.each do |shift_entry|
token = shift_entry[:symbol]
if item_set.reduce_actions.include?(token)
rule = item_set.reduce_actions[token]
@warnings << "Shift/Reduce conflict (state #{item_set.id}) between token #{token.name} and rule #{rule.name} (defined on line #{rule.line_number})"
end
end
end
reduce_entries = reduce_entries =
case ra = item_set.reduce_actions if rule = item_set.reduce_rule
when Rule [{token_id: @grammar.invalid_token_id, rule_id: rule.id, rule: rule,
[{token_id: @grammar.invalid_token_id, rule_id: ra.id, rule_set_id: rule.rule_set.id, n_states: rule.components.size,
rule_set_id: ra.rule_set.id, n_states: ra.components.size}] propagate_optional_target: rule.optional? && rule.components.size == 1}]
when Hash elsif reduce_actions = item_set.reduce_actions
ra.map do |token, rule| reduce_actions.map do |token, rule|
{token_id: token.id, rule_id: rule.id, {token_id: token.id, rule_id: rule.id, rule: rule,
rule_set_id: rule.rule_set.id, n_states: rule.components.size} rule_set_id: rule.rule_set.id, n_states: rule.components.size,
propagate_optional_target: rule.optional? && rule.components.size == 1}
end end
else else
[] []
@ -85,11 +101,11 @@ class Propane
end end
def process_item_set(item_set) def process_item_set(item_set)
item_set.following_symbols.each do |following_symbol| item_set.next_symbols.each do |next_symbol|
unless following_symbol.name == "$EOF" unless next_symbol.name == "$EOF"
following_set = @item_sets_set[item_set.build_following_item_set(following_symbol)] next_item_set = @item_sets_set[item_set.build_next_item_set(next_symbol)]
item_set.following_item_set[following_symbol] = following_set item_set.next_item_set[next_symbol] = next_item_set
following_set.in_sets << item_set next_item_set.in_sets << item_set
end end
end end
end end
@ -108,10 +124,8 @@ class Propane
# @param item_set [ItemSet] # @param item_set [ItemSet]
# ItemSet (parser state) # ItemSet (parser state)
# #
# @return [nil, Rule, Hash] # @return [nil, Hash]
# If no reduce actions are possible for the given item set, nil. # If no reduce actions are possible for the given item set, nil.
# If only one reduce action is possible for the given item set, the Rule
# to reduce.
# Otherwise, a mapping of lookahead Tokens to the Rules to reduce. # Otherwise, a mapping of lookahead Tokens to the Rules to reduce.
def build_reduce_actions_for_item_set(item_set) def build_reduce_actions_for_item_set(item_set)
# To build the reduce actions, we start by looking at any # To build the reduce actions, we start by looking at any
@ -120,26 +134,38 @@ class Propane
# reduction in the current ItemSet. # reduction in the current ItemSet.
reduce_rules = Set.new(item_set.items.select(&:complete?).map(&:rule)) reduce_rules = Set.new(item_set.items.select(&:complete?).map(&:rule))
# If there are no rules to reduce for this ItemSet, we're done here. if reduce_rules.size == 1
return nil if reduce_rules.size == 0 item_set.reduce_rule = reduce_rules.first
end
# If there is exactly one rule to reduce for this ItemSet, then do not if reduce_rules.size == 0
# figure out the lookaheads; just reduce it. nil
return reduce_rules.first if reduce_rules.size == 1 else
build_lookahead_reduce_actions_for_item_set(item_set)
end
end
# Otherwise, we have more than one possible rule to reduce. # Build the reduce actions for a single item set (parser state).
#
# @param item_set [ItemSet]
# ItemSet (parser state)
#
# @return [Hash]
# Mapping of lookahead Tokens to the Rules to reduce.
def build_lookahead_reduce_actions_for_item_set(item_set)
reduce_rules = Set.new(item_set.items.select(&:complete?).map(&:rule))
# We will be looking for all possible tokens that can follow instances of # We will be looking for all possible tokens that can follow instances of
# these rules. Rather than looking through the entire grammar for the # these rules. Rather than looking through the entire grammar for the
# possible following tokens, we will only look in the item sets leading # possible following tokens, we will only look in the item sets leading
# up to this one. This restriction gives us a more precise lookahead set, # up to this one. This restriction gives us a more precise lookahead set,
# and allows us to parse LALR grammars. # and allows us to parse LALR grammars.
item_sets = item_set.leading_item_sets item_sets = Set[item_set] + item_set.leading_item_sets
reduce_rules.reduce({}) do |reduce_actions, reduce_rule| reduce_rules.reduce({}) do |reduce_actions, reduce_rule|
lookahead_tokens_for_rule = build_lookahead_tokens_to_reduce(reduce_rule, item_sets) lookahead_tokens_for_rule = build_lookahead_tokens_to_reduce(reduce_rule, item_sets)
lookahead_tokens_for_rule.each do |lookahead_token| lookahead_tokens_for_rule.each do |lookahead_token|
if existing_reduce_rule = reduce_actions[lookahead_token] if existing_reduce_rule = reduce_actions[lookahead_token]
raise Error.new("Error: reduce/reduce conflict between rule #{existing_reduce_rule.id} (#{existing_reduce_rule.name}) and rule #{reduce_rule.id} (#{reduce_rule.name})") raise Error.new("Error: reduce/reduce conflict (state #{item_set.id}) between rule #{existing_reduce_rule.name}##{existing_reduce_rule.id} (defined on line #{existing_reduce_rule.line_number}) and rule #{reduce_rule.name}##{reduce_rule.id} (defined on line #{reduce_rule.line_number})")
end end
reduce_actions[lookahead_token] = reduce_rule reduce_actions[lookahead_token] = reduce_rule
end end
@ -181,9 +207,9 @@ class Propane
# tokens to form the lookahead token set. # tokens to form the lookahead token set.
item_sets.each do |item_set| item_sets.each do |item_set|
item_set.items.each do |item| item_set.items.each do |item|
if item.following_symbol == rule_set if item.next_symbol == rule_set
(1..).each do |offset| (1..).each do |offset|
case symbol = item.following_symbol(offset) case symbol = item.next_symbol(offset)
when nil when nil
rule_set = item.rule.rule_set rule_set = item.rule.rule_set
unless checked_rule_sets.include?(rule_set) unless checked_rule_sets.include?(rule_set)
@ -207,6 +233,51 @@ class Propane
lookahead_tokens lookahead_tokens
end end
# Build the follow sets for each ItemSet.
#
# @return [void]
def build_follow_sets!
@item_sets.each do |item_set|
item_set.follow_set = build_follow_set_for_item_set(item_set)
end
end
# Build the follow set for the given ItemSet.
#
# @param item_set [ItemSet]
# The ItemSet to build the follow set for.
#
# @return [Set]
# Follow set for the given ItemSet.
def build_follow_set_for_item_set(item_set)
follow_set = Set.new
rule_sets_to_check_after = Set.new
item_set.items.each do |item|
(1..).each do |offset|
case symbol = item.next_symbol(offset)
when nil
rule_sets_to_check_after << item.rule.rule_set
break
when Token
follow_set << symbol
break
when RuleSet
follow_set += symbol.start_token_set
unless symbol.could_be_empty?
break
end
end
end
end
reduce_lookaheads = build_lookahead_reduce_actions_for_item_set(item_set)
reduce_lookaheads.each do |token, rule_set|
if rule_sets_to_check_after.include?(rule_set)
follow_set << token
end
end
follow_set
end
def write_log! def write_log!
@log.puts Util.banner("Parser Rules") @log.puts Util.banner("Parser Rules")
@grammar.rules.each do |rule| @grammar.rules.each do |rule|
@ -240,20 +311,26 @@ class Propane
@log.puts @log.puts
@log.puts " Incoming states: #{incoming_ids.join(", ")}" @log.puts " Incoming states: #{incoming_ids.join(", ")}"
@log.puts " Outgoing states:" @log.puts " Outgoing states:"
item_set.following_item_set.each do |following_symbol, following_item_set| item_set.next_item_set.each do |next_symbol, next_item_set|
@log.puts " #{following_symbol.name} => #{following_item_set.id}" @log.puts " #{next_symbol.name} => #{next_item_set.id}"
end end
@log.puts @log.puts
@log.puts " Reduce actions:" @log.puts " Reduce actions:"
case item_set.reduce_actions if item_set.reduce_rule
when Rule @log.puts " * => rule #{item_set.reduce_rule.id}, rule set #{@rule_sets[item_set.reduce_rule.name].id} (#{item_set.reduce_rule.name})"
@log.puts " * => rule #{item_set.reduce_actions.id}, rule set #{@rule_sets[item_set.reduce_actions.name].id} (#{item_set.reduce_actions.name})" elsif item_set.reduce_actions
when Hash
item_set.reduce_actions.each do |token, rule| item_set.reduce_actions.each do |token, rule|
@log.puts " lookahead #{token.name} => #{rule.name} (#{rule.id}), rule set ##{rule.rule_set.id}" @log.puts " lookahead #{token.name} => #{rule.name} (#{rule.id}), rule set ##{rule.rule_set.id}"
end end
end end
end end
if @warnings.size > 0
@log.puts
@log.puts "Warnings:"
@warnings.each do |warning|
@log.puts " #{warning}"
end
end
end end
end end

View File

@ -56,7 +56,7 @@ class Propane
# Return the set of Items obtained by "closing" the current item. # Return the set of Items obtained by "closing" the current item.
# #
# If the following symbol for the current item is another Rule name, then # If the next symbol for the current item is another Rule name, then
# this method will return all Items for that Rule with a position of 0. # this method will return all Items for that Rule with a position of 0.
# Otherwise, an empty Array is returned. # Otherwise, an empty Array is returned.
# #
@ -81,17 +81,17 @@ class Propane
@position == @rule.components.size @position == @rule.components.size
end end
# Get the following symbol for the Item. # Get the next symbol for the Item.
# #
# That is, the symbol which follows the parse position marker in the # That is, the symbol which is after the parse position marker in the
# current Item. # current Item.
# #
# @param offset [Integer] # @param offset [Integer]
# Offset from current parse position to examine. # Offset from current parse position to examine.
# #
# @return [Token, RuleSet, nil] # @return [Token, RuleSet, nil]
# Following symbol for the Item. # Next symbol for the Item.
def following_symbol(offset = 0) def next_symbol(offset = 0)
@rule.components[@position + offset] @rule.components[@position + offset]
end end
@ -108,25 +108,25 @@ class Propane
end end
end end
# Get whether this Item is followed by the provided symbol. # Get whether this Item's next symbol is the given symbol.
# #
# @param symbol [Token, RuleSet] # @param symbol [Token, RuleSet]
# Symbol to query. # Symbol to query.
# #
# @return [Boolean] # @return [Boolean]
# Whether this Item is followed by the provided symbol. # Whether this Item's next symbol is the given symbol.
def followed_by?(symbol) def next_symbol?(symbol)
following_symbol == symbol next_symbol == symbol
end end
# Get the following item for this Item. # Get the next item for this Item.
# #
# That is, the Item formed by moving the parse position marker one place # That is, the Item formed by moving the parse position marker one place
# forward from its position in this Item. # forward from its position in this Item.
# #
# @return [Item] # @return [Item]
# The following item for this Item. # The next item for this Item.
def following_item def next_item
Item.new(@rule, @position + 1) Item.new(@rule, @position + 1)
end end

View File

@ -14,45 +14,54 @@ class Propane
attr_accessor :id attr_accessor :id
# @return [Hash] # @return [Hash]
# Maps a following symbol to its ItemSet. # Maps a next symbol to its ItemSet.
attr_reader :following_item_set attr_reader :next_item_set
# @return [Set<ItemSet>] # @return [Set<ItemSet>]
# ItemSets leading to this item set. # ItemSets leading to this item set.
attr_reader :in_sets attr_reader :in_sets
# @return [nil, Rule, Hash] # @return [nil, Rule]
# Reduce actions, mapping lookahead tokens to rules. # Rule to reduce if there is only one possibility.
attr_accessor :reduce_rule
# @return [nil, Hash]
# Reduce actions, mapping lookahead tokens to rules, if there is
# more than one rule that could be reduced.
attr_accessor :reduce_actions attr_accessor :reduce_actions
# @return [Set<Token>]
# Follow set for the ItemSet.
attr_accessor :follow_set
# Build an ItemSet. # Build an ItemSet.
# #
# @param items [Array<Item>] # @param items [Array<Item>]
# Items in this ItemSet. # Items in this ItemSet.
def initialize(items) def initialize(items)
@items = Set.new(items) @items = Set.new(items)
@following_item_set = {} @next_item_set = {}
@in_sets = Set.new @in_sets = Set.new
close! close!
end end
# Get the set of following symbols for all Items in this ItemSet. # Get the set of next symbols for all Items in this ItemSet.
# #
# @return [Set<Token, RuleSet>] # @return [Set<Token, RuleSet>]
# Set of following symbols for all Items in this ItemSet. # Set of next symbols for all Items in this ItemSet.
def following_symbols def next_symbols
Set.new(@items.map(&:following_symbol).compact) Set.new(@items.map(&:next_symbol).compact)
end end
# Build a following ItemSet for the given following symbol. # Build a next ItemSet for the given next symbol.
# #
# @param symbol [Token, RuleSet] # @param symbol [Token, RuleSet]
# Following symbol to build the following ItemSet for. # Next symbol to build the next ItemSet for.
# #
# @return [ItemSet] # @return [ItemSet]
# Following ItemSet for the given following symbol. # Next ItemSet for the given next symbol.
def build_following_item_set(symbol) def build_next_item_set(symbol)
ItemSet.new(items_followed_by(symbol).map(&:following_item)) ItemSet.new(items_with_next(symbol).map(&:next_item))
end end
# Hash function. # Hash function.
@ -87,15 +96,25 @@ class Propane
# Set of ItemSets that lead to this ItemSet. # Set of ItemSets that lead to this ItemSet.
# #
# This set includes this ItemSet.
#
# @return [Set<ItemSet>] # @return [Set<ItemSet>]
# Set of all ItemSets that lead up to this ItemSet. # Set of all ItemSets that lead up to this ItemSet.
def leading_item_sets def leading_item_sets
@in_sets.reduce(Set[self]) do |result, item_set| result = Set.new
result + item_set.leading_item_sets eval_sets = Set[self]
evaled = Set.new
while eval_sets.size > 0
eval_set = eval_sets.first
eval_sets.delete(eval_set)
evaled << eval_set
eval_set.in_sets.each do |in_set|
result << in_set
unless evaled.include?(in_set)
eval_sets << in_set
end end
end end
end
result
end
# Represent the ItemSet as a String. # Represent the ItemSet as a String.
# #
@ -127,16 +146,16 @@ class Propane
end end
end end
# Get the Items followed by the given following symbol. # Get the Items with the given next symbol.
# #
# @param symbol [Token, RuleSet] # @param symbol [Token, RuleSet]
# Following symbol. # Next symbol.
# #
# @return [Array<Item>] # @return [Array<Item>]
# Items followed by the given following symbol. # Items with the given next symbol.
def items_followed_by(symbol) def items_with_next(symbol)
@items.select do |item| @items.select do |item|
item.followed_by?(symbol) item.next_symbol?(symbol)
end end
end end

View File

@ -134,8 +134,18 @@ class Propane
else else
c = @pattern.slice!(0) c = @pattern.slice!(0)
case c case c
when "a"
CharacterRangeUnit.new("\a", "\a")
when "b"
CharacterRangeUnit.new("\b", "\b")
when "d" when "d"
CharacterRangeUnit.new("0", "9") CharacterRangeUnit.new("0", "9")
when "f"
CharacterRangeUnit.new("\f", "\f")
when "n"
CharacterRangeUnit.new("\n", "\n")
when "r"
CharacterRangeUnit.new("\r", "\r")
when "s" when "s"
ccu = CharacterClassUnit.new ccu = CharacterClassUnit.new
ccu << CharacterRangeUnit.new(" ") ccu << CharacterRangeUnit.new(" ")
@ -145,6 +155,10 @@ class Propane
ccu << CharacterRangeUnit.new("\f") ccu << CharacterRangeUnit.new("\f")
ccu << CharacterRangeUnit.new("\v") ccu << CharacterRangeUnit.new("\v")
ccu ccu
when "t"
CharacterRangeUnit.new("\t", "\t")
when "v"
CharacterRangeUnit.new("\v", "\v")
else else
CharacterRangeUnit.new(c) CharacterRangeUnit.new(c)
end end

View File

@ -30,6 +30,11 @@ class Propane
# The RuleSet that this Rule is a part of. # The RuleSet that this Rule is a part of.
attr_accessor :rule_set attr_accessor :rule_set
# @return [Array<Integer>]
# Map this rule's components to their positions in the parent RuleSet's
# node field pointer array. This is used for AST construction.
attr_accessor :rule_set_node_field_index_map
# Construct a Rule. # Construct a Rule.
# #
# @param name [String] # @param name [String]
@ -45,6 +50,7 @@ class Propane
def initialize(name, components, code, ptypename, line_number) def initialize(name, components, code, ptypename, line_number)
@name = name @name = name
@components = components @components = components
@rule_set_node_field_index_map = components.map {0}
@code = code @code = code
@ptypename = ptypename @ptypename = ptypename
@line_number = line_number @line_number = line_number
@ -60,6 +66,14 @@ class Propane
@components.empty? @components.empty?
end end
# Return whether this is an optional Rule.
#
# @return [Boolean]
# Whether this is an optional Rule.
def optional?
@name.end_with?("?")
end
# Represent the Rule as a String. # Represent the Rule as a String.
# #
# @return [String] # @return [String]
@ -68,6 +82,17 @@ class Propane
"#{@name} -> #{@components.map(&:name).join(" ")}" "#{@name} -> #{@components.map(&:name).join(" ")}"
end end
# Check whether the rule set node field index map is just a 1:1 mapping.
#
# @return [Boolean]
# Boolean indicating whether the rule set node field index map is just a
# 1:1 mapping.
def flat_rule_set_node_field_index_map?
@rule_set_node_field_index_map.each_with_index.all? do |v, i|
v == i
end
end
end end
end end

View File

@ -1,7 +1,12 @@
class Propane class Propane
# A RuleSet collects all grammar rules of the same name.
class RuleSet class RuleSet
# @return [Array<Hash>]
# AST fields.
attr_reader :ast_fields
# @return [Integer] # @return [Integer]
# ID of the RuleSet. # ID of the RuleSet.
attr_reader :id attr_reader :id
@ -51,6 +56,24 @@ class Propane
@could_be_empty @could_be_empty
end end
# Return whether this is an optional RuleSet.
#
# @return [Boolean]
# Whether this is an optional RuleSet.
def optional?
@name.end_with?("?")
end
# For optional rule sets, return the underlying component that is optional.
def option_target
@rules.each do |rule|
if rule.components.size > 0
return rule.components[0]
end
end
raise "Optional rule target not found"
end
# Build the start token set for the RuleSet. # Build the start token set for the RuleSet.
# #
# @return [Set<Token>] # @return [Set<Token>]
@ -75,6 +98,58 @@ class Propane
@_start_token_set @_start_token_set
end end
# Finalize a RuleSet after adding all Rules to it.
def finalize(grammar)
build_ast_fields(grammar)
end
private
# Build the set of AST fields for this RuleSet.
#
# This is an Array of Hashes. Each entry in the Array corresponds to a
# field location in the AST node. The entry is a Hash. It could have one or
# two keys. It will always have the field name with a positional suffix as
# a key. It may also have the field name without the positional suffix if
# that field only exists in one position across all Rules in the RuleSet.
#
# @return [void]
def build_ast_fields(grammar)
field_ast_node_indexes = {}
field_indexes_across_all_rules = {}
@ast_fields = []
@rules.each do |rule|
rule.components.each_with_index do |component, i|
if component.is_a?(RuleSet) && component.optional?
component = component.option_target
end
if component.is_a?(Token)
node_name = "Token"
else
node_name = component.name
end
struct_name = "#{grammar.ast_prefix}#{node_name}#{grammar.ast_suffix}"
field_name = "p#{node_name}#{i + 1}"
unless field_ast_node_indexes[field_name]
field_ast_node_indexes[field_name] = @ast_fields.size
@ast_fields << {field_name => struct_name}
end
field_indexes_across_all_rules[node_name] ||= Set.new
field_indexes_across_all_rules[node_name] << field_ast_node_indexes[field_name]
rule.rule_set_node_field_index_map[i] = field_ast_node_indexes[field_name]
end
end
field_indexes_across_all_rules.each do |node_name, indexes_across_all_rules|
if indexes_across_all_rules.size == 1
# If this field was only seen in one position across all rules,
# then add an alias to the positional field name that does not
# include the position.
@ast_fields[indexes_across_all_rules.first]["p#{node_name}"] =
"#{grammar.ast_prefix}#{node_name}#{grammar.ast_suffix}"
end
end
end
end end
end end

View File

@ -1,3 +1,3 @@
class Propane class Propane
VERSION = "1.0.0" VERSION = "1.4.0"
end end

View File

@ -13,7 +13,7 @@ describe Propane do
File.write("spec/run/testparser#{options[:name]}.propane", grammar) File.write("spec/run/testparser#{options[:name]}.propane", grammar)
end end
def build_parser(options = {}) def run_propane(options = {})
@statics[:build_test_id] ||= 0 @statics[:build_test_id] ||= 0
@statics[:build_test_id] += 1 @statics[:build_test_id] += 1
if ENV["dist_specs"] if ENV["dist_specs"]
@ -49,7 +49,12 @@ ENV["TERM"] = nil
EOF EOF
end end
end end
if options[:args]
command += options[:args]
else
command += %W[spec/run/testparser#{options[:name]}.propane spec/run/testparser#{options[:name]}.#{options[:language]} --log spec/run/testparser#{options[:name]}.log] command += %W[spec/run/testparser#{options[:name]}.propane spec/run/testparser#{options[:name]}.#{options[:language]} --log spec/run/testparser#{options[:name]}.log]
end
command += (options[:extra_args] || [])
if (options[:capture]) if (options[:capture])
stdout, stderr, status = Open3.capture3(*command) stdout, stderr, status = Open3.capture3(*command)
Results.new(stdout, stderr, status) Results.new(stdout, stderr, status)
@ -74,7 +79,7 @@ EOF
expect(result).to be_truthy expect(result).to be_truthy
end end
def run def run_test
stdout, stderr, status = Open3.capture3("spec/run/testparser") stdout, stderr, status = Open3.capture3("spec/run/testparser")
File.binwrite("spec/run/.stderr", stderr) File.binwrite("spec/run/.stderr", stderr)
File.binwrite("spec/run/.stdout", stdout) File.binwrite("spec/run/.stdout", stdout)
@ -112,6 +117,102 @@ EOF
FileUtils.mkdir_p("spec/run") FileUtils.mkdir_p("spec/run")
end end
it "reports its version" do
results = run_propane(args: %w[--version], capture: true)
expect(results.stdout).to match /propane version \d+\.\d+/
expect(results.stderr).to eq ""
expect(results.status).to eq 0
end
it "shows help usage" do
results = run_propane(args: %w[-h], capture: true)
expect(results.stdout).to match /Usage/i
expect(results.stderr).to eq ""
expect(results.status).to eq 0
end
it "errors with unknown option" do
results = run_propane(args: %w[-i], capture: true)
expect(results.stderr).to match /Error: unknown option -i/
expect(results.status).to_not eq 0
end
it "errors when input and output files are not specified" do
results = run_propane(args: [], capture: true)
expect(results.stderr).to match /Error: specify input and output files/
expect(results.status).to_not eq 0
end
it "errors when input file is not readable" do
results = run_propane(args: %w[nope.txt out.d], capture: true)
expect(results.stderr).to match /Error: cannot read nope.txt/
expect(results.status).to_not eq 0
end
it "raises an error when a pattern referenced ptype has not been defined" do
write_grammar <<EOF
ptype yes = int;
/foo/ (yes) <<
>>
/bar/ (no) <<
>>
EOF
results = run_propane(capture: true)
expect(results.stderr).to match /Error: Line 4: ptype no not declared\. Declare with `ptype` statement\./
expect(results.status).to_not eq 0
end
it "raises an error when a token referenced ptype has not been defined" do
write_grammar <<EOF
ptype yes = int;
token foo (yes);
token bar (no);
EOF
results = run_propane(capture: true)
expect(results.stderr).to match /Error: Line 3: ptype no not declared\. Declare with `ptype` statement\./
expect(results.status).to_not eq 0
end
it "raises an error when a rule referenced ptype has not been defined" do
write_grammar <<EOF
ptype yes = int;
token xyz;
foo (yes) -> bar;
bar (no) -> xyz;
EOF
results = run_propane(capture: true)
expect(results.stderr).to match /Error: Line 4: ptype no not declared\. Declare with `ptype` statement\./
expect(results.status).to_not eq 0
end
it "warns on shift/reduce conflicts" do
write_grammar <<EOF
token a;
token b;
Start -> As? b?;
As -> a As2?;
As2 -> b a As2?;
EOF
results = run_propane(capture: true)
expect(results.stderr).to eq ""
expect(results.status).to eq 0
expect(File.binread("spec/run/testparser.log")).to match %r{Shift/Reduce conflict \(state \d+\) between token b and rule As2\? \(defined on line 4\)}
end
it "errors on shift/reduce conflicts with -w" do
write_grammar <<EOF
token a;
token b;
Start -> As? b?;
As -> a As2?;
As2 -> b a As2?;
EOF
results = run_propane(extra_args: %w[-w], capture: true)
expect(results.stderr).to match %r{Shift/Reduce conflict \(state \d+\) between token b and rule As2\? \(defined on line 4\)}m
expect(results.status).to_not eq 0
expect(File.binread("spec/run/testparser.log")).to match %r{Shift/Reduce conflict \(state \d+\) between token b and rule As2\? \(defined on line 4\)}
end
%w[d c].each do |language| %w[d c].each do |language|
context "#{language.upcase} language" do context "#{language.upcase} language" do
@ -123,14 +224,12 @@ token plus /\\+/;
token times /\\*/; token times /\\*/;
drop /\\s+/; drop /\\s+/;
Start -> Foo; Start -> Foo;
Foo -> int << Foo -> int <<>>
>> Foo -> plus <<>>
Foo -> plus <<
>>
EOF EOF
build_parser(language: language) run_propane(language: language)
compile("spec/test_lexer.#{language}", language: language) compile("spec/test_lexer.#{language}", language: language)
results = run results = run_test
expect(results.stderr).to eq "" expect(results.stderr).to eq ""
expect(results.status).to eq 0 expect(results.status).to eq 0
end end
@ -149,9 +248,7 @@ token int /\\d+/ <<
} }
$$ = v; $$ = v;
>> >>
Start -> int << Start -> int << $$ = $1; >>
$$ = $1;
>>
EOF EOF
when "d" when "d"
write_grammar <<EOF write_grammar <<EOF
@ -165,14 +262,12 @@ token int /\\d+/ <<
} }
$$ = v; $$ = v;
>> >>
Start -> int << Start -> int << $$ = $1; >>
$$ = $1;
>>
EOF EOF
end end
build_parser(language: language) run_propane(language: language)
compile("spec/test_lexer_unknown_character.#{language}", language: language) compile("spec/test_lexer_unknown_character.#{language}", language: language)
results = run results = run_test
expect(results.stderr).to eq "" expect(results.stderr).to eq ""
expect(results.status).to eq 0 expect(results.status).to eq 0
end end
@ -190,7 +285,7 @@ E -> B;
B -> zero; B -> zero;
B -> one; B -> one;
EOF EOF
build_parser(language: language) run_propane(language: language)
end end
it "generates a parser that does basic math - user guide example" do it "generates a parser that does basic math - user guide example" do
@ -219,33 +314,15 @@ token lparen /\\(/;
token rparen /\\)/; token rparen /\\)/;
drop /\\s+/; drop /\\s+/;
Start -> E1 << Start -> E1 << $$ = $1; >>
$$ = $1; E1 -> E2 << $$ = $1; >>
>> E1 -> E1 plus E2 << $$ = $1 + $3; >>
E1 -> E2 << E2 -> E3 << $$ = $1; >>
$$ = $1; E2 -> E2 times E3 << $$ = $1 * $3; >>
>> E3 -> E4 << $$ = $1; >>
E1 -> E1 plus E2 << E3 -> E3 power E4 << $$ = (size_t)pow($1, $3); >>
$$ = $1 + $3; E4 -> integer << $$ = $1; >>
>> E4 -> lparen E1 rparen << $$ = $2; >>
E2 -> E3 <<
$$ = $1;
>>
E2 -> E2 times E3 <<
$$ = $1 * $3;
>>
E3 -> E4 <<
$$ = $1;
>>
E3 -> E3 power E4 <<
$$ = (size_t)pow($1, $3);
>>
E4 -> integer <<
$$ = $1;
>>
E4 -> lparen E1 rparen <<
$$ = $2;
>>
EOF EOF
when "d" when "d"
write_grammar <<EOF write_grammar <<EOF
@ -271,38 +348,20 @@ token lparen /\\(/;
token rparen /\\)/; token rparen /\\)/;
drop /\\s+/; drop /\\s+/;
Start -> E1 << Start -> E1 << $$ = $1; >>
$$ = $1; E1 -> E2 << $$ = $1; >>
>> E1 -> E1 plus E2 << $$ = $1 + $3; >>
E1 -> E2 << E2 -> E3 << $$ = $1; >>
$$ = $1; E2 -> E2 times E3 << $$ = $1 * $3; >>
>> E3 -> E4 << $$ = $1; >>
E1 -> E1 plus E2 << E3 -> E3 power E4 << $$ = pow($1, $3); >>
$$ = $1 + $3; E4 -> integer << $$ = $1; >>
>> E4 -> lparen E1 rparen << $$ = $2; >>
E2 -> E3 <<
$$ = $1;
>>
E2 -> E2 times E3 <<
$$ = $1 * $3;
>>
E3 -> E4 <<
$$ = $1;
>>
E3 -> E3 power E4 <<
$$ = pow($1, $3);
>>
E4 -> integer <<
$$ = $1;
>>
E4 -> lparen E1 rparen <<
$$ = $2;
>>
EOF EOF
end end
build_parser(language: language) run_propane(language: language)
compile("spec/test_basic_math_grammar.#{language}", language: language) compile("spec/test_basic_math_grammar.#{language}", language: language)
results = run results = run_test
expect(results.stderr).to eq "" expect(results.stderr).to eq ""
expect(results.status).to eq 0 expect(results.status).to eq 0
end end
@ -314,7 +373,7 @@ Start -> E;
E -> one E; E -> one E;
E -> one; E -> one;
EOF EOF
build_parser(language: language) run_propane(language: language)
end end
it "distinguishes between multiple identical rules with lookahead symbol" do it "distinguishes between multiple identical rules with lookahead symbol" do
@ -326,9 +385,9 @@ Start -> R2 b;
R1 -> a b; R1 -> a b;
R2 -> a b; R2 -> a b;
EOF EOF
build_parser(language: language) run_propane(language: language)
compile("spec/test_parser_identical_rules_lookahead.#{language}", language: language) compile("spec/test_parser_identical_rules_lookahead.#{language}", language: language)
results = run results = run_test
expect(results.status).to eq 0 expect(results.status).to eq 0
end end
@ -341,9 +400,9 @@ Start -> a R1;
Start -> b R1; Start -> b R1;
R1 -> b; R1 -> b;
EOF EOF
build_parser(language: language) run_propane(language: language)
compile("spec/test_parser_rule_from_multiple_states.#{language}", language: language) compile("spec/test_parser_rule_from_multiple_states.#{language}", language: language)
results = run results = run_test
expect(results.status).to eq 0 expect(results.status).to eq 0
end end
@ -376,9 +435,9 @@ Abcs -> ;
Abcs -> abc Abcs; Abcs -> abc Abcs;
EOF EOF
end end
build_parser(language: language) run_propane(language: language)
compile("spec/test_user_code.#{language}", language: language) compile("spec/test_user_code.#{language}", language: language)
results = run results = run_test
expect(results.status).to eq 0 expect(results.status).to eq 0
verify_lines(results.stdout, [ verify_lines(results.stdout, [
"abc!", "abc!",
@ -408,15 +467,13 @@ EOF
import std.stdio; import std.stdio;
>> >>
token abc; token abc;
/def/ << /def/ << writeln("def!"); >>
writeln("def!");
>>
Start -> abc; Start -> abc;
EOF EOF
end end
build_parser(language: language) run_propane(language: language)
compile("spec/test_pattern.#{language}", language: language) compile("spec/test_pattern.#{language}", language: language)
results = run results = run_test
expect(results.status).to eq 0 expect(results.status).to eq 0
verify_lines(results.stdout, [ verify_lines(results.stdout, [
"def!", "def!",
@ -435,9 +492,7 @@ EOF
#include <stdio.h> #include <stdio.h>
>> >>
token abc; token abc;
/def/ << /def/ << printf("def!\\n"); >>
printf("def!\\n");
>>
/ghi/ << /ghi/ <<
printf("ghi!\\n"); printf("ghi!\\n");
return $token(abc); return $token(abc);
@ -450,9 +505,7 @@ EOF
import std.stdio; import std.stdio;
>> >>
token abc; token abc;
/def/ << /def/ << writeln("def!"); >>
writeln("def!");
>>
/ghi/ << /ghi/ <<
writeln("ghi!"); writeln("ghi!");
return $token(abc); return $token(abc);
@ -460,9 +513,9 @@ token abc;
Start -> abc; Start -> abc;
EOF EOF
end end
build_parser(language: language) run_propane(language: language)
compile("spec/test_return_token_from_pattern.#{language}", language: language) compile("spec/test_return_token_from_pattern.#{language}", language: language)
results = run results = run_test
expect(results.status).to eq 0 expect(results.status).to eq 0
verify_lines(results.stdout, [ verify_lines(results.stdout, [
"def!", "def!",
@ -518,9 +571,9 @@ string: /"/ <<
Start -> abc string def; Start -> abc string def;
EOF EOF
end end
build_parser(language: language) run_propane(language: language)
compile("spec/test_lexer_modes.#{language}", language: language) compile("spec/test_lexer_modes.#{language}", language: language)
results = run results = run_test
expect(results.status).to eq 0 expect(results.status).to eq 0
verify_lines(results.stdout, [ verify_lines(results.stdout, [
"begin string mode", "begin string mode",
@ -541,15 +594,9 @@ EOF
>> >>
token a; token a;
token b; token b;
Start -> A B << Start -> A B << printf("Start!\\n"); >>
printf("Start!\\n"); A -> a << printf("A!\\n"); >>
>> B -> b << printf("B!\\n"); >>
A -> a <<
printf("A!\\n");
>>
B -> b <<
printf("B!\\n");
>>
EOF EOF
when "d" when "d"
write_grammar <<EOF write_grammar <<EOF
@ -558,20 +605,14 @@ import std.stdio;
>> >>
token a; token a;
token b; token b;
Start -> A B << Start -> A B << writeln("Start!"); >>
writeln("Start!"); A -> a << writeln("A!"); >>
>> B -> b << writeln("B!"); >>
A -> a <<
writeln("A!");
>>
B -> b <<
writeln("B!");
>>
EOF EOF
end end
build_parser(language: language) run_propane(language: language)
compile("spec/test_parser_rule_user_code.#{language}", language: language) compile("spec/test_parser_rule_user_code.#{language}", language: language)
results = run results = run_test
expect(results.status).to eq 0 expect(results.status).to eq 0
verify_lines(results.stdout, [ verify_lines(results.stdout, [
"A!", "A!",
@ -584,19 +625,13 @@ EOF
write_grammar <<EOF write_grammar <<EOF
ptype #{language == "c" ? "uint32_t" : "uint"}; ptype #{language == "c" ? "uint32_t" : "uint"};
token a; token a;
Start -> As << Start -> As << $$ = $1; >>
$$ = $1; As -> << $$ = 0u; >>
>> As -> As a << $$ = $1 + 1u; >>
As -> <<
$$ = 0u;
>>
As -> As a <<
$$ = $1 + 1u;
>>
EOF EOF
build_parser(language: language) run_propane(language: language)
compile("spec/test_parsing_lists.#{language}", language: language) compile("spec/test_parsing_lists.#{language}", language: language)
results = run results = run_test
expect(results.status).to eq 0 expect(results.status).to eq 0
expect(results.stderr).to eq "" expect(results.stderr).to eq ""
end end
@ -615,9 +650,9 @@ Start -> b E d;
E -> e; E -> e;
F -> e; F -> e;
EOF EOF
results = build_parser(capture: true, language: language) results = run_propane(capture: true, language: language)
expect(results.status).to_not eq 0 expect(results.status).to_not eq 0
expect(results.stderr).to match %r{reduce/reduce conflict.*\(E\).*\(F\)} expect(results.stderr).to match %r{Error: reduce/reduce conflict \(state \d+\) between rule E#\d+ \(defined on line 10\) and rule F#\d+ \(defined on line 11\)}
end end
it "provides matched text to user code blocks" do it "provides matched text to user code blocks" do
@ -647,9 +682,9 @@ token id /[a-zA-Z_][a-zA-Z0-9_]*/ <<
Start -> id; Start -> id;
EOF EOF
end end
build_parser(language: language) run_propane(language: language)
compile("spec/test_lexer_match_text.#{language}", language: language) compile("spec/test_lexer_match_text.#{language}", language: language)
results = run results = run_test
expect(results.status).to eq 0 expect(results.status).to eq 0
verify_lines(results.stdout, [ verify_lines(results.stdout, [
"Matched token is identifier_123", "Matched token is identifier_123",
@ -680,9 +715,9 @@ Start -> word <<
>> >>
EOF EOF
end end
build_parser(language: language) run_propane(language: language)
compile("spec/test_lexer_result_value.#{language}", language: language) compile("spec/test_lexer_result_value.#{language}", language: language)
results = run results = run_test
expect(results.stderr).to eq "" expect(results.stderr).to eq ""
expect(results.status).to eq 0 expect(results.status).to eq 0
end end
@ -695,16 +730,16 @@ drop /\\s+/;
Start -> a num Start; Start -> a num Start;
Start -> a num; Start -> a num;
EOF EOF
build_parser(language: language) run_propane(language: language)
compile("spec/test_error_positions.#{language}", language: language) compile("spec/test_error_positions.#{language}", language: language)
results = run results = run_test
expect(results.stderr).to eq "" expect(results.stderr).to eq ""
expect(results.status).to eq 0 expect(results.status).to eq 0
end end
it "allows creating a JSON parser" do it "allows creating a JSON parser" do
write_grammar(File.read("spec/json_parser.#{language}.propane")) write_grammar(File.read("spec/json_parser.#{language}.propane"))
build_parser(language: language) run_propane(language: language)
compile(["spec/test_parsing_json.#{language}", "spec/json_types.#{language}"], language: language) compile(["spec/test_parsing_json.#{language}", "spec/json_types.#{language}"], language: language)
end end
@ -716,16 +751,352 @@ token num /\\d+/;
drop /\\s+/; drop /\\s+/;
Start -> a num; Start -> a num;
EOF EOF
build_parser(name: "myp1", language: language) run_propane(name: "myp1", language: language)
write_grammar(<<EOF, name: "myp2") write_grammar(<<EOF, name: "myp2")
prefix myp2_; prefix myp2_;
token b; token b;
token c; token c;
Start -> b c b; Start -> b c b;
EOF EOF
build_parser(name: "myp2", language: language) run_propane(name: "myp2", language: language)
compile("spec/test_multiple_parsers.#{language}", parsers: %w[myp1 myp2], language: language) compile("spec/test_multiple_parsers.#{language}", parsers: %w[myp1 myp2], language: language)
results = run results = run_test
expect(results.stderr).to eq ""
expect(results.status).to eq 0
end
it "allows the user to terminate the lexer" do
write_grammar <<EOF
token a;
token b <<
$terminate(8675309);
>>
token c;
Start -> Any;
Any -> a;
Any -> b;
Any -> c;
EOF
run_propane(language: language)
compile("spec/test_user_terminate_lexer.#{language}", language: language)
results = run_test
expect(results.stderr).to eq ""
expect(results.status).to eq 0
end
it "allows the user to terminate the parser" do
write_grammar <<EOF
token a;
token b;
token c;
Start -> Any;
Any -> a Any;
Any -> b Any << $terminate(4200); >>
Any -> c Any;
Any -> ;
EOF
run_propane(language: language)
compile("spec/test_user_terminate.#{language}", language: language)
results = run_test
expect(results.stderr).to eq ""
expect(results.status).to eq 0
end
it "matches backslash escape sequences" do
case language
when "c"
write_grammar <<EOF
<<
#include <stdio.h>
>>
tokenid t;
/\\a/ << printf("A\\n"); >>
/\\b/ << printf("B\\n"); >>
/\\t/ << printf("T\\n"); >>
/\\n/ << printf("N\\n"); >>
/\\v/ << printf("V\\n"); >>
/\\f/ << printf("F\\n"); >>
/\\r/ << printf("R\\n"); >>
/t/ << return $token(t); >>
Start -> t;
EOF
when "d"
write_grammar <<EOF
<<
import std.stdio;
>>
tokenid t;
/\\a/ << writeln("A"); >>
/\\b/ << writeln("B"); >>
/\\t/ << writeln("T"); >>
/\\n/ << writeln("N"); >>
/\\v/ << writeln("V"); >>
/\\f/ << writeln("F"); >>
/\\r/ << writeln("R"); >>
/t/ <<
return $token(t);
>>
Start -> t;
EOF
end
run_propane(language: language)
compile("spec/test_match_backslashes.#{language}", language: language)
results = run_test
expect(results.stderr).to eq ""
expect(results.status).to eq 0
verify_lines(results.stdout, [
"A",
"B",
"T",
"N",
"V",
"F",
"R",
])
end
it "handles when an item set leads to itself" do
write_grammar <<EOF
token one;
token two;
Start -> Opt one Start;
Start -> ;
Opt -> two;
Opt -> ;
EOF
run_propane(language: language)
end
it "generates an AST" do
write_grammar <<EOF
ast;
ptype int;
token a << $$ = 11; >>
token b << $$ = 22; >>
token one /1/;
token two /2/;
token comma /,/ <<
$$ = 42;
>>
token lparen /\\(/;
token rparen /\\)/;
drop /\\s+/;
Start -> Items;
Items -> Item ItemsMore;
Items -> ;
ItemsMore -> comma Item ItemsMore;
ItemsMore -> ;
Item -> a;
Item -> b;
Item -> lparen Item rparen;
Item -> Dual;
Dual -> One Two;
Dual -> Two One;
One -> one;
Two -> two;
EOF
run_propane(language: language)
compile("spec/test_ast.#{language}", language: language)
results = run_test
expect(results.stderr).to eq ""
expect(results.status).to eq 0
end
it "supports AST node prefix and suffix" do
write_grammar <<EOF
ast;
ast_prefix P ;
ast_suffix S;
ptype int;
token a << $$ = 11; >>
token b << $$ = 22; >>
token one /1/;
token two /2/;
token comma /,/ <<
$$ = 42;
>>
token lparen /\\(/;
token rparen /\\)/;
drop /\\s+/;
Start -> Items;
Items -> Item ItemsMore;
Items -> ;
ItemsMore -> comma Item ItemsMore;
ItemsMore -> ;
Item -> a;
Item -> b;
Item -> lparen Item rparen;
Item -> Dual;
Dual -> One Two;
Dual -> Two One;
One -> one;
Two -> two;
EOF
run_propane(language: language)
compile("spec/test_ast_ps.#{language}", language: language)
results = run_test
expect(results.stderr).to eq ""
expect(results.status).to eq 0
end
it "allows specifying a different start rule" do
write_grammar <<EOF
token hi;
start Top;
Top -> hi;
EOF
run_propane(language: language)
compile("spec/test_start_rule.#{language}", language: language)
end
it "allows specifying a different start rule with AST generation" do
write_grammar <<EOF
ast;
token hi;
start Top;
Top -> hi;
EOF
run_propane(language: language)
compile("spec/test_start_rule_ast.#{language}", language: language)
end
it "allows marking a rule component as optional" do
if language == "d"
write_grammar <<EOF
<<
import std.stdio;
>>
ptype int;
ptype float = float;
ptype string = string;
token a (float) << $$ = 1.5; >>
token b << $$ = 2; >>
token c << $$ = 3; >>
token d << $$ = 4; >>
Start -> a? b R? <<
writeln("a: ", $1);
writeln("b: ", $2);
writeln("R: ", $3);
>>
R -> c d << $$ = "cd"; >>
R (string) -> d c << $$ = "dc"; >>
EOF
else
write_grammar <<EOF
<<
#include <stdio.h>
>>
ptype int;
ptype float = float;
ptype string = char *;
token a (float) << $$ = 1.5; >>
token b << $$ = 2; >>
token c << $$ = 3; >>
token d << $$ = 4; >>
Start -> a? b R? <<
printf("a: %.1f\\n", $1);
printf("b: %d\\n", $2);
printf("R: %s\\n", $3 == NULL ? "" : $3);
>>
R -> c d << $$ = "cd"; >>
R (string) -> d c << $$ = "dc"; >>
EOF
end
run_propane(language: language)
compile("spec/test_optional_rule_component.#{language}", language: language)
results = run_test
expect(results.stderr).to eq ""
expect(results.status).to eq 0
verify_lines(results.stdout, [
"a: 0#{language == "d" ? "" : ".0"}",
"b: 2",
"R: ",
"a: 1.5",
"b: 2",
"R: cd",
"a: 1.5",
"b: 2",
"R: dc",
])
end
it "allows marking a rule component as optional in AST generation mode" do
if language == "d"
write_grammar <<EOF
ast;
<<
import std.stdio;
>>
token a;
token b;
token c;
token d;
Start -> a? b R?;
R -> c d;
R -> d c;
EOF
else
write_grammar <<EOF
ast;
<<
#include <stdio.h>
>>
token a;
token b;
token c;
token d;
Start -> a? b R?;
R -> c d;
R -> d c;
EOF
end
run_propane(language: language)
compile("spec/test_optional_rule_component_ast.#{language}", language: language)
results = run_test
expect(results.stderr).to eq ""
expect(results.status).to eq 0
end
it "stores the token position in the AST Token node" do
write_grammar <<EOF
ast;
token a;
token b;
token c;
drop /\\s+/;
Start -> T T T;
T -> a;
T -> b;
T -> c;
EOF
run_propane(language: language)
compile("spec/test_ast_token_positions.#{language}", language: language)
results = run_test
expect(results.stderr).to eq "" expect(results.stderr).to eq ""
expect(results.status).to eq 0 expect(results.status).to eq 0
end end

55
spec/test_ast.c Normal file
View File

@ -0,0 +1,55 @@
#include "testparser.h"
#include <assert.h>
#include <string.h>
#include "testutils.h"
int main()
{
char const * input = "a, ((b)), b";
p_context_t context;
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert_eq(P_SUCCESS, p_parse(&context));
Start * start = p_result(&context);
assert(start->pItems1 != NULL);
assert(start->pItems != NULL);
Items * items = start->pItems;
assert(items->pItem != NULL);
assert(items->pItem->pToken1 != NULL);
assert_eq(TOKEN_a, items->pItem->pToken1->token);
assert_eq(11, items->pItem->pToken1->pvalue);
assert(items->pItemsMore != NULL);
ItemsMore * itemsmore = items->pItemsMore;
assert(itemsmore->pItem != NULL);
assert(itemsmore->pItem->pItem != NULL);
assert(itemsmore->pItem->pItem->pItem != NULL);
assert(itemsmore->pItem->pItem->pItem->pToken1 != NULL);
assert_eq(TOKEN_b, itemsmore->pItem->pItem->pItem->pToken1->token);
assert_eq(22, itemsmore->pItem->pItem->pItem->pToken1->pvalue);
assert(itemsmore->pItemsMore != NULL);
itemsmore = itemsmore->pItemsMore;
assert(itemsmore->pItem != NULL);
assert(itemsmore->pItem->pToken1 != NULL);
assert_eq(TOKEN_b, itemsmore->pItem->pToken1->token);
assert_eq(22, itemsmore->pItem->pToken1->pvalue);
assert(itemsmore->pItemsMore == NULL);
input = "";
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert_eq(P_SUCCESS, p_parse(&context));
start = p_result(&context);
assert(start->pItems == NULL);
input = "2 1";
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert_eq(P_SUCCESS, p_parse(&context));
start = p_result(&context);
assert(start->pItems != NULL);
assert(start->pItems->pItem != NULL);
assert(start->pItems->pItem->pDual != NULL);
assert(start->pItems->pItem->pDual->pTwo1 != NULL);
assert(start->pItems->pItem->pDual->pOne2 != NULL);
assert(start->pItems->pItem->pDual->pTwo2 == NULL);
assert(start->pItems->pItem->pDual->pOne1 == NULL);
return 0;
}

57
spec/test_ast.d Normal file
View File

@ -0,0 +1,57 @@
import testparser;
import std.stdio;
import testutils;
int main()
{
return 0;
}
unittest
{
string input = "a, ((b)), b";
p_context_t context;
p_context_init(&context, input);
assert_eq(P_SUCCESS, p_parse(&context));
Start * start = p_result(&context);
assert(start.pItems1 !is null);
assert(start.pItems !is null);
Items * items = start.pItems;
assert(items.pItem !is null);
assert(items.pItem.pToken1 !is null);
assert_eq(TOKEN_a, items.pItem.pToken1.token);
assert_eq(11, items.pItem.pToken1.pvalue);
assert(items.pItemsMore !is null);
ItemsMore * itemsmore = items.pItemsMore;
assert(itemsmore.pItem !is null);
assert(itemsmore.pItem.pItem !is null);
assert(itemsmore.pItem.pItem.pItem !is null);
assert(itemsmore.pItem.pItem.pItem.pToken1 !is null);
assert_eq(TOKEN_b, itemsmore.pItem.pItem.pItem.pToken1.token);
assert_eq(22, itemsmore.pItem.pItem.pItem.pToken1.pvalue);
assert(itemsmore.pItemsMore !is null);
itemsmore = itemsmore.pItemsMore;
assert(itemsmore.pItem !is null);
assert(itemsmore.pItem.pToken1 !is null);
assert_eq(TOKEN_b, itemsmore.pItem.pToken1.token);
assert_eq(22, itemsmore.pItem.pToken1.pvalue);
assert(itemsmore.pItemsMore is null);
input = "";
p_context_init(&context, input);
assert_eq(P_SUCCESS, p_parse(&context));
start = p_result(&context);
assert(start.pItems is null);
input = "2 1";
p_context_init(&context, input);
assert_eq(P_SUCCESS, p_parse(&context));
start = p_result(&context);
assert(start.pItems !is null);
assert(start.pItems.pItem !is null);
assert(start.pItems.pItem.pDual !is null);
assert(start.pItems.pItem.pDual.pTwo1 !is null);
assert(start.pItems.pItem.pDual.pOne2 !is null);
assert(start.pItems.pItem.pDual.pTwo2 is null);
assert(start.pItems.pItem.pDual.pOne1 is null);
}

55
spec/test_ast_ps.c Normal file
View File

@ -0,0 +1,55 @@
#include "testparser.h"
#include <assert.h>
#include <string.h>
#include "testutils.h"
int main()
{
char const * input = "a, ((b)), b";
p_context_t context;
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert_eq(P_SUCCESS, p_parse(&context));
PStartS * start = p_result(&context);
assert(start->pItems1 != NULL);
assert(start->pItems != NULL);
PItemsS * items = start->pItems;
assert(items->pItem != NULL);
assert(items->pItem->pToken1 != NULL);
assert_eq(TOKEN_a, items->pItem->pToken1->token);
assert_eq(11, items->pItem->pToken1->pvalue);
assert(items->pItemsMore != NULL);
PItemsMoreS * itemsmore = items->pItemsMore;
assert(itemsmore->pItem != NULL);
assert(itemsmore->pItem->pItem != NULL);
assert(itemsmore->pItem->pItem->pItem != NULL);
assert(itemsmore->pItem->pItem->pItem->pToken1 != NULL);
assert_eq(TOKEN_b, itemsmore->pItem->pItem->pItem->pToken1->token);
assert_eq(22, itemsmore->pItem->pItem->pItem->pToken1->pvalue);
assert(itemsmore->pItemsMore != NULL);
itemsmore = itemsmore->pItemsMore;
assert(itemsmore->pItem != NULL);
assert(itemsmore->pItem->pToken1 != NULL);
assert_eq(TOKEN_b, itemsmore->pItem->pToken1->token);
assert_eq(22, itemsmore->pItem->pToken1->pvalue);
assert(itemsmore->pItemsMore == NULL);
input = "";
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert_eq(P_SUCCESS, p_parse(&context));
start = p_result(&context);
assert(start->pItems == NULL);
input = "2 1";
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert_eq(P_SUCCESS, p_parse(&context));
start = p_result(&context);
assert(start->pItems != NULL);
assert(start->pItems->pItem != NULL);
assert(start->pItems->pItem->pDual != NULL);
assert(start->pItems->pItem->pDual->pTwo1 != NULL);
assert(start->pItems->pItem->pDual->pOne2 != NULL);
assert(start->pItems->pItem->pDual->pTwo2 == NULL);
assert(start->pItems->pItem->pDual->pOne1 == NULL);
return 0;
}

57
spec/test_ast_ps.d Normal file
View File

@ -0,0 +1,57 @@
import testparser;
import std.stdio;
import testutils;
int main()
{
return 0;
}
unittest
{
string input = "a, ((b)), b";
p_context_t context;
p_context_init(&context, input);
assert_eq(P_SUCCESS, p_parse(&context));
PStartS * start = p_result(&context);
assert(start.pItems1 !is null);
assert(start.pItems !is null);
PItemsS * items = start.pItems;
assert(items.pItem !is null);
assert(items.pItem.pToken1 !is null);
assert_eq(TOKEN_a, items.pItem.pToken1.token);
assert_eq(11, items.pItem.pToken1.pvalue);
assert(items.pItemsMore !is null);
PItemsMoreS * itemsmore = items.pItemsMore;
assert(itemsmore.pItem !is null);
assert(itemsmore.pItem.pItem !is null);
assert(itemsmore.pItem.pItem.pItem !is null);
assert(itemsmore.pItem.pItem.pItem.pToken1 !is null);
assert_eq(TOKEN_b, itemsmore.pItem.pItem.pItem.pToken1.token);
assert_eq(22, itemsmore.pItem.pItem.pItem.pToken1.pvalue);
assert(itemsmore.pItemsMore !is null);
itemsmore = itemsmore.pItemsMore;
assert(itemsmore.pItem !is null);
assert(itemsmore.pItem.pToken1 !is null);
assert_eq(TOKEN_b, itemsmore.pItem.pToken1.token);
assert_eq(22, itemsmore.pItem.pToken1.pvalue);
assert(itemsmore.pItemsMore is null);
input = "";
p_context_init(&context, input);
assert_eq(P_SUCCESS, p_parse(&context));
start = p_result(&context);
assert(start.pItems is null);
input = "2 1";
p_context_init(&context, input);
assert_eq(P_SUCCESS, p_parse(&context));
start = p_result(&context);
assert(start.pItems !is null);
assert(start.pItems.pItem !is null);
assert(start.pItems.pItem.pDual !is null);
assert(start.pItems.pItem.pDual.pTwo1 !is null);
assert(start.pItems.pItem.pDual.pOne2 !is null);
assert(start.pItems.pItem.pDual.pTwo2 is null);
assert(start.pItems.pItem.pDual.pOne1 is null);
}

View File

@ -0,0 +1,33 @@
#include "testparser.h"
#include <assert.h>
#include <string.h>
#include "testutils.h"
int main()
{
char const * input = "abc";
p_context_t context;
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_SUCCESS);
Start * start = p_result(&context);
assert_eq(0, start->pT1->pToken->position.row);
assert_eq(0, start->pT1->pToken->position.col);
assert_eq(0, start->pT2->pToken->position.row);
assert_eq(1, start->pT2->pToken->position.col);
assert_eq(0, start->pT3->pToken->position.row);
assert_eq(2, start->pT3->pToken->position.col);
input = "\n\n a\nc\n\n a";
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_SUCCESS);
start = p_result(&context);
assert_eq(2, start->pT1->pToken->position.row);
assert_eq(2, start->pT1->pToken->position.col);
assert_eq(3, start->pT2->pToken->position.row);
assert_eq(0, start->pT2->pToken->position.col);
assert_eq(5, start->pT3->pToken->position.row);
assert_eq(5, start->pT3->pToken->position.col);
return 0;
}

View File

@ -0,0 +1,34 @@
import testparser;
import std.stdio;
import testutils;
int main()
{
return 0;
}
unittest
{
string input = "abc";
p_context_t context;
p_context_init(&context, input);
assert(p_parse(&context) == P_SUCCESS);
Start * start = p_result(&context);
assert_eq(0, start.pT1.pToken.position.row);
assert_eq(0, start.pT1.pToken.position.col);
assert_eq(0, start.pT2.pToken.position.row);
assert_eq(1, start.pT2.pToken.position.col);
assert_eq(0, start.pT3.pToken.position.row);
assert_eq(2, start.pT3.pToken.position.col);
input = "\n\n a\nc\n\n a";
p_context_init(&context, input);
assert(p_parse(&context) == P_SUCCESS);
start = p_result(&context);
assert_eq(2, start.pT1.pToken.position.row);
assert_eq(2, start.pT1.pToken.position.col);
assert_eq(3, start.pT2.pToken.position.row);
assert_eq(0, start.pT2.pToken.position.col);
assert_eq(5, start.pT3.pToken.position.row);
assert_eq(5, start.pT3.pToken.position.col);
}

View File

@ -14,14 +14,14 @@ int main()
assert(p_parse(&context) == P_UNEXPECTED_TOKEN); assert(p_parse(&context) == P_UNEXPECTED_TOKEN);
assert(p_position(&context).row == 2); assert(p_position(&context).row == 2);
assert(p_position(&context).col == 3); assert(p_position(&context).col == 3);
assert(context.token == TOKEN_a); assert(p_token(&context) == TOKEN_a);
input = "12"; input = "12";
p_context_init(&context, (uint8_t const *)input, strlen(input)); p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_UNEXPECTED_TOKEN); assert(p_parse(&context) == P_UNEXPECTED_TOKEN);
assert(p_position(&context).row == 0); assert(p_position(&context).row == 0);
assert(p_position(&context).col == 0); assert(p_position(&context).col == 0);
assert(context.token == TOKEN_num); assert(p_token(&context) == TOKEN_num);
input = "a 12\n\nab"; input = "a 12\n\nab";
p_context_init(&context, (uint8_t const *)input, strlen(input)); p_context_init(&context, (uint8_t const *)input, strlen(input));
@ -35,5 +35,8 @@ int main()
assert(p_position(&context).row == 5); assert(p_position(&context).row == 5);
assert(p_position(&context).col == 4); assert(p_position(&context).col == 4);
assert(strcmp(p_token_names[TOKEN_a], "a") == 0);
assert(strcmp(p_token_names[TOKEN_num], "num") == 0);
return 0; return 0;
} }

View File

@ -17,13 +17,13 @@ unittest
p_context_init(&context, input); p_context_init(&context, input);
assert(p_parse(&context) == P_UNEXPECTED_TOKEN); assert(p_parse(&context) == P_UNEXPECTED_TOKEN);
assert(p_position(&context) == p_position_t(2, 3)); assert(p_position(&context) == p_position_t(2, 3));
assert(context.token == TOKEN_a); assert(p_token(&context) == TOKEN_a);
input = "12"; input = "12";
p_context_init(&context, input); p_context_init(&context, input);
assert(p_parse(&context) == P_UNEXPECTED_TOKEN); assert(p_parse(&context) == P_UNEXPECTED_TOKEN);
assert(p_position(&context) == p_position_t(0, 0)); assert(p_position(&context) == p_position_t(0, 0));
assert(context.token == TOKEN_num); assert(p_token(&context) == TOKEN_num);
input = "a 12\n\nab"; input = "a 12\n\nab";
p_context_init(&context, input); p_context_init(&context, input);
@ -34,4 +34,7 @@ unittest
p_context_init(&context, input); p_context_init(&context, input);
assert(p_parse(&context) == P_DECODE_ERROR); assert(p_parse(&context) == P_DECODE_ERROR);
assert(p_position(&context) == p_position_t(5, 4)); assert(p_position(&context) == p_position_t(5, 4));
assert(p_token_names[TOKEN_a] == "a");
assert(p_token_names[TOKEN_num] == "num");
} }

View File

@ -0,0 +1,13 @@
#include "testparser.h"
#include <assert.h>
#include <string.h>
int main()
{
char const * input = "\a\b\t\n\v\f\rt";
p_context_t context;
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_SUCCESS);
return 0;
}

View File

@ -0,0 +1,15 @@
import testparser;
import std.stdio;
int main()
{
return 0;
}
unittest
{
string input = "\a\b\t\n\v\f\rt";
p_context_t context;
p_context_init(&context, input);
assert(p_parse(&context) == P_SUCCESS);
}

View File

@ -0,0 +1,22 @@
#include "testparser.h"
#include <assert.h>
#include <string.h>
int main()
{
char const * input = "b";
p_context_t context;
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_SUCCESS);
input = "abcd";
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_SUCCESS);
input = "abdc";
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_SUCCESS);
return 0;
}

View File

@ -0,0 +1,23 @@
import testparser;
import std.stdio;
int main()
{
return 0;
}
unittest
{
string input = "b";
p_context_t context;
p_context_init(&context, input);
assert(p_parse(&context) == P_SUCCESS);
input = "abcd";
p_context_init(&context, input);
assert(p_parse(&context) == P_SUCCESS);
input = "abdc";
p_context_init(&context, input);
assert(p_parse(&context) == P_SUCCESS);
}

View File

@ -0,0 +1,42 @@
#include "testparser.h"
#include <assert.h>
#include <string.h>
#include "testutils.h"
int main()
{
char const * input = "b";
p_context_t context;
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_SUCCESS);
Start * start = p_result(&context);
assert(start->pToken1 == NULL);
assert(start->pToken2 != NULL);
assert_eq(TOKEN_b, start->pToken2->token);
assert(start->pR3 == NULL);
assert(start->pR == NULL);
input = "abcd";
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_SUCCESS);
start = p_result(&context);
assert(start->pToken1 != NULL);
assert_eq(TOKEN_a, start->pToken1->token);
assert(start->pToken2 != NULL);
assert(start->pR3 != NULL);
assert(start->pR != NULL);
assert(start->pR == start->pR3);
assert_eq(TOKEN_c, start->pR->pToken1->token);
input = "bdc";
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_SUCCESS);
start = p_result(&context);
assert(start->pToken1 == NULL);
assert(start->pToken2 != NULL);
assert(start->pR != NULL);
assert_eq(TOKEN_d, start->pR->pToken1->token);
return 0;
}

View File

@ -0,0 +1,43 @@
import testparser;
import std.stdio;
import testutils;
int main()
{
return 0;
}
unittest
{
string input = "b";
p_context_t context;
p_context_init(&context, input);
assert(p_parse(&context) == P_SUCCESS);
Start * start = p_result(&context);
assert(start.pToken1 is null);
assert(start.pToken2 !is null);
assert_eq(TOKEN_b, start.pToken2.token);
assert(start.pR3 is null);
assert(start.pR is null);
input = "abcd";
p_context_init(&context, input);
assert(p_parse(&context) == P_SUCCESS);
start = p_result(&context);
assert(start.pToken1 != null);
assert_eq(TOKEN_a, start.pToken1.token);
assert(start.pToken2 != null);
assert(start.pR3 != null);
assert(start.pR != null);
assert(start.pR == start.pR3);
assert_eq(TOKEN_c, start.pR.pToken1.token);
input = "bdc";
p_context_init(&context, input);
assert(p_parse(&context) == P_SUCCESS);
start = p_result(&context);
assert(start.pToken1 is null);
assert(start.pToken2 !is null);
assert(start.pR !is null);
assert_eq(TOKEN_d, start.pR.pToken1.token);
}

9
spec/test_start_rule.c Normal file
View File

@ -0,0 +1,9 @@
#include "testparser.h"
#include <assert.h>
#include <string.h>
#include "testutils.h"
int main()
{
return 0;
}

8
spec/test_start_rule.d Normal file
View File

@ -0,0 +1,8 @@
import testparser;
import std.stdio;
import testutils;
int main()
{
return 0;
}

View File

@ -0,0 +1,17 @@
#include "testparser.h"
#include <assert.h>
#include <string.h>
#include "testutils.h"
int main()
{
char const * input = "hi";
p_context_t context;
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert_eq(P_SUCCESS, p_parse(&context));
Top * top = p_result(&context);
assert(top->pToken != NULL);
assert_eq(TOKEN_hi, top->pToken->token);
return 0;
}

View File

@ -0,0 +1,19 @@
import testparser;
import std.stdio;
import testutils;
int main()
{
return 0;
}
unittest
{
string input = "hi";
p_context_t context;
p_context_init(&context, input);
assert_eq(P_SUCCESS, p_parse(&context));
Top * top = p_result(&context);
assert(top.pToken !is null);
assert_eq(TOKEN_hi, top.pToken.token);
}

View File

@ -0,0 +1,19 @@
#include "testparser.h"
#include <assert.h>
#include <stdio.h>
#include <string.h>
int main()
{
char const * input = "aacc";
p_context_t context;
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_SUCCESS);
input = "abc";
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_USER_TERMINATED);
assert(p_user_terminate_code(&context) == 4200);
return 0;
}

View File

@ -0,0 +1,20 @@
import testparser;
import std.stdio;
int main()
{
return 0;
}
unittest
{
string input = "aacc";
p_context_t context;
p_context_init(&context, input);
assert(p_parse(&context) == P_SUCCESS);
input = "abc";
p_context_init(&context, input);
assert(p_parse(&context) == P_USER_TERMINATED);
assert(p_user_terminate_code(&context) == 4200);
}

View File

@ -0,0 +1,19 @@
#include "testparser.h"
#include <assert.h>
#include <stdio.h>
#include <string.h>
int main()
{
char const * input = "a";
p_context_t context;
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_SUCCESS);
input = "b";
p_context_init(&context, (uint8_t const *)input, strlen(input));
assert(p_parse(&context) == P_USER_TERMINATED);
assert(p_user_terminate_code(&context) == 8675309);
return 0;
}

View File

@ -0,0 +1,20 @@
import testparser;
import std.stdio;
int main()
{
return 0;
}
unittest
{
string input = "a";
p_context_t context;
p_context_init(&context, input);
assert(p_parse(&context) == P_SUCCESS);
input = "b";
p_context_init(&context, input);
assert(p_parse(&context) == P_USER_TERMINATED);
assert(p_user_terminate_code(&context) == 8675309);
}