v1.5.1

Only calculate lookahead tokens when needed - #28
Lookahead tokens are only need if either: (1) There is more than one rule that could be reduced in a given parser state, or (2) There are shift actions for a state and at least one rule that could be reduced in the same state (to warn about shift/reduce conflicts).
2024-07-26 22:30:48 -04:00 · 2024-07-26 22:08:25 -04:00 · 2024-07-26 21:36:41 -04:00 · 2024-07-25 20:33:15 -04:00 · 2024-07-25 20:02:00 -04:00 · 2024-07-25 10:42:43 -04:00
51 changed files with 3346 additions and 462 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,3 +1,62 @@
+## v1.5.1
+
+### Improvements
+
+- Improve performance (#28)
+
+## v1.5.0
+
+### New Features
+
+- Track start and end text positions for tokens and rules in AST node structures (#27)
+- Add warnings for shift/reduce conflicts to log file (#25)
+- Add -w command line switch to treat warnings as errors and output to stderr (#26)
+- Add rule field aliases (#24)
+
+### Improvements
+
+- Show line numbers of rules on conflict (#23)
+
+## v1.4.0
+
+### New Features
+
+- Allow user to specify AST node name prefix or suffix
+- Allow specifying the start rule name
+- Allow rule terms to be marked as optional
+
+### Improvements
+
+- Give a better error message when a referenced ptype has not been declared
+
+## v1.3.0
+
+### New Features
+
+- Add AST generation (#22)
+
+## v1.2.0
+
+### New Features
+
+- Allow one line user code blocks (#21)
+- Add backslash escape codes (#19)
+- Add API to access unexpected token found (#18)
+- Add token_names API (#17)
+- Add D example to user guide for p_context_init() (#16)
+- Allow user termination from lexer code blocks (#15)
+
+### Fixes
+
+- Fix generator hang when state transition cycle is present (#20)
+
+## v1.1.0
+
+### New Features
+
+- Add user parser terminations (#13)
+- Document generated parser API in user guide (#14)
+
 ## v1.0.0

 - Initial release
--- a/LICENSE.txt
+++ b/LICENSE.txt
@ -1,6 +1,6 @@
 The MIT License (MIT)

-Copyright (c) 2010-2023 Josh Holtrop
+Copyright (c) 2010-2024 Josh Holtrop

 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
--- a/README.md
+++ b/README.md
@ -6,7 +6,8 @@ Propane is a LALR Parser Generator (LPG) which:
  * generates a built-in lexer to tokenize input
  * supports UTF-8 lexer inputs
  * generates a table-driven shift/reduce parser to parse input in linear time
-  * target C or D language outputs
+  * targets C or D language outputs
+  * optionally supports automatic full AST generation
  * is MIT-licensed
  * is distributable as a standalone Ruby script

@ -17,6 +18,7 @@ can be copied into and versioned in a project's source tree.
 The only requirement to run Propane is that the system has a Ruby interpreter
 installed.
 The latest release can be downloaded from [https://github.com/holtrop/propane/releases](https://github.com/holtrop/propane/releases).
+
 Simply copy the `propane` executable script into the desired location within
 the project to be built (typically the root of the repository) and mark it
 executable.
@ -29,9 +31,14 @@ Propane is typically invoked from the command-line as `./propane`.

    Usage: ./propane [options] <input-file> <output-file>
    Options:
-      --log LOG   Write log file
-      --version   Show program version and exit
-      -h, --help  Show this usage and exit
+      -h, --help  Show this usage and exit.
+      --log LOG   Write log file. This will show all parser states and their
+                  associated shifts and reduces. It can be helpful when
+                  debugging a grammar.
+      --version   Show program version and exit.
+      -w          Treat warnings as errors. This option will treat shift/reduce
+                  conflicts as fatal errors and will print them to stderr in
+                  addition to the log file.

 The user must specify the path to a Propane input grammar file and a path to an
 output file.
@ -55,10 +62,10 @@ import std.math;
 ptype ulong;

 # A few basic arithmetic operators.
-token plus /\\+/;
-token times /\\*/;
-token power /\\*\\*/;
-token integer /\\d+/ <<
+token plus /\+/;
+token times /\*/;
+token power /\*\*/;
+token integer /\d+/ <<
  ulong v;
  foreach (c; match)
  {
@ -67,38 +74,22 @@ token integer /\\d+/ <<
  }
  $$ = v;
 >>
-token lparen /\\(/;
-token rparen /\\)/;
+token lparen /\(/;
+token rparen /\)/;
 # Drop whitespace.
-drop /\\s+/;
+drop /\s+/;

-Start -> E1 <<
-  $$ = $1;
->>
-E1 -> E2 <<
-  $$ = $1;
->>
-E1 -> E1 plus E2 <<
-  $$ = $1 + $3;
->>
-E2 -> E3 <<
-  $$ = $1;
->>
-E2 -> E2 times E3 <<
-  $$ = $1 * $3;
->>
-E3 -> E4 <<
-  $$ = $1;
->>
+Start -> E1 << $$ = $1; >>
+E1 -> E2 << $$ = $1; >>
+E1 -> E1 plus E2 << $$ = $1 + $3; >>
+E2 -> E3 << $$ = $1; >>
+E2 -> E2 times E3 << $$ = $1 * $3; >>
+E3 -> E4 << $$ = $1; >>
 E3 -> E3 power E4 <<
  $$ = pow($1, $3);
 >>
-E4 -> integer <<
-  $$ = $1;
->>
-E4 -> lparen E1 rparen <<
-  $$ = $2;
->>
+E4 -> integer << $$ = $1; >>
+E4 -> lparen E1 rparen << $$ = $2; >>
 ```

 Grammar files can contain comment lines beginning with `#` which are ignored.
--- a/assets/parser.c.erb
+++ b/assets/parser.c.erb
@ -3,6 +3,17 @@
 #include <stdlib.h>
 #include <string.h>

+/**************************************************************************
+ * Public data
+ *************************************************************************/
+
+/** Token names. */
+const char * <%= @grammar.prefix %>token_names[] = {
+<% @grammar.tokens.each_with_index do |token, index| %>
+    "<%= token.name %>",
+<% end %>
+};
+
 /**************************************************************************
 * User code blocks
 *************************************************************************/
@ -21,6 +32,7 @@
 #define P_UNEXPECTED_TOKEN 3u
 #define P_DROP 4u
 #define P_EOF 5u
+#define P_USER_TERMINATED 6u
 <% end %>

 /* An invalid ID value. */
@ -214,7 +226,10 @@ typedef struct
    /** Number of bytes of input text used to match. */
    size_t length;

-    /** Input text position delta. */
+    /** Input text position delta to end of token. */
+    <%= @grammar.prefix %>position_t end_delta_position;
+
+    /** Input text position delta to next code point after token end. */
    <%= @grammar.prefix %>position_t delta_position;

    /** Accepting lexer state from the match. */
@ -308,9 +323,12 @@ static lexer_state_id_t check_lexer_transition(uint32_t current_state, uint32_t
 *
 * @param context
 *   Lexer/parser context structure.
- * @param[out] out_token_info
- *   The lexed token information is stored here if the return value is
- *   P_SUCCESS.
+ * @param[out] out_match_info
+ *   The longest match information is stored here if the return value is
+ *   P_SUCCESS or P_DECODE_ERROR.
+ * @param[out] out_unexpected_input_length
+ *   The unexpected input length is stored here if the return value is
+ *   P_UNEXPECTED_INPUT.
 *
 * @reval P_SUCCESS
 *   A token was successfully lexed.
@ -343,6 +361,7 @@ static size_t find_longest_match(<%= @grammar.prefix %>context_t * context,
            if (transition_state != INVALID_LEXER_STATE_ID)
            {
                attempt_match.length += code_point_length;
+                attempt_match.end_delta_position = attempt_match.delta_position;
                if (code_point == '\n')
                {
                    attempt_match.delta_position.row++;
@ -390,7 +409,6 @@ static size_t find_longest_match(<%= @grammar.prefix %>context_t * context,
                /* Valid EOF return. */
                return P_EOF;
            }
-            break;

        case P_DECODE_ERROR:
            /* If we see a decode error, we may be partially in the middle of
@ -422,13 +440,14 @@ static size_t find_longest_match(<%= @grammar.prefix %>context_t * context,
 *   Input text does not match any lexer pattern.
 * @retval P_DROP
 *   A drop pattern was matched so the lexer should continue.
+ * @retval P_USER_TERMINATED
+ *   User code has requested to terminate the lexer.
 */
 static size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%= @grammar.prefix %>token_info_t * out_token_info)
 {
    <%= @grammar.prefix %>token_info_t token_info = {0};
    token_info.position = context->text_position;
    token_info.token = INVALID_TOKEN_ID;
-    *out_token_info = token_info; // TODO: remove
    lexer_match_info_t match_info;
    size_t unexpected_input_length;
    size_t result = find_longest_match(context, &match_info, &unexpected_input_length);
@ -441,6 +460,12 @@ static size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%= @
            uint8_t const * match = &context->input[context->input_index];
            <%= @grammar.prefix %>token_t user_code_token = lexer_user_code(context,
                match_info.accepting_state->code_id, match, match_info.length, &token_info);
+            /* A TERMINATE_TOKEN_ID return code from lexer_user_code() means
+             * that the user code is requesting to terminate the lexer. */
+            if (user_code_token == TERMINATE_TOKEN_ID)
+            {
+                return P_USER_TERMINATED;
+            }
            /* An invalid token returned from lexer_user_code() means that the
             * user code did not explicitly return a token. So only override
             * the token to return if the user code does explicitly return a
@ -469,11 +494,22 @@ static size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%= @
        }
        token_info.token = token_to_accept;
        token_info.length = match_info.length;
+        if (match_info.end_delta_position.row != 0u)
+        {
+            token_info.end_position.row = token_info.position.row + match_info.end_delta_position.row;
+            token_info.end_position.col = match_info.end_delta_position.col;
+        }
+        else
+        {
+            token_info.end_position.row = token_info.position.row;
+            token_info.end_position.col = token_info.position.col + match_info.end_delta_position.col;
+        }
        *out_token_info = token_info;
        return P_SUCCESS;

    case P_EOF:
        token_info.token = TOKEN___EOF;
+        token_info.end_position = token_info.position;
        *out_token_info = token_info;
        return P_SUCCESS;

@ -511,6 +547,8 @@ static size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%= @
 *   The decoder encountered invalid text encoding.
 * @reval P_UNEXPECTED_INPUT
 *   Input text does not match any lexer pattern.
+ * @retval P_USER_TERMINATED
+ *   User code has requested to terminate the lexer.
 */
 size_t <%= @grammar.prefix %>lex(<%= @grammar.prefix %>context_t * context, <%= @grammar.prefix %>token_info_t * out_token_info)
 {
@ -528,6 +566,9 @@ size_t <%= @grammar.prefix %>lex(<%= @grammar.prefix %>context_t * context, <%=
 * Parser
 *************************************************************************/

+/** Invalid position value. */
+#define INVALID_POSITION (<%= @grammar.prefix %>position_t){0xFFFFFFFFu, 0xFFFFFFFFu}
+
 /** Reduce ID type. */
 typedef <%= get_type_for(@parser.reduce_table.size) %> reduce_id_t;

@ -587,6 +628,25 @@ typedef struct
     * reduce action.
     */
    parser_state_id_t n_states;
+<% if @grammar.ast %>
+
+    /**
+     * Map of rule components to rule set child fields.
+     */
+    uint16_t const * rule_set_node_field_index_map;
+
+    /**
+     * Number of rule set AST node fields.
+     */
+    uint16_t rule_set_node_field_array_size;
+
+    /**
+     * Whether this rule was a generated optional rule that matched the
+     * optional target. In this case, propagate the matched target node up
+     * instead of making a new node for this rule.
+     */
+    bool propagate_optional_target;
+<% end %>
 } reduce_t;

 /** Parser state entry. */
@ -617,19 +677,50 @@ typedef struct

    /** Parser value from this state. */
    <%= @grammar.prefix %>value_t pvalue;
+
+<% if @grammar.ast %>
+    /** AST node. */
+    void * ast_node;
+<% end %>
 } state_value_t;

+/** Common AST node structure. */
+typedef struct
+{
+    <%= @grammar.prefix %>position_t position;
+    <%= @grammar.prefix %>position_t end_position;
+    void * fields[];
+} ASTNode;
+
 /** Parser shift table. */
 static const shift_t parser_shift_table[] = {
 <%   @parser.shift_table.each do |shift| %>
-    {<%= shift[:symbol_id] %>u, <%= shift[:state_id] %>u},
+    {<%= shift[:symbol].id %>u, <%= shift[:state_id] %>u},
 <%   end %>
 };

+<% if @grammar.ast %>
+<%   @grammar.rules.each do |rule| %>
+<%     unless rule.flat_rule_set_node_field_index_map? %>
+const uint16_t r_<%= rule.name.gsub("$", "_") %><%= rule.id %>_node_field_index_map[<%= rule.rule_set_node_field_index_map.size %>] = {<%= rule.rule_set_node_field_index_map.map {|v| v.to_s}.join(", ") %>};
+<%     end %>
+<%   end %>
+<% end %>
+
 /** Parser reduce table. */
 static const reduce_t parser_reduce_table[] = {
 <%   @parser.reduce_table.each do |reduce| %>
-    {<%= reduce[:token_id] %>u, <%= reduce[:rule_id] %>u, <%= reduce[:rule_set_id] %>u, <%= reduce[:n_states] %>u},
+    {<%= reduce[:token_id] %>u, <%= reduce[:rule_id] %>u, <%= reduce[:rule_set_id] %>u, <%= reduce[:n_states] %>u
+<%     if @grammar.ast %>
+<%       if reduce[:rule].flat_rule_set_node_field_index_map? %>
+             , NULL
+<%       else %>
+             , &r_<%= reduce[:rule].name.gsub("$", "_") %><%= reduce[:rule].id %>_node_field_index_map[0]
+<%       end %>
+             , <%= reduce[:rule].rule_set.ast_fields.size %>
+             , <%= reduce[:propagate_optional_target] %>
+<%     end %>
+    },
 <%   end %>
 };

@ -733,17 +824,19 @@ static void state_values_stack_free(state_values_stack_t * stack)
     free(stack->entries);
 }

+<% unless @grammar.ast %>
 /**
 * Execute user code associated with a parser rule.
 *
 * @param rule The ID of the rule.
 *
- * @return Parse value.
+ * @retval P_SUCCESS
+ *   Continue parsing.
+ * @retval P_USER_TERMINATED
+ *   User requested to terminate parsing.
 */
-static <%= @grammar.prefix %>value_t parser_user_code(uint32_t rule, state_values_stack_t * statevalues, uint32_t n_states)
+static size_t parser_user_code(<%= @grammar.prefix %>value_t * _pvalue, uint32_t rule, state_values_stack_t * statevalues, uint32_t n_states, <%= @grammar.prefix %>context_t * context)
 {
-    <%= @grammar.prefix %>value_t _pvalue = {0};
-
    switch (rule)
    {
 <%   @grammar.rules.each do |rule| %>
@ -756,8 +849,9 @@ static <%= @grammar.prefix %>value_t parser_user_code(uint32_t rule, state_value
    default: break;
    }

-    return _pvalue;
+    return P_SUCCESS;
 }
+<% end %>

 /**
 * Check if the parser should shift to a new state.
@ -819,7 +913,7 @@ static size_t check_reduce(size_t state_id, <%= @grammar.prefix %>token_t token)
 *   can be accessed with <%= @grammar.prefix %>result().
 * @retval P_UNEXPECTED_TOKEN
 *   An unexpected token was encountered that does not match any grammar rule.
- *   The value context->token holds the unexpected token.
+ *   The function p_token(&context) can be used to get the unexpected token.
 * @reval P_DECODE_ERROR
 *   The decoder encountered invalid text encoding.
 * @reval P_UNEXPECTED_INPUT
@ -831,7 +925,11 @@ size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context)
    <%= @grammar.prefix %>token_t token = INVALID_TOKEN_ID;
    state_values_stack_t statevalues;
    size_t reduced_rule_set = INVALID_ID;
+<% if @grammar.ast %>
+    void * reduced_parser_node;
+<% else %>
    <%= @grammar.prefix %>value_t reduced_parser_value;
+<% end %>
    state_values_stack_init(&statevalues);
    state_values_stack_push(&statevalues);
    size_t result;
@ -858,7 +956,11 @@ size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context)
            if ((shift_state != INVALID_ID) && (token == TOKEN___EOF))
            {
                /* Successful parse. */
+<% if @grammar.ast %>
+                context->parse_result = (<%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> *)state_values_stack_index(&statevalues, -1)->ast_node;
+<% else %>
                context->parse_result = state_values_stack_index(&statevalues, -1)->pvalue;
+<% end %>
                result = P_SUCCESS;
                break;
            }
@ -871,15 +973,28 @@ size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context)
            if (reduced_rule_set == INVALID_ID)
            {
                /* We shifted a token, mark it consumed. */
-                token = INVALID_TOKEN_ID;
+<% if @grammar.ast %>
+                <%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %> * token_ast_node = malloc(sizeof(<%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %>));
+                token_ast_node->position = token_info.position;
+                token_ast_node->end_position = token_info.end_position;
+                token_ast_node->token = token;
+                token_ast_node->pvalue = token_info.pvalue;
+                state_values_stack_index(&statevalues, -1)->ast_node = token_ast_node;
+<% else %>
                state_values_stack_index(&statevalues, -1)->pvalue = token_info.pvalue;
+<% end %>
+                token = INVALID_TOKEN_ID;
            }
            else
            {
                /* We shifted a RuleSet. */
+<% if @grammar.ast %>
+                state_values_stack_index(&statevalues, -1)->ast_node = reduced_parser_node;
+<% else %>
                state_values_stack_index(&statevalues, -1)->pvalue = reduced_parser_value;
                <%= @grammar.prefix %>value_t new_parse_result = {0};
                reduced_parser_value = new_parse_result;
+<% end %>
                reduced_rule_set = INVALID_ID;
            }
            continue;
@ -889,7 +1004,63 @@ size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context)
        if (reduce_index != INVALID_ID)
        {
            /* We have something to reduce. */
-            reduced_parser_value = parser_user_code(parser_reduce_table[reduce_index].rule, &statevalues, parser_reduce_table[reduce_index].n_states);
+<% if @grammar.ast %>
+            if (parser_reduce_table[reduce_index].propagate_optional_target)
+            {
+                reduced_parser_node = state_values_stack_index(&statevalues, -1)->ast_node;
+            }
+            else if (parser_reduce_table[reduce_index].n_states > 0)
+            {
+                size_t n_fields = parser_reduce_table[reduce_index].rule_set_node_field_array_size;
+                ASTNode * node = (ASTNode *)malloc(sizeof(ASTNode) + n_fields * sizeof(void *));
+                node->position = INVALID_POSITION;
+                node->end_position = INVALID_POSITION;
+                for (size_t i = 0; i < n_fields; i++)
+                {
+                    node->fields[i] = NULL;
+                }
+                if (parser_reduce_table[reduce_index].rule_set_node_field_index_map == NULL)
+                {
+                    for (size_t i = 0; i < parser_reduce_table[reduce_index].n_states; i++)
+                    {
+                        node->fields[i] = state_values_stack_index(&statevalues, -(int)parser_reduce_table[reduce_index].n_states + (int)i)->ast_node;
+                    }
+                }
+                else
+                {
+                    for (size_t i = 0; i < parser_reduce_table[reduce_index].n_states; i++)
+                    {
+                        node->fields[parser_reduce_table[reduce_index].rule_set_node_field_index_map[i]] = state_values_stack_index(&statevalues, -(int)parser_reduce_table[reduce_index].n_states + (int)i)->ast_node;
+                    }
+                }
+                bool position_found = false;
+                for (size_t i = 0; i < n_fields; i++)
+                {
+                    ASTNode * child = (ASTNode *)node->fields[i];
+                    if ((child != NULL) && <%= @grammar.prefix %>position_valid(child->position))
+                    {
+                        if (!position_found)
+                        {
+                            node->position = child->position;
+                            position_found = true;
+                        }
+                        node->end_position = child->end_position;
+                    }
+                }
+                reduced_parser_node = node;
+            }
+            else
+            {
+                reduced_parser_node = NULL;
+            }
+<% else %>
+            <%= @grammar.prefix %>value_t reduced_parser_value2 = {0};
+            if (parser_user_code(&reduced_parser_value2, parser_reduce_table[reduce_index].rule, &statevalues, parser_reduce_table[reduce_index].n_states, context) == P_USER_TERMINATED)
+            {
+                return P_USER_TERMINATED;
+            }
+            reduced_parser_value = reduced_parser_value2;
+<% end %>
            reduced_rule_set = parser_reduce_table[reduce_index].rule_set;
            state_values_stack_pop(&statevalues, parser_reduce_table[reduce_index].n_states);
            continue;
@ -917,9 +1088,17 @@ size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context)
 *
 * @return Parse result value.
 */
+<% if @grammar.ast %>
+<%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> * <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context)
+<% else %>
 <%= start_rule_type[1] %> <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context)
+<% end %>
 {
+<% if @grammar.ast %>
+    return context->parse_result;
+<% else %>
    return context->parse_result.v_<%= start_rule_type[0] %>;
+<% end %>
 }

 /**
@ -934,3 +1113,26 @@ size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context)
 {
    return context->text_position;
 }
+
+/**
+ * Get the user terminate code.
+ *
+ * @param context
+ *   Lexer/parser context structure.
+ *
+ * @return User terminate code.
+ */
+size_t <%= @grammar.prefix %>user_terminate_code(<%= @grammar.prefix %>context_t * context)
+{
+    return context->user_terminate_code;
+}
+
+/**
+ * Get the parse token.
+ *
+ * @return Parse token.
+ */
+<%= @grammar.prefix %>token_t <%= @grammar.prefix %>token(<%= @grammar.prefix %>context_t * context)
+{
+    return context->token;
+}
--- a/assets/parser.d.erb
+++ b/assets/parser.d.erb
@ -8,6 +8,8 @@
 module <%= @grammar.modulename %>;
 <% end %>

+import core.stdc.stdlib : malloc;
+
 /**************************************************************************
 * User code blocks
 *************************************************************************/
@ -27,10 +29,11 @@ public enum : size_t
    <%= @grammar.prefix.upcase %>UNEXPECTED_TOKEN,
    <%= @grammar.prefix.upcase %>DROP,
    <%= @grammar.prefix.upcase %>EOF,
+    <%= @grammar.prefix.upcase %>USER_TERMINATED,
 }

 /** Token type. */
-public alias <%= @grammar.prefix %>token_t = <%= get_type_for(@grammar.invalid_token_id) %>;
+public alias <%= @grammar.prefix %>token_t = <%= get_type_for(@grammar.terminate_token_id) %>;

 /** Token IDs. */
 public enum : <%= @grammar.prefix %>token_t
@ -42,21 +45,14 @@ public enum : <%= @grammar.prefix %>token_t
 <%   end %>
 <% end %>
    INVALID_TOKEN_ID = <%= @grammar.invalid_token_id %>,
+    TERMINATE_TOKEN_ID = <%= @grammar.terminate_token_id %>,
 }

 /** Code point type. */
 public alias <%= @grammar.prefix %>code_point_t = uint;

-/** Parser values type(s). */
-public union <%= @grammar.prefix %>value_t
-{
-<% @grammar.ptypes.each do |name, typestring| %>
-    <%= typestring %> v_<%= name %>;
-<% end %>
-}
-
 /**
- * A structure to keep track of parser position.
+ * A structure to keep track of input position.
 *
 * This is useful for reporting errors, etc...
 */
@ -67,14 +63,79 @@ public struct <%= @grammar.prefix %>position_t

    /** Input text column (0-based). */
    uint col;
+
+    /** Invalid position value. */
+    enum INVALID = <%= @grammar.prefix %>position_t(0xFFFF_FFFF, 0xFFFF_FFFF);
+
+    /** Return whether the position is valid. */
+    public @property bool valid()
+    {
+        return row != 0xFFFF_FFFFu;
+    }
 }

+<% if @grammar.ast %>
+/** Parser values type. */
+public alias <%= @grammar.prefix %>value_t = <%= @grammar.ptype %>;
+<% else %>
+/** Parser values type(s). */
+public union <%= @grammar.prefix %>value_t
+{
+<%   @grammar.ptypes.each do |name, typestring| %>
+    <%= typestring %> v_<%= name %>;
+<%   end %>
+}
+<% end %>
+
+<% if @grammar.ast %>
+/** Common AST node structure. */
+private struct ASTNode
+{
+    <%= @grammar.prefix %>position_t position;
+    <%= @grammar.prefix %>position_t end_position;
+    void *[0] fields;
+}
+
+/** AST node types. @{ */
+public struct <%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %>
+{
+    /* ASTNode fields must be present in the same order here. */
+    <%= @grammar.prefix %>position_t position;
+    <%= @grammar.prefix %>position_t end_position;
+    <%= @grammar.prefix %>token_t token;
+    <%= @grammar.prefix %>value_t pvalue;
+}
+
+<%   @parser.rule_sets.each do |name, rule_set| %>
+<%     next if name.start_with?("$") %>
+<%     next if rule_set.optional? %>
+public struct <%= @grammar.ast_prefix %><%= name %><%= @grammar.ast_suffix %>
+{
+    <%= @grammar.prefix %>position_t position;
+    <%= @grammar.prefix %>position_t end_position;
+<%     rule_set.ast_fields.each do |fields| %>
+    union
+    {
+<%       fields.each do |field_name, type| %>
+        <%= type %> * <%= field_name %>;
+<%       end %>
+    }
+<%     end %>
+}
+
+<%   end %>
+/** @} */
+<% end %>
+
 /** Lexed token information. */
 public struct <%= @grammar.prefix %>token_info_t
 {
-    /** Text position where the token was found. */
+    /** Text position of first code point in token. */
    <%= @grammar.prefix %>position_t position;

+    /** Text position of last code point in token. */
+    <%= @grammar.prefix %>position_t end_position;
+
    /** Number of input bytes used by the token. */
    size_t length;

@ -110,10 +171,17 @@ public struct <%= @grammar.prefix %>context_t
    /* Parser context data. */

    /** Parse result value. */
+<% if @grammar.ast %>
+    <%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> * parse_result;
+<% else %>
    <%= @grammar.prefix %>value_t parse_result;
+<% end %>

    /** Unexpected token received. */
    <%= @grammar.prefix %>token_t token;
+
+    /** User terminate code. */
+    size_t user_terminate_code;
 }

 /**************************************************************************
@ -141,6 +209,7 @@ private enum : size_t
    P_UNEXPECTED_TOKEN,
    P_DROP,
    P_EOF,
+    P_USER_TERMINATED,
 }
 <% end %>

@ -330,7 +399,10 @@ private struct lexer_match_info_t
    /** Number of bytes of input text used to match. */
    size_t length;

-    /** Input text position delta. */
+    /** Input text position delta to end of token. */
+    <%= @grammar.prefix %>position_t end_delta_position;
+
+    /** Input text position delta to next code point after token end. */
    <%= @grammar.prefix %>position_t delta_position;

    /** Accepting lexer state from the match. */
@ -422,9 +494,12 @@ private lexer_state_id_t check_lexer_transition(uint current_state, uint code_po
 *
 * @param context
 *   Lexer/parser context structure.
- * @param[out] out_token_info
- *   The lexed token information is stored here if the return value is
- *   P_SUCCESS.
+ * @param[out] out_match_info
+ *   The longest match information is stored here if the return value is
+ *   P_SUCCESS or P_DECODE_ERROR.
+ * @param[out] out_unexpected_input_length
+ *   The unexpected input length is stored here if the return value is
+ *   P_UNEXPECTED_INPUT.
 *
 * @reval P_SUCCESS
 *   A token was successfully lexed.
@ -455,6 +530,7 @@ private size_t find_longest_match(<%= @grammar.prefix %>context_t * context,
            if (transition_state != INVALID_LEXER_STATE_ID)
            {
                attempt_match.length += code_point_length;
+                attempt_match.end_delta_position = attempt_match.delta_position;
                if (code_point == '\n')
                {
                    attempt_match.delta_position.row++;
@ -502,7 +578,6 @@ private size_t find_longest_match(<%= @grammar.prefix %>context_t * context,
                /* Valid EOF return. */
                return P_EOF;
            }
-            break;

        case P_DECODE_ERROR:
            /* If we see a decode error, we may be partially in the middle of
@ -534,13 +609,14 @@ private size_t find_longest_match(<%= @grammar.prefix %>context_t * context,
 *   Input text does not match any lexer pattern.
 * @retval P_DROP
 *   A drop pattern was matched so the lexer should continue.
+ * @retval P_USER_TERMINATED
+ *   User code has requested to terminate the lexer.
 */
 private size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%= @grammar.prefix %>token_info_t * out_token_info)
 {
    <%= @grammar.prefix %>token_info_t token_info;
    token_info.position = context.text_position;
    token_info.token = INVALID_TOKEN_ID;
-    *out_token_info = token_info; // TODO: remove
    lexer_match_info_t match_info;
    size_t unexpected_input_length;
    size_t result = find_longest_match(context, &match_info, &unexpected_input_length);
@ -553,6 +629,12 @@ private size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%=
            string match = context.input[context.input_index..(context.input_index + match_info.length)];
            <%= @grammar.prefix %>token_t user_code_token = lexer_user_code(context,
                match_info.accepting_state.code_id, match, &token_info);
+            /* A TERMINATE_TOKEN_ID return code from lexer_user_code() means
+             * that the user code is requesting to terminate the lexer. */
+            if (user_code_token == TERMINATE_TOKEN_ID)
+            {
+                return P_USER_TERMINATED;
+            }
            /* An invalid token returned from lexer_user_code() means that the
             * user code did not explicitly return a token. So only override
             * the token to return if the user code does explicitly return a
@ -581,11 +663,22 @@ private size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%=
        }
        token_info.token = token_to_accept;
        token_info.length = match_info.length;
+        if (match_info.end_delta_position.row != 0u)
+        {
+            token_info.end_position.row = token_info.position.row + match_info.end_delta_position.row;
+            token_info.end_position.col = match_info.end_delta_position.col;
+        }
+        else
+        {
+            token_info.end_position.row = token_info.position.row;
+            token_info.end_position.col = token_info.position.col + match_info.end_delta_position.col;
+        }
        *out_token_info = token_info;
        return P_SUCCESS;

    case P_EOF:
        token_info.token = TOKEN___EOF;
+        token_info.end_position = token_info.position;
        *out_token_info = token_info;
        return P_SUCCESS;

@ -623,6 +716,8 @@ private size_t attempt_lex_token(<%= @grammar.prefix %>context_t * context, <%=
 *   The decoder encountered invalid text encoding.
 * @reval P_UNEXPECTED_INPUT
 *   Input text does not match any lexer pattern.
+ * @retval P_USER_TERMINATED
+ *   User code has requested to terminate the lexer.
 */
 public size_t <%= @grammar.prefix %>lex(<%= @grammar.prefix %>context_t * context, <%= @grammar.prefix %>token_info_t * out_token_info)
 {
@ -699,6 +794,25 @@ private struct reduce_t
     * reduce action.
     */
    parser_state_id_t n_states;
+<% if @grammar.ast %>
+
+    /**
+     * Map of rule components to rule set child fields.
+     */
+    immutable(ushort) * rule_set_node_field_index_map;
+
+    /**
+     * Number of rule set AST node fields.
+     */
+    ushort rule_set_node_field_array_size;
+
+    /**
+     * Whether this rule was a generated optional rule that matched the
+     * optional target. In this case, propagate the matched target node up
+     * instead of making a new node for this rule.
+     */
+    bool propagate_optional_target;
+<% end %>
 }

 /** Parser state entry. */
@ -730,6 +844,11 @@ private struct state_value_t
    /** Parser value from this state. */
    <%= @grammar.prefix %>value_t pvalue;

+<% if @grammar.ast %>
+    /** AST node. */
+    void * ast_node;
+<% end %>
+
    this(size_t state_id)
    {
        this.state_id = state_id;
@ -739,14 +858,32 @@ private struct state_value_t
 /** Parser shift table. */
 private immutable shift_t[] parser_shift_table = [
 <%   @parser.shift_table.each do |shift| %>
-    shift_t(<%= shift[:symbol_id] %>u, <%= shift[:state_id] %>u),
+    shift_t(<%= shift[:symbol].id %>u, <%= shift[:state_id] %>u),
 <%   end %>
 ];

+<% if @grammar.ast %>
+<%   @grammar.rules.each do |rule| %>
+<%     unless rule.flat_rule_set_node_field_index_map? %>
+immutable ushort[<%= rule.rule_set_node_field_index_map.size %>] r_<%= rule.name.gsub("$", "_") %><%= rule.id %>_node_field_index_map = [<%= rule.rule_set_node_field_index_map.map {|v| v.to_s}.join(", ") %>];
+<%     end %>
+<%   end %>
+<% end %>
+
 /** Parser reduce table. */
 private immutable reduce_t[] parser_reduce_table = [
 <%   @parser.reduce_table.each do |reduce| %>
-    reduce_t(<%= reduce[:token_id] %>u, <%= reduce[:rule_id] %>u, <%= reduce[:rule_set_id] %>u, <%= reduce[:n_states] %>u),
+    reduce_t(<%= reduce[:token_id] %>u, <%= reduce[:rule_id] %>u, <%= reduce[:rule_set_id] %>u, <%= reduce[:n_states] %>u
+<%     if @grammar.ast %>
+<%       if reduce[:rule].flat_rule_set_node_field_index_map? %>
+             , null
+<%       else %>
+             , &r_<%= reduce[:rule].name.gsub("$", "_") %><%= reduce[:rule].id %>_node_field_index_map[0]
+<%       end %>
+             , <%= reduce[:rule].rule_set.ast_fields.size %>
+             , <%= reduce[:propagate_optional_target] %>
+<%     end %>
+            ),
 <%   end %>
 ];

@ -757,17 +894,19 @@ private immutable parser_state_t[] parser_state_table = [
 <%   end %>
 ];

+<% unless @grammar.ast %>
 /**
 * Execute user code associated with a parser rule.
 *
 * @param rule The ID of the rule.
 *
- * @return Parse value.
+ * @retval P_SUCCESS
+ *   Continue parsing.
+ * @retval P_USER_TERMINATED
+ *   User requested to terminate parsing.
 */
-private <%= @grammar.prefix %>value_t parser_user_code(uint rule, state_value_t[] statevalues, uint n_states)
+private size_t parser_user_code(<%= @grammar.prefix %>value_t * _pvalue, uint rule, state_value_t[] statevalues, uint n_states, <%= @grammar.prefix %>context_t * context)
 {
-    <%= @grammar.prefix %>value_t _pvalue;
-
    switch (rule)
    {
 <%   @grammar.rules.each do |rule| %>
@ -780,8 +919,9 @@ private <%= @grammar.prefix %>value_t parser_user_code(uint rule, state_value_t[
    default: break;
    }

-    return _pvalue;
+    return P_SUCCESS;
 }
+<% end %>

 /**
 * Check if the parser should shift to a new state.
@ -843,7 +983,7 @@ private size_t check_reduce(size_t state_id, <%= @grammar.prefix %>token_t token
 *   can be accessed with <%= @grammar.prefix %>result().
 * @retval P_UNEXPECTED_TOKEN
 *   An unexpected token was encountered that does not match any grammar rule.
- *   The value context.token holds the unexpected token.
+ *   The function p_token(&context) can be used to get the unexpected token.
 * @reval P_DECODE_ERROR
 *   The decoder encountered invalid text encoding.
 * @reval P_UNEXPECTED_INPUT
@ -855,7 +995,11 @@ public size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * cont
    <%= @grammar.prefix %>token_t token = INVALID_TOKEN_ID;
    state_value_t[] statevalues = new state_value_t[](1);
    size_t reduced_rule_set = INVALID_ID;
+<% if @grammar.ast %>
+    void * reduced_parser_node;
+<% else %>
    <%= @grammar.prefix %>value_t reduced_parser_value;
+<% end %>
    for (;;)
    {
        if (token == INVALID_TOKEN_ID)
@ -878,7 +1022,11 @@ public size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * cont
            if ((shift_state != INVALID_ID) && (token == TOKEN___EOF))
            {
                /* Successful parse. */
+<% if @grammar.ast %>
+                context.parse_result = cast(<%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> *)statevalues[$-1].ast_node;
+<% else %>
                context.parse_result = statevalues[$-1].pvalue;
+<% end %>
                return P_SUCCESS;
            }
        }
@ -889,15 +1037,24 @@ public size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * cont
            if (reduced_rule_set == INVALID_ID)
            {
                /* We shifted a token, mark it consumed. */
-                token = INVALID_TOKEN_ID;
+<% if @grammar.ast %>
+                <%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %> * token_ast_node = new <%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %>(token_info.position, token_info.end_position, token, token_info.pvalue);
+                statevalues[$-1].ast_node = token_ast_node;
+<% else %>
                statevalues[$-1].pvalue = token_info.pvalue;
+<% end %>
+                token = INVALID_TOKEN_ID;
            }
            else
            {
                /* We shifted a RuleSet. */
+<% if @grammar.ast %>
+                statevalues[$-1].ast_node = reduced_parser_node;
+<% else %>
                statevalues[$-1].pvalue = reduced_parser_value;
                <%= @grammar.prefix %>value_t new_parse_result;
                reduced_parser_value = new_parse_result;
+<% end %>
                reduced_rule_set = INVALID_ID;
            }
            continue;
@ -907,7 +1064,63 @@ public size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * cont
        if (reduce_index != INVALID_ID)
        {
            /* We have something to reduce. */
-            reduced_parser_value = parser_user_code(parser_reduce_table[reduce_index].rule, statevalues, parser_reduce_table[reduce_index].n_states);
+<% if @grammar.ast %>
+            if (parser_reduce_table[reduce_index].propagate_optional_target)
+            {
+                reduced_parser_node = statevalues[$ - 1].ast_node;
+            }
+            else if (parser_reduce_table[reduce_index].n_states > 0)
+            {
+                size_t n_fields = parser_reduce_table[reduce_index].rule_set_node_field_array_size;
+                ASTNode * node = cast(ASTNode *)malloc(ASTNode.sizeof + n_fields * (void *).sizeof);
+                node.position = <%= @grammar.prefix %>position_t.INVALID;
+                node.end_position = <%= @grammar.prefix %>position_t.INVALID;
+                foreach (i; 0..n_fields)
+                {
+                    node.fields[i] = null;
+                }
+                if (parser_reduce_table[reduce_index].rule_set_node_field_index_map is null)
+                {
+                    foreach (i; 0..parser_reduce_table[reduce_index].n_states)
+                    {
+                        node.fields[i] = statevalues[$ - parser_reduce_table[reduce_index].n_states + i].ast_node;
+                    }
+                }
+                else
+                {
+                    foreach (i; 0..parser_reduce_table[reduce_index].n_states)
+                    {
+                        node.fields[parser_reduce_table[reduce_index].rule_set_node_field_index_map[i]] = statevalues[$ - parser_reduce_table[reduce_index].n_states + i].ast_node;
+                    }
+                }
+                bool position_found = false;
+                foreach (i; 0..n_fields)
+                {
+                    ASTNode * child = cast(ASTNode *)node.fields[i];
+                    if (child && child.position.valid)
+                    {
+                        if (!position_found)
+                        {
+                            node.position = child.position;
+                            position_found = true;
+                        }
+                        node.end_position = child.end_position;
+                    }
+                }
+                reduced_parser_node = node;
+            }
+            else
+            {
+                reduced_parser_node = null;
+            }
+<% else %>
+            <%= @grammar.prefix %>value_t reduced_parser_value2;
+            if (parser_user_code(&reduced_parser_value2, parser_reduce_table[reduce_index].rule, statevalues, parser_reduce_table[reduce_index].n_states, context) == P_USER_TERMINATED)
+            {
+                return P_USER_TERMINATED;
+            }
+            reduced_parser_value = reduced_parser_value2;
+<% end %>
            reduced_rule_set = parser_reduce_table[reduce_index].rule_set;
            statevalues.length -= parser_reduce_table[reduce_index].n_states;
            continue;
@ -932,9 +1145,17 @@ public size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * cont
 *
 * @return Parse result value.
 */
+<% if @grammar.ast %>
+public <%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> * <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context)
+<% else %>
 public <%= start_rule_type[1] %> <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context)
+<% end %>
 {
+<% if @grammar.ast %>
+    return context.parse_result;
+<% else %>
    return context.parse_result.v_<%= start_rule_type[0] %>;
+<% end %>
 }

 /**
@ -949,3 +1170,26 @@ public <%= @grammar.prefix %>position_t <%= @grammar.prefix %>position(<%= @gram
 {
    return context.text_position;
 }
+
+/**
+ * Get the user terminate code.
+ *
+ * @param context
+ *   Lexer/parser context structure.
+ *
+ * @return User terminate code.
+ */
+public size_t <%= @grammar.prefix %>user_terminate_code(<%= @grammar.prefix %>context_t * context)
+{
+    return context.user_terminate_code;
+}
+
+/**
+ * Get the parse token.
+ *
+ * @return Parse token.
+ */
+public <%= @grammar.prefix %>token_t <%= @grammar.prefix %>token(<%= @grammar.prefix %>context_t * context)
+{
+    return context.token;
+}
--- a/assets/parser.h.erb
+++ b/assets/parser.h.erb
@ -20,9 +20,10 @@
 #define <%= @grammar.prefix.upcase %>UNEXPECTED_TOKEN 3u
 #define <%= @grammar.prefix.upcase %>DROP 4u
 #define <%= @grammar.prefix.upcase %>EOF 5u
+#define <%= @grammar.prefix.upcase %>USER_TERMINATED 6u

 /** Token type. */
-typedef <%= get_type_for(@grammar.invalid_token_id) %> <%= @grammar.prefix %>token_t;
+typedef <%= get_type_for(@grammar.terminate_token_id) %> <%= @grammar.prefix %>token_t;

 /** Token IDs. */
 <% @grammar.tokens.each_with_index do |token, index| %>
@ -32,23 +33,13 @@ typedef <%= get_type_for(@grammar.invalid_token_id) %> <%= @grammar.prefix %>tok
 <%   end %>
 <% end %>
 #define INVALID_TOKEN_ID <%= @grammar.invalid_token_id %>u
+#define TERMINATE_TOKEN_ID <%= @grammar.terminate_token_id %>u

 /** Code point type. */
 typedef uint32_t <%= @grammar.prefix %>code_point_t;

-/** User header code blocks. */
-<%= @grammar.code_blocks.fetch("header", "") %>
-
-/** Parser values type(s). */
-typedef union
-{
-<% @grammar.ptypes.each do |name, typestring| %>
-    <%= typestring %> v_<%= name %>;
-<% end %>
-} <%= @grammar.prefix %>value_t;
-
 /**
- * A structure to keep track of parser position.
+ * A structure to keep track of input position.
 *
 * This is useful for reporting errors, etc...
 */
@ -61,12 +52,72 @@ typedef struct
    uint32_t col;
 } <%= @grammar.prefix %>position_t;

+/** Return whether the position is valid. */
+#define <%= @grammar.prefix %>position_valid(p) ((p).row != 0xFFFFFFFFu)
+
+/** User header code blocks. */
+<%= @grammar.code_blocks.fetch("header", "") %>
+
+<% if @grammar.ast %>
+/** Parser values type. */
+typedef <%= @grammar.ptype %> <%= @grammar.prefix %>value_t;
+<% else %>
+/** Parser values type(s). */
+typedef union
+{
+<%   @grammar.ptypes.each do |name, typestring| %>
+    <%= typestring %> v_<%= name %>;
+<%   end %>
+} <%= @grammar.prefix %>value_t;
+<% end %>
+
+<% if @grammar.ast %>
+/** AST node types. @{ */
+typedef struct <%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %>
+{
+    /* ASTNode fields must be present in the same order here. */
+    <%= @grammar.prefix %>position_t position;
+    <%= @grammar.prefix %>position_t end_position;
+    <%= @grammar.prefix %>token_t token;
+    <%= @grammar.prefix %>value_t pvalue;
+} <%= @grammar.ast_prefix %>Token<%= @grammar.ast_suffix %>;
+
+<%   @parser.rule_sets.each do |name, rule_set| %>
+<%     next if name.start_with?("$") %>
+<%     next if rule_set.optional? %>
+struct <%= name %>;
+<%   end %>
+
+<%   @parser.rule_sets.each do |name, rule_set| %>
+<%     next if name.start_with?("$") %>
+<%     next if rule_set.optional? %>
+typedef struct <%= @grammar.ast_prefix %><%= name %><%= @grammar.ast_suffix %>
+{
+    <%= @grammar.prefix %>position_t position;
+    <%= @grammar.prefix %>position_t end_position;
+<%     rule_set.ast_fields.each do |fields| %>
+    union
+    {
+<%       fields.each do |field_name, type| %>
+        struct <%= type %> * <%= field_name %>;
+<%       end %>
+    };
+<%     end %>
+} <%= @grammar.ast_prefix %><%= name %><%= @grammar.ast_suffix %>;
+
+<%   end %>
+/** @} */
+<% end %>
+
 /** Lexed token information. */
 typedef struct
 {
-    /** Text position where the token was found. */
+    /** Text position of first code point in token. */
    <%= @grammar.prefix %>position_t position;

+    /** Text position of last code point in token. */
+    <%= @grammar.prefix %>position_t end_position;
+
    /** Number of input bytes used by the token. */
    size_t length;

@ -105,12 +156,26 @@ typedef struct
    /* Parser context data. */

    /** Parse result value. */
+<% if @grammar.ast %>
+    <%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> * parse_result;
+<% else %>
    <%= @grammar.prefix %>value_t parse_result;
+<% end %>

    /** Unexpected token received. */
    <%= @grammar.prefix %>token_t token;
+
+    /** User terminate code. */
+    size_t user_terminate_code;
 } <%= @grammar.prefix %>context_t;

+/**************************************************************************
+ * Public data
+ *************************************************************************/
+
+/** Token names. */
+extern const char * <%= @grammar.prefix %>token_names[];
+
 void <%= @grammar.prefix %>context_init(<%= @grammar.prefix %>context_t * context, uint8_t const * input, size_t input_length);

 size_t <%= @grammar.prefix %>decode_code_point(uint8_t const * input, size_t input_length,
@ -120,6 +185,14 @@ size_t <%= @grammar.prefix %>lex(<%= @grammar.prefix %>context_t * context, <%=

 size_t <%= @grammar.prefix %>parse(<%= @grammar.prefix %>context_t * context);

+<% if @grammar.ast %>
+<%= @grammar.ast_prefix %><%= @grammar.start_rule %><%= @grammar.ast_suffix %> * <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context);
+<% else %>
 <%= start_rule_type[1] %> <%= @grammar.prefix %>result(<%= @grammar.prefix %>context_t * context);
+<% end %>

 <%= @grammar.prefix %>position_t <%= @grammar.prefix %>position(<%= @grammar.prefix %>context_t * context);
+
+size_t <%= @grammar.prefix %>user_terminate_code(<%= @grammar.prefix %>context_t * context);
+
+<%= @grammar.prefix %>token_t <%= @grammar.prefix %>token(<%= @grammar.prefix %>context_t * context);
--- a/doc/user_guide.md
+++ b/doc/user_guide.md
@ -13,7 +13,9 @@ Propane is a LALR Parser Generator (LPG) which:
  * generates a built-in lexer to tokenize input
  * supports UTF-8 lexer inputs
  * generates a table-driven shift/reduce parser to parse input in linear time
-  * target C or D language outputs
+  * targets C or D language outputs
+  * optionally supports automatic full AST generation
+  * tracks input text start and end positions for all matched tokens/rules
  * is MIT-licensed
  * is distributable as a standalone Ruby script

@ -34,9 +36,14 @@ Propane is typically invoked from the command-line as `./propane`.

    Usage: ./propane [options] <input-file> <output-file>
    Options:
-      --log LOG   Write log file
-      --version   Show program version and exit
-      -h, --help  Show this usage and exit
+      -h, --help  Show this usage and exit.
+      --log LOG   Write log file. This will show all parser states and their
+                  associated shifts and reduces. It can be helpful when
+                  debugging a grammar.
+      --version   Show program version and exit.
+      -w          Treat warnings as errors. This option will treat shift/reduce
+                  conflicts as fatal errors and will print them to stderr in
+                  addition to the log file.

 The user must specify the path to a Propane input grammar file and a path to an
 output file.
@ -77,33 +84,15 @@ token rparen /\\)/;
 # Drop whitespace.
 drop /\\s+/;

-Start -> E1 <<
-  $$ = $1;
->>
-E1 -> E2 <<
-  $$ = $1;
->>
-E1 -> E1 plus E2 <<
-  $$ = $1 + $3;
->>
-E2 -> E3 <<
-  $$ = $1;
->>
-E2 -> E2 times E3 <<
-  $$ = $1 * $3;
->>
-E3 -> E4 <<
-  $$ = $1;
->>
-E3 -> E3 power E4 <<
-  $$ = pow($1, $3);
->>
-E4 -> integer <<
-  $$ = $1;
->>
-E4 -> lparen E1 rparen <<
-  $$ = $2;
->>
+Start -> E1 << $$ = $1; >>
+E1 -> E2 << $$ = $1; >>
+E1 -> E1 plus E2 << $$ = $1 + $3; >>
+E2 -> E3 << $$ = $1; >>
+E2 -> E2 times E3 << $$ = $1 * $3; >>
+E3 -> E4 << $$ = $1; >>
+E3 -> E3 power E4 << $$ = pow($1, $3); >>
+E4 -> integer << $$ = $1; >>
+E4 -> lparen E1 rparen << $$ = $2; >>
 ```

 Grammar files can contain comment lines beginning with `#` which are ignored.
@ -117,8 +106,8 @@ lowercase character and beginning a rule name with an uppercase character.

 ##> User Code Blocks

-User code blocks begin with the line following a "<<" token and end with the
-line preceding a grammar line consisting of solely the ">>" token.
+User code blocks begin following a "<<" token and end with a ">>" token found
+at the end of a line.
 All text lines in the code block are copied verbatim into the output file.

 ### Standalone Code Blocks
@ -189,9 +178,7 @@ This parser value can then be used later in a parser rule.
 Example:

 ```
-E1 -> E1 plus E2 <<
-  $$ = $1 + $3;
->>
+E1 -> E1 plus E2 << $$ = $1 + $3; >>
 ```

 Parser rule code blocks appear following a rule expression.
@ -202,6 +189,143 @@ rule.
 Parser values for the rules or tokens in the rule pattern can be accessed
 positionally with tokens `$1`, `$2`, `$3`, etc...

+Parser rule code blocks are not available in AST generation mode.
+In AST generation mode, a full parse tree is automatically constructed in
+memory for user code to traverse after parsing is complete.
+
+##> AST generation mode - the `ast` statement
+
+To activate AST generation mode, place the `ast` statement in your grammar file:
+
+```
+ast;
+```
+
+It is recommended to place this statement early in the grammar.
+
+In AST generation mode various aspects of propane's behavior are changed:
+
+  * Only one `ptype` is allowed.
+  * Parser user code blocks are not supported.
+  * Structure types are generated to represent the parsed tokens and rules as
+  defined in the grammar.
+  * The parse result from `p_result()` points to a `Start` struct containing
+  the entire parse tree for the input. If the user has changed the start rule
+  with the `start` grammar statement, the name of the start struct will be
+  given by the user-specified start rule instead of `Start`.
+
+Example AST generation grammar:
+
+```
+ast;
+
+ptype int;
+
+token a << $$ = 11; >>
+token b << $$ = 22; >>
+token one /1/;
+token two /2/;
+token comma /,/ <<
+  $$ = 42;
+>>
+token lparen /\\(/;
+token rparen /\\)/;
+drop /\\s+/;
+
+Start -> Items;
+
+Items -> Item:item ItemsMore;
+Items -> ;
+
+ItemsMore -> comma Item:item ItemsMore;
+ItemsMore -> ;
+
+Item -> a;
+Item -> b;
+Item -> lparen Item:item rparen;
+Item -> Dual;
+
+Dual -> One Two;
+Dual -> Two One;
+One -> one;
+Two -> two;
+```
+
+The following unit test describes the fields that will be present for an
+example parse:
+
+```
+string input = "a, ((b)), b";
+p_context_t context;
+p_context_init(&context, input);
+assert_eq(P_SUCCESS, p_parse(&context));
+Start * start = p_result(&context);
+assert(start.pItems1 !is null);
+assert(start.pItems !is null);
+Items * items = start.pItems;
+assert(items.item !is null);
+assert(items.item.pToken1 !is null);
+assert_eq(TOKEN_a, items.item.pToken1.token);
+assert_eq(11, items.item.pToken1.pvalue);
+assert(items.pItemsMore !is null);
+ItemsMore * itemsmore = items.pItemsMore;
+assert(itemsmore.item !is null);
+assert(itemsmore.item.item !is null);
+assert(itemsmore.item.item.item !is null);
+assert(itemsmore.item.item.item.pToken1 !is null);
+assert_eq(TOKEN_b, itemsmore.item.item.item.pToken1.token);
+assert_eq(22, itemsmore.item.item.item.pToken1.pvalue);
+assert(itemsmore.pItemsMore !is null);
+itemsmore = itemsmore.pItemsMore;
+assert(itemsmore.item !is null);
+assert(itemsmore.item.pToken1 !is null);
+assert_eq(TOKEN_b, itemsmore.item.pToken1.token);
+assert_eq(22, itemsmore.item.pToken1.pvalue);
+assert(itemsmore.pItemsMore is null);
+```
+
+## `ast_prefix` and `ast_suffix` statements
+
+In AST generation mode, structure types are defined and named based on the
+rules in the grammar.
+Additionally, a structure type called `Token` is generated to hold parsed
+token information.
+
+These structure names can be modified by using the `ast_prefix` or `ast_suffix`
+statements in the grammar file.
+The field names that point to instances of the structures are not affected by
+the `ast_prefix` or `ast_suffix` values.
+
+For example, if the following two lines were added to the example above:
+
+```
+ast_prefix ABC;
+ast_suffix XYZ;
+```
+
+Then the types would be used as such instead:
+
+```
+string input = "a, ((b)), b";
+p_context_t context;
+p_context_init(&context, input);
+assert_eq(P_SUCCESS, p_parse(&context));
+ABCStartXYZ * start = p_result(&context);
+assert(start.pItems1 !is null);
+assert(start.pItems !is null);
+ABCItemsXYZ * items = start.pItems;
+assert(items.pItem !is null);
+assert(items.pItem.pToken1 !is null);
+assert_eq(TOKEN_a, items.pItem.pToken1.token);
+assert_eq(11, items.pItem.pToken1.pvalue);
+assert(items.pItemsMore !is null);
+ABCItemsMoreXYZ * itemsmore = items.pItemsMore;
+assert(itemsmore.pItem !is null);
+assert(itemsmore.pItem.pItem !is null);
+assert(itemsmore.pItem.pItem.pItem !is null);
+assert(itemsmore.pItem.pItem.pItem.pToken1 !is null);
+```
+
 ##> Specifying tokens - the `token` statement

 The `token` statement allows defining a lexer token and a pattern to match that
@ -238,9 +362,7 @@ lexer.
 Example:

 ```
-token if <<
-  writeln("'if' keyword lexed");
->>
+token if << writeln("'if' keyword lexed"); >>
 ```

 The `token` statement is actually a shortcut statement for a combination of a
@ -277,9 +399,7 @@ code but may not result in a matched token.
 Example:

 ```
-/foo+/ <<
-  writeln("saw a foo pattern");
->>
+/foo+/ << writeln("saw a foo pattern"); >>
 ```

 This can be especially useful with ${#Lexer modes}.
@ -325,9 +445,16 @@ Regular expressions can include many special characters:
  * The `(` character begins a matching group.
  * The `{` character begins a count qualifier.
  * The `\` character escapes the following character and changes its meaning:
+    * The `\a` sequence matches an ASCII bell character (0x07).
+    * The `\b` sequence matches an ASCII backspace character (0x08).
    * The `\d` sequence matches any character `0` through `9`.
+    * The `\f` sequence matches an ASCII form feed character (0x0C).
+    * The `\n` sequence matches an ASCII new line character (0x0A).
+    * The `\r` sequence matches an ASCII carriage return character (0x0D).
    * The `\s` sequence matches a space, horizontal tab `\t`, carriage return
    `\r`, a form feed `\f`, or a vertical tab `\v` character.
+    * The `\t` sequence matches an ASCII tab character (0x09).
+    * The `\v` sequence matches an ASCII vertical tab character (0x0B).
    * Any other character matches itself.
  * The `|` character creates an alternate match.

@ -381,9 +508,7 @@ tokenid str;
  mystringvalue = "";
  $mode(string);
 >>
-string: /[^"]+/ <<
-  mystringvalue += match;
->>
+string: /[^"]+/ << mystringvalue ~= match; >>
 string: /"/ <<
  $mode(default);
  return $token(str);
@ -440,20 +565,12 @@ ptype Value;
 ptype array = Value[];
 ptype dict = Value[string];

-Object -> lbrace rbrace <<
-  $$ = new Value();
->>
+Object -> lbrace rbrace << $$ = new Value(); >>

-Values (array) -> Value <<
-  $$ = [$1];
->>
-Values -> Values comma Value <<
-  $$ = $1 ~ [$3];
->>
+Values (array) -> Value << $$ = [$1]; >>
+Values -> Values comma Value << $$ = $1 ~ [$3]; >>

-KeyValue (dict) -> string colon Value <<
-  $$ = [$1: $3];
->>
+KeyValue (dict) -> string colon Value << $$ = [$1: $3]; >>
 ```

 In this example, the default parser value type is `Value`.
@ -469,6 +586,12 @@ In this example:
  * a reduced `Values`'s parser value has a type of `Value[]`.
  * a reduced `KeyValue`'s parser value has a type of `Value[string]`.

+When AST generation mode is active, the `ptype` functionality works differently.
+In this mode, only one `ptype` is used by the parser.
+Lexer user code blocks may assign a parse value to the generated `Token` node
+by assigning to `$$` within a lexer code block.
+The type of the parse value `$$` is given by the global `ptype` type.
+
 ##> Specifying a parser rule - the rule statement

 Rule statements create parser rules which define the grammar that will be
@ -479,66 +602,86 @@ Rules with the same name define a rule set for that name and act as
 alternatives that the parser can accept when attempting to match a reference to
 that rule.

-The grammar file must define a rule with the name `Start` which will be used as
-the top-level starting rule that the parser attempts to reduce.
+The default start rule name is `Start`.
+This can be changed with the `start` statement.
+The grammar file must define a rule with the name of the start rule name which
+will be used as the top-level starting rule that the parser attempts to reduce.
+
+Rule statements are composed of the name of the rule, a `->` token, the fields
+defining the rule pattern that must be matched, and a terminating semicolon or
+user code block.

 Example:

 ```
 ptype ulong;
-token word /[a-z]+/ <<
-  $$ = match.length;
->>
-Start -> word <<
-  $$ = $1;
->>
+start Top;
+token word /[a-z]+/ << $$ = match.length; >>
+Top -> word << $$ = $1; >>
 ```

-In the above example the `Start` rule is defined to match a single `word`
+In the above example the `Top` rule is defined to match a single `word`
 token.

-Example:
+Another example:

 ```
-Start -> E1 <<
-  $$ = $1;
->>
-E1 -> E2 <<
-  $$ = $1;
->>
-E1 -> E1 plus E2 <<
-  $$ = $1 + $3;
->>
-E2 -> E3 <<
-  $$ = $1;
->>
-E2 -> E2 times E3 <<
-  $$ = $1 * $3;
->>
-E3 -> E4 <<
-  $$ = $1;
->>
-E3 -> E3 power E4 <<
-  $$ = pow($1, $3);
->>
-E4 -> integer <<
-  $$ = $1;
->>
-E4 -> lparen E1 rparen <<
-  $$ = $2;
->>
+Start -> E1 << $$ = $1; >>
+E1 -> E2 << $$ = $1; >>
+E1 -> E1 plus E2 << $$ = $1 + $3; >>
+E2 -> E3 << $$ = $1; >>
+E2 -> E2 times E3 << $$ = $1 * $3; >>
+E3 -> E4 << $$ = $1; >>
+E3 -> E3 power E4 << $$ = pow($1, $3); >>
+E4 -> integer << $$ = $1; >>
+E4 -> lparen E1 rparen << $$ = $2; >>
 ```

-A parser rule has zero or more terms on the right side of its definition.
-Each of these terms is either a token name or a rule name.
+This example uses the default start rule name of `Start`.

-In a parser rule code block, parser values for the right side terms are
-accessible as `$1` for the first term's parser value, `$2` for the second
-term's parser value, etc...
+A parser rule has zero or more fields on the right side of its definition.
+Each of these fields is either a token name or a rule name.
+A field can optionally be followed by a `:` and then a field alias name.
+If present, the field alias name is used to refer to the field value in user
+code blocks, or if AST mode is active, the field alias name is used as the
+field name in the generated AST node structure.
+A field can be immediately followed by a `?` character to signify that it is
+optional.
+Another example:
+
+```
+token public;
+token private;
+token int;
+token ident /[a-zA-Z_][a-zA-Z_0-9]*/;
+token semicolon /;/;
+IntegerDeclaration -> Visibility? int ident:name semicolon;
+Visibility -> public;
+Visibility -> private;
+```
+
+In a parser rule code block, parser values for the right side fields are
+accessible as `$1` for the first field's parser value, `$2` for the second
+field's parser value, etc...
+For the `IntegerDeclaration` rule, the third field value can also be referred
+to as `${name}`.
 The `$$` symbol accesses the output parser value for this rule.
 The above examples demonstrate how the parser values for the rule components
 can be used to produce the parser value for the accepted rule.

+Parser rule code blocks are not allowed and not used when AST generation mode
+is active.
+
+##> Specifying the parser start rule name - the `start` statement
+
+The start rule can be changed from the default of `Start` by using the `start`
+statement.
+Example:
+
+```
+start MyStartRule;
+```
+
 ##> Specifying the parser module name - the `module` statement

 The `module` statement can be used to specify the module name for a generated
@ -574,6 +717,309 @@ default.
 It can also be used when generating multiple lexers/parsers to be used in the
 same program to avoid symbol collisions.

+##> User termination of the lexer or parser
+
+Propane supports allowing lexer or parser user code blocks to terminate
+execution of the parser.
+Some example uses of this functionality could be to:
+
+  * Detect integer overflow when lexing an integer literal constant.
+  * Detect and report an error as soon as possible during parsing before continuing to parse any more of the input.
+  * Determine whether parsing should stop and instead be performed using a different parser version.
+
+To terminate parsing from a lexer or parser user code block, use the
+`$terminate(code)` function, passing an integer expression argument.
+For example:
+
+```
+NewExpression -> new Expression << $terminate(42); >>
+```
+
+The value passed to the `$terminate()` function is known as the "user terminate
+code".
+If the parser returns a `P_USER_TERMINATED` result code, then the user
+terminate code can be accessed using the `p_user_terminate_code()` API
+function.
+
+#> Propane generated API
+
+By default, Propane uses a prefix of `p_` when generating a lexer/parser.
+This prefix is used for all publicly declared types and functions.
+The uppercase version of the prefix is used for all constant values.
+
+This section documents the generated API using the default `p_` or `P_` names.
+
+##> Constants
+
+Propane generates the following result code constants:
+
+* `P_SUCCESS`: A successful decode/lex/parse operation has taken place.
+* `P_DECODE_ERROR`: An error occurred when decoding UTF-8 input.
+* `P_UNEXPECTED_INPUT`: Input was received by the lexer that does not match any lexer pattern.
+* `P_UNEXPECTED_TOKEN`: A token was seen in a location that does not match any parser rule.
+* `P_DROP`: The lexer matched a drop pattern.
+* `P_EOF`: The lexer reached the end of the input string.
+* `P_USER_TERMINATED`: A parser user code block has requested to terminate the parser.
+
+Result codes are returned by the functions `p_decode_input()`, `p_lex()`, and `p_parse()`.
+
+##> Types
+
+### `p_context_t`
+
+Propane defines a `p_context_t` structure type.
+The structure is intended to be used opaquely and stores information related to
+the state of the lexer and parser.
+Integrating code must define an instance of the `p_context_t` structure.
+A pointer to this instance is passed to the generated functions.
+
+### `p_position_t`
+
+The `p_position_t` structure contains two fields `row` and `col`.
+These fields contain the 0-based row and column describing a parser position.
+
+For D targets, the `p_position_t` structure can be checked for validity by
+querying the `valid` property.
+
+For C targets, the `p_position_t` structure can be checked for validity by
+calling `p_position_valid(pos)` where `pos` is a `p_position_t` structure
+instance.
+
+### AST Node Types
+
+If AST generation mode is enabled, a structure type for each rule will be
+generated.
+The name of the structure type is given by the name of the rule.
+Additionally a structure type called `Token` is generated to represent an
+AST node which refers to a raw parser token rather than a composite rule.
+
+#### AST Node Fields
+
+All AST nodes have a `position` field specifying the text position of the
+beginning of the matched token or rule, and an `end_position` field specifying
+the text position of the end of the matched token or rule.
+Each of these fields are instances of the `p_position_t` structure.
+
+A `Token` node will always have a valid `position` and `end_position`.
+A rule node may not have valid positions if the rule allows for an empty match.
+In this case the `position` structure should be checked for validity before
+using it.
+For C targets this can be accomplished with
+`if (p_position_valid(node->position))` and for D targets this can be
+accomplished with `if (node.position.valid)`.
+
+A `Token` node has the following additional fields:
+
+  * `token` which specifies which token was parsed (one of `TOKEN_*`)
+  * `pvalue` which specifies the parser value for the token. If a lexer user
+  code block assigned to `$$`, the assigned value will be stored here.
+
+AST node structures for rules contain generated fields based on the
+right hand side components specified for all rules of a given name.
+
+In this example:
+
+```
+Start -> Items;
+
+Items -> Item ItemsMore;
+Items -> ;
+```
+
+The `Start` structure will have a field called `pItems` and another field of
+the same name but with a positional suffix (`pItems1`) which both point to the
+parsed `Items` node.
+Their value will be null if the parsed `Items` rule was empty.
+
+The `Items` structure will have fields:
+
+  * `pItem` and `pItem1` which point to the parsed `Item` structure.
+  * `pItemsMore` and `pItemsMore2` which point to the parsed `ItemsMore` structure.
+
+If a rule can be empty (for example in the second `Items` rule above), then
+an instance of a pointer to that rule's generated AST node will be null if the
+parser matches the empty rule pattern.
+
+The non-positional AST node field pointer will not be generated if there are
+multiple positions in which an instance of the node it points to could be
+present.
+For example, in the below rules:
+
+```
+Dual -> One Two;
+Dual -> Two One;
+```
+
+The generated `Dual` structure will contain `pOne1`, `pTwo2`, `pTwo1`, and
+`pOne2` fields.
+However, a `pOne` field and `pTwo` field will not be generated since it would
+be ambiguous which one was matched.
+
+If the first rule is matched, then `pOne1` and `pTwo2` will be non-null while
+`pTwo1` and `pOne2` will be null.
+If the second rule is matched instead, then the opposite would be the case.
+
+If a field alias is present in a rule definition, an additional field will be
+generated in the AST node with the field alias name.
+For example:
+
+```
+Exp -> Exp:left plus ExpB:right;
+```
+
+In the generated `Exp` structure, the fields `pExp`, `pExp1`, and `left` will
+all point to the same child node (an instance of the `Exp` structure), and the
+fields `pExpB`, `pExpB3`, and `right` will all point to the same child node
+(an instance of the `ExpB` structure).
+
+##> Functions
+
+### `p_context_init`
+
+The `p_context_init()` function must be called to initialize the context
+structure.
+The input to be used for lexing/parsing is passed in when initializing the
+context structure.
+
+C example:
+
+```
+p_context_t context;
+p_context_init(&context, input, input_length);
+```
+
+D example:
+
+```
+p_context_t context;
+p_context_init(&context, input);
+```
+
+### `p_parse`
+
+The `p_parse()` function is the main entry point to the parser.
+It must be passed a pointer to an initialized context structure.
+
+Example:
+
+```
+p_context_t context;
+p_context_init(&context, input, input_length);
+size_t result = p_parse(&context);
+```
+
+### `p_position_valid`
+
+The `p_position_valid()` function is only generated for C targets.
+it is used to determine whether or not a `p_position_t` structure is valid.
+
+Example:
+
+```
+if (p_position_valid(node->position))
+{
+    ....
+}
+```
+
+For D targets, rather than using `p_position_valid()`, the `valid` property
+function of the `p_position_t` structure can be queried
+(e.g. `if (node.position.valid)`).
+
+### `p_result`
+
+The `p_result()` function can be used to retrieve the final parse value after
+`p_parse()` returns a `P_SUCCESS` value.
+
+Example:
+
+```
+p_context_t context;
+p_context_init(&context, input, input_length);
+size_t result = p_parse(&context);
+if (p_parse(&context) == P_SUCCESS)
+{
+    result = p_result(&context);
+}
+```
+
+If AST generation mode is active, then the `p_result()` function returns a
+`Start *` pointing to the `Start` AST structure.
+
+### `p_position`
+
+The `p_position()` function can be used to retrieve the parser position where
+an error occurred.
+
+Example:
+
+```
+p_context_t context;
+p_context_init(&context, input, input_length);
+size_t result = p_parse(&context);
+if (p_parse(&context) == P_UNEXPECTED_TOKEN)
+{
+    p_position_t error_position = p_position(&context);
+    fprintf(stderr, "Error: unexpected token at row %u column %u\n",
+        error_position.row + 1, error_position.col + 1);
+}
+```
+
+### `p_user_terminate_code`
+
+The `p_user_terminate_code()` function can be used to retrieve the user
+terminate code after `p_parse()` returns a `P_USER_TERMINATED` value.
+User terminate codes are arbitrary values that can be defined by the user to
+be returned when the user requests to terminate parsing.
+They have no particular meaning to Propane.
+
+Example:
+
+```
+if (p_parse(&context) == P_USER_TERMINATED)
+{
+    size_t user_terminate_code = p_user_terminate_code(&context);
+}
+```
+
+### `p_token`
+
+The `p_token()` function can be used to retrieve the current parse token.
+This is useful after `p_parse()` returns a `P_UNEXPECTED_TOKEN` value.
+terminate code after `p_parse()` returns a `P_USER_TERMINATED` value to
+indicate what token the parser was not expecting.
+
+Example:
+
+```
+if (p_parse(&context) == P_UNEXPECTED_TOKEN)
+{
+    p_token_t unexpected_token = p_token(&context);
+}
+```
+
+##> Data
+
+### `p_token_names`
+
+The `p_token_names` array contains the grammar-specified token names.
+It is indexed by the token ID.
+
+C example:
+
+```
+p_context_t context;
+p_context_init(&context, input, input_length);
+size_t result = p_parse(&context);
+if (p_parse(&context) == P_UNEXPECTED_TOKEN)
+{
+    p_position_t error_position = p_position(&context);
+    fprintf(stderr, "Error: unexpected token `%s' at row %u column %u\n",
+        p_token_names[context.token],
+        error_position.row + 1, error_position.col + 1);
+}
+```
+
 #> License

 Propane is licensed under the terms of the MIT License:
--- a/extra/vim/ftdetect/propane.vim
+++ b/extra/vim/ftdetect/propane.vim
@ -0,0 +1 @@
+au BufNewFile,BufRead *.propane set filetype=propane
--- a/extra/vim/syntax/propane.vim
+++ b/extra/vim/syntax/propane.vim
@ -0,0 +1,33 @@
+" Vim syntax file for Propane
+" Language: propane
+" Maintainer: Josh Holtrop
+" URL: https://github.com/holtrop/propane
+
+if exists("b:current_syntax")
+  finish
+endif
+
+if !exists("b:propane_subtype")
+  let b:propane_subtype = "d"
+endif
+
+exe "syn include @propaneTarget syntax/".b:propane_subtype.".vim"
+
+syn region propaneTarget matchgroup=propaneDelimiter start="<<" end=">>$" contains=@propaneTarget keepend
+
+syn match propaneComment "#.*"
+syn match propaneOperator "->"
+syn match propaneFieldAlias ":[a-zA-Z0-9_]\+" contains=propaneFieldOperator
+syn match propaneFieldOperator ":" contained
+syn match propaneOperator "?"
+syn keyword propaneKeyword ast ast_prefix ast_suffix drop module prefix ptype start token tokenid
+
+syn region propaneRegex start="/" end="/" skip="\\/"
+
+hi def link propaneComment Comment
+hi def link propaneKeyword Keyword
+hi def link propaneRegex String
+hi def link propaneOperator Operator
+hi def link propaneFieldOperator Operator
+hi def link propaneDelimiter Delimiter
+hi def link propaneFieldAlias Identifier
--- a/lib/propane.rb
+++ b/lib/propane.rb
@ -31,10 +31,10 @@ class Propane

  class << self

-    def run(input_file, output_file, log_file)
+    def run(input_file, output_file, log_file, options)
      begin
        grammar = Grammar.new(File.read(input_file))
-        generator = Generator.new(grammar, output_file, log_file)
+        generator = Generator.new(grammar, output_file, log_file, options)
        generator.generate
      rescue Error => e
        $stderr.puts e.message
--- a/lib/propane/cli.rb
+++ b/lib/propane/cli.rb
@ -4,15 +4,21 @@ class Propane
    USAGE = <<EOF
 Usage: #{$0} [options] <input-file> <output-file>
 Options:
-  --log LOG   Write log file
-  --version   Show program version and exit
-  -h, --help  Show this usage and exit
+  -h, --help  Show this usage and exit.
+  --log LOG   Write log file. This will show all parser states and their
+              associated shifts and reduces. It can be helpful when
+              debugging a grammar.
+  --version   Show program version and exit.
+  -w          Treat warnings as errors. This option will treat shift/reduce
+              conflicts as fatal errors and will print them to stderr in
+              addition to the log file.
 EOF

    class << self

      def run(args)
        params = []
+        options = {}
        log_file = nil
        i = 0
        while i < args.size
@ -24,11 +30,13 @@ EOF
              log_file = args[i]
            end
          when "--version"
-            puts "propane v#{VERSION}"
+            puts "propane version #{VERSION}"
            return 0
          when "-h", "--help"
            puts USAGE
            return 0
+          when "-w"
+            options[:warnings_as_errors] = true
          when /^-/
            $stderr.puts "Error: unknown option #{arg}"
            return 1
@ -45,7 +53,7 @@ EOF
          $stderr.puts "Error: cannot read #{params[0]}"
          return 2
        end
-        Propane.run(*params, log_file)
+        Propane.run(*params, log_file, options)
      end

    end
--- a/lib/propane/generator.rb
+++ b/lib/propane/generator.rb
@ -2,7 +2,7 @@ class Propane

  class Generator

-    def initialize(grammar, output_file, log_file)
+    def initialize(grammar, output_file, log_file, options)
      @grammar = grammar
      @output_file = output_file
      if log_file
@ -16,6 +16,7 @@ class Propane
        else
          "d"
        end
+      @options = options
      process_grammar!
    end

@ -51,6 +52,7 @@ class Propane
      unless found_default
        raise Error.new("No patterns found for default mode")
      end
+      check_ptypes!
      # Add EOF token.
      @grammar.tokens << Token.new("$EOF", nil, nil)
      tokens_by_name = {}
@ -66,11 +68,14 @@ class Propane
        tokens_by_name[token.name] = token
      end
      # Check for user start rule.
-      unless @grammar.rules.find {|rule| rule.name == "Start"}
-        raise Error.new("Start rule not found")
+      unless @grammar.rules.find {|rule| rule.name == @grammar.start_rule}
+        raise Error.new("Start rule `#{@grammar.start_rule}` not found")
      end
      # Add "real" start rule.
-      @grammar.rules.unshift(Rule.new("$Start", ["Start", "$EOF"], nil, nil, nil))
+      @grammar.rules.unshift(Rule.new("$Start", [@grammar.start_rule, "$EOF"], nil, nil, nil))
+      # Generate and add rules for optional components.
+      generate_optional_component_rules!(tokens_by_name)
+      # Build rule sets.
      rule_sets = {}
      rule_set_id = @grammar.tokens.size
      @grammar.rules.each_with_index do |rule, rule_id|
@ -119,10 +124,55 @@ class Propane
        end
      end
      determine_possibly_empty_rulesets!(rule_sets)
+      rule_sets.each do |name, rule_set|
+        rule_set.finalize(@grammar)
+      end
      # Generate the lexer.
      @lexer = Lexer.new(@grammar)
      # Generate the parser.
-      @parser = Parser.new(@grammar, rule_sets, @log)
+      @parser = Parser.new(@grammar, rule_sets, @log, @options)
+    end
+
+    # Check that any referenced ptypes have been defined.
+    def check_ptypes!
+      (@grammar.patterns + @grammar.tokens + @grammar.rules).each do |potor|
+        if potor.ptypename
+          unless @grammar.ptypes.include?(potor.ptypename)
+            raise Error.new("Error: Line #{potor.line_number}: ptype #{potor.ptypename} not declared. Declare with `ptype` statement.")
+          end
+        end
+      end
+    end
+
+    # Generate and add rules for any optional components.
+    def generate_optional_component_rules!(tokens_by_name)
+      optional_rules_added = Set.new
+      @grammar.rules.each do |rule|
+        rule.components.each do |component|
+          if component =~ /^(.*)\?$/
+            c = $1
+            unless optional_rules_added.include?(component)
+              # Create two rules for the optional component: one empty and
+              # one just matching the component.
+              # We need to find the ptypename for the optional component in
+              # order to copy it to the generated rules.
+              if tokens_by_name[c]
+                # The optional component is a token.
+                ptypename = tokens_by_name[c].ptypename
+              else
+                # The optional component must be a rule, so find any instance
+                # of that rule that specifies a ptypename.
+                ptypename = @grammar.rules.reduce(nil) do |result, rule|
+                  rule.name == c && rule.ptypename ? rule.ptypename : result
+                end
+              end
+              @grammar.rules << Rule.new(component, [], nil, ptypename, rule.line_number)
+              @grammar.rules << Rule.new(component, [c], "$$ = $1;\n", ptypename, rule.line_number)
+              optional_rules_added << component
+            end
+          end
+        end
+      end
    end

    # Determine which grammar rules could expand to empty sequences.
@ -198,10 +248,25 @@ class Propane
      code = code.gsub(/\$token\(([$\w]+)\)/) do |match|
        "TOKEN_#{Token.code_name($1)}"
      end
+      code = code.gsub(/\$terminate\((.*)\);/) do |match|
+        user_terminate_code = $1
+        retval = rule ? "P_USER_TERMINATED" : "TERMINATE_TOKEN_ID"
+        case @language
+        when "c"
+          "context->user_terminate_code = (#{user_terminate_code}); return #{retval};"
+        when "d"
+          "context.user_terminate_code = (#{user_terminate_code}); return #{retval};"
+        end
+      end
      if parser
        code = code.gsub(/\$\$/) do |match|
+          case @language
+          when "c"
+            "_pvalue->v_#{rule.ptypename}"
+          when "d"
            "_pvalue.v_#{rule.ptypename}"
          end
+        end
        code = code.gsub(/\$(\d+)/) do |match|
          index = $1.to_i
          case @language
@ -211,8 +276,29 @@ class Propane
            "statevalues[$-1-n_states+#{index}].pvalue.v_#{rule.components[index - 1].ptypename}"
          end
        end
+        code = code.gsub(/\$\{(\w+)\}/) do |match|
+          aliasname = $1
+          if index = rule.aliases[aliasname]
+            case @language
+            when "c"
+              "state_values_stack_index(statevalues, -(int)n_states + #{index})->pvalue.v_#{rule.components[index].ptypename}"
+            when "d"
+              "statevalues[$-n_states+#{index}].pvalue.v_#{rule.components[index].ptypename}"
+            end
+          else
+            raise Error.new("Field alias '#{aliasname}' not found")
+          end
+        end
      else
        code = code.gsub(/\$\$/) do |match|
+          if @grammar.ast
+            case @language
+            when "c"
+              "out_token_info->pvalue"
+            when "d"
+              "out_token_info.pvalue"
+            end
+          else
            case @language
            when "c"
              "out_token_info->pvalue.v_#{pattern.ptypename}"
@ -220,6 +306,7 @@ class Propane
              "out_token_info.pvalue.v_#{pattern.ptypename}"
            end
          end
+        end
        code = code.gsub(/\$mode\(([a-zA-Z_][a-zA-Z_0-9]*)\)/) do |match|
          mode_name = $1
          mode_id = @lexer.mode_id(mode_name)
@ -243,7 +330,7 @@ class Propane
    #   Start rule parser value type name and type string.
    def start_rule_type
      start_rule = @grammar.rules.find do |rule|
-        rule.name == "Start"
+        rule.name == @grammar.start_rule
      end
      [start_rule.ptypename, @grammar.ptypes[start_rule.ptypename]]
    end
--- a/lib/propane/grammar.rb
+++ b/lib/propane/grammar.rb
@ -5,9 +5,13 @@ class Propane
    # Reserve identifiers beginning with a double-underscore for internal use.
    IDENTIFIER_REGEX = /(?:[a-zA-Z]|_[a-zA-Z0-9])[a-zA-Z_0-9]*/

+    attr_reader :ast
+    attr_reader :ast_prefix
+    attr_reader :ast_suffix
    attr_reader :modulename
    attr_reader :patterns
    attr_reader :rules
+    attr_reader :start_rule
    attr_reader :tokens
    attr_reader :code_blocks
    attr_reader :ptypes
@ -15,6 +19,7 @@ class Propane

    def initialize(input)
      @patterns = []
+      @start_rule = "Start"
      @tokens = []
      @rules = []
      @code_blocks = {}
@ -24,6 +29,9 @@ class Propane
      @input = input.gsub("\r\n", "\n")
      @ptypes = {"default" => "void *"}
      @prefix = "p_"
+      @ast = false
+      @ast_prefix = ""
+      @ast_suffix = ""
      parse_grammar!
    end

@ -35,6 +43,10 @@ class Propane
      @tokens.size
    end

+    def terminate_token_id
+      @tokens.size + 1
+    end
+
    private

    def parse_grammar!
@ -47,9 +59,13 @@ class Propane
      if parse_white_space!
      elsif parse_comment_line!
      elsif @mode.nil? && parse_mode_label!
+      elsif parse_ast_statement!
+      elsif parse_ast_prefix_statement!
+      elsif parse_ast_suffix_statement!
      elsif parse_module_statement!
      elsif parse_ptype_statement!
      elsif parse_pattern_statement!
+      elsif parse_start_statement!
      elsif parse_token_statement!
      elsif parse_tokenid_statement!
      elsif parse_drop_statement!
@ -78,6 +94,24 @@ class Propane
      consume!(/#.*\n/)
    end

+    def parse_ast_statement!
+      if consume!(/ast\s*;/)
+        @ast = true
+      end
+    end
+
+    def parse_ast_prefix_statement!
+      if md = consume!(/ast_prefix\s+(\w+)\s*;/)
+        @ast_prefix = md[1]
+      end
+    end
+
+    def parse_ast_suffix_statement!
+      if md = consume!(/ast_suffix\s+(\w+)\s*;/)
+        @ast_suffix = md[1]
+      end
+    end
+
    def parse_module_statement!
      if consume!(/module\s+/)
        md = consume!(/([\w.]+)\s*/, "expected module name")
@ -92,6 +126,9 @@ class Propane
      if consume!(/ptype\s+/)
        name = "default"
        if md = consume!(/(#{IDENTIFIER_REGEX})\s*=\s*/)
+          if @ast
+            raise Error.new("Multiple ptypes are unsupported in AST mode")
+          end
          name = md[1]
        end
        md = consume!(/([^;]+);/, "expected parser result type expression")
@ -104,12 +141,15 @@ class Propane
        md = consume!(/(#{IDENTIFIER_REGEX})\s*/, "expected token name")
        name = md[1]
        if md = consume!(/\((#{IDENTIFIER_REGEX})\)\s*/)
+          if @ast
+            raise Error.new("Multiple ptypes are unsupported in AST mode")
+          end
          ptypename = md[1]
        end
        pattern = parse_pattern! || name
        consume!(/\s+/)
        unless code = parse_code_block!
-          consume!(/;/, "expected pattern or `;' or code block")
+          consume!(/;/, "expected `;' or code block")
        end
        token = Token.new(name, ptypename, @line_number)
        @tokens << token
@ -125,6 +165,9 @@ class Propane
        md = consume!(/(#{IDENTIFIER_REGEX})\s*/, "expected token name")
        name = md[1]
        if md = consume!(/\((#{IDENTIFIER_REGEX})\)\s*/)
+          if @ast
+            raise Error.new("Multiple ptypes are unsupported in AST mode")
+          end
          ptypename = md[1]
        end
        consume!(/;/, "expected `;'");
@ -152,10 +195,17 @@ class Propane
    def parse_rule_statement!
      if md = consume!(/(#{IDENTIFIER_REGEX})\s*(?:\((#{IDENTIFIER_REGEX})\))?\s*->\s*/)
        rule_name, ptypename = *md[1, 2]
-        md = consume!(/((?:#{IDENTIFIER_REGEX}\s*)*)\s*/, "expected rule component list")
+        if @ast && ptypename
+          raise Error.new("Multiple ptypes are unsupported in AST mode")
+        end
+        md = consume!(/((?:#{IDENTIFIER_REGEX}(?::#{IDENTIFIER_REGEX})?\??\s*)*)\s*/, "expected rule component list")
        components = md[1].strip.split(/\s+/)
+        if @ast
+          consume!(/;/, "expected `;'")
+        else
          unless code = parse_code_block!
-          consume!(/;/, "expected pattern or `;' or code block")
+            consume!(/;/, "expected `;' or code block")
+          end
        end
        @rules << Rule.new(rule_name, components, code, ptypename, @line_number)
        @mode = nil
@ -167,6 +217,9 @@ class Propane
      if pattern = parse_pattern!
        consume!(/\s+/)
        if md = consume!(/\((#{IDENTIFIER_REGEX})\)\s*/)
+          if @ast
+            raise Error.new("Multiple ptypes are unsupported in AST mode")
+          end
          ptypename = md[1]
        end
        unless code = parse_code_block!
@ -178,9 +231,17 @@ class Propane
      end
    end

+    def parse_start_statement!
+      if md = consume!(/start\s+(\w+)\s*;/)
+        @start_rule = md[1]
+      end
+    end
+
    def parse_code_block_statement!
-      if md = consume!(/<<([a-z]*)\n(.*?)^>>\n/m)
+      if md = consume!(/<<([a-z]*)(.*?)>>\n/m)
        name, code = md[1..2]
+        code.sub!(/\A\n/, "")
+        code += "\n" unless code.end_with?("\n")
        if @code_blocks[name]
          @code_blocks[name] += code
        else
@ -218,8 +279,11 @@ class Propane
    end

    def parse_code_block!
-      if md = consume!(/<<\n(.*?)^>>\n/m)
-        md[1]
+      if md = consume!(/<<(.*?)>>\n/m)
+        code = md[1]
+        code.sub!(/\A\n/, "")
+        code += "\n" unless code.end_with?("\n")
+        code
      end
    end

--- a/lib/propane/parser.rb
+++ b/lib/propane/parser.rb
@ -7,12 +7,14 @@ class Propane
    attr_reader :reduce_table
    attr_reader :rule_sets

-    def initialize(grammar, rule_sets, log)
+    def initialize(grammar, rule_sets, log, options)
      @grammar = grammar
      @rule_sets = rule_sets
      @log = log
      @item_sets = []
      @item_sets_set = {}
+      @warnings = Set.new
+      @options = options
      start_item = Item.new(grammar.rules.first, 0)
      eval_item_sets = Set[ItemSet.new([start_item])]

@ -23,10 +25,10 @@ class Propane
          item_set.id = @item_sets.size
          @item_sets << item_set
          @item_sets_set[item_set] = item_set
-          item_set.following_symbols.each do |following_symbol|
-            unless following_symbol.name == "$EOF"
-              following_set = item_set.build_following_item_set(following_symbol)
-              eval_item_sets << following_set
+          item_set.next_symbols.each do |next_symbol|
+            unless next_symbol.name == "$EOF"
+              next_item_set = item_set.build_next_item_set(next_symbol)
+              eval_item_sets << next_item_set
            end
          end
        end
@ -37,8 +39,11 @@ class Propane
      end

      build_reduce_actions!
-      write_log!
      build_tables!
+      write_log!
+      if @warnings.size > 0 && @options[:warnings_as_errors]
+        raise Error.new("Fatal errors (-w):\n" + @warnings.join("\n"))
+      end
    end

    private
@ -48,27 +53,37 @@ class Propane
      @shift_table = []
      @reduce_table = []
      @item_sets.each do |item_set|
-        shift_entries = item_set.following_symbols.map do |following_symbol|
+        shift_entries = item_set.next_symbols.map do |next_symbol|
          state_id =
-            if following_symbol.name == "$EOF"
+            if next_symbol.name == "$EOF"
              0
            else
-              item_set.following_item_set[following_symbol].id
+              item_set.next_item_set[next_symbol].id
            end
          {
-            symbol_id: following_symbol.id,
+            symbol: next_symbol,
            state_id: state_id,
          }
        end
+        unless item_set.reduce_rules.empty?
+          shift_entries.each do |shift_entry|
+            token = shift_entry[:symbol]
+            if get_lookahead_reduce_actions_for_item_set(item_set).include?(token)
+              rule = item_set.reduce_actions[token]
+              @warnings << "Shift/Reduce conflict (state #{item_set.id}) between token #{token.name} and rule #{rule.name} (defined on line #{rule.line_number})"
+            end
+          end
+        end
        reduce_entries =
-          case ra = item_set.reduce_actions
-          when Rule
-            [{token_id: @grammar.invalid_token_id, rule_id: ra.id,
-              rule_set_id: ra.rule_set.id, n_states: ra.components.size}]
-          when Hash
-            ra.map do |token, rule|
-              {token_id: token.id, rule_id: rule.id,
-               rule_set_id: rule.rule_set.id, n_states: rule.components.size}
+          if rule = item_set.reduce_rule
+            [{token_id: @grammar.invalid_token_id, rule_id: rule.id, rule: rule,
+              rule_set_id: rule.rule_set.id, n_states: rule.components.size,
+              propagate_optional_target: rule.optional? && rule.components.size == 1}]
+          elsif reduce_actions = item_set.reduce_actions
+            reduce_actions.map do |token, rule|
+              {token_id: token.id, rule_id: rule.id, rule: rule,
+               rule_set_id: rule.rule_set.id, n_states: rule.components.size,
+               propagate_optional_target: rule.optional? && rule.components.size == 1}
            end
          else
            []
@ -85,11 +100,11 @@ class Propane
    end

    def process_item_set(item_set)
-      item_set.following_symbols.each do |following_symbol|
-        unless following_symbol.name == "$EOF"
-          following_set = @item_sets_set[item_set.build_following_item_set(following_symbol)]
-          item_set.following_item_set[following_symbol] = following_set
-          following_set.in_sets << item_set
+      item_set.next_symbols.each do |next_symbol|
+        unless next_symbol.name == "$EOF"
+          next_item_set = @item_sets_set[item_set.build_next_item_set(next_symbol)]
+          item_set.next_item_set[next_symbol] = next_item_set
+          next_item_set.in_sets << item_set
        end
      end
    end
@ -99,7 +114,7 @@ class Propane
    # @return [void]
    def build_reduce_actions!
      @item_sets.each do |item_set|
-        item_set.reduce_actions = build_reduce_actions_for_item_set(item_set)
+        build_reduce_actions_for_item_set(item_set)
      end
    end

@ -108,38 +123,55 @@ class Propane
    # @param item_set [ItemSet]
    #   ItemSet (parser state)
    #
-    # @return [nil, Rule, Hash]
-    #   If no reduce actions are possible for the given item set, nil.
-    #   If only one reduce action is possible for the given item set, the Rule
-    #   to reduce.
-    #   Otherwise, a mapping of lookahead Tokens to the Rules to reduce.
+    # @return [void]
    def build_reduce_actions_for_item_set(item_set)
      # To build the reduce actions, we start by looking at any
      # "complete" items, i.e., items where the parse position is at the
      # end of a rule. These are the only rules that are candidates for
      # reduction in the current ItemSet.
-      reduce_rules = Set.new(item_set.items.select(&:complete?).map(&:rule))
+      item_set.reduce_rules = Set.new(item_set.items.select(&:complete?).map(&:rule))

-      # If there are no rules to reduce for this ItemSet, we're done here.
-      return nil if reduce_rules.size == 0
+      if item_set.reduce_rules.size == 1
+        item_set.reduce_rule = item_set.reduce_rules.first
+      end

-      # If there is exactly one rule to reduce for this ItemSet, then do not
-      # figure out the lookaheads; just reduce it.
-      return reduce_rules.first if reduce_rules.size == 1
+      if item_set.reduce_rules.size > 1
+        # Force item_set.reduce_actions to be built to store the lookahead
+        # tokens for the possible reduce rules if there is more than one.
+        get_lookahead_reduce_actions_for_item_set(item_set)
+      end
+    end

-      # Otherwise, we have more than one possible rule to reduce.
+    # Get the reduce actions for a single item set (parser state).
+    #
+    # @param item_set [ItemSet]
+    #   ItemSet (parser state)
+    #
+    # @return [Hash]
+    #   Mapping of lookahead Tokens to the Rules to reduce.
+    def get_lookahead_reduce_actions_for_item_set(item_set)
+      item_set.reduce_actions ||= build_lookahead_reduce_actions_for_item_set(item_set)
+    end

+    # Build the reduce actions for a single item set (parser state).
+    #
+    # @param item_set [ItemSet]
+    #   ItemSet (parser state)
+    #
+    # @return [Hash]
+    #   Mapping of lookahead Tokens to the Rules to reduce.
+    def build_lookahead_reduce_actions_for_item_set(item_set)
      # We will be looking for all possible tokens that can follow instances of
      # these rules. Rather than looking through the entire grammar for the
      # possible following tokens, we will only look in the item sets leading
      # up to this one. This restriction gives us a more precise lookahead set,
      # and allows us to parse LALR grammars.
-      item_sets = item_set.leading_item_sets
-      reduce_rules.reduce({}) do |reduce_actions, reduce_rule|
+      item_sets = Set[item_set] + item_set.leading_item_sets
+      item_set.reduce_rules.reduce({}) do |reduce_actions, reduce_rule|
        lookahead_tokens_for_rule = build_lookahead_tokens_to_reduce(reduce_rule, item_sets)
        lookahead_tokens_for_rule.each do |lookahead_token|
          if existing_reduce_rule = reduce_actions[lookahead_token]
-            raise Error.new("Error: reduce/reduce conflict between rule #{existing_reduce_rule.id} (#{existing_reduce_rule.name}) and rule #{reduce_rule.id} (#{reduce_rule.name})")
+            raise Error.new("Error: reduce/reduce conflict (state #{item_set.id}) between rule #{existing_reduce_rule.name}##{existing_reduce_rule.id} (defined on line #{existing_reduce_rule.line_number}) and rule #{reduce_rule.name}##{reduce_rule.id} (defined on line #{reduce_rule.line_number})")
          end
          reduce_actions[lookahead_token] = reduce_rule
        end
@ -181,9 +213,9 @@ class Propane
        # tokens to form the lookahead token set.
        item_sets.each do |item_set|
          item_set.items.each do |item|
-            if item.following_symbol == rule_set
+            if item.next_symbol == rule_set
              (1..).each do |offset|
-                case symbol = item.following_symbol(offset)
+                case symbol = item.next_symbol(offset)
                when nil
                  rule_set = item.rule.rule_set
                  unless checked_rule_sets.include?(rule_set)
@ -240,20 +272,26 @@ class Propane
        @log.puts
        @log.puts "  Incoming states: #{incoming_ids.join(", ")}"
        @log.puts "  Outgoing states:"
-        item_set.following_item_set.each do |following_symbol, following_item_set|
-          @log.puts "    #{following_symbol.name} => #{following_item_set.id}"
+        item_set.next_item_set.each do |next_symbol, next_item_set|
+          @log.puts "    #{next_symbol.name} => #{next_item_set.id}"
        end
        @log.puts
        @log.puts "  Reduce actions:"
-        case item_set.reduce_actions
-        when Rule
-          @log.puts "    * => rule #{item_set.reduce_actions.id}, rule set #{@rule_sets[item_set.reduce_actions.name].id} (#{item_set.reduce_actions.name})"
-        when Hash
+        if item_set.reduce_rule
+          @log.puts "    * => rule #{item_set.reduce_rule.id}, rule set #{@rule_sets[item_set.reduce_rule.name].id} (#{item_set.reduce_rule.name})"
+        elsif item_set.reduce_actions
          item_set.reduce_actions.each do |token, rule|
            @log.puts "    lookahead #{token.name} => #{rule.name} (#{rule.id}), rule set ##{rule.rule_set.id}"
          end
        end
      end
+      if @warnings.size > 0
+        @log.puts
+        @log.puts "Warnings:"
+        @warnings.each do |warning|
+          @log.puts "  #{warning}"
+        end
+      end
    end

  end
--- a/lib/propane/parser/item.rb
+++ b/lib/propane/parser/item.rb
@ -56,7 +56,7 @@ class Propane

      # Return the set of Items obtained by "closing" the current item.
      #
-      # If the following symbol for the current item is another Rule name, then
+      # If the next symbol for the current item is another Rule name, then
      # this method will return all Items for that Rule with a position of 0.
      # Otherwise, an empty Array is returned.
      #
@ -81,17 +81,17 @@ class Propane
        @position == @rule.components.size
      end

-      # Get the following symbol for the Item.
+      # Get the next symbol for the Item.
      #
-      # That is, the symbol which follows the parse position marker in the
+      # That is, the symbol which is after the parse position marker in the
      # current Item.
      #
      # @param offset [Integer]
      #   Offset from current parse position to examine.
      #
      # @return [Token, RuleSet, nil]
-      #   Following symbol for the Item.
-      def following_symbol(offset = 0)
+      #   Next symbol for the Item.
+      def next_symbol(offset = 0)
        @rule.components[@position + offset]
      end

@ -108,25 +108,25 @@ class Propane
        end
      end

-      # Get whether this Item is followed by the provided symbol.
+      # Get whether this Item's next symbol is the given symbol.
      #
      # @param symbol [Token, RuleSet]
      #   Symbol to query.
      #
      # @return [Boolean]
-      #   Whether this Item is followed by the provided symbol.
-      def followed_by?(symbol)
-        following_symbol == symbol
+      #   Whether this Item's next symbol is the given symbol.
+      def next_symbol?(symbol)
+        next_symbol == symbol
      end

-      # Get the following item for this Item.
+      # Get the next item for this Item.
      #
      # That is, the Item formed by moving the parse position marker one place
      # forward from its position in this Item.
      #
      # @return [Item]
-      #   The following item for this Item.
-      def following_item
+      #   The next item for this Item.
+      def next_item
        Item.new(@rule, @position + 1)
      end

--- a/lib/propane/parser/item_set.rb
+++ b/lib/propane/parser/item_set.rb
@ -2,7 +2,7 @@ class Propane
  class Parser

    # Represent a parser "item set", which is a set of possible items that the
-    # parser could currently be parsing.
+    # parser could currently be parsing. This is equivalent to a parser state.
    class ItemSet

      # @return [Set<Item>]
@ -14,15 +14,24 @@ class Propane
      attr_accessor :id

      # @return [Hash]
-      #   Maps a following symbol to its ItemSet.
-      attr_reader :following_item_set
+      #   Maps a next symbol to its ItemSet.
+      attr_reader :next_item_set

      # @return [Set<ItemSet>]
      #   ItemSets leading to this item set.
      attr_reader :in_sets

-      # @return [nil, Rule, Hash]
-      #   Reduce actions, mapping lookahead tokens to rules.
+      # @return [nil, Rule]
+      #   Rule to reduce if there is only one possibility.
+      attr_accessor :reduce_rule
+
+      # @return [Set<Rule>]
+      #   Set of rules that could be reduced in this parser state.
+      attr_accessor :reduce_rules
+
+      # @return [nil, Hash]
+      #   Reduce actions, mapping lookahead tokens to rules, if there is
+      #   more than one rule that could be reduced.
      attr_accessor :reduce_actions

      # Build an ItemSet.
@ -31,28 +40,28 @@ class Propane
      #   Items in this ItemSet.
      def initialize(items)
        @items = Set.new(items)
-        @following_item_set = {}
+        @next_item_set = {}
        @in_sets = Set.new
        close!
      end

-      # Get the set of following symbols for all Items in this ItemSet.
+      # Get the set of next symbols for all Items in this ItemSet.
      #
      # @return [Set<Token, RuleSet>]
-      #   Set of following symbols for all Items in this ItemSet.
-      def following_symbols
-        Set.new(@items.map(&:following_symbol).compact)
+      #   Set of next symbols for all Items in this ItemSet.
+      def next_symbols
+        @_next_symbols ||= Set.new(@items.map(&:next_symbol).compact)
      end

-      # Build a following ItemSet for the given following symbol.
+      # Build a next ItemSet for the given next symbol.
      #
      # @param symbol [Token, RuleSet]
-      #   Following symbol to build the following ItemSet for.
+      #   Next symbol to build the next ItemSet for.
      #
      # @return [ItemSet]
-      #   Following ItemSet for the given following symbol.
-      def build_following_item_set(symbol)
-        ItemSet.new(items_followed_by(symbol).map(&:following_item))
+      #   Next ItemSet for the given next symbol.
+      def build_next_item_set(symbol)
+        ItemSet.new(items_with_next(symbol).map(&:next_item))
      end

      # Hash function.
@ -87,13 +96,26 @@ class Propane

      # Set of ItemSets that lead to this ItemSet.
      #
-      # This set includes this ItemSet.
-      #
      # @return [Set<ItemSet>]
      #   Set of all ItemSets that lead up to this ItemSet.
      def leading_item_sets
-        @in_sets.reduce(Set[self]) do |result, item_set|
-          result + item_set.leading_item_sets
+        @_leading_item_sets ||=
+          begin
+            result = Set.new
+            eval_sets = Set[self]
+            evaled = Set.new
+            while eval_sets.size > 0
+              eval_set = eval_sets.first
+              eval_sets.delete(eval_set)
+              evaled << eval_set
+              eval_set.in_sets.each do |in_set|
+                result << in_set
+                unless evaled.include?(in_set)
+                  eval_sets << in_set
+                end
+              end
+            end
+            result
          end
      end

@ -127,16 +149,16 @@ class Propane
        end
      end

-      # Get the Items followed by the given following symbol.
+      # Get the Items with the given next symbol.
      #
      # @param symbol [Token, RuleSet]
-      #   Following symbol.
+      #   Next symbol.
      #
      # @return [Array<Item>]
-      #   Items followed by the given following symbol.
-      def items_followed_by(symbol)
+      #   Items with the given next symbol.
+      def items_with_next(symbol)
        @items.select do |item|
-          item.followed_by?(symbol)
+          item.next_symbol?(symbol)
        end
      end

--- a/lib/propane/regex.rb
+++ b/lib/propane/regex.rb
@ -134,8 +134,18 @@ class Propane
      else
        c = @pattern.slice!(0)
        case c
+        when "a"
+          CharacterRangeUnit.new("\a", "\a")
+        when "b"
+          CharacterRangeUnit.new("\b", "\b")
        when "d"
          CharacterRangeUnit.new("0", "9")
+        when "f"
+          CharacterRangeUnit.new("\f", "\f")
+        when "n"
+          CharacterRangeUnit.new("\n", "\n")
+        when "r"
+          CharacterRangeUnit.new("\r", "\r")
        when "s"
          ccu = CharacterClassUnit.new
          ccu << CharacterRangeUnit.new(" ")
@ -145,6 +155,10 @@ class Propane
          ccu << CharacterRangeUnit.new("\f")
          ccu << CharacterRangeUnit.new("\v")
          ccu
+        when "t"
+          CharacterRangeUnit.new("\t", "\t")
+        when "v"
+          CharacterRangeUnit.new("\v", "\v")
        else
          CharacterRangeUnit.new(c)
        end
--- a/lib/propane/rule.rb
+++ b/lib/propane/rule.rb
@ -6,6 +6,10 @@ class Propane
    #   Rule components.
    attr_reader :components

+    # @return [Hash]
+    #   Field aliases.
+    attr_reader :aliases
+
    # @return [String]
    #   User code associated with the rule.
    attr_reader :code
@ -30,6 +34,11 @@ class Propane
    #   The RuleSet that this Rule is a part of.
    attr_accessor :rule_set

+    # @return [Array<Integer>]
+    #   Map this rule's components to their positions in the parent RuleSet's
+    #   node field pointer array. This is used for AST construction.
+    attr_accessor :rule_set_node_field_index_map
+
    # Construct a Rule.
    #
    # @param name [String]
@ -44,7 +53,20 @@ class Propane
    #   Line number where the rule was defined in the input grammar.
    def initialize(name, components, code, ptypename, line_number)
      @name = name
-      @components = components
+      @aliases = {}
+      @components = components.each_with_index.map do |component, i|
+        if component =~ /(\S+):(\S+)/
+          c, aliasname = $1, $2
+          if @aliases[aliasname]
+            raise Error.new("Error: duplicate field alias `#{aliasname}` for rule #{name} defined on line #{line_number}")
+          end
+          @aliases[aliasname] = i
+          c
+        else
+          component
+        end
+      end
+      @rule_set_node_field_index_map = components.map {0}
      @code = code
      @ptypename = ptypename
      @line_number = line_number
@ -60,6 +82,14 @@ class Propane
      @components.empty?
    end

+    # Return whether this is an optional Rule.
+    #
+    # @return [Boolean]
+    #   Whether this is an optional Rule.
+    def optional?
+      @name.end_with?("?")
+    end
+
    # Represent the Rule as a String.
    #
    # @return [String]
@ -68,6 +98,17 @@ class Propane
      "#{@name} -> #{@components.map(&:name).join(" ")}"
    end

+    # Check whether the rule set node field index map is just a 1:1 mapping.
+    #
+    # @return [Boolean]
+    #   Boolean indicating whether the rule set node field index map is just a
+    #   1:1 mapping.
+    def flat_rule_set_node_field_index_map?
+      @rule_set_node_field_index_map.each_with_index.all? do |v, i|
+        v == i
+      end
+    end
+
  end

 end
--- a/lib/propane/rule_set.rb
+++ b/lib/propane/rule_set.rb
@ -1,7 +1,12 @@
 class Propane

+  # A RuleSet collects all grammar rules of the same name.
  class RuleSet

+    # @return [Array<Hash>]
+    #   AST fields.
+    attr_reader :ast_fields
+
    # @return [Integer]
    #   ID of the RuleSet.
    attr_reader :id
@ -51,6 +56,24 @@ class Propane
      @could_be_empty
    end

+    # Return whether this is an optional RuleSet.
+    #
+    # @return [Boolean]
+    #   Whether this is an optional RuleSet.
+    def optional?
+      @name.end_with?("?")
+    end
+
+    # For optional rule sets, return the underlying component that is optional.
+    def option_target
+      @rules.each do |rule|
+        if rule.components.size > 0
+          return rule.components[0]
+        end
+      end
+      raise "Optional rule target not found"
+    end
+
    # Build the start token set for the RuleSet.
    #
    # @return [Set<Token>]
@ -75,6 +98,72 @@ class Propane
      @_start_token_set
    end

+    # Finalize a RuleSet after adding all Rules to it.
+    def finalize(grammar)
+      if grammar.ast
+        build_ast_fields(grammar)
+      end
+    end
+
+    private
+
+    # Build the set of AST fields for this RuleSet.
+    #
+    # This is an Array of Hashes. Each entry in the Array corresponds to a
+    # field location in the AST node. The entry is a Hash. It could have one or
+    # two keys. It will always have the field name with a positional suffix as
+    # a key. It may also have the field name without the positional suffix if
+    # that field only exists in one position across all Rules in the RuleSet.
+    #
+    # @return [void]
+    def build_ast_fields(grammar)
+      field_ast_node_indexes = {}
+      field_indexes_across_all_rules = {}
+      @ast_fields = []
+      @rules.each do |rule|
+        rule.components.each_with_index do |component, i|
+          if component.is_a?(RuleSet) && component.optional?
+            component = component.option_target
+          end
+          if component.is_a?(Token)
+            node_name = "Token"
+          else
+            node_name = component.name
+          end
+          struct_name = "#{grammar.ast_prefix}#{node_name}#{grammar.ast_suffix}"
+          field_name = "p#{node_name}#{i + 1}"
+          unless field_ast_node_indexes[field_name]
+            field_ast_node_indexes[field_name] = @ast_fields.size
+            @ast_fields << {field_name => struct_name}
+          end
+          field_indexes_across_all_rules[node_name] ||= Set.new
+          field_indexes_across_all_rules[node_name] << field_ast_node_indexes[field_name]
+          rule.rule_set_node_field_index_map[i] = field_ast_node_indexes[field_name]
+        end
+      end
+      field_indexes_across_all_rules.each do |node_name, indexes_across_all_rules|
+        if indexes_across_all_rules.size == 1
+          # If this field was only seen in one position across all rules,
+          # then add an alias to the positional field name that does not
+          # include the position.
+          @ast_fields[indexes_across_all_rules.first]["p#{node_name}"] =
+            "#{grammar.ast_prefix}#{node_name}#{grammar.ast_suffix}"
+        end
+      end
+      # Now merge in the field aliases as given by the user in the
+      # grammar.
+      field_aliases = {}
+      @rules.each do |rule|
+        rule.aliases.each do |alias_name, index|
+          if field_aliases[alias_name] && field_aliases[alias_name] != index
+            raise Error.new("Error: conflicting AST node field positions for alias `#{alias_name}`")
+          end
+          field_aliases[alias_name] = index
+          @ast_fields[index][alias_name] = @ast_fields[index].first[1]
+        end
+      end
+    end
+
  end

 end
--- a/lib/propane/version.rb
+++ b/lib/propane/version.rb
@ -1,3 +1,3 @@
 class Propane
-  VERSION = "1.0.0"
+  VERSION = "1.5.1"
 end
--- a/spec/propane_spec.rb
+++ b/spec/propane_spec.rb
@ -13,7 +13,7 @@ describe Propane do
    File.write("spec/run/testparser#{options[:name]}.propane", grammar)
  end

-  def build_parser(options = {})
+  def run_propane(options = {})
    @statics[:build_test_id] ||= 0
    @statics[:build_test_id] += 1
    if ENV["dist_specs"]
@ -49,7 +49,12 @@ ENV["TERM"] = nil
 EOF
      end
    end
+    if options[:args]
+      command += options[:args]
+    else
      command += %W[spec/run/testparser#{options[:name]}.propane spec/run/testparser#{options[:name]}.#{options[:language]} --log spec/run/testparser#{options[:name]}.log]
+    end
+    command += (options[:extra_args] || [])
    if (options[:capture])
      stdout, stderr, status = Open3.capture3(*command)
      Results.new(stdout, stderr, status)
@ -74,7 +79,7 @@ EOF
    expect(result).to be_truthy
  end

-  def run
+  def run_test
    stdout, stderr, status = Open3.capture3("spec/run/testparser")
    File.binwrite("spec/run/.stderr", stderr)
    File.binwrite("spec/run/.stdout", stdout)
@ -112,6 +117,138 @@ EOF
    FileUtils.mkdir_p("spec/run")
  end

+  it "reports its version" do
+    results = run_propane(args: %w[--version], capture: true)
+    expect(results.stdout).to match /propane version \d+\.\d+/
+    expect(results.stderr).to eq ""
+    expect(results.status).to eq 0
+  end
+
+  it "shows help usage" do
+    results = run_propane(args: %w[-h], capture: true)
+    expect(results.stdout).to match /Usage/i
+    expect(results.stderr).to eq ""
+    expect(results.status).to eq 0
+  end
+
+  it "errors with unknown option" do
+    results = run_propane(args: %w[-i], capture: true)
+    expect(results.stderr).to match /Error: unknown option -i/
+    expect(results.status).to_not eq 0
+  end
+
+  it "errors when input and output files are not specified" do
+    results = run_propane(args: [], capture: true)
+    expect(results.stderr).to match /Error: specify input and output files/
+    expect(results.status).to_not eq 0
+  end
+
+  it "errors when input file is not readable" do
+    results = run_propane(args: %w[nope.txt out.d], capture: true)
+    expect(results.stderr).to match /Error: cannot read nope.txt/
+    expect(results.status).to_not eq 0
+  end
+
+  it "raises an error when a pattern referenced ptype has not been defined" do
+    write_grammar <<EOF
+ptype yes = int;
+/foo/ (yes) <<
+>>
+/bar/ (no) <<
+>>
+EOF
+    results = run_propane(capture: true)
+    expect(results.stderr).to match /Error: Line 4: ptype no not declared\. Declare with `ptype` statement\./
+    expect(results.status).to_not eq 0
+  end
+
+  it "raises an error when a token referenced ptype has not been defined" do
+    write_grammar <<EOF
+ptype yes = int;
+token foo (yes);
+token bar (no);
+EOF
+    results = run_propane(capture: true)
+    expect(results.stderr).to match /Error: Line 3: ptype no not declared\. Declare with `ptype` statement\./
+    expect(results.status).to_not eq 0
+  end
+
+  it "raises an error when a rule referenced ptype has not been defined" do
+    write_grammar <<EOF
+ptype yes = int;
+token xyz;
+foo (yes) -> bar;
+bar (no) -> xyz;
+EOF
+    results = run_propane(capture: true)
+    expect(results.stderr).to match /Error: Line 4: ptype no not declared\. Declare with `ptype` statement\./
+    expect(results.status).to_not eq 0
+  end
+
+  it "warns on shift/reduce conflicts" do
+    write_grammar <<EOF
+token a;
+token b;
+Start -> As? b?;
+As -> a As2?;
+As2 -> b a As2?;
+EOF
+    results = run_propane(capture: true)
+    expect(results.stderr).to eq ""
+    expect(results.status).to eq 0
+    expect(File.binread("spec/run/testparser.log")).to match %r{Shift/Reduce conflict \(state \d+\) between token b and rule As2\? \(defined on line 4\)}
+  end
+
+  it "errors on shift/reduce conflicts with -w" do
+    write_grammar <<EOF
+token a;
+token b;
+Start -> As? b?;
+As -> a As2?;
+As2 -> b a As2?;
+EOF
+    results = run_propane(extra_args: %w[-w], capture: true)
+    expect(results.stderr).to match %r{Shift/Reduce conflict \(state \d+\) between token b and rule As2\? \(defined on line 4\)}m
+    expect(results.status).to_not eq 0
+    expect(File.binread("spec/run/testparser.log")).to match %r{Shift/Reduce conflict \(state \d+\) between token b and rule As2\? \(defined on line 4\)}
+  end
+
+  it "errors on duplicate field aliases in a rule" do
+    write_grammar <<EOF
+token a;
+token b;
+Start -> a:foo b:foo;
+EOF
+    results = run_propane(extra_args: %w[-w], capture: true)
+    expect(results.stderr).to match %r{Error: duplicate field alias `foo` for rule Start defined on line 3}
+    expect(results.status).to_not eq 0
+  end
+
+  it "errors when an alias is in different positions for different rules in a rule set when AST mode is enabled" do
+    write_grammar <<EOF
+ast;
+token a;
+token b;
+Start -> a:foo b;
+Start -> b b:foo;
+EOF
+    results = run_propane(extra_args: %w[-w], capture: true)
+    expect(results.stderr).to match %r{Error: conflicting AST node field positions for alias `foo`}
+    expect(results.status).to_not eq 0
+  end
+
+  it "does not error when an alias is in different positions for different rules in a rule set when AST mode is not enabled" do
+    write_grammar <<EOF
+token a;
+token b;
+Start -> a:foo b;
+Start -> b b:foo;
+EOF
+    results = run_propane(extra_args: %w[-w], capture: true)
+    expect(results.stderr).to eq ""
+    expect(results.status).to eq 0
+  end
+
  %w[d c].each do |language|

    context "#{language.upcase} language" do
@ -123,14 +260,12 @@ token plus /\\+/;
 token times /\\*/;
 drop /\\s+/;
 Start -> Foo;
-Foo -> int <<
->>
-Foo -> plus <<
->>
+Foo -> int <<>>
+Foo -> plus <<>>
 EOF
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_lexer.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.stderr).to eq ""
        expect(results.status).to eq 0
      end
@ -149,9 +284,7 @@ token int /\\d+/ <<
  }
  $$ = v;
 >>
-Start -> int <<
-  $$ = $1;
->>
+Start -> int << $$ = $1; >>
 EOF
        when "d"
          write_grammar <<EOF
@ -165,14 +298,12 @@ token int /\\d+/ <<
  }
  $$ = v;
 >>
-Start -> int <<
-  $$ = $1;
->>
+Start -> int << $$ = $1; >>
 EOF
        end
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_lexer_unknown_character.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.stderr).to eq ""
        expect(results.status).to eq 0
      end
@ -190,7 +321,7 @@ E -> B;
 B -> zero;
 B -> one;
 EOF
-        build_parser(language: language)
+        run_propane(language: language)
      end

      it "generates a parser that does basic math - user guide example" do
@ -219,33 +350,15 @@ token lparen /\\(/;
 token rparen /\\)/;
 drop /\\s+/;

-Start -> E1 <<
-  $$ = $1;
->>
-E1 -> E2 <<
-  $$ = $1;
->>
-E1 -> E1 plus E2 <<
-  $$ = $1 + $3;
->>
-E2 -> E3 <<
-  $$ = $1;
->>
-E2 -> E2 times E3 <<
-  $$ = $1 * $3;
->>
-E3 -> E4 <<
-  $$ = $1;
->>
-E3 -> E3 power E4 <<
-  $$ = (size_t)pow($1, $3);
->>
-E4 -> integer <<
-  $$ = $1;
->>
-E4 -> lparen E1 rparen <<
-  $$ = $2;
->>
+Start -> E1 << $$ = $1; >>
+E1 -> E2 << $$ = $1; >>
+E1 -> E1 plus E2 << $$ = $1 + $3; >>
+E2 -> E3 << $$ = $1; >>
+E2 -> E2 times E3 << $$ = $1 * $3; >>
+E3 -> E4 << $$ = $1; >>
+E3 -> E3 power E4 << $$ = (size_t)pow($1, $3); >>
+E4 -> integer << $$ = $1; >>
+E4 -> lparen E1 rparen << $$ = $2; >>
 EOF
        when "d"
          write_grammar <<EOF
@ -271,38 +384,20 @@ token lparen /\\(/;
 token rparen /\\)/;
 drop /\\s+/;

-Start -> E1 <<
-  $$ = $1;
->>
-E1 -> E2 <<
-  $$ = $1;
->>
-E1 -> E1 plus E2 <<
-  $$ = $1 + $3;
->>
-E2 -> E3 <<
-  $$ = $1;
->>
-E2 -> E2 times E3 <<
-  $$ = $1 * $3;
->>
-E3 -> E4 <<
-  $$ = $1;
->>
-E3 -> E3 power E4 <<
-  $$ = pow($1, $3);
->>
-E4 -> integer <<
-  $$ = $1;
->>
-E4 -> lparen E1 rparen <<
-  $$ = $2;
->>
+Start -> E1 << $$ = $1; >>
+E1 -> E2 << $$ = $1; >>
+E1 -> E1 plus E2 << $$ = $1 + $3; >>
+E2 -> E3 << $$ = $1; >>
+E2 -> E2 times E3 << $$ = $1 * $3; >>
+E3 -> E4 << $$ = $1; >>
+E3 -> E3 power E4 << $$ = pow($1, $3); >>
+E4 -> integer << $$ = $1; >>
+E4 -> lparen E1 rparen << $$ = $2; >>
 EOF
        end
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_basic_math_grammar.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.stderr).to eq ""
        expect(results.status).to eq 0
      end
@ -314,7 +409,7 @@ Start -> E;
 E -> one E;
 E -> one;
 EOF
-        build_parser(language: language)
+        run_propane(language: language)
      end

      it "distinguishes between multiple identical rules with lookahead symbol" do
@ -326,9 +421,9 @@ Start -> R2 b;
 R1 -> a b;
 R2 -> a b;
 EOF
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_parser_identical_rules_lookahead.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.status).to eq 0
      end

@ -341,9 +436,9 @@ Start -> a R1;
 Start -> b R1;
 R1 -> b;
 EOF
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_parser_rule_from_multiple_states.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.status).to eq 0
      end

@ -376,9 +471,9 @@ Abcs -> ;
 Abcs -> abc Abcs;
 EOF
        end
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_user_code.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.status).to eq 0
        verify_lines(results.stdout, [
          "abc!",
@ -408,15 +503,13 @@ EOF
 import std.stdio;
 >>
 token abc;
-/def/ <<
-  writeln("def!");
->>
+/def/ << writeln("def!"); >>
 Start -> abc;
 EOF
        end
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_pattern.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.status).to eq 0
        verify_lines(results.stdout, [
          "def!",
@ -435,9 +528,7 @@ EOF
 #include <stdio.h>
 >>
 token abc;
-/def/ <<
-  printf("def!\\n");
->>
+/def/ << printf("def!\\n"); >>
 /ghi/ <<
  printf("ghi!\\n");
  return $token(abc);
@ -450,9 +541,7 @@ EOF
 import std.stdio;
 >>
 token abc;
-/def/ <<
-  writeln("def!");
->>
+/def/ << writeln("def!"); >>
 /ghi/ <<
  writeln("ghi!");
  return $token(abc);
@ -460,9 +549,9 @@ token abc;
 Start -> abc;
 EOF
        end
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_return_token_from_pattern.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.status).to eq 0
        verify_lines(results.stdout, [
          "def!",
@ -518,9 +607,9 @@ string: /"/ <<
 Start -> abc string def;
 EOF
        end
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_lexer_modes.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.status).to eq 0
        verify_lines(results.stdout, [
          "begin string mode",
@ -541,15 +630,9 @@ EOF
 >>
 token a;
 token b;
-Start -> A B <<
-  printf("Start!\\n");
->>
-A -> a <<
-  printf("A!\\n");
->>
-B -> b <<
-  printf("B!\\n");
->>
+Start -> A B << printf("Start!\\n"); >>
+A -> a << printf("A!\\n"); >>
+B -> b << printf("B!\\n"); >>
 EOF
        when "d"
          write_grammar <<EOF
@ -558,20 +641,14 @@ import std.stdio;
 >>
 token a;
 token b;
-Start -> A B <<
-  writeln("Start!");
->>
-A -> a <<
-  writeln("A!");
->>
-B -> b <<
-  writeln("B!");
->>
+Start -> A B << writeln("Start!"); >>
+A -> a << writeln("A!"); >>
+B -> b << writeln("B!"); >>
 EOF
        end
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_parser_rule_user_code.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.status).to eq 0
        verify_lines(results.stdout, [
          "A!",
@ -584,19 +661,13 @@ EOF
        write_grammar <<EOF
 ptype #{language == "c" ? "uint32_t" : "uint"};
 token a;
-Start -> As <<
-  $$ = $1;
->>
-As -> <<
-  $$ = 0u;
->>
-As -> As a <<
-  $$ = $1 + 1u;
->>
+Start -> As << $$ = $1; >>
+As -> << $$ = 0u; >>
+As -> As a << $$ = $1 + 1u; >>
 EOF
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_parsing_lists.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.status).to eq 0
        expect(results.stderr).to eq ""
      end
@ -615,9 +686,9 @@ Start -> b E d;
 E -> e;
 F -> e;
 EOF
-        results = build_parser(capture: true, language: language)
+        results = run_propane(capture: true, language: language)
        expect(results.status).to_not eq 0
-        expect(results.stderr).to match %r{reduce/reduce conflict.*\(E\).*\(F\)}
+        expect(results.stderr).to match %r{Error: reduce/reduce conflict \(state \d+\) between rule E#\d+ \(defined on line 10\) and rule F#\d+ \(defined on line 11\)}
      end

      it "provides matched text to user code blocks" do
@ -647,9 +718,9 @@ token id /[a-zA-Z_][a-zA-Z0-9_]*/ <<
 Start -> id;
 EOF
        end
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_lexer_match_text.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.status).to eq 0
        verify_lines(results.stdout, [
          "Matched token is identifier_123",
@ -680,9 +751,9 @@ Start -> word <<
 >>
 EOF
        end
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_lexer_result_value.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.stderr).to eq ""
        expect(results.status).to eq 0
      end
@ -695,16 +766,16 @@ drop /\\s+/;
 Start -> a num Start;
 Start -> a num;
 EOF
-        build_parser(language: language)
+        run_propane(language: language)
        compile("spec/test_error_positions.#{language}", language: language)
-        results = run
+        results = run_test
        expect(results.stderr).to eq ""
        expect(results.status).to eq 0
      end

      it "allows creating a JSON parser" do
        write_grammar(File.read("spec/json_parser.#{language}.propane"))
-        build_parser(language: language)
+        run_propane(language: language)
        compile(["spec/test_parsing_json.#{language}", "spec/json_types.#{language}"], language: language)
      end

@ -716,19 +787,439 @@ token num /\\d+/;
 drop /\\s+/;
 Start -> a num;
 EOF
-        build_parser(name: "myp1", language: language)
+        run_propane(name: "myp1", language: language)
        write_grammar(<<EOF, name: "myp2")
 prefix myp2_;
 token b;
 token c;
 Start -> b c b;
 EOF
-        build_parser(name: "myp2", language: language)
+        run_propane(name: "myp2", language: language)
        compile("spec/test_multiple_parsers.#{language}", parsers: %w[myp1 myp2], language: language)
-        results = run
+        results = run_test
        expect(results.stderr).to eq ""
        expect(results.status).to eq 0
      end
+
+      it "allows the user to terminate the lexer" do
+        write_grammar <<EOF
+token a;
+token b <<
+  $terminate(8675309);
+>>
+token c;
+Start -> Any;
+Any -> a;
+Any -> b;
+Any -> c;
+EOF
+        run_propane(language: language)
+        compile("spec/test_user_terminate_lexer.#{language}", language: language)
+        results = run_test
+        expect(results.stderr).to eq ""
+        expect(results.status).to eq 0
+      end
+
+      it "allows the user to terminate the parser" do
+        write_grammar <<EOF
+token a;
+token b;
+token c;
+Start -> Any;
+Any -> a Any;
+Any -> b Any << $terminate(4200); >>
+Any -> c Any;
+Any -> ;
+EOF
+        run_propane(language: language)
+        compile("spec/test_user_terminate.#{language}", language: language)
+        results = run_test
+        expect(results.stderr).to eq ""
+        expect(results.status).to eq 0
+      end
+
+      it "matches backslash escape sequences" do
+        case language
+        when "c"
+          write_grammar <<EOF
+<<
+  #include <stdio.h>
+>>
+tokenid t;
+/\\a/ << printf("A\\n"); >>
+/\\b/ << printf("B\\n"); >>
+/\\t/ << printf("T\\n"); >>
+/\\n/ << printf("N\\n"); >>
+/\\v/ << printf("V\\n"); >>
+/\\f/ << printf("F\\n"); >>
+/\\r/ << printf("R\\n"); >>
+/t/ << return $token(t); >>
+Start -> t;
+EOF
+        when "d"
+          write_grammar <<EOF
+<<
+  import std.stdio;
+>>
+tokenid t;
+/\\a/ << writeln("A"); >>
+/\\b/ << writeln("B"); >>
+/\\t/ << writeln("T"); >>
+/\\n/ << writeln("N"); >>
+/\\v/ << writeln("V"); >>
+/\\f/ << writeln("F"); >>
+/\\r/ << writeln("R"); >>
+/t/ <<
+  return $token(t);
+>>
+Start -> t;
+EOF
+        end
+        run_propane(language: language)
+        compile("spec/test_match_backslashes.#{language}", language: language)
+        results = run_test
+        expect(results.stderr).to eq ""
+        expect(results.status).to eq 0
+        verify_lines(results.stdout, [
+          "A",
+          "B",
+          "T",
+          "N",
+          "V",
+          "F",
+          "R",
+        ])
+      end
+
+      it "handles when an item set leads to itself" do
+        write_grammar <<EOF
+token one;
+token two;
+
+Start -> Opt one Start;
+Start -> ;
+
+Opt -> two;
+Opt -> ;
+EOF
+        run_propane(language: language)
+      end
+
+      it "generates an AST" do
+        write_grammar <<EOF
+ast;
+
+ptype int;
+
+token a << $$ = 11; >>
+token b << $$ = 22; >>
+token one /1/;
+token two /2/;
+token comma /,/ <<
+  $$ = 42;
+>>
+token lparen /\\(/;
+token rparen /\\)/;
+drop /\\s+/;
+
+Start -> Items;
+
+Items -> Item ItemsMore;
+Items -> ;
+
+ItemsMore -> comma Item ItemsMore;
+ItemsMore -> ;
+
+Item -> a;
+Item -> b;
+Item -> lparen Item rparen;
+Item -> Dual;
+
+Dual -> One Two;
+Dual -> Two One;
+One -> one;
+Two -> two;
+EOF
+        run_propane(language: language)
+        compile("spec/test_ast.#{language}", language: language)
+        results = run_test
+        expect(results.stderr).to eq ""
+        expect(results.status).to eq 0
+      end
+
+      it "supports AST node prefix and suffix" do
+        write_grammar <<EOF
+ast;
+ast_prefix P ;
+ast_suffix  S;
+
+ptype int;
+
+token a << $$ = 11; >>
+token b << $$ = 22; >>
+token one /1/;
+token two /2/;
+token comma /,/ <<
+  $$ = 42;
+>>
+token lparen /\\(/;
+token rparen /\\)/;
+drop /\\s+/;
+
+Start -> Items;
+
+Items -> Item ItemsMore;
+Items -> ;
+
+ItemsMore -> comma Item ItemsMore;
+ItemsMore -> ;
+
+Item -> a;
+Item -> b;
+Item -> lparen Item rparen;
+Item -> Dual;
+
+Dual -> One Two;
+Dual -> Two One;
+One -> one;
+Two -> two;
+EOF
+        run_propane(language: language)
+        compile("spec/test_ast_ps.#{language}", language: language)
+        results = run_test
+        expect(results.stderr).to eq ""
+        expect(results.status).to eq 0
+      end
+
+      it "allows specifying a different start rule" do
+        write_grammar <<EOF
+token hi;
+start Top;
+Top -> hi;
+EOF
+        run_propane(language: language)
+        compile("spec/test_start_rule.#{language}", language: language)
+      end
+
+      it "allows specifying a different start rule with AST generation" do
+        write_grammar <<EOF
+ast;
+token hi;
+start Top;
+Top -> hi;
+EOF
+        run_propane(language: language)
+        compile("spec/test_start_rule_ast.#{language}", language: language)
+      end
+
+      it "allows marking a rule component as optional" do
+        if language == "d"
+          write_grammar <<EOF
+<<
+import std.stdio;
+>>
+
+ptype int;
+ptype float = float;
+ptype string = string;
+
+token a (float) << $$ = 1.5; >>
+token b << $$ = 2; >>
+token c << $$ = 3; >>
+token d << $$ = 4; >>
+Start -> a? b R? <<
+  writeln("a: ", $1);
+  writeln("b: ", $2);
+  writeln("R: ", $3);
+>>
+R -> c d << $$ = "cd"; >>
+R (string) -> d c << $$ = "dc"; >>
+EOF
+        else
+          write_grammar <<EOF
+<<
+#include <stdio.h>
+>>
+
+ptype int;
+ptype float = float;
+ptype string = char *;
+
+token a (float) << $$ = 1.5; >>
+token b << $$ = 2; >>
+token c << $$ = 3; >>
+token d << $$ = 4; >>
+Start -> a? b R? <<
+  printf("a: %.1f\\n", $1);
+  printf("b: %d\\n", $2);
+  printf("R: %s\\n", $3 == NULL ? "" : $3);
+>>
+R -> c d << $$ = "cd"; >>
+R (string) -> d c << $$ = "dc"; >>
+EOF
+        end
+        run_propane(language: language)
+        compile("spec/test_optional_rule_component.#{language}", language: language)
+        results = run_test
+        expect(results.stderr).to eq ""
+        expect(results.status).to eq 0
+        verify_lines(results.stdout, [
+          "a: 0#{language == "d" ? "" : ".0"}",
+          "b: 2",
+          "R: ",
+          "a: 1.5",
+          "b: 2",
+          "R: cd",
+          "a: 1.5",
+          "b: 2",
+          "R: dc",
+        ])
+      end
+
+      it "allows marking a rule component as optional in AST generation mode" do
+        if language == "d"
+          write_grammar <<EOF
+ast;
+
+<<
+import std.stdio;
+>>
+
+token a;
+token b;
+token c;
+token d;
+Start -> a? b R?;
+R -> c d;
+R -> d c;
+EOF
+        else
+          write_grammar <<EOF
+ast;
+
+<<
+#include <stdio.h>
+>>
+
+token a;
+token b;
+token c;
+token d;
+Start -> a? b R?;
+R -> c d;
+R -> d c;
+EOF
+        end
+        run_propane(language: language)
+        compile("spec/test_optional_rule_component_ast.#{language}", language: language)
+        results = run_test
+        expect(results.stderr).to eq ""
+        expect(results.status).to eq 0
+      end
+
+      it "stores token and rule positions in AST nodes" do
+        write_grammar <<EOF
+ast;
+
+token a;
+token bb;
+token c /c(.|\\n)*c/;
+drop /\\s+/;
+Start -> T T T;
+T -> a;
+T -> bb;
+T -> c;
+EOF
+        run_propane(language: language)
+        compile("spec/test_ast_token_positions.#{language}", language: language)
+        results = run_test
+        expect(results.stderr).to eq ""
+        expect(results.status).to eq 0
+      end
+
+      it "stores invalid positions for empty rule matches" do
+        write_grammar <<EOF
+ast;
+
+token a;
+token bb;
+token c /c(.|\\n)*c/;
+drop /\\s+/;
+Start -> T Start;
+Start -> ;
+T -> a A;
+A -> bb? c?;
+EOF
+        run_propane(language: language)
+        compile("spec/test_ast_invalid_positions.#{language}", language: language)
+        results = run_test
+        expect(results.stderr).to eq ""
+        expect(results.status).to eq 0
+      end
+
+      it "allows specifying field aliases in AST mode" do
+        write_grammar <<EOF
+ast;
+
+token a;
+token b;
+token c;
+drop /\\s+/;
+Start -> T:first T:second T:third;
+T -> a;
+T -> b;
+T -> c;
+EOF
+        run_propane(language: language)
+        compile("spec/test_ast_field_aliases.#{language}", language: language)
+        results = run_test
+        expect(results.stderr).to eq ""
+        expect(results.status).to eq 0
+      end
+
+      it "allows specifying field aliases when AST mode is not enabled" do
+        if language == "d"
+          write_grammar <<EOF
+<<
+import std.stdio;
+>>
+ptype string;
+token id /[a-zA-Z_][a-zA-Z0-9_]*/ <<
+  $$ = match;
+>>
+drop /\\s+/;
+Start -> id:first id:second <<
+  writeln("first is ", ${first});
+  writeln("second is ", ${second});
+>>
+EOF
+        else
+          write_grammar <<EOF
+<<
+#include <stdio.h>
+#include <string.h>
+>>
+ptype char const *;
+token id /[a-zA-Z_][a-zA-Z0-9_]*/ <<
+  char * s = malloc(match_length + 1);
+  strncpy(s, (char const *)match, match_length);
+  s[match_length] = 0;
+  $$ = s;
+>>
+drop /\\s+/;
+Start -> id:first id:second <<
+  printf("first is %s\\n", ${first});
+  printf("second is %s\\n", ${second});
+>>
+EOF
+        end
+        run_propane(language: language)
+        compile("spec/test_field_aliases.#{language}", language: language)
+        results = run_test
+        expect(results.stderr).to eq ""
+        expect(results.status).to eq 0
+        expect(results.stdout).to match /first is foo1.*second is bar2/m
+      end
    end
  end
 end
--- a/spec/test_ast.c
+++ b/spec/test_ast.c
@ -0,0 +1,55 @@
+#include "testparser.h"
+#include <assert.h>
+#include <string.h>
+#include "testutils.h"
+
+int main()
+{
+    char const * input = "a, ((b)), b";
+    p_context_t context;
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert_eq(P_SUCCESS, p_parse(&context));
+    Start * start = p_result(&context);
+    assert(start->pItems1 != NULL);
+    assert(start->pItems != NULL);
+    Items * items = start->pItems;
+    assert(items->pItem != NULL);
+    assert(items->pItem->pToken1 != NULL);
+    assert_eq(TOKEN_a, items->pItem->pToken1->token);
+    assert_eq(11, items->pItem->pToken1->pvalue);
+    assert(items->pItemsMore != NULL);
+    ItemsMore * itemsmore = items->pItemsMore;
+    assert(itemsmore->pItem != NULL);
+    assert(itemsmore->pItem->pItem != NULL);
+    assert(itemsmore->pItem->pItem->pItem != NULL);
+    assert(itemsmore->pItem->pItem->pItem->pToken1 != NULL);
+    assert_eq(TOKEN_b, itemsmore->pItem->pItem->pItem->pToken1->token);
+    assert_eq(22, itemsmore->pItem->pItem->pItem->pToken1->pvalue);
+    assert(itemsmore->pItemsMore != NULL);
+    itemsmore = itemsmore->pItemsMore;
+    assert(itemsmore->pItem != NULL);
+    assert(itemsmore->pItem->pToken1 != NULL);
+    assert_eq(TOKEN_b, itemsmore->pItem->pToken1->token);
+    assert_eq(22, itemsmore->pItem->pToken1->pvalue);
+    assert(itemsmore->pItemsMore == NULL);
+
+    input = "";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert_eq(P_SUCCESS, p_parse(&context));
+    start = p_result(&context);
+    assert(start->pItems == NULL);
+
+    input = "2 1";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert_eq(P_SUCCESS, p_parse(&context));
+    start = p_result(&context);
+    assert(start->pItems != NULL);
+    assert(start->pItems->pItem != NULL);
+    assert(start->pItems->pItem->pDual != NULL);
+    assert(start->pItems->pItem->pDual->pTwo1 != NULL);
+    assert(start->pItems->pItem->pDual->pOne2 != NULL);
+    assert(start->pItems->pItem->pDual->pTwo2 == NULL);
+    assert(start->pItems->pItem->pDual->pOne1 == NULL);
+
+    return 0;
+}
--- a/spec/test_ast.d
+++ b/spec/test_ast.d
@ -0,0 +1,57 @@
+import testparser;
+import std.stdio;
+import testutils;
+
+int main()
+{
+    return 0;
+}
+
+unittest
+{
+    string input = "a, ((b)), b";
+    p_context_t context;
+    p_context_init(&context, input);
+    assert_eq(P_SUCCESS, p_parse(&context));
+    Start * start = p_result(&context);
+    assert(start.pItems1 !is null);
+    assert(start.pItems !is null);
+    Items * items = start.pItems;
+    assert(items.pItem !is null);
+    assert(items.pItem.pToken1 !is null);
+    assert_eq(TOKEN_a, items.pItem.pToken1.token);
+    assert_eq(11, items.pItem.pToken1.pvalue);
+    assert(items.pItemsMore !is null);
+    ItemsMore * itemsmore = items.pItemsMore;
+    assert(itemsmore.pItem !is null);
+    assert(itemsmore.pItem.pItem !is null);
+    assert(itemsmore.pItem.pItem.pItem !is null);
+    assert(itemsmore.pItem.pItem.pItem.pToken1 !is null);
+    assert_eq(TOKEN_b, itemsmore.pItem.pItem.pItem.pToken1.token);
+    assert_eq(22, itemsmore.pItem.pItem.pItem.pToken1.pvalue);
+    assert(itemsmore.pItemsMore !is null);
+    itemsmore = itemsmore.pItemsMore;
+    assert(itemsmore.pItem !is null);
+    assert(itemsmore.pItem.pToken1 !is null);
+    assert_eq(TOKEN_b, itemsmore.pItem.pToken1.token);
+    assert_eq(22, itemsmore.pItem.pToken1.pvalue);
+    assert(itemsmore.pItemsMore is null);
+
+    input = "";
+    p_context_init(&context, input);
+    assert_eq(P_SUCCESS, p_parse(&context));
+    start = p_result(&context);
+    assert(start.pItems is null);
+
+    input = "2 1";
+    p_context_init(&context, input);
+    assert_eq(P_SUCCESS, p_parse(&context));
+    start = p_result(&context);
+    assert(start.pItems !is null);
+    assert(start.pItems.pItem !is null);
+    assert(start.pItems.pItem.pDual !is null);
+    assert(start.pItems.pItem.pDual.pTwo1 !is null);
+    assert(start.pItems.pItem.pDual.pOne2 !is null);
+    assert(start.pItems.pItem.pDual.pTwo2 is null);
+    assert(start.pItems.pItem.pDual.pOne1 is null);
+}
--- a/spec/test_ast_field_aliases.c
+++ b/spec/test_ast_field_aliases.c
@ -0,0 +1,19 @@
+#include "testparser.h"
+#include <assert.h>
+#include <string.h>
+#include "testutils.h"
+
+int main()
+{
+    char const * input = "\na\nb\nc";
+    p_context_t context;
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+    Start * start = p_result(&context);
+
+    assert_eq(TOKEN_a, start->first->pToken->token);
+    assert_eq(TOKEN_b, start->second->pToken->token);
+    assert_eq(TOKEN_c, start->third->pToken->token);
+
+    return 0;
+}
--- a/spec/test_ast_field_aliases.d
+++ b/spec/test_ast_field_aliases.d
@ -0,0 +1,21 @@
+import testparser;
+import std.stdio;
+import testutils;
+
+int main()
+{
+    return 0;
+}
+
+unittest
+{
+    string input = "\na\nb\nc";
+    p_context_t context;
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+    Start * start = p_result(&context);
+
+    assert_eq(TOKEN_a, start.first.pToken.token);
+    assert_eq(TOKEN_b, start.second.pToken.token);
+    assert_eq(TOKEN_c, start.third.pToken.token);
+}
--- a/spec/test_ast_invalid_positions.c
+++ b/spec/test_ast_invalid_positions.c
@ -0,0 +1,102 @@
+#include "testparser.h"
+#include <assert.h>
+#include <string.h>
+#include "testutils.h"
+
+int main()
+{
+    char const * input = "\na\n  bb ccc";
+    p_context_t context;
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+    Start * start = p_result(&context);
+
+    assert_eq(1, start->pT1->pToken->position.row);
+    assert_eq(0, start->pT1->pToken->position.col);
+    assert_eq(1, start->pT1->pToken->end_position.row);
+    assert_eq(0, start->pT1->pToken->end_position.col);
+    assert(p_position_valid(start->pT1->pA->position));
+    assert_eq(2, start->pT1->pA->position.row);
+    assert_eq(2, start->pT1->pA->position.col);
+    assert_eq(2, start->pT1->pA->end_position.row);
+    assert_eq(7, start->pT1->pA->end_position.col);
+    assert_eq(1, start->pT1->position.row);
+    assert_eq(0, start->pT1->position.col);
+    assert_eq(2, start->pT1->end_position.row);
+    assert_eq(7, start->pT1->end_position.col);
+
+    assert_eq(1, start->position.row);
+    assert_eq(0, start->position.col);
+    assert_eq(2, start->end_position.row);
+    assert_eq(7, start->end_position.col);
+
+    input = "a\nbb";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+    start = p_result(&context);
+
+    assert_eq(0, start->pT1->pToken->position.row);
+    assert_eq(0, start->pT1->pToken->position.col);
+    assert_eq(0, start->pT1->pToken->end_position.row);
+    assert_eq(0, start->pT1->pToken->end_position.col);
+    assert(p_position_valid(start->pT1->pA->position));
+    assert_eq(1, start->pT1->pA->position.row);
+    assert_eq(0, start->pT1->pA->position.col);
+    assert_eq(1, start->pT1->pA->end_position.row);
+    assert_eq(1, start->pT1->pA->end_position.col);
+    assert_eq(0, start->pT1->position.row);
+    assert_eq(0, start->pT1->position.col);
+    assert_eq(1, start->pT1->end_position.row);
+    assert_eq(1, start->pT1->end_position.col);
+
+    assert_eq(0, start->position.row);
+    assert_eq(0, start->position.col);
+    assert_eq(1, start->end_position.row);
+    assert_eq(1, start->end_position.col);
+
+    input = "a\nc\nc";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+    start = p_result(&context);
+
+    assert_eq(0, start->pT1->pToken->position.row);
+    assert_eq(0, start->pT1->pToken->position.col);
+    assert_eq(0, start->pT1->pToken->end_position.row);
+    assert_eq(0, start->pT1->pToken->end_position.col);
+    assert(p_position_valid(start->pT1->pA->position));
+    assert_eq(1, start->pT1->pA->position.row);
+    assert_eq(0, start->pT1->pA->position.col);
+    assert_eq(2, start->pT1->pA->end_position.row);
+    assert_eq(0, start->pT1->pA->end_position.col);
+    assert_eq(0, start->pT1->position.row);
+    assert_eq(0, start->pT1->position.col);
+    assert_eq(2, start->pT1->end_position.row);
+    assert_eq(0, start->pT1->end_position.col);
+
+    assert_eq(0, start->position.row);
+    assert_eq(0, start->position.col);
+    assert_eq(2, start->end_position.row);
+    assert_eq(0, start->end_position.col);
+
+    input = "a";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+    start = p_result(&context);
+
+    assert_eq(0, start->pT1->pToken->position.row);
+    assert_eq(0, start->pT1->pToken->position.col);
+    assert_eq(0, start->pT1->pToken->end_position.row);
+    assert_eq(0, start->pT1->pToken->end_position.col);
+    assert(!p_position_valid(start->pT1->pA->position));
+    assert_eq(0, start->pT1->position.row);
+    assert_eq(0, start->pT1->position.col);
+    assert_eq(0, start->pT1->end_position.row);
+    assert_eq(0, start->pT1->end_position.col);
+
+    assert_eq(0, start->position.row);
+    assert_eq(0, start->position.col);
+    assert_eq(0, start->end_position.row);
+    assert_eq(0, start->end_position.col);
+
+    return 0;
+}
--- a/spec/test_ast_invalid_positions.d
+++ b/spec/test_ast_invalid_positions.d
@ -0,0 +1,104 @@
+import testparser;
+import std.stdio;
+import testutils;
+
+int main()
+{
+    return 0;
+}
+
+unittest
+{
+    string input = "\na\n  bb ccc";
+    p_context_t context;
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+    Start * start = p_result(&context);
+
+    assert_eq(1, start.pT1.pToken.position.row);
+    assert_eq(0, start.pT1.pToken.position.col);
+    assert_eq(1, start.pT1.pToken.end_position.row);
+    assert_eq(0, start.pT1.pToken.end_position.col);
+    assert(start.pT1.pA.position.valid);
+    assert_eq(2, start.pT1.pA.position.row);
+    assert_eq(2, start.pT1.pA.position.col);
+    assert_eq(2, start.pT1.pA.end_position.row);
+    assert_eq(7, start.pT1.pA.end_position.col);
+    assert_eq(1, start.pT1.position.row);
+    assert_eq(0, start.pT1.position.col);
+    assert_eq(2, start.pT1.end_position.row);
+    assert_eq(7, start.pT1.end_position.col);
+
+    assert_eq(1, start.position.row);
+    assert_eq(0, start.position.col);
+    assert_eq(2, start.end_position.row);
+    assert_eq(7, start.end_position.col);
+
+    input = "a\nbb";
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+    start = p_result(&context);
+
+    assert_eq(0, start.pT1.pToken.position.row);
+    assert_eq(0, start.pT1.pToken.position.col);
+    assert_eq(0, start.pT1.pToken.end_position.row);
+    assert_eq(0, start.pT1.pToken.end_position.col);
+    assert(start.pT1.pA.position.valid);
+    assert_eq(1, start.pT1.pA.position.row);
+    assert_eq(0, start.pT1.pA.position.col);
+    assert_eq(1, start.pT1.pA.end_position.row);
+    assert_eq(1, start.pT1.pA.end_position.col);
+    assert_eq(0, start.pT1.position.row);
+    assert_eq(0, start.pT1.position.col);
+    assert_eq(1, start.pT1.end_position.row);
+    assert_eq(1, start.pT1.end_position.col);
+
+    assert_eq(0, start.position.row);
+    assert_eq(0, start.position.col);
+    assert_eq(1, start.end_position.row);
+    assert_eq(1, start.end_position.col);
+
+    input = "a\nc\nc";
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+    start = p_result(&context);
+
+    assert_eq(0, start.pT1.pToken.position.row);
+    assert_eq(0, start.pT1.pToken.position.col);
+    assert_eq(0, start.pT1.pToken.end_position.row);
+    assert_eq(0, start.pT1.pToken.end_position.col);
+    assert(start.pT1.pA.position.valid);
+    assert_eq(1, start.pT1.pA.position.row);
+    assert_eq(0, start.pT1.pA.position.col);
+    assert_eq(2, start.pT1.pA.end_position.row);
+    assert_eq(0, start.pT1.pA.end_position.col);
+    assert_eq(0, start.pT1.position.row);
+    assert_eq(0, start.pT1.position.col);
+    assert_eq(2, start.pT1.end_position.row);
+    assert_eq(0, start.pT1.end_position.col);
+
+    assert_eq(0, start.position.row);
+    assert_eq(0, start.position.col);
+    assert_eq(2, start.end_position.row);
+    assert_eq(0, start.end_position.col);
+
+    input = "a";
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+    start = p_result(&context);
+
+    assert_eq(0, start.pT1.pToken.position.row);
+    assert_eq(0, start.pT1.pToken.position.col);
+    assert_eq(0, start.pT1.pToken.end_position.row);
+    assert_eq(0, start.pT1.pToken.end_position.col);
+    assert(!start.pT1.pA.position.valid);
+    assert_eq(0, start.pT1.position.row);
+    assert_eq(0, start.pT1.position.col);
+    assert_eq(0, start.pT1.end_position.row);
+    assert_eq(0, start.pT1.end_position.col);
+
+    assert_eq(0, start.position.row);
+    assert_eq(0, start.position.col);
+    assert_eq(0, start.end_position.row);
+    assert_eq(0, start.end_position.col);
+}
--- a/spec/test_ast_ps.c
+++ b/spec/test_ast_ps.c
@ -0,0 +1,55 @@
+#include "testparser.h"
+#include <assert.h>
+#include <string.h>
+#include "testutils.h"
+
+int main()
+{
+    char const * input = "a, ((b)), b";
+    p_context_t context;
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert_eq(P_SUCCESS, p_parse(&context));
+    PStartS * start = p_result(&context);
+    assert(start->pItems1 != NULL);
+    assert(start->pItems != NULL);
+    PItemsS * items = start->pItems;
+    assert(items->pItem != NULL);
+    assert(items->pItem->pToken1 != NULL);
+    assert_eq(TOKEN_a, items->pItem->pToken1->token);
+    assert_eq(11, items->pItem->pToken1->pvalue);
+    assert(items->pItemsMore != NULL);
+    PItemsMoreS * itemsmore = items->pItemsMore;
+    assert(itemsmore->pItem != NULL);
+    assert(itemsmore->pItem->pItem != NULL);
+    assert(itemsmore->pItem->pItem->pItem != NULL);
+    assert(itemsmore->pItem->pItem->pItem->pToken1 != NULL);
+    assert_eq(TOKEN_b, itemsmore->pItem->pItem->pItem->pToken1->token);
+    assert_eq(22, itemsmore->pItem->pItem->pItem->pToken1->pvalue);
+    assert(itemsmore->pItemsMore != NULL);
+    itemsmore = itemsmore->pItemsMore;
+    assert(itemsmore->pItem != NULL);
+    assert(itemsmore->pItem->pToken1 != NULL);
+    assert_eq(TOKEN_b, itemsmore->pItem->pToken1->token);
+    assert_eq(22, itemsmore->pItem->pToken1->pvalue);
+    assert(itemsmore->pItemsMore == NULL);
+
+    input = "";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert_eq(P_SUCCESS, p_parse(&context));
+    start = p_result(&context);
+    assert(start->pItems == NULL);
+
+    input = "2 1";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert_eq(P_SUCCESS, p_parse(&context));
+    start = p_result(&context);
+    assert(start->pItems != NULL);
+    assert(start->pItems->pItem != NULL);
+    assert(start->pItems->pItem->pDual != NULL);
+    assert(start->pItems->pItem->pDual->pTwo1 != NULL);
+    assert(start->pItems->pItem->pDual->pOne2 != NULL);
+    assert(start->pItems->pItem->pDual->pTwo2 == NULL);
+    assert(start->pItems->pItem->pDual->pOne1 == NULL);
+
+    return 0;
+}
--- a/spec/test_ast_ps.d
+++ b/spec/test_ast_ps.d
@ -0,0 +1,57 @@
+import testparser;
+import std.stdio;
+import testutils;
+
+int main()
+{
+    return 0;
+}
+
+unittest
+{
+    string input = "a, ((b)), b";
+    p_context_t context;
+    p_context_init(&context, input);
+    assert_eq(P_SUCCESS, p_parse(&context));
+    PStartS * start = p_result(&context);
+    assert(start.pItems1 !is null);
+    assert(start.pItems !is null);
+    PItemsS * items = start.pItems;
+    assert(items.pItem !is null);
+    assert(items.pItem.pToken1 !is null);
+    assert_eq(TOKEN_a, items.pItem.pToken1.token);
+    assert_eq(11, items.pItem.pToken1.pvalue);
+    assert(items.pItemsMore !is null);
+    PItemsMoreS * itemsmore = items.pItemsMore;
+    assert(itemsmore.pItem !is null);
+    assert(itemsmore.pItem.pItem !is null);
+    assert(itemsmore.pItem.pItem.pItem !is null);
+    assert(itemsmore.pItem.pItem.pItem.pToken1 !is null);
+    assert_eq(TOKEN_b, itemsmore.pItem.pItem.pItem.pToken1.token);
+    assert_eq(22, itemsmore.pItem.pItem.pItem.pToken1.pvalue);
+    assert(itemsmore.pItemsMore !is null);
+    itemsmore = itemsmore.pItemsMore;
+    assert(itemsmore.pItem !is null);
+    assert(itemsmore.pItem.pToken1 !is null);
+    assert_eq(TOKEN_b, itemsmore.pItem.pToken1.token);
+    assert_eq(22, itemsmore.pItem.pToken1.pvalue);
+    assert(itemsmore.pItemsMore is null);
+
+    input = "";
+    p_context_init(&context, input);
+    assert_eq(P_SUCCESS, p_parse(&context));
+    start = p_result(&context);
+    assert(start.pItems is null);
+
+    input = "2 1";
+    p_context_init(&context, input);
+    assert_eq(P_SUCCESS, p_parse(&context));
+    start = p_result(&context);
+    assert(start.pItems !is null);
+    assert(start.pItems.pItem !is null);
+    assert(start.pItems.pItem.pDual !is null);
+    assert(start.pItems.pItem.pDual.pTwo1 !is null);
+    assert(start.pItems.pItem.pDual.pOne2 !is null);
+    assert(start.pItems.pItem.pDual.pTwo2 is null);
+    assert(start.pItems.pItem.pDual.pOne1 is null);
+}
--- a/spec/test_ast_token_positions.c
+++ b/spec/test_ast_token_positions.c
@ -0,0 +1,84 @@
+#include "testparser.h"
+#include <assert.h>
+#include <string.h>
+#include "testutils.h"
+
+int main()
+{
+    char const * input = "abbccc";
+    p_context_t context;
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+    Start * start = p_result(&context);
+
+    assert_eq(0, start->pT1->pToken->position.row);
+    assert_eq(0, start->pT1->pToken->position.col);
+    assert_eq(0, start->pT1->pToken->end_position.row);
+    assert_eq(0, start->pT1->pToken->end_position.col);
+    assert_eq(0, start->pT1->position.row);
+    assert_eq(0, start->pT1->position.col);
+    assert_eq(0, start->pT1->end_position.row);
+    assert_eq(0, start->pT1->end_position.col);
+
+    assert_eq(0, start->pT2->pToken->position.row);
+    assert_eq(1, start->pT2->pToken->position.col);
+    assert_eq(0, start->pT2->pToken->end_position.row);
+    assert_eq(2, start->pT2->pToken->end_position.col);
+    assert_eq(0, start->pT2->position.row);
+    assert_eq(1, start->pT2->position.col);
+    assert_eq(0, start->pT2->end_position.row);
+    assert_eq(2, start->pT2->end_position.col);
+
+    assert_eq(0, start->pT3->pToken->position.row);
+    assert_eq(3, start->pT3->pToken->position.col);
+    assert_eq(0, start->pT3->pToken->end_position.row);
+    assert_eq(5, start->pT3->pToken->end_position.col);
+    assert_eq(0, start->pT3->position.row);
+    assert_eq(3, start->pT3->position.col);
+    assert_eq(0, start->pT3->end_position.row);
+    assert_eq(5, start->pT3->end_position.col);
+
+    assert_eq(0, start->position.row);
+    assert_eq(0, start->position.col);
+    assert_eq(0, start->end_position.row);
+    assert_eq(5, start->end_position.col);
+
+    input = "\n\n  bb\nc\ncc\n\n     a";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+    start = p_result(&context);
+
+    assert_eq(2, start->pT1->pToken->position.row);
+    assert_eq(2, start->pT1->pToken->position.col);
+    assert_eq(2, start->pT1->pToken->end_position.row);
+    assert_eq(3, start->pT1->pToken->end_position.col);
+    assert_eq(2, start->pT1->position.row);
+    assert_eq(2, start->pT1->position.col);
+    assert_eq(2, start->pT1->end_position.row);
+    assert_eq(3, start->pT1->end_position.col);
+
+    assert_eq(3, start->pT2->pToken->position.row);
+    assert_eq(0, start->pT2->pToken->position.col);
+    assert_eq(4, start->pT2->pToken->end_position.row);
+    assert_eq(1, start->pT2->pToken->end_position.col);
+    assert_eq(3, start->pT2->position.row);
+    assert_eq(0, start->pT2->position.col);
+    assert_eq(4, start->pT2->end_position.row);
+    assert_eq(1, start->pT2->end_position.col);
+
+    assert_eq(6, start->pT3->pToken->position.row);
+    assert_eq(5, start->pT3->pToken->position.col);
+    assert_eq(6, start->pT3->pToken->end_position.row);
+    assert_eq(5, start->pT3->pToken->end_position.col);
+    assert_eq(6, start->pT3->position.row);
+    assert_eq(5, start->pT3->position.col);
+    assert_eq(6, start->pT3->end_position.row);
+    assert_eq(5, start->pT3->end_position.col);
+
+    assert_eq(2, start->position.row);
+    assert_eq(2, start->position.col);
+    assert_eq(6, start->end_position.row);
+    assert_eq(5, start->end_position.col);
+
+    return 0;
+}
--- a/spec/test_ast_token_positions.d
+++ b/spec/test_ast_token_positions.d
@ -0,0 +1,86 @@
+import testparser;
+import std.stdio;
+import testutils;
+
+int main()
+{
+    return 0;
+}
+
+unittest
+{
+    string input = "abbccc";
+    p_context_t context;
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+    Start * start = p_result(&context);
+
+    assert_eq(0, start.pT1.pToken.position.row);
+    assert_eq(0, start.pT1.pToken.position.col);
+    assert_eq(0, start.pT1.pToken.end_position.row);
+    assert_eq(0, start.pT1.pToken.end_position.col);
+    assert_eq(0, start.pT1.position.row);
+    assert_eq(0, start.pT1.position.col);
+    assert_eq(0, start.pT1.end_position.row);
+    assert_eq(0, start.pT1.end_position.col);
+
+    assert_eq(0, start.pT2.pToken.position.row);
+    assert_eq(1, start.pT2.pToken.position.col);
+    assert_eq(0, start.pT2.pToken.end_position.row);
+    assert_eq(2, start.pT2.pToken.end_position.col);
+    assert_eq(0, start.pT2.position.row);
+    assert_eq(1, start.pT2.position.col);
+    assert_eq(0, start.pT2.end_position.row);
+    assert_eq(2, start.pT2.end_position.col);
+
+    assert_eq(0, start.pT3.pToken.position.row);
+    assert_eq(3, start.pT3.pToken.position.col);
+    assert_eq(0, start.pT3.pToken.end_position.row);
+    assert_eq(5, start.pT3.pToken.end_position.col);
+    assert_eq(0, start.pT3.position.row);
+    assert_eq(3, start.pT3.position.col);
+    assert_eq(0, start.pT3.end_position.row);
+    assert_eq(5, start.pT3.end_position.col);
+
+    assert_eq(0, start.position.row);
+    assert_eq(0, start.position.col);
+    assert_eq(0, start.end_position.row);
+    assert_eq(5, start.end_position.col);
+
+    input = "\n\n  bb\nc\ncc\n\n     a";
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+    start = p_result(&context);
+
+    assert_eq(2, start.pT1.pToken.position.row);
+    assert_eq(2, start.pT1.pToken.position.col);
+    assert_eq(2, start.pT1.pToken.end_position.row);
+    assert_eq(3, start.pT1.pToken.end_position.col);
+    assert_eq(2, start.pT1.position.row);
+    assert_eq(2, start.pT1.position.col);
+    assert_eq(2, start.pT1.end_position.row);
+    assert_eq(3, start.pT1.end_position.col);
+
+    assert_eq(3, start.pT2.pToken.position.row);
+    assert_eq(0, start.pT2.pToken.position.col);
+    assert_eq(4, start.pT2.pToken.end_position.row);
+    assert_eq(1, start.pT2.pToken.end_position.col);
+    assert_eq(3, start.pT2.position.row);
+    assert_eq(0, start.pT2.position.col);
+    assert_eq(4, start.pT2.end_position.row);
+    assert_eq(1, start.pT2.end_position.col);
+
+    assert_eq(6, start.pT3.pToken.position.row);
+    assert_eq(5, start.pT3.pToken.position.col);
+    assert_eq(6, start.pT3.pToken.end_position.row);
+    assert_eq(5, start.pT3.pToken.end_position.col);
+    assert_eq(6, start.pT3.position.row);
+    assert_eq(5, start.pT3.position.col);
+    assert_eq(6, start.pT3.end_position.row);
+    assert_eq(5, start.pT3.end_position.col);
+
+    assert_eq(2, start.position.row);
+    assert_eq(2, start.position.col);
+    assert_eq(6, start.end_position.row);
+    assert_eq(5, start.end_position.col);
+}
--- a/spec/test_error_positions.c
+++ b/spec/test_error_positions.c
@ -14,14 +14,14 @@ int main()
    assert(p_parse(&context) == P_UNEXPECTED_TOKEN);
    assert(p_position(&context).row == 2);
    assert(p_position(&context).col == 3);
-    assert(context.token == TOKEN_a);
+    assert(p_token(&context) == TOKEN_a);

    input = "12";
    p_context_init(&context, (uint8_t const *)input, strlen(input));
    assert(p_parse(&context) == P_UNEXPECTED_TOKEN);
    assert(p_position(&context).row == 0);
    assert(p_position(&context).col == 0);
-    assert(context.token == TOKEN_num);
+    assert(p_token(&context) == TOKEN_num);

    input = "a 12\n\nab";
    p_context_init(&context, (uint8_t const *)input, strlen(input));
@ -35,5 +35,8 @@ int main()
    assert(p_position(&context).row == 5);
    assert(p_position(&context).col == 4);

+    assert(strcmp(p_token_names[TOKEN_a], "a") == 0);
+    assert(strcmp(p_token_names[TOKEN_num], "num") == 0);
+
    return 0;
 }
--- a/spec/test_error_positions.d
+++ b/spec/test_error_positions.d
@ -17,13 +17,13 @@ unittest
    p_context_init(&context, input);
    assert(p_parse(&context) == P_UNEXPECTED_TOKEN);
    assert(p_position(&context) == p_position_t(2, 3));
-    assert(context.token == TOKEN_a);
+    assert(p_token(&context) == TOKEN_a);

    input = "12";
    p_context_init(&context, input);
    assert(p_parse(&context) == P_UNEXPECTED_TOKEN);
    assert(p_position(&context) == p_position_t(0, 0));
-    assert(context.token == TOKEN_num);
+    assert(p_token(&context) == TOKEN_num);

    input = "a 12\n\nab";
    p_context_init(&context, input);
@ -34,4 +34,7 @@ unittest
    p_context_init(&context, input);
    assert(p_parse(&context) == P_DECODE_ERROR);
    assert(p_position(&context) == p_position_t(5, 4));
+
+    assert(p_token_names[TOKEN_a] == "a");
+    assert(p_token_names[TOKEN_num] == "num");
 }
--- a/spec/test_field_aliases.c
+++ b/spec/test_field_aliases.c
@ -0,0 +1,13 @@
+#include "testparser.h"
+#include <assert.h>
+#include <string.h>
+#include "testutils.h"
+
+int main()
+{
+    char const * input = "foo1\nbar2";
+    p_context_t context;
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+    return 0;
+}
--- a/spec/test_field_aliases.d
+++ b/spec/test_field_aliases.d
@ -0,0 +1,15 @@
+import testparser;
+import std.stdio;
+
+int main()
+{
+    return 0;
+}
+
+unittest
+{
+    string input = "foo1\nbar2";
+    p_context_t context;
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+}
--- a/spec/test_lexer.c
+++ b/spec/test_lexer.c
@ -43,41 +43,57 @@ int main()
    assert(p_lex(&context, &token_info) == P_SUCCESS);
    assert(token_info.position.row == 0u);
    assert(token_info.position.col == 0u);
+    assert(token_info.end_position.row == 0u);
+    assert(token_info.end_position.col == 0u);
    assert(token_info.length == 1u);
    assert(token_info.token == TOKEN_int);
    assert(p_lex(&context, &token_info) == P_SUCCESS);
    assert(token_info.position.row == 0u);
    assert(token_info.position.col == 2u);
+    assert(token_info.end_position.row == 0u);
+    assert(token_info.end_position.col == 2u);
    assert(token_info.length == 1u);
    assert(token_info.token == TOKEN_plus);
    assert(p_lex(&context, &token_info) == P_SUCCESS);
    assert(token_info.position.row == 0u);
    assert(token_info.position.col == 4u);
+    assert(token_info.end_position.row == 0u);
+    assert(token_info.end_position.col == 4u);
    assert(token_info.length == 1u);
    assert(token_info.token == TOKEN_int);
    assert(p_lex(&context, &token_info) == P_SUCCESS);
    assert(token_info.position.row == 0u);
    assert(token_info.position.col == 6u);
+    assert(token_info.end_position.row == 0u);
+    assert(token_info.end_position.col == 6u);
    assert(token_info.length == 1u);
    assert(token_info.token == TOKEN_times);
    assert(p_lex(&context, &token_info) == P_SUCCESS);
    assert(token_info.position.row == 1u);
    assert(token_info.position.col == 0u);
+    assert(token_info.end_position.row == 1u);
+    assert(token_info.end_position.col == 2u);
    assert(token_info.length == 3u);
    assert(token_info.token == TOKEN_int);
    assert(p_lex(&context, &token_info) == P_SUCCESS);
    assert(token_info.position.row == 1u);
    assert(token_info.position.col == 4u);
+    assert(token_info.end_position.row == 1u);
+    assert(token_info.end_position.col == 4u);
    assert(token_info.length == 1u);
    assert(token_info.token == TOKEN_plus);
    assert(p_lex(&context, &token_info) == P_SUCCESS);
    assert(token_info.position.row == 1u);
    assert(token_info.position.col == 6u);
+    assert(token_info.end_position.row == 1u);
+    assert(token_info.end_position.col == 8u);
    assert(token_info.length == 3u);
    assert(token_info.token == TOKEN_int);
    assert(p_lex(&context, &token_info) == P_SUCCESS);
    assert(token_info.position.row == 1u);
    assert(token_info.position.col == 9u);
+    assert(token_info.end_position.row == 1u);
+    assert(token_info.end_position.col == 9u);
    assert(token_info.length == 0u);
    assert(token_info.token == TOKEN___EOF);

@ -85,6 +101,8 @@ int main()
    assert(p_lex(&context, &token_info) == P_SUCCESS);
    assert(token_info.position.row == 0u);
    assert(token_info.position.col == 0u);
+    assert(token_info.end_position.row == 0u);
+    assert(token_info.end_position.col == 0u);
    assert(token_info.length == 0u);
    assert(token_info.token == TOKEN___EOF);

--- a/spec/test_lexer.d
+++ b/spec/test_lexer.d
@ -47,23 +47,23 @@ unittest
    p_context_t context;
    p_context_init(&context, input);
    assert(p_lex(&context, &token_info) == P_SUCCESS);
-    assert(token_info == p_token_info_t(p_position_t(0, 0), 1, TOKEN_int));
+    assert(token_info == p_token_info_t(p_position_t(0, 0), p_position_t(0, 0), 1, TOKEN_int));
    assert(p_lex(&context, &token_info) == P_SUCCESS);
-    assert(token_info == p_token_info_t(p_position_t(0, 2), 1, TOKEN_plus));
+    assert(token_info == p_token_info_t(p_position_t(0, 2), p_position_t(0, 2), 1, TOKEN_plus));
    assert(p_lex(&context, &token_info) == P_SUCCESS);
-    assert(token_info == p_token_info_t(p_position_t(0, 4), 1, TOKEN_int));
+    assert(token_info == p_token_info_t(p_position_t(0, 4), p_position_t(0, 4), 1, TOKEN_int));
    assert(p_lex(&context, &token_info) == P_SUCCESS);
-    assert(token_info == p_token_info_t(p_position_t(0, 6), 1, TOKEN_times));
+    assert(token_info == p_token_info_t(p_position_t(0, 6), p_position_t(0, 6), 1, TOKEN_times));
    assert(p_lex(&context, &token_info) == P_SUCCESS);
-    assert(token_info == p_token_info_t(p_position_t(1, 0), 3, TOKEN_int));
+    assert(token_info == p_token_info_t(p_position_t(1, 0), p_position_t(1, 2), 3, TOKEN_int));
    assert(p_lex(&context, &token_info) == P_SUCCESS);
-    assert(token_info == p_token_info_t(p_position_t(1, 4), 1, TOKEN_plus));
+    assert(token_info == p_token_info_t(p_position_t(1, 4), p_position_t(1, 4), 1, TOKEN_plus));
    assert(p_lex(&context, &token_info) == P_SUCCESS);
-    assert(token_info == p_token_info_t(p_position_t(1, 6), 3, TOKEN_int));
+    assert(token_info == p_token_info_t(p_position_t(1, 6), p_position_t(1, 8), 3, TOKEN_int));
    assert(p_lex(&context, &token_info) == P_SUCCESS);
-    assert(token_info == p_token_info_t(p_position_t(1, 9), 0, TOKEN___EOF));
+    assert(token_info == p_token_info_t(p_position_t(1, 9), p_position_t(1, 9), 0, TOKEN___EOF));

    p_context_init(&context, "");
    assert(p_lex(&context, &token_info) == P_SUCCESS);
-    assert(token_info == p_token_info_t(p_position_t(0, 0), 0, TOKEN___EOF));
+    assert(token_info == p_token_info_t(p_position_t(0, 0), p_position_t(0, 0), 0, TOKEN___EOF));
 }
--- a/spec/test_match_backslashes.c
+++ b/spec/test_match_backslashes.c
@ -0,0 +1,13 @@
+#include "testparser.h"
+#include <assert.h>
+#include <string.h>
+
+int main()
+{
+    char const * input = "\a\b\t\n\v\f\rt";
+    p_context_t context;
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+
+    return 0;
+}
--- a/spec/test_match_backslashes.d
+++ b/spec/test_match_backslashes.d
@ -0,0 +1,15 @@
+import testparser;
+import std.stdio;
+
+int main()
+{
+    return 0;
+}
+
+unittest
+{
+    string input = "\a\b\t\n\v\f\rt";
+    p_context_t context;
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+}
--- a/spec/test_optional_rule_component.c
+++ b/spec/test_optional_rule_component.c
@ -0,0 +1,22 @@
+#include "testparser.h"
+#include <assert.h>
+#include <string.h>
+
+int main()
+{
+    char const * input = "b";
+    p_context_t context;
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+
+    input = "abcd";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+
+    input = "abdc";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+
+    return 0;
+}
+
--- a/spec/test_optional_rule_component.d
+++ b/spec/test_optional_rule_component.d
@ -0,0 +1,23 @@
+import testparser;
+import std.stdio;
+
+int main()
+{
+    return 0;
+}
+
+unittest
+{
+    string input = "b";
+    p_context_t context;
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+
+    input = "abcd";
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+
+    input = "abdc";
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+}
--- a/spec/test_optional_rule_component_ast.c
+++ b/spec/test_optional_rule_component_ast.c
@ -0,0 +1,42 @@
+#include "testparser.h"
+#include <assert.h>
+#include <string.h>
+#include "testutils.h"
+
+int main()
+{
+    char const * input = "b";
+    p_context_t context;
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+    Start * start = p_result(&context);
+    assert(start->pToken1 == NULL);
+    assert(start->pToken2 != NULL);
+    assert_eq(TOKEN_b, start->pToken2->token);
+    assert(start->pR3 == NULL);
+    assert(start->pR == NULL);
+
+    input = "abcd";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+    start = p_result(&context);
+    assert(start->pToken1 != NULL);
+    assert_eq(TOKEN_a, start->pToken1->token);
+    assert(start->pToken2 != NULL);
+    assert(start->pR3 != NULL);
+    assert(start->pR != NULL);
+    assert(start->pR == start->pR3);
+    assert_eq(TOKEN_c, start->pR->pToken1->token);
+
+    input = "bdc";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+    start = p_result(&context);
+    assert(start->pToken1 == NULL);
+    assert(start->pToken2 != NULL);
+    assert(start->pR != NULL);
+    assert_eq(TOKEN_d, start->pR->pToken1->token);
+
+    return 0;
+}
+
--- a/spec/test_optional_rule_component_ast.d
+++ b/spec/test_optional_rule_component_ast.d
@ -0,0 +1,43 @@
+import testparser;
+import std.stdio;
+import testutils;
+
+int main()
+{
+    return 0;
+}
+
+unittest
+{
+    string input = "b";
+    p_context_t context;
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+    Start * start = p_result(&context);
+    assert(start.pToken1 is null);
+    assert(start.pToken2 !is null);
+    assert_eq(TOKEN_b, start.pToken2.token);
+    assert(start.pR3 is null);
+    assert(start.pR is null);
+
+    input = "abcd";
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+    start = p_result(&context);
+    assert(start.pToken1 != null);
+    assert_eq(TOKEN_a, start.pToken1.token);
+    assert(start.pToken2 != null);
+    assert(start.pR3 != null);
+    assert(start.pR != null);
+    assert(start.pR == start.pR3);
+    assert_eq(TOKEN_c, start.pR.pToken1.token);
+
+    input = "bdc";
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+    start = p_result(&context);
+    assert(start.pToken1 is null);
+    assert(start.pToken2 !is null);
+    assert(start.pR !is null);
+    assert_eq(TOKEN_d, start.pR.pToken1.token);
+}
--- a/spec/test_start_rule.c
+++ b/spec/test_start_rule.c
@ -0,0 +1,9 @@
+#include "testparser.h"
+#include <assert.h>
+#include <string.h>
+#include "testutils.h"
+
+int main()
+{
+    return 0;
+}
--- a/spec/test_start_rule.d
+++ b/spec/test_start_rule.d
@ -0,0 +1,8 @@
+import testparser;
+import std.stdio;
+import testutils;
+
+int main()
+{
+    return 0;
+}
--- a/spec/test_start_rule_ast.c
+++ b/spec/test_start_rule_ast.c
@ -0,0 +1,17 @@
+#include "testparser.h"
+#include <assert.h>
+#include <string.h>
+#include "testutils.h"
+
+int main()
+{
+    char const * input = "hi";
+    p_context_t context;
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert_eq(P_SUCCESS, p_parse(&context));
+    Top * top = p_result(&context);
+    assert(top->pToken != NULL);
+    assert_eq(TOKEN_hi, top->pToken->token);
+
+    return 0;
+}
--- a/spec/test_start_rule_ast.d
+++ b/spec/test_start_rule_ast.d
@ -0,0 +1,19 @@
+import testparser;
+import std.stdio;
+import testutils;
+
+int main()
+{
+    return 0;
+}
+
+unittest
+{
+    string input = "hi";
+    p_context_t context;
+    p_context_init(&context, input);
+    assert_eq(P_SUCCESS, p_parse(&context));
+    Top * top = p_result(&context);
+    assert(top.pToken !is null);
+    assert_eq(TOKEN_hi, top.pToken.token);
+}
--- a/spec/test_user_terminate.c
+++ b/spec/test_user_terminate.c
@ -0,0 +1,19 @@
+#include "testparser.h"
+#include <assert.h>
+#include <stdio.h>
+#include <string.h>
+
+int main()
+{
+    char const * input = "aacc";
+    p_context_t context;
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+
+    input = "abc";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_USER_TERMINATED);
+    assert(p_user_terminate_code(&context) == 4200);
+
+    return 0;
+}
--- a/spec/test_user_terminate.d
+++ b/spec/test_user_terminate.d
@ -0,0 +1,20 @@
+import testparser;
+import std.stdio;
+
+int main()
+{
+    return 0;
+}
+
+unittest
+{
+    string input = "aacc";
+    p_context_t context;
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+
+    input = "abc";
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_USER_TERMINATED);
+    assert(p_user_terminate_code(&context) == 4200);
+}
--- a/spec/test_user_terminate_lexer.c
+++ b/spec/test_user_terminate_lexer.c
@ -0,0 +1,19 @@
+#include "testparser.h"
+#include <assert.h>
+#include <stdio.h>
+#include <string.h>
+
+int main()
+{
+    char const * input = "a";
+    p_context_t context;
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_SUCCESS);
+
+    input = "b";
+    p_context_init(&context, (uint8_t const *)input, strlen(input));
+    assert(p_parse(&context) == P_USER_TERMINATED);
+    assert(p_user_terminate_code(&context) == 8675309);
+
+    return 0;
+}
--- a/spec/test_user_terminate_lexer.d
+++ b/spec/test_user_terminate_lexer.d
@ -0,0 +1,20 @@
+import testparser;
+import std.stdio;
+
+int main()
+{
+    return 0;
+}
+
+unittest
+{
+    string input = "a";
+    p_context_t context;
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_SUCCESS);
+
+    input = "b";
+    p_context_init(&context, input);
+    assert(p_parse(&context) == P_USER_TERMINATED);
+    assert(p_user_terminate_code(&context) == 8675309);
+}
Author	SHA1	Message	Date
Josh Holtrop	c24f323ff0	v1.5.1	2024-07-26 22:30:48 -04:00
Josh Holtrop	fec2c28693	Only calculate lookahead tokens when needed - #28 Lookahead tokens are only need if either: (1) There is more than one rule that could be reduced in a given parser state, or (2) There are shift actions for a state and at least one rule that could be reduced in the same state (to warn about shift/reduce conflicts).	2024-07-26 22:08:25 -04:00
Josh Holtrop	61339aeae9	Avoid recalculating reduce_rules - #28	2024-07-26 21:36:41 -04:00
Josh Holtrop	95b3dc6550	Cache ItemSet#next_symbols - #28	2024-07-25 20:33:15 -04:00
Josh Holtrop	74d94fef72	Do not build ItemSet follow sets - #28	2024-07-25 20:02:00 -04:00
Josh Holtrop	588c5e21c7	Cache ItemSet#leading_item_sets return values - #28	2024-07-25 10:42:43 -04:00
Josh Holtrop	5f1c306273	Update CLI usage in README	2024-07-22 21:35:32 -04:00
Josh Holtrop	343e8a7f9e	v1.5.0	2024-07-22 21:23:38 -04:00
Josh Holtrop	b3a134bf8d	Update vim syntax to highlight "?" and field alias names	2024-07-22 20:39:59 -04:00
Josh Holtrop	4a71dc74fb	Update CHANGELOG for v1.5.0	2024-07-22 20:26:04 -04:00
Josh Holtrop	a7348be95d	Add rule field aliases - #24	2024-07-22 20:16:52 -04:00
Josh Holtrop	9746b3f2bf	Document position tracking fields in user guide - #27	2024-07-21 14:04:51 -04:00
Josh Holtrop	c5b8fc28bd	Move INVALID_POSITION from header to C source - #27	2024-07-21 13:39:34 -04:00
Josh Holtrop	092fce61eb	Test position validity for empty matching rules - #27	2024-07-21 13:39:30 -04:00
Josh Holtrop	e647248e34	Track start and end position of rules in AST nodes - #27	2024-07-19 15:37:37 -04:00
Josh Holtrop	f4ae1b8601	Add position fields to AST nodes (not populated yet) - #27	2024-07-19 14:34:50 -04:00
Josh Holtrop	eae2e17f41	Test tracking token end positions when the token spans a newline - #27	2024-07-18 12:09:26 -04:00
Josh Holtrop	87d6d29d60	Store token end position - #27	2024-07-18 12:03:44 -04:00
Josh Holtrop	3aced70356	Show line numbers of rules upon conflict - close #23	2024-07-14 20:52:52 -04:00
Josh Holtrop	2dd89445fc	Add command line switch to output warnings to stderr - close #26	2024-07-14 15:36:07 -04:00
Josh Holtrop	4ae5ab79b3	Warn on shift/reduce conflicts	2024-07-13 21:35:53 -04:00
Josh Holtrop	69cc8fa67d	Always compute lookahead tokens for reduce rules Even if they won't be needed for the generated parser, they'll be useful to detect shift/reduce conflicts.	2024-07-13 21:01:44 -04:00
Josh Holtrop	7f3eb8f315	Calculate follow token set for an ItemSet	2024-07-13 20:48:28 -04:00
Josh Holtrop	d76e12fea1	Rename "following" to "next" - #25 The term "following" could potentially imply an association with the "follow set", however it was used in a non-closed manner.	2024-07-08 10:14:09 -04:00
Josh Holtrop	911e9505b7	Track token position in AST Token node	2024-05-27 22:10:05 -04:00
Josh Holtrop	aaeb0c4db1	Remove leftover TODO from earlier restructuring	2024-05-27 20:44:42 -04:00
Josh Holtrop	fd89c5c6b3	Add Vim syntax highlighting files for Propane	2024-05-26 14:49:30 -04:00
Josh Holtrop	1468946735	v1.4.0	2024-05-11 11:46:28 -04:00
Josh Holtrop	2bccf3303e	Update CHANGELOG	2024-05-09 17:38:18 -04:00
Josh Holtrop	0d1ee74ca6	Give a better error message when a referenced ptype has not been declared	2024-05-09 17:35:27 -04:00
Josh Holtrop	985b180f62	Update CHANGELOG	2024-05-09 11:56:44 -04:00
Josh Holtrop	f3e4941ad8	Allow rule terms to be marked as optional	2024-05-09 11:56:13 -04:00
Josh Holtrop	494afb7307	Allow specifying the start rule name	2024-05-05 12:39:00 -04:00
Josh Holtrop	508dabe760	Update CHANGELOG for v1.4.0	2024-05-04 21:49:13 -04:00
Josh Holtrop	153f9d28f8	Allow user to specify AST node prefix or suffix Add ast_prefix and ast_suffix grammar statements.	2024-05-04 21:49:13 -04:00
Josh Holtrop	d0f542cbd7	v1.3.0	2024-04-23 00:31:56 -04:00
Josh Holtrop	786c78b635	Update CHANGELOG for v1.3.0	2024-04-23 00:21:28 -04:00
Josh Holtrop	f0bd8d8663	Add documentation for AST generation mode - close #22	2024-04-23 00:15:19 -04:00
Josh Holtrop	c7a18ef821	Add AST node field name with no suffix when unique - #22	2024-04-22 21:50:26 -04:00
Josh Holtrop	cb06a56f81	Add AST generation - #22	2024-04-22 20:51:27 -04:00
Josh Holtrop	2b28ef622d	Add specs to fully cover cli.rb	2024-04-06 14:37:15 -04:00
Josh Holtrop	19c32b58dc	Fix README example grammar	2024-04-06 14:16:27 -04:00
Josh Holtrop	3a8dcac55f	v1.2.0	2024-04-02 21:42:33 -04:00
Josh Holtrop	632ab2fe6f	Update CHANGELOG for v1.2.0	2024-04-02 21:42:18 -04:00
Josh Holtrop	3eaf0d3d49	allow one line user code blocks - close #21	2024-04-02 17:44:15 -04:00
Josh Holtrop	918dc7b2bb	fix generator hang when state transition cycle is present - close #20	2024-04-02 14:27:08 -04:00
Josh Holtrop	5b2cbe53e6	Add backslash escape codes - close #19	2024-03-29 16:45:54 -04:00
Josh Holtrop	1d1590dfda	Add API to access unexpected token found - close #18	2024-03-29 15:58:56 -04:00
Josh Holtrop	1c91dcd298	Add token_names API - close #17	2024-03-29 15:02:01 -04:00
Josh Holtrop	5dfd62b756	Add D example to user guide for p_context_init() - close #16	2024-03-29 13:52:16 -04:00
Josh Holtrop	fad7f4fb36	Allow user termination from lexer code blocks - close #15	2024-03-29 13:45:08 -04:00
Josh Holtrop	d55c5e0080	Update CHANGELOG for v1.1.0	2024-01-07 17:48:47 -05:00
Josh Holtrop	6c847c05b1	v1.1.0	2024-01-07 17:43:06 -05:00
Josh Holtrop	a5800575c8	Document generated API in user guide - close #14	2024-01-05 20:47:22 -05:00
Josh Holtrop	24af3590d1	Allow user to terminate the parser - close #13	2024-01-03 22:32:10 -05:00
Josh Holtrop	92c76b74c8	Update license year	2024-01-03 20:05:46 -05:00
Josh Holtrop	a032ac027c	Compilation warning for unreachable statement - close #12	2023-10-21 16:04:15 -04:00
				`@ -0,0 +1 @@`
				`au BufNewFile,BufRead *.propane set filetype=propane`