SQLite Forum

Timeline
Login

16 forum posts by user terpstra

2021-08-26
19:05 Reply: lemon cannot shift-reduce errors (artifact: e680f42f53 user: terpstra)

Ok. Here's my tested and proposed change:

diff --git a/tool/lemon.c b/tool/lemon.c
index 75fc7aa2f..1bcb3778a 100644
--- a/tool/lemon.c
+++ b/tool/lemon.c
@@ -3571,7 +3571,7 @@ PRIVATE int compute_action(struct lemon *lemp, struct action *ap)
       /* Since a SHIFT is inherient after a prior REDUCE, convert any
       ** SHIFTREDUCE action with a nonterminal on the LHS into a simple
       ** REDUCE action: */
-      if( ap->sp->index>=lemp->nterminal ){
+      if( ap->sp->index>=lemp->nterminal && ( !lemp->errsym || ap->sp->index!=lemp->errsym->index )){
         act = lemp->minReduce + ap->x.rp->iRule;
       }else{
         act = lemp->minShiftReduce + ap->x.rp->iRule;
diff --git a/tool/lempar.c b/tool/lempar.c
index bbb0cc367..d5ebe6942 100644
--- a/tool/lempar.c
+++ b/tool/lempar.c
@@ -985,10 +985,6 @@ void Parse(
           yyact = yy_find_reduce_action(yypParser->yytos->stateno,
                                         YYERRORSYMBOL);
           if( yyact<=YY_MAX_SHIFTREDUCE ) break;
-          if( yyact>=YY_MIN_REDUCE && yyRuleInfoNRhs[yyact-YY_MIN_REDUCE] ){
-            yyact -= YY_MIN_REDUCE - YY_MIN_SHIFTREDUCE;
-            break;
-          }
           yy_pop_parser_stack(yypParser);
         }
         if( yypParser->yytos <= yypParser->yystack || yymajor==0 ){

This fixes the test1.y example (ie: the parser accepts 'three'). It also does not mess up when presented with a true reduction in my larger grammar.

18:45 Reply: lemon cannot shift-reduce errors (artifact: 20487aec24 user: terpstra)

I should stop posting the moment I discover something. That fix alone is not enough.

The web interface makes me act like I'm writing some sort of journal. I'll stop posting until I have a complete fix.

Sorry for the spam.

18:40 Reply: lemon cannot shift-reduce errors (artifact: 41cf387e59 user: terpstra)

I've found it!

diff --git a/lemon/lemon.c b/lemon/lemon.c
index 75fc7aa..a35ef0a 100644
--- a/lemon/lemon.c
+++ b/lemon/lemon.c
@@ -3571,7 +3571,7 @@ PRIVATE int compute_action(struct lemon *lemp, struct action *ap)
       /* Since a SHIFT is inherient after a prior REDUCE, convert any
       ** SHIFTREDUCE action with a nonterminal on the LHS into a simple
       ** REDUCE action: */
-      if( ap->sp->index>=lemp->nterminal ){
+      if( ap->sp->index>lemp->nterminal){
         act = lemp->minReduce + ap->x.rp->iRule;
       }else{
         act = lemp->minShiftReduce + ap->x.rp->iRule;
18:20 Reply: lemon cannot shift-reduce errors (artifact: a3264e97f4 user: terpstra)

Also, I'll note that parser.out shows that lemon.c is aware of the distinction between cases (2) and (3), so the problem is probably strictly in the table generation:

                         KW_IF shift        26     
                         error shift-reduce 68     data_elts ::= error
                          type shift-reduce 69     data_elts ::= type
...
                         KW_IF shift        26     
                         error shift        162    
                       pattern shift        102    
18:11 Edit reply: lemon cannot shift-reduce errors (artifact: 77bb75f85c user: terpstra)

Unfortunately, this fix does not work completely.

The bug appears to actually be inside lemon.c, not lempar.c.

When trying to shift an error, you can have three cases:

  (1) it is legal to shift an error
  (2) it is legal to shift-reduce an error
  (3) it is legal to reduce

In my grammar, (3) only ever appears as the default action for a state. I don't know if that's true in general.

The root of the problem IMO is that lemon.c writes yy_action entries for (2) as though they were (3). lemon.c writes error shift actions (1) correctly.

The previously applied fix just detects a reduction (3) in the table and replaces it with a shift-reduction (2). Unfortunately, if you actually have case (3), you now incorrectly shift an error and then reduce. This can lead to a mismatch between the current state and the stack, which in the case of my default reduction actions, leads to an infinite loop.

So, the correct fix is probably to revert this last fix and find where lemon.c writes the erroneous entry.

Also, there's this interesting comment in the code:

  /* There are no SHIFTREDUCE actions on nonterminals because the table
  ** generator has simplified them to pure REDUCE actions. */
  assert( !(yyact>YY_MAX_SHIFT && yyact<=YY_MAX_SHIFTREDUCE) );

... but it is exactly this claim which is false for the case of the 'error' non-terminal.

18:10 Reply: lemon cannot shift-reduce errors (artifact: 36a58aa7d7 user: terpstra)

Unfortunately, this fix does not work completely.

The bug appears to actually be inside lemon.c, not lempar.c.

When trying to shift an error, you can have three cases: (1) it is legal to shift an error (2) it is legal to shift-reduce an error (3) it is legal to reduce

In my grammar, (3) only ever appears as the default action for a state. I don't know if that's true in general.

The root of the problem IMO is that lemon.c writes yy_action entries for (2) as though they were (3). lemon.c writes error shift actions (1) correctly.

The previously applied fix just detects a reduction (3) in the table and replaces it with a shift-reduction (2). Unfortunately, if you actually have case (3), you now incorrectly shift an error and then reduce. This can lead to a mismatch between the current state and the stack, which in the case of my default reduction actions, leads to an infinite loop.

So, the correct fix is probably to revert this last fix and find where lemon.c writes the erroneous entry.

Also, there's this interesting comment in the code:

  /* There are no SHIFTREDUCE actions on nonterminals because the table
  ** generator has simplified them to pure REDUCE actions. */
  assert( !(yyact>YY_MAX_SHIFT && yyact<=YY_MAX_SHIFTREDUCE) );

... but it is exactly this claim which is false for the case of the 'error' non-terminal.

2021-08-21
07:42 Post: lemon.out is great (artifact: b51e88a012 user: terpstra)

I've been playing around with lemon now for a while and I wanted to give a shout out to the best two features of lemon:

  • lemon.out has a very great description of the state machine
  • ParseTrace is really excellent for debugging

Thank you so much. These tools really made it trivial to diagnose my parser and focus on getting the actual work done.

2021-08-20
10:45 Reply: lemon cannot shift-reduce errors (artifact: 5bc109093f user: terpstra)

My fix also appears to work on your test case. Running with just the assertion-fix has a 'Fail!' in it:

parser: Input 'THREE' in state 0
parser: Reduce 1 [top ::=] without external action.
parser: ... then shift 'top', go to state 1
parser: Syntax Error!
parser: Popping top
parser: Fail!
parser: Return. Stack=]
parser: Input '$' in state 0
parser: Reduce 1 [top ::=] without external action.
parser: ... then shift 'top', go to state 1
parser: Reduce 0 [start ::= top] without external action, pop back to state 0.
parser: ... then shift 'start', pending reduce -2
parser: Accept!

With both fixes applied:

parser: Input 'THREE' in state 0
parser: Reduce 1 [top ::=] without external action.
parser: ... then shift 'top', go to state 1
parser: Syntax Error!
parser: Shift 'error', pending reduce 2
parser: Reduce 2 [top ::= top error] without external action, pop back to state 0.
parser: ... then shift 'top', go to state 1
parser: Syntax Error!
parser: Discard input token THREE
parser: Return. Stack=[top]
parser: Input '$' in state 1
parser: Reduce 0 [start ::= top] without external action, pop back to state 0.
parser: ... then shift 'start', pending reduce -2
parser: Accept!
2021-08-18
19:59 Reply: lemon cannot shift-reduce errors (artifact: 2f468f43cb user: terpstra)

@drh

This seems to fix it for me:

--- ../sqlite/tool/lempar.c	2021-08-17 13:36:38.275642753 -0700
+++ lempar.c	2021-08-18 12:58:41.840112123 -0700
@@ -986,6 +986,10 @@
                         yypParser->yytos->stateno,
                         YYERRORSYMBOL)) > YY_MAX_SHIFTREDUCE
         ){
+          if (yyact >= YY_MIN_REDUCE && yyRuleInfoNRhs[yyact-YY_MIN_REDUCE] < 0) {
+            yyact -= YY_MIN_REDUCE - YY_MIN_SHIFTREDUCE;
+            break;
+          }
           yy_pop_parser_stack(yypParser);
         }
         if( yypParser->yytos <= yypParser->yystack || yymajor==0 ){

05:07 Reply: lemon cannot shift-reduce errors (artifact: 7aea393261 user: terpstra)

With that fix applied, AFAICT lemon still does not run the error reduction rule..?

2021-08-17
18:55 Post: lemon: optimize empty reduction chains (artifact: a642109043 user: terpstra)

When one needs precise control over the precedence rules, you can end up with a bunch of rules like this:

    top ::= expr_inequal.

    expr_inequal ::= expr_ltgt.
    expr_inequal ::= expr_inequal EQ expr_ltgt.
    expr_inequal ::= expr_inequal NE expr_ltgt.

    expr_ltgt ::= expr_sum.
    expr_ltgt ::= expr_ltgt LT expr_sum.
    expr_ltgt ::= expr_ltgt GT expr_sum.

    expr_sum ::= expr_mul.
    expr_sum ::= expr_sum PLUS expr_mul.
    expr_sum ::= expr_sum MINUS expr_mul.

    expr_mul ::= expr_exp.
    expr_mul ::= expr_mul MUL expr_exp.

    expr_exp ::= expr_term.
    expr_exp ::= expr_term EXP expr_exp. // right-associative

    expr_term ::= ID.
    expr_term ::= OPEN expr_inequal CLOSE.

This works fine, but if you have an expression like "a != b", to get from ID(a) to an expr_inequal you have to run 6 no-op reductions before you can shift the NE token. In a language with more precedence levels, this cost starts to add up. As long as the reduction rules have a matching type and no code inside {}s, this multi-level reduction is a complete waste of time. The parser could have gone from a stack of [ID] with a token NE and reduced directly to expr_inequal.

For the parser I'm working on, here is a trace to demonstrate the potential savings:

Return. Stack=[top global export KW_DEF pattern P_EQUALS INDENT blockdefs KW_DEF pattern P_EQUALS KW_MATCH expression_term INDENT pattern KW_IF expression_op_inequal expression_binary_app ID]
Input 'P_EQUALS' in state 135
Reduce 118 [expression_nodot ::= ID], pop back to state 71.
... then shift 'expression_nodot', go to state 134
Reduce 220 [expression_term ::= expression_nodot] without external action, pop back to state 71.
... then shift 'expression_term', go to state 142
Reduce 143 [expression_binary_app ::= expression_binary_app expression_term], pop back to state 52.
... then shift 'expression_binary_app', go to state 71
Reduce 195 [expression_unary_quant ::= expression_binary_app] without external action, pop back to state 52.
... then shift 'expression_unary_quant', go to state 166
Reduce 194 [expression_binary_quant ::= expression_unary_quant] without external action, pop back to state 52.
... then shift 'expression_binary_quant', go to state 89
Reduce 197 [expression_unary_exp ::= expression_binary_quant] without external action, pop back to state 52.
... then shift 'expression_unary_exp', go to state 90
Reduce 196 [expression_binary_exp ::= expression_unary_exp] without external action, pop back to state 52.
... then shift 'expression_binary_exp', go to state 173
Reduce 199 [expression_unary_muldiv ::= expression_binary_exp] without external action, pop back to state 52.
... then shift 'expression_unary_muldiv', go to state 172
Reduce 198 [expression_binary_muldiv ::= expression_unary_muldiv] without external action, pop back to state 52.
... then shift 'expression_binary_muldiv', go to state 91
Reduce 201 [expression_unary_addsub ::= expression_binary_muldiv] without external action, pop back to state 52.
... then shift 'expression_unary_addsub', go to state 178
Reduce 200 [expression_binary_addsub ::= expression_unary_addsub] without external action, pop back to state 52.
... then shift 'expression_binary_addsub', go to state 92
Reduce 203 [expression_unary_compare ::= expression_binary_addsub] without external action, pop back to state 52.
... then shift 'expression_unary_compare', go to state 182
Reduce 202 [expression_binary_compare ::= expression_unary_compare] without external action, pop back to state 52.
... then shift 'expression_binary_compare', go to state 94
Reduce 205 [expression_unary_inequal ::= expression_binary_compare] without external action, pop back to state 52.
... then shift 'expression_unary_inequal', go to state 186
Reduce 95 [expression_unary_inequal ::= expression_op_inequal expression_unary_inequal], pop back to state 39.
... then shift 'expression_unary_inequal', go to state 185
Reduce 204 [expression_binary_inequal ::= expression_unary_inequal] without external action, pop back to state 39.
... then shift 'expression_binary_inequal', go to state 96
Reduce 207 [expression_unary_and ::= expression_binary_inequal] without external action, pop back to state 39.
... then shift 'expression_unary_and', go to state 189
Reduce 206 [expression_binary_and ::= expression_unary_and] without external action, pop back to state 39.
... then shift 'expression_binary_and', go to state 98
Reduce 209 [expression_unary_or ::= expression_binary_and] without external action, pop back to state 39.
... then shift 'expression_unary_or', go to state 193
Reduce 208 [expression_binary_or ::= expression_unary_or] without external action, pop back to state 39.
... then shift 'expression_binary_or', go to state 99
Reduce 211 [expression_unary_dollar ::= expression_binary_or] without external action, pop back to state 39.
... then shift 'expression_unary_dollar', go to state 101
Reduce 210 [expression_binary_dollar ::= expression_unary_dollar] without external action, pop back to state 39.
... then shift 'expression_binary_dollar', go to state 201
Reduce 213 [expression_unary_colon ::= expression_binary_dollar] without external action, pop back to state 39.
... then shift 'expression_unary_colon', go to state 104
Reduce 212 [expression_binary_colon ::= expression_unary_colon] without external action, pop back to state 39.
... then shift 'expression_binary_colon', go to state 221
Reduce 215 [expression_unary_lrarrow ::= expression_binary_colon] without external action, pop back to state 39.
... then shift 'expression_unary_lrarrow', go to state 220
Reduce 214 [expression_binary_lrarrow ::= expression_unary_lrarrow] without external action, pop back to state 39.
... then shift 'expression_binary_lrarrow', go to state 105
Reduce 217 [expression_unary_eqarrow ::= expression_binary_lrarrow] without external action, pop back to state 39.
... then shift 'expression_unary_eqarrow', go to state 231
Reduce 216 [expression_binary_eqarrow ::= expression_unary_eqarrow] without external action, pop back to state 39.
... then shift 'expression_binary_eqarrow', go to state 106
Reduce 219 [expression_unary_comma ::= expression_binary_eqarrow] without external action, pop back to state 39.
... then shift 'expression_unary_comma', go to state 108
Reduce 218 [expression_binary_comma ::= expression_unary_comma] without external action, pop back to state 39.
... then shift 'expression_binary_comma', go to state 246
Reduce 222 [expression ::= expression_binary_comma] without external action, pop back to state 39.
... then shift 'expression', go to state 328
Reduce 162 [guard ::= KW_IF expression], pop back to state 116.
... then shift 'guard', go to state 342
Shift 'P_EQUALS', go to state 34
Return. Stack=[top global export KW_DEF pattern P_EQUALS INDENT blockdefs KW_DEF pattern P_EQUALS KW_MATCH expression_term INDENT pattern guard P_EQUALS]
18:40 Post: lemon cannot shift-reduce errors (artifact: bd91fd965c user: terpstra)

I have been trying to use lemon with the 'error' keyword to enable error recovery, but I've run into something that might be a bug.

TLDR: Are reduction rules allowed to END with 'error'? If not, disregard the rest of this post.

Given a grammar like this:

    start ::= top.
    top ::= .
    top ::= top error.
    top ::= top topdef.
    ... a bunch of legal topdefs

When the generated parser encounters an error, the loop:

        while( yypParser->yytos >= yypParser->yystack
            && (yyact = yy_find_reduce_action(
                        yypParser->yytos->stateno,
                        YYERRORSYMBOL)) > YY_MAX_SHIFTREDUCE
        ){
          /* added for debug by me: */ if (yyact >= YY_MIN_REDUCE) printf("CONSIDERING %d: %s\n", yyact - YY_MIN_REDUCE, yyRuleName[yyact-YY_MIN_REDUCE]);
unwinds the stack until it gets an assertion failure due to popping the end of the stack:
    Return. Stack=[top global export KW_DEF]
    Input 'KW_DATA' in state 21
    Syntax Error!
    foo.wake:133:[5-8]: syntax error; found 'data', but was expecting one of:
        if    \    (    subscribe    prim    match
    Popping KW_DEF
    Popping export
    CONSIDERING 41: export ::=
    Popping global
    CONSIDERING 2: top ::= top error
    Popping top
    CONSIDERING 1: top ::=
    parser: parser.c:2096: void yy_pop_parser_stack(yyParser*): Assertion `pParser->yytos > pParser->yystack' failed.
    Aborted

As you can see in the trace of 'yyact', it DID consider the rule top ::= top error., but the number it had assigned to this rule was in the range of [YY_MIN_REDUCE, YY_MAX_REDUCE) instead of [YY_MIN_SHIFTREDUCE, YY_MAX_SHIFTREDUCE). Thus, the loop popped past the point it could have shift-reduced. This led me to the work-around of not having 'error' as the last term in the error-handling rule, and that seems to have worked, except that now I cannot tolerate an error at the end of the input, which is quite unfortunate, as people often edit files at the bottom.

I've also noticed that when I try to create a minimal grammar reproducing this behavior, the bug(?) goes away and the reduction works. When I use the '-c' option, the bug(?) also goes away and the error reduction works.

I'm not sure where to post the full grammar that causes this issue. I did include it attached to the original email I sent.

2020-03-16
17:38 Reply: lemon stack limit (artifact: 8a4b3cae8c user: terpstra)

Awesome!

The lemon version installed on my system was 1.39. I just compiled the snapshot version and it indeed produced a realloc capable parser.

Many thanks.

15:49 Reply: lemon stack limit (artifact: fcc811f720 user: terpstra)

I just searched lemon.c from the latest sqlite snapshot tarball, and it has realloc nowhere inside a string. Also, the generated code does not have any calls to realloc. Are you sure this option exists?

I totally understand that having your own parser generator helped you. Having exactly the right tool for the job often does. Building your own tool essentially guarantees this property. :)

My current parser is a hand rolled recursive descent parser using precedence climbing. It slowly got larger over time and is now 1700 lines. The parser also mixes desugaring with parsing, which makes the code needlessly complex. I was hoping to refactor this into: proper grammar rules, a front-end AST, a verification stage, and a rewrite stage to desugar the AST. So I went looking for parser generators.

I believe my language is both LL(1) and LALR(1), but since the parser is hand rolled I couldn't verify this property easily. I wrote an initial description in lemon, and found it was indeed conflict free, so I am at least right about LALR(1).

I need right recursion amusingly exactly for the construct you quoted: lists. My language has right- and left- associative operators. The comma operator is right-associative, in order to act like a prepend 'cons' function. If I parsed it left-associative, the produced AST would be postpending symbols in the list. Yes, I could play games where I invert the associativity afterwards, but I hope you agree this should be done by the parser. The rest of the compiler has been structured to deal with very deep AST constructs, so I would lose arbitrarily deep list literals if I switched to a parser generator with a hard stack limit.

The other feature I am missing is the ability to stamp out copies of a set of very similar rules. My operator parsing rules take 30 lines in lemon. That's fine. However, there are 4 variants of those rules, with only minor differences based on whether I am parsing a type/expr/pattern/body. In the hand-rolled parser, the parser can consult a variable to decide whether or not to enable the special rules based on category. I could probably get all I need by using the C preprocessor to generate the lemon input, but that seems a bit crude.

06:35 Reply: lemon stack limit (artifact: 1817727d0e user: terpstra)

Yes, I know it can be increased, but in my situation, the only limit that would be appropriate is something proportional to the input file size. I will probably seek a different generator, given a few other limitations lemon has, but since I started experimenting with lemon first, I had hoped I could stick with it.

06:11 Post: lemon stack limit (artifact: bf99002e69 user: terpstra)

The lemon parser generator documentation says that it is maintained as part of the SQLlite project. I could not find an issue tracker for lemon, so I am posting here. If there is a more appropriate forum, please redirect me there.

My request: lemon has a fixed YYSTACKDEPTH depth. I have a grammar that uses a fairly deep right recursion. In my application, I'd prefer yystack to be a std::vector<> or equivalent. Is there already an option to remove the stack limit and if not, could it be added?

I could probably add this feature myself, but it's also not clear to me where I would submit a pull request for this project.