SQLite Forum

lemon cannot shift-reduce errors
Login

lemon cannot shift-reduce errors

(1) By terpstra on 2021-08-17 18:40:30 [link] [source]

I have been trying to use lemon with the 'error' keyword to enable error recovery, but I've run into something that might be a bug.

TLDR: Are reduction rules allowed to END with 'error'? If not, disregard the rest of this post.

Given a grammar like this:

    start ::= top.
    top ::= .
    top ::= top error.
    top ::= top topdef.
    ... a bunch of legal topdefs

When the generated parser encounters an error, the loop:

        while( yypParser->yytos >= yypParser->yystack
            && (yyact = yy_find_reduce_action(
                        yypParser->yytos->stateno,
                        YYERRORSYMBOL)) > YY_MAX_SHIFTREDUCE
        ){
          /* added for debug by me: */ if (yyact >= YY_MIN_REDUCE) printf("CONSIDERING %d: %s\n", yyact - YY_MIN_REDUCE, yyRuleName[yyact-YY_MIN_REDUCE]);
unwinds the stack until it gets an assertion failure due to popping the end of the stack:
    Return. Stack=[top global export KW_DEF]
    Input 'KW_DATA' in state 21
    Syntax Error!
    foo.wake:133:[5-8]: syntax error; found 'data', but was expecting one of:
        if    \    (    subscribe    prim    match
    Popping KW_DEF
    Popping export
    CONSIDERING 41: export ::=
    Popping global
    CONSIDERING 2: top ::= top error
    Popping top
    CONSIDERING 1: top ::=
    parser: parser.c:2096: void yy_pop_parser_stack(yyParser*): Assertion `pParser->yytos > pParser->yystack' failed.
    Aborted

As you can see in the trace of 'yyact', it DID consider the rule top ::= top error., but the number it had assigned to this rule was in the range of [YY_MIN_REDUCE, YY_MAX_REDUCE) instead of [YY_MIN_SHIFTREDUCE, YY_MAX_SHIFTREDUCE). Thus, the loop popped past the point it could have shift-reduced. This led me to the work-around of not having 'error' as the last term in the error-handling rule, and that seems to have worked, except that now I cannot tolerate an error at the end of the input, which is quite unfortunate, as people often edit files at the bottom.

I've also noticed that when I try to create a minimal grammar reproducing this behavior, the bug(?) goes away and the reduction works. When I use the '-c' option, the bug(?) also goes away and the error reduction works.

I'm not sure where to post the full grammar that causes this issue. I did include it attached to the original email I sent.

(2) By Richard Hipp (drh) on 2021-08-17 19:11:39 in reply to 1 updated by 2.1 [source]

Assertion fault repro case:

> ~~~~
%token_prefix TOKEN_
start ::= top.
top ::= .
top ::= top error.
top ::= top topdef.
topdef ::= ONE.
topdef ::= TWO THREE.
%code {
  #include <stdlib.h>
  #include <string.h>
  int main(int argc, char **argv){
    int i;
    yyParser x;
    ParseInit(&x);
    ParseTrace(stdout, "parser: ");
    for(i=1; i<argc; i++){
      if( strcmp(argv[i],"one")==0 )        Parse(&x, TOKEN_ONE, 0);
      else if( strcmp(argv[i],"two")==0 )   Parse(&x, TOKEN_TWO, 0);
      else if( strcmp(argv[i],"three")==0 ) Parse(&x, TOKEN_THREE, 0);
      else {
        fprintf(stderr, "unknown token: \"%s\"\n", argv[i]);
        exit(1);
      }
    }
    Parse(&x, 0, 0);
    ParseFinalize(&x);
    return 0;
  }
}
~~~~

If the file above is named "test1.y" then compile and run as follows:

>    ./lemon test1.y && gcc test1.c && ./a.out three

(2.1) By Richard Hipp (drh) on 2021-08-17 20:00:21 edited from 2.0 in reply to 1 [link] [source]

Assertion fault repro case:

%token_prefix TOKEN_
start ::= top.
top ::= .
top ::= top error.
top ::= top topdef.
topdef ::= ONE.
topdef ::= TWO THREE.
%code {
  #include <stdlib.h>
  #include <string.h>
  int main(int argc, char **argv){
    int i;
    yyParser x;
    ParseInit(&x);
    ParseTrace(stdout, "parser: ");
    for(i=1; i<argc; i++){
      if( strcmp(argv[i],"one")==0 )        Parse(&x, TOKEN_ONE, 0);
      else if( strcmp(argv[i],"two")==0 )   Parse(&x, TOKEN_TWO, 0);
      else if( strcmp(argv[i],"three")==0 ) Parse(&x, TOKEN_THREE, 0);
      else {
        fprintf(stderr, "unknown token: \"%s\"\n", argv[i]);
        exit(1);
      }
    }
    Parse(&x, 0, 0);
    ParseFinalize(&x);
    return 0;
  }
}

If the file above is named "test1.y" then compile and run as follows:

./lemon test1.y && gcc test1.c && ./a.out three

Update: Fix for the assertion fault is at check-in 7cca80808cef192f

(3) By terpstra on 2021-08-18 05:07:01 in reply to 2.1 [link] [source]

With that fix applied, AFAICT lemon still does not run the error reduction rule..?

(4) By terpstra on 2021-08-18 19:59:33 in reply to 3 [link] [source]

@drh

This seems to fix it for me:

--- ../sqlite/tool/lempar.c	2021-08-17 13:36:38.275642753 -0700
+++ lempar.c	2021-08-18 12:58:41.840112123 -0700
@@ -986,6 +986,10 @@
                         yypParser->yytos->stateno,
                         YYERRORSYMBOL)) > YY_MAX_SHIFTREDUCE
         ){
+          if (yyact >= YY_MIN_REDUCE && yyRuleInfoNRhs[yyact-YY_MIN_REDUCE] < 0) {
+            yyact -= YY_MIN_REDUCE - YY_MIN_SHIFTREDUCE;
+            break;
+          }
           yy_pop_parser_stack(yypParser);
         }
         if( yypParser->yytos <= yypParser->yystack || yymajor==0 ){

(5) By terpstra on 2021-08-20 10:45:13 in reply to 4 [link] [source]

My fix also appears to work on your test case. Running with just the assertion-fix has a 'Fail!' in it:

parser: Input 'THREE' in state 0
parser: Reduce 1 [top ::=] without external action.
parser: ... then shift 'top', go to state 1
parser: Syntax Error!
parser: Popping top
parser: Fail!
parser: Return. Stack=]
parser: Input '$' in state 0
parser: Reduce 1 [top ::=] without external action.
parser: ... then shift 'top', go to state 1
parser: Reduce 0 [start ::= top] without external action, pop back to state 0.
parser: ... then shift 'start', pending reduce -2
parser: Accept!

With both fixes applied:

parser: Input 'THREE' in state 0
parser: Reduce 1 [top ::=] without external action.
parser: ... then shift 'top', go to state 1
parser: Syntax Error!
parser: Shift 'error', pending reduce 2
parser: Reduce 2 [top ::= top error] without external action, pop back to state 0.
parser: ... then shift 'top', go to state 1
parser: Syntax Error!
parser: Discard input token THREE
parser: Return. Stack=[top]
parser: Input '$' in state 1
parser: Reduce 0 [start ::= top] without external action, pop back to state 0.
parser: ... then shift 'start', pending reduce -2
parser: Accept!

(6) By Richard Hipp (drh) on 2021-08-20 19:51:48 in reply to 4 [link] [source]

Similar checked into trunk. Thanks for the suggestion.

(7) By terpstra on 2021-08-26 18:10:43 in reply to 6 updated by 7.1 [link] [source]

Unfortunately, this fix does not work completely.

The bug appears to actually be inside lemon.c, not lempar.c.

When trying to shift an error, you can have three cases:
  (1) it is legal to shift an error
  (2) it is legal to shift-reduce an error
  (3) it is legal to reduce

In my grammar, (3) only ever appears as the default action for a state. I don't know if that's true in general.

The root of the problem IMO is that lemon.c writes yy_action entries for (2) as though they were (3).
lemon.c writes error shift actions (1) correctly. 

The previously applied fix just detects a reduction (3) in the table and replaces it with a shift-reduction (2). Unfortunately, if you actually have case (3), you now incorrectly shift an error and then reduce. This can lead to a mismatch between the current state and the stack, which in the case of my default reduction actions, leads to an infinite loop.

So, the correct fix is probably to revert this last fix and find where lemon.c writes the erroneous entry.

Also, there's this interesting comment in the code:
```
  /* There are no SHIFTREDUCE actions on nonterminals because the table
  ** generator has simplified them to pure REDUCE actions. */
  assert( !(yyact>YY_MAX_SHIFT && yyact<=YY_MAX_SHIFTREDUCE) );
```

... but it is exactly this claim which is false for the case of the 'error' non-terminal.

(7.1) By terpstra on 2021-08-26 18:11:08 edited from 7.0 in reply to 6 [link] [source]

Unfortunately, this fix does not work completely.

The bug appears to actually be inside lemon.c, not lempar.c.

When trying to shift an error, you can have three cases:

  (1) it is legal to shift an error
  (2) it is legal to shift-reduce an error
  (3) it is legal to reduce

In my grammar, (3) only ever appears as the default action for a state. I don't know if that's true in general.

The root of the problem IMO is that lemon.c writes yy_action entries for (2) as though they were (3). lemon.c writes error shift actions (1) correctly.

The previously applied fix just detects a reduction (3) in the table and replaces it with a shift-reduction (2). Unfortunately, if you actually have case (3), you now incorrectly shift an error and then reduce. This can lead to a mismatch between the current state and the stack, which in the case of my default reduction actions, leads to an infinite loop.

So, the correct fix is probably to revert this last fix and find where lemon.c writes the erroneous entry.

Also, there's this interesting comment in the code:

  /* There are no SHIFTREDUCE actions on nonterminals because the table
  ** generator has simplified them to pure REDUCE actions. */
  assert( !(yyact>YY_MAX_SHIFT && yyact<=YY_MAX_SHIFTREDUCE) );

... but it is exactly this claim which is false for the case of the 'error' non-terminal.

(8) By terpstra on 2021-08-26 18:20:23 in reply to 7.1 [link] [source]

Also, I'll note that parser.out shows that lemon.c is aware of the distinction between cases (2) and (3), so the problem is probably strictly in the table generation:

                         KW_IF shift        26     
                         error shift-reduce 68     data_elts ::= error
                          type shift-reduce 69     data_elts ::= type
...
                         KW_IF shift        26     
                         error shift        162    
                       pattern shift        102    

(9) By terpstra on 2021-08-26 18:40:45 in reply to 8 [link] [source]

I've found it!

diff --git a/lemon/lemon.c b/lemon/lemon.c
index 75fc7aa..a35ef0a 100644
--- a/lemon/lemon.c
+++ b/lemon/lemon.c
@@ -3571,7 +3571,7 @@ PRIVATE int compute_action(struct lemon *lemp, struct action *ap)
       /* Since a SHIFT is inherient after a prior REDUCE, convert any
       ** SHIFTREDUCE action with a nonterminal on the LHS into a simple
       ** REDUCE action: */
-      if( ap->sp->index>=lemp->nterminal ){
+      if( ap->sp->index>lemp->nterminal){
         act = lemp->minReduce + ap->x.rp->iRule;
       }else{
         act = lemp->minShiftReduce + ap->x.rp->iRule;

(10) By terpstra on 2021-08-26 18:45:34 in reply to 9 [link] [source]

I should stop posting the moment I discover something. That fix alone is not enough.

The web interface makes me act like I'm writing some sort of journal. I'll stop posting until I have a complete fix.

Sorry for the spam.

(11) By terpstra on 2021-08-26 19:05:01 in reply to 10 [link] [source]

Ok. Here's my tested and proposed change:

diff --git a/tool/lemon.c b/tool/lemon.c
index 75fc7aa2f..1bcb3778a 100644
--- a/tool/lemon.c
+++ b/tool/lemon.c
@@ -3571,7 +3571,7 @@ PRIVATE int compute_action(struct lemon *lemp, struct action *ap)
       /* Since a SHIFT is inherient after a prior REDUCE, convert any
       ** SHIFTREDUCE action with a nonterminal on the LHS into a simple
       ** REDUCE action: */
-      if( ap->sp->index>=lemp->nterminal ){
+      if( ap->sp->index>=lemp->nterminal && ( !lemp->errsym || ap->sp->index!=lemp->errsym->index )){
         act = lemp->minReduce + ap->x.rp->iRule;
       }else{
         act = lemp->minShiftReduce + ap->x.rp->iRule;
diff --git a/tool/lempar.c b/tool/lempar.c
index bbb0cc367..d5ebe6942 100644
--- a/tool/lempar.c
+++ b/tool/lempar.c
@@ -985,10 +985,6 @@ void Parse(
           yyact = yy_find_reduce_action(yypParser->yytos->stateno,
                                         YYERRORSYMBOL);
           if( yyact<=YY_MAX_SHIFTREDUCE ) break;
-          if( yyact>=YY_MIN_REDUCE && yyRuleInfoNRhs[yyact-YY_MIN_REDUCE] ){
-            yyact -= YY_MIN_REDUCE - YY_MIN_SHIFTREDUCE;
-            break;
-          }
           yy_pop_parser_stack(yypParser);
         }
         if( yypParser->yytos <= yypParser->yystack || yymajor==0 ){

This fixes the test1.y example (ie: the parser accepts 'three'). It also does not mess up when presented with a true reduction in my larger grammar.