Why LEMON uses precedence to hide/omit reduce/reduce conflicts ?
(1) By Domingo (mingodad) on 2023-11-17 13:48:25 [source]
While converting this grammar https://github.com/diku-dk/futhark/blob/master/src/Language/Futhark/Parser/Parser.y to be used in https://mingodad.github.io/parsertl-playground/playground/ I found that bison/byacc/kmyacc reports 35 reduce/reduce conflicts but lemon reports none. The same happens with the sqlite3 grammar where lemon reports 0 conflicts but bison/byacc/kmyacc reports 52 reduce/reduce conflicts. Looking through the code I can see that LEMON uses precedence like it does with shift/reduce conflicts but bison/byacc/kmyacc doesn't. ======== }else if( apx->type==REDUCE && apy->type==REDUCE ){ spx = apx->x.rp->precsym; spy = apy->x.rp->precsym; if( spx==0 || spy==0 || spx->prec<0 || spy->prec<0 || spx->prec==spy->prec ){ apy->type = RRCONFLICT; errcnt++; }else if( spx->prec>spy->prec ){ apy->type = RD_RESOLVED; }else if( spx->prec<spy->prec ){ apx->type = RD_RESOLVED; } }else{ ======== What's the rationality behind this decision ? People using LEMON should be aware of this behavior because it masks serious grammar conflicts. Cheers !
(2) By Richard Hipp (drh) on 2023-11-17 13:59:34 in reply to 1 [link] [source]
Why shouldn't Lemon use precedence to resolve reduce/reduce conflicts? Seems like a feature to me. Maybe a better question is why Bison/YACC doesn't do this.
(3) By Domingo (mingodad) on 2023-11-17 14:04:26 in reply to 2 [link] [source]
Because we can be using precedence/associativity to resolve shift/reduce conflicts and by accident also omitting/hiding reduce/reduce conflicts in other parts of the grammar without knowing it.
(4) By Larry Brasfield (larrybr) on 2023-11-17 14:06:29 in reply to 1 [link] [source]
The last paragraph in section "Precedence Rules" here mentions this behavior without rationalizing it.
Saying that this implicit but documented precedence "masks serious grammar conflicts" is a bit of a stretch. I agree that it is good for ambiguity in the grammar to be made visible, but with this resolution known and relied upon, it is not an ambiguity. So "serious grammar conflict" is a misdiagnosis.
(5) By Richard Hipp (drh) on 2023-11-17 14:08:02 in reply to 3 [link] [source]
I'm sorry you don't like the feature. I like it. The code is open source. You are welcomed to fork it and make it more restrictive.
(6) By Domingo (mingodad) on 2023-11-17 14:31:34 in reply to 5 [link] [source]
Sorry if I was not clear ! I do understand that LEMON is open source and I can make with it anything I want, thank you for that ! I'm just trying to understand the rationality behind it. For example here is what bison reports: ======== bison -v -Wcounterexamples sqlite3.g.y sqlite3.g.y: warning: 52 reduce/reduce conflicts [-Wconflicts-rr] sqlite3.g.y: warning: reduce/reduce conflict on tokens SEMI, BEGIN, END, RP, AS, COMMA, ASC, DESC, ID, DO, KEY, OFFSET, ROWS, NULLS, FOLLOWING, PRECEDING, RANGE, GROUPS, STRING, JOIN_KW, AUTOINCR, UNION, EXCEPT, INTERSECT, FROM, JOIN, ORDER, GROUP, HAVING, LIMIT, WHERE, RETURNING, WHEN, THEN, ELSE, WINDOW, OR, AND, NOT, IS, MATCH, LIKE_KW, BETWEEN, IN, ISNULL, NOTNULL, NE, EQ, ESCAPE, ON [-Wcounterexamples] Example: expr IS NOT expr • First reduce derivation expr ↳ 295: expr IS NOT expr • Second reduce derivation expr ↳ 294: expr IS expr ↳ 298: NOT expr • sqlite3.g.y: warning: reduce/reduce conflict on tokens OR, AND [-Wcounterexamples] Example: expr between_op expr AND expr • AND expr First reduce derivation expr ↳ 303: expr between_op expr AND expr ↳ 271: expr AND expr • Second reduce derivation expr ↳ 271: expr AND expr ↳ 303: expr between_op expr AND expr • ====== And with a modified LEMON with an option to omit precedence resolution for reduce/reduce conflicts I'm getting this with sqlite3 grammar: ====== State 131: expr ::= expr * COLLATE ID|STRING ... expr ::= expr * in_op nm dbnm paren_exprlist OR reduce 195 expr ::= expr AND expr OR reduce 218 (expr) ** Parsing conflict ** AND reduce 195 expr ::= expr AND expr AND reduce 218 (expr) ** Parsing conflict ** ... State 135: expr ::= expr * COLLATE ID|STRING ... expr ::= expr * in_op nm dbnm paren_exprlist SEMI reduce 212 (expr) ** Parsing conflict ** BEGIN reduce 212 (expr) ** Parsing conflict ** END reduce 212 (expr) ** Parsing conflict ** NOT reduce 212 (expr) ** Parsing conflict ** RP reduce 212 (expr) ** Parsing conflict ** AS reduce 212 (expr) ** Parsing conflict ** COMMA reduce 212 (expr) ** Parsing conflict ** ASC reduce 212 (expr) ** Parsing conflict ** DESC reduce 212 (expr) ** Parsing conflict ** OR reduce 212 (expr) ** Parsing conflict ** AND reduce 212 (expr) ** Parsing conflict ** IS reduce 212 (expr) ** Parsing conflict ** MATCH reduce 212 (expr) ** Parsing conflict ** LIKE_KW reduce 212 (expr) ** Parsing conflict ** BETWEEN reduce 212 (expr) ** Parsing conflict ** IN reduce 212 (expr) ** Parsing conflict ** ISNULL reduce 212 (expr) ** Parsing conflict ** NOTNULL reduce 212 (expr) ** Parsing conflict ** NE reduce 212 (expr) ** Parsing conflict ** EQ reduce 212 (expr) ** Parsing conflict ** ... ESCAPE reduce 212 (expr) ** Parsing conflict ** ID reduce 212 (expr) ** Parsing conflict ** DO reduce 212 (expr) ** Parsing conflict ** KEY reduce 212 (expr) ** Parsing conflict ** OFFSET reduce 212 (expr) ** Parsing conflict ** ROWS reduce 212 (expr) ** Parsing conflict ** NULLS reduce 212 (expr) ** Parsing conflict ** FOLLOWING reduce 212 (expr) ** Parsing conflict ** PRECEDING reduce 212 (expr) ** Parsing conflict ** RANGE reduce 212 (expr) ** Parsing conflict ** GROUPS reduce 212 (expr) ** Parsing conflict ** ... ON reduce 212 (expr) ** Parsing conflict ** STRING reduce 212 (expr) ** Parsing conflict ** JOIN_KW reduce 212 (expr) ** Parsing conflict ** AUTOINCR reduce 212 (expr) ** Parsing conflict ** UNION reduce 212 (expr) ** Parsing conflict ** EXCEPT reduce 212 (expr) ** Parsing conflict ** INTERSECT reduce 212 (expr) ** Parsing conflict ** FROM reduce 212 (expr) ** Parsing conflict ** JOIN reduce 212 (expr) ** Parsing conflict ** ORDER reduce 212 (expr) ** Parsing conflict ** GROUP reduce 212 (expr) ** Parsing conflict ** HAVING reduce 212 (expr) ** Parsing conflict ** LIMIT reduce 212 (expr) ** Parsing conflict ** WHERE reduce 212 (expr) ** Parsing conflict ** RETURNING reduce 212 (expr) ** Parsing conflict ** WHEN reduce 212 (expr) ** Parsing conflict ** THEN reduce 212 (expr) ** Parsing conflict ** ELSE reduce 212 (expr) ** Parsing conflict ** WINDOW reduce 212 (expr) ** Parsing conflict ** ... ======= Again thank you for your great work and dedication !
(7) By Richard Hipp (drh) on 2023-11-17 14:45:09 in reply to 6 [link] [source]
Yes, the SQLite grammar requires the ability to resolve reduce/reduce conflicts using precedence.
Yes, the developers of Bison believe that resolving reduce/reduce conflicts using precedence is either a bad idea or else is not worth their effort to implement or perhaps they just never thought of doing that. Lemon does a lot of things that Bison cannot do, which is an important reason why SQLite uses Lemon and not Bison.
Yes, the SQLite grammar as currently written will not compile without the ability to resolve reduce/reduce conflicts using precedence rules.
Yes, that means that you cannot easily adapt the SQLite grammar for use with Bison. You'll have to rewrite the grammar so that it does not contain reduce/reduce conflicts. That will not be simple.
The rationale for allowing Lemon to resolve reduce/reduce conflicts using precedence is because it allows the SQLite grammar to generate a very fast LALR(1) parser that is easier to understand and maintain than it would be if we were forced to manually resolve those conflicts.
(8.3) By Rico Mariani (rmariani) on 2023-11-19 23:25:26 edited from 8.2 in reply to 7 [link] [source]
Domingo asked me to comment but I'm not sure he's gonna like what I have to say. :)
Reduce/reduce conflicts have to be resolved because you don't want have a tool that creates a parser with an ambiguous parse. Everyone agrees on that. However, it's entirely normal to use precedence to economically and unambiguously resolve conflicts. Even the earliest versions of yacc had that these features. [it's become clear to me that it's only shift/reduce that yacc targetted and this is basically arbitrary]
I suppose it's possible to write impenetrable precedence rules but that isn't usually how this works out.
When I did the SQL grammar for my CQL I ran into the same "Between x and y" ambiguity as everyone else. I chose to resolve that one with a subset of expressions that may appear in a between expression. That is I resolved it with grammar instead of precedence. There is no law that says that's the once choice I just thought it was the most economical in my context.
For those who aren't familiar the ambiguity is this:
A between B and C and D
could mean A between (B and C) and D
or A between B and (C and D)
or, and this one is interesting, (A between B and C) and D
This isn't very different from
A + B * C
could mean (A + B) * C
or A + (B * C)
. Yacc doesn't know or care about normal conventions, someone has to tell it.
Precedence and desired associativity clear these up very conveniently in many cases, these are not tools to be shunned.
While it's always possible to choose more productions (e.g. creating non-terminals for factors, terms, logical ops, etc.) this is rarely yields the best economy or clarity.
Additionally an ambiguous reduction might become unambiguous with more lookahead. That doesn't apply here, but different engine choices lead to different issues. A more powerful parser (e.g. LRK(n)) might allow more expressiveness. Lemon offers many useful lexical and grammatical features that Bison lacks -- these serve to keep the grammar simpler and hence clearer for users and easier to maintain for developers. CQL's bison grammar for the language does parse very quickly, because it's only LALR(1), but it suffers from unfortunate rule duplication to win that speed. To say nothing of the compromises I had to make in the lexer dealing with multi-word tokens.
Anyway all of this to say, nobody in the languages biz shuns precedence as a tool. Every interesting grammar tool I've used these 40 years has it and every language developer I know uses it. While it is provably true that you can do without such features, you don't get better code or better clarity for the effort. You are certainly not more likely to have reductions happening in an order other than what you intended because of precedence.
What Domingo is trying to do here, namely port a number of popular grammars to a common infra, is very cool, but it's super hard. Grammars almost always take strong dependence on their toolchain. It could hardly be otherwise. Moving away from the toolchain leaves you having to recreate its benefits in some other kind of way. No easy task.
- I think almost all of my examples were shift/reduce rather than reduce/reduce which isn't helping sorry about that.