SQLite Forum

lemon stack limit
Login

lemon stack limit

(1) By terpstra on 2020-03-16 06:11:35 [link] [source]

The lemon parser generator documentation says that it is maintained as part of the SQLlite project. I could not find an issue tracker for lemon, so I am posting here. If there is a more appropriate forum, please redirect me there.

My request: lemon has a fixed YYSTACKDEPTH depth. I have a grammar that uses a fairly deep right recursion. In my application, I'd prefer yystack to be a std::vector<> or equivalent. Is there already an option to remove the stack limit and if not, could it be added?

I could probably add this feature myself, but it's also not clear to me where I would submit a pull request for this project.

(2) By Stephan Beal (stephan) on 2020-03-16 06:22:52 in reply to 1 [link] [source]

Lemon's stack size can be modified:

https://www.sqlite.org/src/doc/trunk/doc/lemon.html#stack_size

but it is still fixed at compile-time. std::vector is a C++ thing, and lemon is plain C. Making that value runtime-extendible would require also extending lemon to be able to use a client-specified allocator for that purpose. Its current interface is apparently limited to using a client-specified allocator for the parser state and for non-terminals. That's assuming that the stack structure is such that it could be extended at runtime (and i have no idea whether or not that's the case).

(3) By terpstra on 2020-03-16 06:35:46 in reply to 2 [link] [source]

Yes, I know it can be increased, but in my situation, the only limit that would be appropriate is something proportional to the input file size. I will probably seek a different generator, given a few other limitations lemon has, but since I started experimenting with lemon first, I had hoped I could stick with it.

(4) By Richard Hipp (drh) on 2020-03-16 10:57:37 in reply to 1 [link] [source]

If you set -DYYSTACKDEPTH=0 then the size of the stack will grow as needed, using realloc().

Lemon background

I wrote Lemon back around 1990, a full decade before I started writing SQLite. Using Lemon instead of Yacc/Bison has turned out to be a very good choice for SQLite because I have control of the tool, and can thus add new features and optimizations as needed by SQLite. Over the two decades of SQLite development, Lemon has been optimized for use with SQLite. But I have not kept up a rigorous testing program to ensure that features of Lemon that are not used by SQLite actually still work. So, for example, I have not recently tested to ensure that YYSTACKDEPTH=0 is still functional. The code is still there, but it might have been broken at some point along the way.

The parts of Lemon that SQLite uses are very well tested. The other parts, not so much.

Are you sure you really need this?

The size of the stack needs to be proportional to the size of your input, that is often an indication that your grammar needs improvement. LALR(1) parsers want to use right-recursion, not left-recursion. So recursive rules should work like this:

     x ::= a COMMA x.

Not like this:

     x ::= x COMMA a.

If you use right-recursion instead of left recursion, then the stack depth is proportional to the nesting depth, which in most human-generated inputs is quite shallow.

(5) By anonymous on 2020-03-16 11:45:17 in reply to 4 [link] [source]

I believe for LALR(1) parsers you actually want left recursion instead of right recursion. Right recursion will continue to grow the stack(shift) until the recursion ends, then it will perform all the reduction actions(pop the stack). Left recursion will reduce immediately and keep the stack lower. This is consistent to the Lemon documentation.

(6.1) By Richard Hipp (drh) on 2020-03-16 12:11:49 edited from 6.0 in reply to 5 [link] [source]

You are quite correct. I got it backwards. The perils of answering emails before breakfast. :-)

(7) By terpstra on 2020-03-16 15:49:43 in reply to 4 [link] [source]

I just searched lemon.c from the latest sqlite snapshot tarball, and it has realloc nowhere inside a string. Also, the generated code does not have any calls to realloc. Are you sure this option exists?

I totally understand that having your own parser generator helped you. Having exactly the right tool for the job often does. Building your own tool essentially guarantees this property. :)

My current parser is a hand rolled recursive descent parser using precedence climbing. It slowly got larger over time and is now 1700 lines. The parser also mixes desugaring with parsing, which makes the code needlessly complex. I was hoping to refactor this into: proper grammar rules, a front-end AST, a verification stage, and a rewrite stage to desugar the AST. So I went looking for parser generators.

I believe my language is both LL(1) and LALR(1), but since the parser is hand rolled I couldn't verify this property easily. I wrote an initial description in lemon, and found it was indeed conflict free, so I am at least right about LALR(1).

I need right recursion amusingly exactly for the construct you quoted: lists. My language has right- and left- associative operators. The comma operator is right-associative, in order to act like a prepend 'cons' function. If I parsed it left-associative, the produced AST would be postpending symbols in the list. Yes, I could play games where I invert the associativity afterwards, but I hope you agree this should be done by the parser. The rest of the compiler has been structured to deal with very deep AST constructs, so I would lose arbitrarily deep list literals if I switched to a parser generator with a hard stack limit.

The other feature I am missing is the ability to stamp out copies of a set of very similar rules. My operator parsing rules take 30 lines in lemon. That's fine. However, there are 4 variants of those rules, with only minor differences based on whether I am parsing a type/expr/pattern/body. In the hand-rolled parser, the parser can consult a variable to decide whether or not to enable the special rules based on category. I could probably get all I need by using the C preprocessor to generate the lemon input, but that seems a bit crude.

(8) By Richard Hipp (drh) on 2020-03-16 16:03:56 in reply to 7 [link] [source]

I just searched lemon.c from the latest sqlite snapshot tarball, and it has realloc nowhere inside a string.

The parser template is the file lempar.c. The stack reallocation code is here: https://sqlite.org/src/info/e8899b28488f060d?ln=295

(9) By terpstra on 2020-03-16 17:38:54 in reply to 8 [source]

Awesome!

The lemon version installed on my system was 1.39. I just compiled the snapshot version and it indeed produced a realloc capable parser.

Many thanks.

(10) By anonymous on 2020-03-16 19:06:41 in reply to 7 [link] [source]

Your need for right recursion is because of this construct? "3, 2, 1" -> [1,2,3]

list ::= item(A). { single item list AST...}

list ::= item(A) cons(B). { 1. build item AST(A) 2. make A child of B}

cons ::= COMMA item. { build item AST}

cons ::= cons(A) COMMA item. { 1. build item AST(B) 2. make A child of B }

The above pseudo grammar should do what you want it to do while being left recursive. Not obtuse magic and the parser is still doing the work.

(11) By anonymous on 2020-03-20 21:38:58 in reply to 10 [link] [source]

Thank you for the taking the time to reply.

Unfortunately, this technique is not applicable in my case. I need an actual right-deep AST. Your proposal would require me to repeatedly append to this list, which would make my algorithm O(n^2) in this case. Post-processing the AST to flip it from left-deep to right-deep could work, but that would needlessly complicate the situation.

The right solution in my situation is a dynamically sized parser stack. Fortunately, that feature exists.