"parser stack overflow" limitation in not super complex query

(1) By vorimi on 2023-11-14 21:51:29 [link] [source]

Hello devs,

I have discovered the following SQLite limitation:

Currently, SQLite fails with "parser stack overflow" error.

I tested SQLite v3.40 and the latest v3.44.

I have figured out it is because of YYSTACKDEPTH limit set (by default) to 100.

The query in the repro is obviously generated but not super complex and no other DB vendors impose such limitation. I have tested PostgreSQL, MySQL, MSSQL and Oracle.

The deepest expression depth in the repro query is 32 which is much lower than SQLITE_MAX_EXPR_DEPTH set (by default) to 1000 and even less than the YYSTACKDEPTH/100.

I have no in-depth SQLite/Lemon knowledge but I belive the SQLite grammar/parser could be improved to parse at least real YYSTACKDEPTH (100) expression depth. Probably also some query parts, like subqueries, can be reasoned to the only grammar candidate, and parse such parts iteratively instead of recursively.

I would be grateful if the SQLite limitation can be lifted.

Michael

(2) By TP (tp9237842738) on 2024-01-25 17:13:38 in reply to 1 [link] [source]

Hi Michael,

Did you find a workaround or is your query working now? I have a quite similar problem with this query: https://pastebin.com/6Tz5vHNd

Regards,

(4) By Richard Hipp (drh) on 2024-01-25 18:01:40 in reply to 2 [link] [source]

And here is the second pastebin query:

SELECT *
FROM   (SELECT *
        FROM   (SELECT *
                FROM   (SELECT *
                        FROM   (SELECT *
                                FROM   (SELECT *
                                        FROM   (SELECT *
                                                FROM   (SELECT *
                                                        FROM   (SELECT *
                                                                FROM   (SELECT *
                                                                        FROM   (
                                                               SELECT
                                                                       *
                                                               FROM
                                                                       (SELECT
                                                                               *
                                                                       FROM
                                                                               (
                                                               SELECT
                                                                       *
                                                               FROM
                                                                       (SELECT
                                                                               *
                                                                       FROM
                                                                               (
                                                               SELECT
                                                                       *
                                                               FROM
                                                                       (SELECT
                                                                               *
                                                                       FROM
                                                                               (
                                                               SELECT
                                                                       1 AS x))
                                                                               )
                                                               )))))))
                                               ))))))

(3) By Richard Hipp (drh) on 2024-01-25 18:00:36 in reply to 1 [link] [source]

Here is a reproduction of the query from Pastebin:

select
  count(*)
from
  `user`
where
  (
    (
      select
        exists (
          select
            *
          from
            `ticket` `_T_c1c694bd849d`
          where
            (
              `user` = `user`.`id`
              and (
                select
                  exists (
                    select
                      *
                    from
                      `user` `_T_u_d09419503c1c`
                    where
                      (
                        `id` = `_T_c1c694bd849d`.`user`
                        and (
                          select
                            exists (
                              select
                                *
                              from
                                `country` `_T_u_c_a91b1c284cd4`
                              where
                                (
                                  `id` = `_T_u_d09419503c1c`.`country_id`
                                  and (
                                    select
                                      exists (
                                        select
                                          *
                                        from
                                          `user` `_T_u_c_U_06e2ba85c546`
                                        where
                                          (
                                            `country_id` = `_T_u_c_a91b1c284cd4`.`id`
                                            and (
                                              select
                                                exists (
                                                  select
                                                    *
                                                  from
                                                    `country` `_T_u_c_U_c_f53146a9f663`
                                                  where
                                                    (
                                                      `id` = `_T_u_c_U_06e2ba85c546`.`country_id`
                                                      and (
                                                        select
                                                          exists (
                                                            select
                                                              *
                                                            from
                                                              `user` `_T_u_c_U_c_U_0cfa13a09292`
                                                            where
                                                              (
                                                                `country_id` = `_T_u_c_U_c_f53146a9f663`.`id`
                                                                and `name` is not null
                                                              )
                                                          )
                                                      ) = 1
                                                    )
                                                )
                                            ) = 1
                                          )
                                      )
                                  ) = 1
                                )
                            )
                        ) = 1
                      )
                  )
              ) = 1
            )
        )
    ) = 1
  );

(5.1) By Richard Hipp (drh) on 2024-01-26 11:33:09 edited from 5.0 in reply to 1 [link] [source]

Answer to both reports: You can recompile using -DYYSTACKDEPTH=10000 or whatever other number you think is appropriate.

(6) By anonymous on 2024-01-25 19:16:14 in reply to 5.0 [link] [source]

Hm... I count a depth of 13 in the first example and 17 in the second and thrid example. Now the recommendation is to increase the max_Stack_depth from 1.000 to 10.000. Can anyone help with my confusion?

(7.1) By Richard Hipp (drh) on 2024-01-26 11:33:24 edited from 7.0 in reply to 6 [link] [source]

The YYSTACKDEPTH specifies the maximum depth of the stack on the push down automaton that parses the input text, token by token. That is not the same thing as the nesting depth of the SELECT statements. Each SELECT statement requires multiple stack levels in the push down automaton.

(10) By vorimi on 2024-01-26 11:52:31 in reply to 7.1 [link] [source]

Hi Richard,

I am a researcher, lead developer of atk4/data ORM and main contributor of the most popular PHP ORM - Doctrine.

The purpose of this post is not to (try to) reduce the SQL complexity in the example. I agree almost any SQL can be rewritten, but not always easily (as generated with **encapsulation design pattern** by code) and IMO it is actually the job of the SQL engine do handle (and possibly optimize) the (valid) SQL input.

The purpose of this post is to discuss the current limitation of Sqlite.

In past, I have opened this topic in github.com/php/php-src/issues/10135 .

There and here I was given the answer to raise YYSTACKDEPTH. This is easily said, but harder to do, as the default Sqlite package on Debian/Ubuntu/in general has the default YYSTACKDEPTH value set to 100. So currently, even in x64 environments where stack depth of many thousands is supported, this limitation is impossible to overcome without having to recompile Sqlite with custom YYSTACKDEPTH.

So as outlined in the initial post I would be grateful if this could be addressed in the parser either by parsing the subqueries iteratively (preferrably) or allowing the limit to be raised at runtime (at least from C like other Sqlite configuration options).

(11) By Richard Hipp (drh) on 2024-01-26 12:40:48 in reply to 10 [link] [source]

There are space and performance tradeoffs involved.

If you compile with -DYYSTACKDEPTH=0, then space for the parser stack is dynamically allocated. This gives you essentially unlimited depth, but comes with a 1.5% performance loss. 1.5% does not sound like much, but when you multiply that by multiple billions of devices on which SQLite runs, it becomes a bigger deal.

The parser stack is (by default) an automatic variable in an internal subroutine of SQLite (specifically the sqlite3RunParser() subroutine). In other words, the parser stack is allocated on the process stack. Each parser-stack level uses 24 bytes (on 64-bit machines). So 2400 bytes are needed for a 100-entry parser stack. If we increase the parser stack depth to 1000, that increases the process stack requirement by 21600 bytes. Many embedded devices running SQLite have very limited memory, and I think increasing the default stack requirements by 22KB will provoke loud complaints, and likely also some breakage.

If I undefine the sqlite3Parser_ENGINEALWAYSONSTACK macro such that the yyParse object in sqlite3RunParser() routine still has a fixed size but so that the whole thing is dynamically allocated, allowing the parser stack to be a run-time parameter, then performance drops by 3.4% overall.

So, yes, we can easily change the SQLite parser stack design to accommodate oversize SQL statements generated ORMs. But doing so has negative performance and power impact on literally billions of smartphones and other embedded devices. Is that really a good tradeoff, do you think?

Perhaps it would be better to lobby Debian/Ubuntu maintainers to include the -DYYSTACKDEPTH=1000 compile-time option on their builds of the system SQLite library? It seems to me that the extra memory associated with a large fixed parser stack is much less of a concern on Linux systems than it is on the embedded devices where SQLite is commonly used. Or, perhaps you could configure your ORMs so that they statically link against their own private version of SQLite, that is compiled more to your liking?

(12) By ddevienne on 2024-01-26 13:34:14 in reply to 11 [link] [source]

What about an hybrid approach, where the (lower) part of the (parser) stack is (process) stack-allocated,
but the other (upper) part is dynamically allocated, for the subset of queries that need it?

Should remain faster for most queries, with less overhead than fully heap-allocated, no?
While remaining more robust against parser-stack-overflow?

Of course stack use now needs a conditional, to use the upper or lower part / memory.

Just thinking out loud. --DD

PS: Then the upper-part max can be runtime configurable, while the heap-threshold remains compile-time

(13) By Richard Hipp (drh) on 2024-01-26 13:45:39 in reply to 12 [link] [source]

I was working on that before you posted :-)

There will be performance impact. The SQL parser is highly optimized. It is unreasonable to expect that any dynamic stack-sizing changes will not make the overall performance slower. The only question is how much slower.

(14) By Richard Hipp (drh) on 2024-01-26 14:22:17 in reply to 13 [link] [source]

Experiments are showing that by doing the initial parser stack allocation from the process-stack, as is done now, but then expanding the parser stack using malloc() if and only if the initial process-stack allocation overflow, we get about a 0.13% slowdown (due apparently to the extra level of indirection needed to locate the parser stack array). And the size of the library machine code grows by 301 bytes (x64 with -Os). This is getting closer to being feasible, I suppose.

(8) By TP (tp9237842738) on 2024-01-26 10:44:21 in reply to 5.0 [link] [source]

Do you mean YYSTACKDEPTH, @Richard?

We were able to solve our problem setting YYSTACKDEPTH.

(9) By Richard Hipp (drh) on 2024-01-26 11:33:47 in reply to 8 [link] [source]

Yes, you are correct. Prior posts have been revised.

(15) By Richard Hipp (drh) on 2024-01-27 11:47:37 in reply to 1 [link] [source]

As of check-in d87a2054774aa6ce (2024-01-27) the parser never runs out of stack space. If the parser stack fills up, more space is allocated from the heap. These changes should appear in the 3.46.0 release.

These changes were executed with no loss in performance and with less than 300 bytes of extra machine code (on x64 compiled with -Os).

(16) By vorimi on 2024-01-27 18:50:19 in reply to 15 [source]

Thank you, Richard, you are a hero!

Looking at the changes, sqlite.org/src/file?ci=trunk&name=test/misc5.test&ln=574 test comment should be probably updated or deleted completely.

Also, it seems there is no other test, it would be good to modify it to test larger i sqlite.org/src/file?ci=trunk&name=test/misc5.test&ln=580 to test multiple reallocations/grows of the stack.

Also, it would be good to assert the result with simple query like SELECT * FROM t1.