A segmentation fault in SQLite latest release build

(1) By Wang Ke (krking) on 2021-11-08 08:48:32 [link] [source]

Hello developers,

We found a test case causing a segmentation fault in SQLite latest release build (3.36.0).

The test case is as follows:

CREATE VIRTUAL TABLE rt0 USING rtree(c0,c1,c2 INT);
INSERT INTO rt0(c2) VALUES(0);
SELECT * FROM (SELECT rt0.c2,rt0.c1 FROM rt0 UNION ALL SELECT rt0.c1,rt0.c1 FROM rt0) LEFT JOIN (SELECT * FROM (SELECT (SELECT max(sum(rt0.c1),rank() OVER (ORDER BY ra0.c2)) FROM rt0 AS ra0) AS ca1 FROM rt0) WHERE ca1=0);

which causes:

sqlite3: sqlite3.c: RtreeNode *rtreeNodeOfFirstSearchPoint(RtreeCursor *, int *): Assertion `pCur->bPoint pCur->nPoint' failed.

when compiling with option --enable-debug, and segmentation fault when the option is disabled.

Here is the gdb debugging information:

#0  0x00005648d420ed44 in rtreeNodeOfFirstSearchPoint (
    pCur=pCur@entry=0x5648d5dc6490, pRC=pRC@entry=0x7ffe9ddf6894)
    at sqlite3.c:193706
#1  0x00005648d420efdd in rtreeColumn (cur=0x5648d5dc6490, ctx=0x7ffe9ddf6a60, 
    i=1) at sqlite3.c:193981
#2  0x00005648d41e2fbb in sqlite3VdbeExec (p=0x5648d5dbf550) at sqlite3.c:94169
#3  0x00005648d41e7410 in sqlite3Step (p=0x5648d5dbf550) at sqlite3.c:84861
#4  sqlite3_step (pStmt=0x5648d5dbf550) at sqlite3.c:19382
#5  0x00005648d414797c in exec_prepared_stmt (pArg=0x7ffe9ddf6f10, 
    pStmt=0x5648d5dbf550) at shell.c:14164
#6  0x00005648d414c1c1 in shell_exec (pArg=<optimized out>, 
    zSql=<optimized out>, pzErrMsg=0x7ffe9ddf6d28) at shell.c:14473
#7  0x00005648d414d946 in runOneSqlLine (p=0x7ffe9ddf6f10, 
    zSql=0x5648d5dc8130 "SELECT * FROM (SELECT rt0.c2,rt0.c1 FROM rt0 UNION ALL SELECT rt0.c1,rt0.c1 FROM rt0) LEFT JOIN (SELECT * FROM (SELECT (SELECT max(sum(rt0.c1),rank() OVER (ORDER BY ra0.c2)) FROM rt0 AS ra0) AS ca1 FR"..., in=0x0, 
    startline=3) at shell.c:21449
#8  0x00005648d4158cf4 in process_input (p=0x7ffe9ddf6f10) at shell.c:21549
#9  0x00005648d4133d59 in main (argc=<optimized out>, argv=0x7ffe9ddf8238)
    at shell.c:22350

Bisecting shows the problem may be related to check-in c510377b0b.

Hope the problem will be handled properly :)

(2) By Richard Hipp (drh) on 2021-11-08 14:52:02 in reply to 1 [source]

Thanks for the bug report.

The crash is due to a NULL pointer dereference in the byte-code engine caused by incorrect byte-code. The incorrect byte-code results for a fault in the code generator.

Each table or subquery in a complex SELECT statement is assigned a cursor number. The name resolution logic for aggregate functions depends on the fact that cursor numbers for subqueries are always greater than cursor numbers in outer queries. But that assumption was violated by a new UNION ALL optimization that as added on 2020-12-19. The query in question invokes that optimization, causing cursor numbers to be misordered, resulting in incorrect byte-code, and ultimately the NULL pointer dereference.

A simplified query is this:

SELECT * FROM (
  SELECT 1 FROM rt0 AS q3
  UNION ALL
  SELECT 2 FROM rt0 AS q4
) LEFT JOIN (
  SELECT * 
    FROM (
           SELECT (SELECT sum(q2.c1) + rank() OVER () FROM rt0 AS q1) AS ca1
             FROM rt0 AS q2
         ) AS q5
   WHERE q5.ca1=0
);

The UNION ALL optimization transforms this query into the following:

SELECT 1, q7.* 
  FROM rt0 AS q3
  LEFT JOIN (
    SELECT * 
      FROM (
             SELECT (SELECT sum(q2.c1) + rank() OVER () FROM rt0 AS q1a) AS ca1
               FROM rt0 AS q2a
           ) AS q5a
     WHERE q5a.ca1=0
  ) AS q7
UNION ALL
SELECT 2, q8.*
  FROM rt0 AS q4
  LEFT JOIN (
    SELECT * 
      FROM (
             SELECT (SELECT sum(q2.c1) + rank() OVER () FROM rt0 AS q1b) AS ca1
               FROM rt0 AS q2b
           ) AS q5b
     WHERE q5b.ca1=0
  ) q8;

If you enter the second optimized SQL directly, it works (because all of the cursor numbers are well ordered). But when the query optimizer makes that translation, it ends up with the cursor number for q2b being less than the cursor number for q5b.

Multiple things need to be fixed here:

The UNION ALL optimization needs to be fixed so that it yields a parse tree where the cursor numbers of all subquerys greater than than the cursor number of any outer query.
New assert() statements need to be added to check every parse tree transformation and assure that the cursor numbers are always well-ordered.

I'm working on these changes now.

Everything in this post is the result of a preliminary analysis and is subject to change or correction in follow-up posts.

(3) By Richard Hipp (drh) on 2021-11-08 23:27:00 in reply to 2 [link] [source]

Plans did change.

The end solution (seen at check-in 74aec5dd1df95b56) was to refactor the logic that figures out which query in a cascade of nested and correlated queries a particular aggregate function belongs too. The new logic does not depend on subquery cursor numbers being greater than outer query cursor numbers. That constraint is no longer on the parse tree, and so no new assert() logic to check is was required.

Should be working now.

(4) By Wang Ke (krking) on 2021-11-09 02:04:55 in reply to 3 [link] [source]

Great job done!

Compared to guaranteeing the order of query cursor numbers and subquery cursor numbers, this is indeed a bit more targeted and have potentially less code modification, although this is already quite a lot of work.

I 'm trying to fuzz with testcases similar to this one and no further problems occurred yet.

Thank you for your explanation and hard work.