Bug report: Complex CTE generates segmentation fault depending on the order of joined tables in its body

(1) By mzm2021 on 2021-03-18 10:50:03 [link] [source]

The complex CTE below generates a segmentation fault depending on the order of the joined tables in its body.

The problem was spotted on 3.35.2 and 3.35.1. It does not affect 3.34.1.

I used the SQLite amalgamation compiled with:

gcc shell.c sqlite3.c -lpthread -ldl -lm -g

On 3.35.2, here is the GDB session log with the backtrace:

(gdb) r < bug-cte-materialize.sql 
Starting program: /home/mm/sqlite-test/sqlite-amalgamation-3350200/a.out < bug-cte-materialize.sql
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x000000000044eeff in sqlite3BtreeCursor (p=0x0, iTable=1, wrFlag=4, 
    pKeyInfo=0x0, pCur=0x767a28) at sqlite3.c:69440
69440	  if( p->sharable ){
(gdb) bt
#0  0x000000000044eeff in sqlite3BtreeCursor (p=0x0, iTable=1, wrFlag=4, 
    pKeyInfo=0x0, pCur=0x767a28) at sqlite3.c:69440
#1  0x000000000046c3d2 in sqlite3VdbeExec (p=0x742778) at sqlite3.c:89843
#2  0x0000000000464b5b in sqlite3Step (p=0x742778) at sqlite3.c:84321
#3  0x0000000000464dad in sqlite3_step (pStmt=0x742778) at sqlite3.c:84378
#4  0x0000000000416ec3 in exec_prepared_stmt (pArg=0x7fffffffcb40, 
    pStmt=0x742778) at shell.c:13374
#5  0x0000000000417b73 in shell_exec (pArg=0x7fffffffcb40, 
    zSql=0x7264c0 "WITH\n\tcst(rsx, rsy) AS  (\n\t\tSELECT 100, 100\n\t),\n\n\tcst2(minx, maxx, stepx, miny, maxy, stepy, qualitativex, qualitativey) AS (\t\t\n\t\tSELECT NULL, NULL, NULL, NULL, NULL, NULL, 0, 0\n\t),\n\n\tds0(m, n, x, y, "..., pzErrMsg=0x7fffffffc9c8) at shell.c:13683
#6  0x000000000042647e in runOneSqlLine (p=0x7fffffffcb40, 
    zSql=0x7264c0 "WITH\n\tcst(rsx, rsy) AS  (\n\t\tSELECT 100, 100\n\t),\n\n\tcst2(minx, maxx, stepx, miny, maxy, stepy, qualitativex, qualitativey) AS (\t\t\n\t\tSELECT NULL, NULL, NULL, NULL, NULL, NULL, 0, 0\n\t),\n\n\tds0(m, n, x, y, "..., in=0x7ffff76ac640 <_IO_2_1_stdin_>, startline=1) at shell.c:20613
#7  0x0000000000426a5f in process_input (p=0x7fffffffcb40) at shell.c:20727
#8  0x0000000000428560 in main (argc=1, argv=0x7fffffffde38) at shell.c:21522

Obviously, the problem is caused by sqlite3BtreeCursor() being called with a NULL pointer as its first parameter.

The CTE query below is already a simplified version of an original (and larger one) that started misbehaving after upgrading to 3.35.

This bug and a previous one (reported a few minutes ago) were discovered when trying to debug issues with the original query.

WITH
	cst(rsx, rsy) AS  (
		SELECT 100, 100
	),

	cst2(minx, maxx, stepx, miny, maxy, stepy, qualitativex, qualitativey) AS (		
		SELECT NULL, NULL, NULL, NULL, NULL, NULL, 0, 0
	),

	ds0(m, n, x, y, x2, y2, title, size, mark, label, markmode) AS (
		SELECT 1, 2, 3, 4, 5, 6, 7 , 8, 9, 10, 11
	)
	,
	
	ds(m, n, x, y, x2, y2, title, size, mark, label, markmode) AS (
		SELECT m, n, x,
			y, x2,
			y2,
			title, size, mark, label, markmode
		FROM ds0
		WINDOW w AS (PARTITION BY m, x ORDER BY n)
	),

	d(m, n, x, y, x2, y2, labelx, labely, title, size, mark, label, markmode) AS (
		SELECT m, n, x, y,	x2, y2, x, y, title, size, mark, label, markmode
		FROM ds, cst2
	),

	ylabels(y, label) AS (
		SELECT y, MIN(labely) FROM d GROUP BY y
	),

	yaxis(maxy, miny, stepy , minstepy) AS (
		WITH
			xt0(minx, maxx) AS (
				SELECT  coalesce(miny, min(min(y2), min(y))),  coalesce(maxy, max(max(y2), max(y))) + qualitativey  FROM d, cst2
			),
			xt1(mx, mn) AS (SELECT maxx, minx FROM xt0),
			xt2(mx, mn, step) AS (SELECT mx, mn, (mx-mn)  FROM xt1),
			
			xt3(mx, mn, ms) AS (
				SELECT mx, mn, first_value(rs) OVER (order by x desc) AS ms FROM (SELECT mx, mn, step, f,(mx-mn) as rng, 1.0*step/f as rs, 1.0*(mx-mn)/(step/f) AS x FROM xt2, (SELECT 1 AS f UNION ALL SELECT 2 UNION ALL SELECT 4 UNION ALL SELECT 5)) AS src WHERE x < 10 limit 1),
			xt4(minstepy) AS (
				SELECT MIN(abs(y2-y)) FROM d WHERE y2 != y
			)
		SELECT (mx/ms)*ms, (mn/ms)*ms, coalesce(stepy, ms), coalesce(minstepy, ms, stepy)  FROM xt3, cst2,xt4
	),

	distinct_mark_n_m(mark, ze, zem, title) AS (
		SELECT DISTINCT mark, n AS ze, m AS zem, title FROM ds0
	),

	facet0(m, mi, title, radial) AS (
		SELECT md, row_number() OVER () - 1, title, 'radial' IN (SELECT mark FROM distinct_mark_n_m WHERE zem = md)
		FROM (SELECT DISTINCT zem AS md, title AS title FROM distinct_mark_n_m ORDER BY 2, 1)
	),

	facet(m, mi, xorigin, yorigin, title, radial) AS (
		SELECT m, mi,
			rsx * 1.2 * IFNULL(CASE WHEN (
				0
			) > 0 THEN mi / (
				0
			) ELSE mi % (
				2
			)  END, mi),
			rsy  * 1.2 * IFNULL(CASE WHEN (
				2
			) > 0 THEN mi / (
				2
			) ELSE mi / (
				0
			)  END, 0),
			title, radial FROM facet0, cst
	),

	radygrid(m, mi, tty, wty, ttx, ttx2, xorigin, yorigin) AS (
		SELECT m, mi,  rsy / 2 / ((maxy-miny)/stepy) * (value-1) AS tty, coalesce(NULL, miny + stepy * (value-1)) AS wty, xorigin, xorigin+rsx, xorigin + rsx / 2, yorigin + rsy / 2  FROM generate_series(1), yaxis, cst, facet LEFT JOIN ylabels ON ylabels.y = (miny + (value-1) * stepy) WHERE radial AND stop = 1+1.0*(maxy-miny)/stepy
	),
	
	ypos(m, mi, pcx, pcy, radial) AS (
		SELECT m, mi, xorigin, yorigin + CASE
			WHEN 0 BETWEEN miny AND maxy THEN
				rsy - (0 - miny) * rsy / (maxy-miny)
			WHEN 0 >= maxy THEN 0
			ELSE  rsy
		END, radial FROM yaxis, cst, facet WHERE NOT radial
		UNION ALL
		SELECT m, mi, xorigin + rsx / 2, yorigin + (CASE
			WHEN 0 BETWEEN miny AND maxy THEN
				rsy - (0 - miny) * rsy / 2 / (maxy-miny)
			WHEN 0 >= maxy THEN 0
			ELSE  rsy
		END ) / 2, radial FROM yaxis, cst, facet WHERE radial
	)

-- Segmentation fault
SELECT * FROM radygrid , ypos

-- Inverting the tables order generates no fault
-- SELECT * FROM  ypos, radygrid

(2) By Richard Hipp (drh) on 2021-03-18 16:58:03 in reply to 1 [link] [source]

Thanks for the report.

Ticket: <https://www.sqlite.org/src/info/bb8a9fd4a9b7fce5>

Fixed at: <https://www.sqlite.org/src/info/bcbe5308f3a3b94f> (context <https://www.sqlite.org/src/timeline?c=bcbe5308f3a3b94f&y=a>)

There is a new amalgamation tarball at <https://www.sqlite.org/tmp/sqlite-snapshot-202103181652.tar.gz>. Please download and try this new build and report back whether or not it works with your system. Thanks.

Patch release 3.35.3 will be forthcoming.

(3) By mzm2021 on 2021-03-19 08:20:13 in reply to 2 [link] [source]

Thank you for your reply. I confirm that this particular bug is gone with the new tarball.

I even tested it with the original large CTE request and it worked correctly.

Still, there is a regression in a test case involving the same CTE but with the wholenumber virtual table instead of generate_series.

In that test case, the query enters into an infinite loop as the limits on wholenumber generated values inside some parts of the CTE seem to not be respected. It is as if the WHERE clause is completely ignored in some chunks of the CTE which look like this:

SELECT value FROM integers WHERE value < 10

I am still investigating this particular issue as it looks to be related to shared cursors in materialized views (according to my quick understanding of the new SQLite code) and to query flattening which is not working as before.

I am also checking the source code of both virtual tables to see if there are any differences in cursor handling that may explain this behavior.

I'll keep you updated about my findings.

(4) By mzm2021 on 2021-03-19 09:03:22 in reply to 2 [source]

I should probably file another bug in another thread. But since the test case is almost the same, I prefer continuing here.

As explained in my previous reply, I confirm that replacing generate_series by a wholenumber derived virtual table in the above CTE makes the query enter in an infinite loop. As the first bug, the query used to work on 3.34.1 but not anymore on any 3.35.x release including yesterday's test tarball where you have fixed the segmentation fault.

Below is a simple variation on the above CTE with an integers table derived from wholenumber.

The wholenumber extension is compiled using:

gcc -g -fPIC -shared wholenumber.c -o wholenumber.so

Then SQLite shell is invoked with the following input (Note that only the radygrid sub-table defintion has been updated to use integers instead of generate_series):

.load ./wholenumber

CREATE VIRTUAL TABLE integers USING wholenumber;

WITH
	cst(rsx, rsy) AS  (
		SELECT 100, 100
	),

	cst2(minx, maxx, stepx, miny, maxy, stepy, qualitativex, qualitativey) AS (		
		SELECT NULL, NULL, NULL, NULL, NULL, NULL, 0, 0
	),radygrid

	ds0(m, n, x, y, x2, y2, title, size, mark, label, markmode) AS (
		SELECT 1, 2, 3, 4, 5, 6, 7 , 8, 9, 10, 11
	)
	,
	
	ds(m, n, x, y, x2, y2, title, size, mark, label, markmode) AS (
		SELECT m, n, x,
			y, x2,
			y2,
			title, size, mark, label, markmode
		FROM ds0
		WINDOW w AS (PARTITION BY m, x ORDER BY n)
	),

	d(m, n, x, y, x2, y2, labelx, labely, title, size, mark, label, markmode) AS (
		SELECT m, n, x, y,	x2, y2, x, y, title, size, mark, label, markmode
		FROM ds, cst2
	),

	ylabels(y, label) AS (
		SELECT y, MIN(labely) FROM d GROUP BY y
	),

	yaxis(maxy, miny, stepy , minstepy) AS (
		WITH
			xt0(minx, maxx) AS (
				SELECT  coalesce(miny, min(min(y2), min(y))),  coalesce(maxy, max(max(y2), max(y))) + qualitativey  FROM d, cst2
			),
			xt1(mx, mn) AS (SELECT maxx, minx FROM xt0),
			xt2(mx, mn, step) AS (SELECT mx, mn, (mx-mn)  FROM xt1),
			
			xt3(mx, mn, ms) AS (
				SELECT mx, mn, first_value(rs) OVER (order by x desc) AS ms FROM (SELECT mx, mn, step, f,(mx-mn) as rng, 1.0*step/f as rs, 1.0*(mx-mn)/(step/f) AS x FROM xt2, (SELECT 1 AS f UNION ALL SELECT 2 UNION ALL SELECT 4 UNION ALL SELECT 5)) AS src WHERE x < 10 limit 1),
			xt4(minstepy) AS (
				SELECT MIN(abs(y2-y)) FROM d WHERE y2 != y
			)
		SELECT (mx/ms)*ms, (mn/ms)*ms, coalesce(stepy, ms), coalesce(minstepy, ms, stepy)  FROM xt3, cst2,xt4
	),

	distinct_mark_n_m(mark, ze, zem, title) AS (
		SELECT DISTINCT mark, n AS ze, m AS zem, title FROM ds0
	),

	facet0(m, mi, title, radial) AS (
		SELECT md, row_number() OVER () - 1, title, 'radial' IN (SELECT mark FROM distinct_mark_n_m WHERE zem = md)
		FROM (SELECT DISTINCT zem AS md, title AS title FROM distinct_mark_n_m ORDER BY 2, 1)
	),

	facet(m, mi, xorigin, yorigin, title, radial) AS (
		SELECT m, mi,
			rsx * 1.2 * IFNULL(CASE WHEN (
				0
			) > 0 THEN mi / (
				0
			) ELSE mi % (
				2
			)  END, mi),
			rsy  * 1.2 * IFNULL(CASE WHEN (
				2
			) > 0 THEN mi / (
				2
			) ELSE mi / (
				0
			)  END, 0),
			title, radial FROM facet0, cst
	),

	radygrid(m, mi, tty, wty, ttx, ttx2, xorigin, yorigin) AS (
		SELECT m, mi,  rsy / 2 / ((maxy-miny)/stepy) * (value-1) AS tty, coalesce(NULL, miny + stepy * (value-1)) AS wty, xorigin, xorigin+rsx, xorigin + rsx / 2, yorigin + rsy / 2  FROM integers, yaxis, cst, facet LEFT JOIN ylabels ON ylabels.y = (miny + (value-1) * stepy) WHERE radial AND value <= 1+1.0*(maxy-miny)/stepy
	),
	
	ypos(m, mi, pcx, pcy, radial) AS (
		SELECT m, mi, xorigin, yorigin + CASE
			WHEN 0 BETWEEN miny AND maxy THEN
				rsy - (0 - miny) * rsy / (maxy-miny)
			WHEN 0 >= maxy THEN 0
			ELSE  rsy
		END, radial FROM yaxis, cst, facet WHERE NOT radial
		UNION ALL
		SELECT m, mi, xorigin + rsx / 2, yorigin + (CASE
			WHEN 0 BETWEEN miny AND maxy THEN
				rsy - (0 - miny) * rsy / 2 / (maxy-miny)
			WHEN 0 >= maxy THEN 0
			ELSE  rsy
		END ) / 2, radial FROM yaxis, cst, facet WHERE radial
	)

SELECT * FROM radygrid , ypos;

(5) By Richard Hipp (drh) on 2021-03-19 17:59:12 in reply to 4 [link] [source]

This new problem appears to be fixed by the change to the wholenumber virtual table in check-in f12b54042e27b2fe.

(6) By anonymous on 2021-03-20 09:23:38 in reply to 5 [link] [source]

Wow, that’s scary!

I’d never have thought a vtable ‘estimatedCost’ influenced correctness,
with one value yielding an infinite loop, and another being fine.

(7) By mzm2021 on 2021-03-22 09:45:51 in reply to 5 [link] [source]

Thank you.

I confirm that the cost change to the wholenumber virtual table fixes the reported issue.

By the way, is there a chance that other virtual table implementations would be affected by such a cost problem?

(8) By Richard Hipp (drh) on 2021-03-22 12:09:34 in reply to 7 [link] [source]

For virtual tables that might run forever in an infinite loop (like wholenumber can), then the cost value of the xBestIndex method of the virtual table implementation is important for helping the query planner choose a plan that does not try to materialize the virtual table. What other virtual tables are you using that might return an infinite number of results, depending on how it is used?

(9) By mzm2021 on 2021-03-22 12:28:00 in reply to 8 [link] [source]

I use another vtable which was directly inspired by wholenumber.c. So it was fixed using the same method.

BTW, I looked at the explanation of estimatedCost in https://www.sqlite.org/vtab.html#outputs where I read:

The SQLite core initializes estimatedCost to a very large value prior to invoking xBestIndex, so if xBestIndex determines that the current combination of parameters is undesirable, it can leave the estimatedCost field unchanged to discourage its use.

So I tested commenting out estimatedCost's assignment in wholenumber.c as in:

if( (idxNum & 12)==0 ){
    /*pIdxInfo->estimatedCost = (double)1e99;*/
  }else if( (idxNum & 3)==0 ){
    pIdxInfo->estimatedCost = (double)5;
  }else{
    pIdxInfo->estimatedCost = (double)1;
  }

And it worked fine too.

I noticed also that wholenumber.c makes no use of estimatedRows.

How effective would it be to provide the query planner with estimatedRows hints?

(10) By Richard Hipp (drh) on 2021-03-22 12:40:06 in reply to 9 [link] [source]

How effective would it be to provide the query planner with estimatedRows hints?

I don't know. Why don't you try it and report back?

(11) By mzm2021 on 2021-03-22 12:58:14 in reply to 10 [link] [source]

Sure I will do.

But my question was general about how the query planner considers theses hints: The documentation does not state clearly which of estimatedCost or estimatedRows has higher precedence.

I quickly checked the source code of SQLite and it seems that the values of estimatedCost are always compared before those of estimatedRows.

(12) By Dan Kennedy (dan) on 2021-03-22 20:17:24 in reply to 11 [link] [source]

estimatedRows is used as the basis for the cost of an external sort, if the virtual table cannot supply rows in the order requested by SQLite. Also, if SQLite is considering the virtual table as the outer loop of a join, then estimatedRows is used to estimate how many times the inner loops are expected to run, which is used to determine their overall cost.

Dan.

(13) By anonymous on 2021-04-20 19:31:08 in reply to 8 [link] [source]

Maybe it might be a good idea to add a SQLITE_INDEX_SCAN_NO_MATERIALIZE flag to force it to not materialize the table (in case the data is infinite or otherwise way too big, or possibly other reasons that I don't know)? (In this case, if the query specifies a ORDER BY clause that cannot be consumed by the virtual table, then it is an error.)

Trying to force it to do certain things (rather than only suggesting them) by setting the estimated cost seems like klugy to me. (Also is klugy to force it to reject the query plan by setting the cost, but this has already been fixed; now you can return SQLITE_CONSTRAINT to force it to reject the query plan, which is better.)