SQLite User Forum

Regression with sqlite 3.38.0 on query involving view of UNION ALL tables and JOINs
Login

Regression with sqlite 3.38.0 on query involving view of UNION ALL tables and JOINs

(1) By Even Rouault (rouault) on 2022-03-04 18:12:13 [source]

Hi,

I've spotted a regression with sqlite 3.38.0. This involves a quite complex database and request. I've done my best to simplify down to the minimum scenario I could identified.

Dump of a (simplified version) of the database (the values of the sqlite1_stat table are the ones of the full database):

PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;

CREATE TABLE extent(
    auth_name TEXT NOT NULL,
    code INTEGER NOT NULL,
    south_lat FLOAT,
    CONSTRAINT pk_extent PRIMARY KEY (auth_name, code)
);
INSERT INTO extent VALUES('PROJ',1,1);

CREATE TABLE usage(
    object_table_name TEXT NOT NULL,
    object_auth_name TEXT NOT NULL,
    object_code INTEGER NOT NULL,
    extent_auth_name TEXT NOT NULL,
    extent_code INTEGER NOT NULL
);
INSERT INTO usage VALUES('other_transformation','PROJ',1,'PROJ',1);
INSERT INTO usage VALUES('other_transformation','PROJ',2,'PROJ',1);

CREATE TABLE grid_transformation(
    auth_name TEXT NOT NULL,
    code INTEGER NOT NULL,
    name TEXT NOT NULL,

    source_crs_auth_name TEXT NOT NULL,
    source_crs_code INTEGER NOT NULL,
    target_crs_auth_name TEXT NOT NULL,
    target_crs_code INTEGER NOT NULL,

    accuracy FLOAT,

    CONSTRAINT pk_grid_transformation PRIMARY KEY (auth_name, code)
);

CREATE TABLE other_transformation(
    auth_name TEXT NOT NULL,
    code INTEGER NOT NULL,
    name TEXT NOT NULL ,

    source_crs_auth_name TEXT NOT NULL,
    source_crs_code INTEGER NOT NULL,
    target_crs_auth_name TEXT NOT NULL,
    target_crs_code INTEGER NOT NULL,

    accuracy FLOAT,

    CONSTRAINT pk_other_transformation PRIMARY KEY (auth_name, code)
);

INSERT INTO other_transformation VALUES('PROJ',1,'N43 height to N60 height','EPSG',8675,'EPSG',5717,0.01);
INSERT INTO other_transformation VALUES('PROJ',2,'N60 height to N2000 height','EPSG',5717,'EPSG',3900,0.01);

CREATE TABLE supersession(
    superseded_table_name TEXT NOT NULL,
    superseded_auth_name TEXT NOT NULL,
    superseded_code INTEGER NOT NULL,
    replacement_table_name TEXT NOT NULL,
    replacement_auth_name TEXT NOT NULL,
    replacement_code INTEGER NOT NULL,
    same_source_target_crs BOOLEAN NOT NULL
);

ANALYZE sqlite_master;
INSERT INTO sqlite_stat1 VALUES('grid_transformation','grid_transformation_idx','799 267 6 6 2');
INSERT INTO sqlite_stat1 VALUES('other_transformation','other_transformation_idx','373 75 5 5 3');
INSERT INTO sqlite_stat1 VALUES('supersession','supersession_idx','1206 201 173 2');
INSERT INTO sqlite_stat1 VALUES('usage','idx_usage_object','22233 2022 530 2');
INSERT INTO sqlite_stat1 VALUES('extent','extent','4160 832 1');
CREATE INDEX idx_usage_object ON usage(object_table_name, object_auth_name, object_code);
CREATE INDEX supersession_idx ON supersession(superseded_table_name, superseded_auth_name, superseded_code);
CREATE INDEX grid_transformation_idx ON grid_transformation(source_crs_auth_name, source_crs_code, target_crs_auth_name, target_crs_code);
CREATE INDEX other_transformation_idx ON other_transformation(source_crs_auth_name, source_crs_code, target_crs_auth_name, target_crs_code);

CREATE VIEW coordinate_operation_view AS
    SELECT 'grid_transformation' AS table_name, auth_name, code, name,
           source_crs_auth_name,
           source_crs_code, target_crs_auth_name, target_crs_code,
           accuracy FROM grid_transformation
    UNION ALL
    SELECT 'other_transformation' AS table_name, auth_name, code, name,
           source_crs_auth_name,
           source_crs_code, target_crs_auth_name, target_crs_code,
           accuracy FROM other_transformation;
COMMIT;

Request:

echo "SELECT v1.accuracy AS accuracy1, v2.accuracy as accuracy2, a1.south_lat AS south_lat1, a2.south_lat AS south_lat2 FROM coordinate_operation_view v1 JOIN coordinate_operation_view v2 ON v1.target_crs_auth_name = v2.source_crs_auth_name AND v1.target_crs_code = v2.source_crs_code LEFT JOIN supersession ss1 ON ss1.superseded_table_name = v1.table_name AND ss1.superseded_auth_name = v1.auth_name AND ss1.superseded_code = v1.code AND ss1.superseded_table_name = ss1.replacement_table_name AND ss1.same_source_target_crs = 1 JOIN usage u1 ON u1.object_table_name = v1.table_name AND u1.object_auth_name = v1.auth_name AND u1.object_code = v1.code JOIN extent a1 ON a1.auth_name = u1.extent_auth_name AND a1.code = u1.extent_code JOIN usage u2 ON u2.object_table_name = v2.table_name AND u2.object_auth_name = v2.auth_name AND u2.object_code = v2.code JOIN extent a2 ON a2.auth_name = u2.extent_auth_name AND a2.code = u2.extent_code WHERE v1.source_crs_auth_name = 'EPSG' AND v1.source_crs_code = '8675' AND v2.target_crs_auth_name = 'EPSG' AND v2.target_crs_code = '3900' AND south_lat1 != 0 AND south_lat2 != 0 ORDER BY accuracy1 + accuracy2" | sqlite3 test.db

With sqlite < 3.38.0, the following correct result is returned:

0.01 0.01 1.0 1.0

With sqlite 3.38.0, no row is returned.

Doing a git bisect session on the https://github.com/sqlite/sqlite mirror spots the following commit as the offending one:

commit a11c5e22b5eb32cdf2061a6c19fd7bf4d2a36e05
Author: drh <>
Date:   Thu Dec 9 18:44:03 2021 +0000

    Enable bloom filters for the right table of a LEFT JOIN.  Fix unreachable
    branches.
    
    FossilOrigin-Name: d342ab722de1fc3f34219c9755253db9d88eb50a6fa5cc257207008b00e7fc03

I've tested also with the latest commit of the master branch at time of writing (https://github.com/sqlite/sqlite/commit/6f9d68817cddd4491a225403e8aa9125ee199fe5), and the issue is still present on it.

Best regards, Even

(2) By Richard Hipp (drh) on 2022-03-04 19:42:13 in reply to 1 [link] [source]

This might be the correct patch against trunk:

Index: src/where.c
==================================================================
--- src/where.c
+++ src/where.c
@@ -1076,14 +1076,17 @@
     VdbeCoverage(v);
     sqlite3VdbeJumpHere(v, addrTop);
     pLoop->wsFlags &= ~WHERE_BLOOMFILTER;
     if( OptimizationDisabled(pParse->db, SQLITE_BloomPulldown) ) break;
     while( ++iLevel < pWInfo->nLevel ){
+      const SrcItem *pTabItem;
       pLevel = &pWInfo->a[iLevel];
       pLoop = pLevel->pWLoop;
       if( NEVER(pLoop==0) ) continue;
       if( pLoop->prereq & notReady ) continue;
+      pTabItem = &pWInfo->pTabList->a[pLevel->iFrom];
+      if( pTabItem->fg.jointype & JT_LEFT ) continue;
       if( (pLoop->wsFlags & (WHERE_BLOOMFILTER|WHERE_COLUMN_IN))
                  ==WHERE_BLOOMFILTER
       ){
         /* This is a candidate for bloom-filter pull-down (early evaluation).
         ** The test that WHERE_COLUMN_IN is omitted is important, as we are

(3) By Even Rouault (rouault) on 2022-03-04 20:34:56 in reply to 2 [link] [source]

Richard,

Thanks for the prompt answer. I've tested your patch, and it does fix the issue I hit, including on the full scenario.

Would you be aware of a reliable workaround I could use (like the addition of some terms in my initial query) to make things work even on stock 3.38.0 ? I noticed that removing the ORDER BY clause (which I could do as post processing) seemed to make things work, but maybe that's just luck on the test cases of my regression test suite that wouldn't work on other similar scenarios.

(4) By Richard Hipp (drh) on 2022-03-04 20:50:10 in reply to 3 [link] [source]

Possible work-arounds:

  • Invoke sqlite3_test_control(SQLITE_TESTCTRL_OPTIMIZATIONS,db,0x100000) to disable the Bloom-filter pull-down optimization, which is where the error is happening.

  • Move the LEFT JOIN to the end. To wit:

SELECT v1.accuracy AS accuracy1,
       v2.accuracy as accuracy2,
       a1.south_lat AS south_lat1,
       a2.south_lat AS south_lat2
  FROM coordinate_operation_view v1
  JOIN coordinate_operation_view v2
        ON v1.target_crs_auth_name = v2.source_crs_auth_name 
       AND v1.target_crs_code = v2.source_crs_code
  JOIN usage u1
        ON u1.object_table_name = v1.table_name
       AND u1.object_auth_name = v1.auth_name
       AND u1.object_code = v1.code
  JOIN extent a1
        ON a1.auth_name = u1.extent_auth_name
       AND a1.code = u1.extent_code
  JOIN usage u2
        ON u2.object_table_name = v2.table_name
       AND u2.object_auth_name = v2.auth_name
       AND u2.object_code = v2.code
  JOIN extent a2
        ON a2.auth_name = u2.extent_auth_name
       AND a2.code = u2.extent_code
  LEFT JOIN supersession ss1
        ON ss1.superseded_table_name = v1.table_name
       AND ss1.superseded_auth_name = v1.auth_name
       AND ss1.superseded_code = v1.code
       AND ss1.superseded_table_name = ss1.replacement_table_name
       AND ss1.same_source_target_crs = 1
  WHERE v1.source_crs_auth_name = 'EPSG'
    AND v1.source_crs_code = '8675'
    AND v2.target_crs_auth_name = 'EPSG'
    AND v2.target_crs_code = '3900'
    AND south_lat1 != 0
    AND south_lat2 != 0
  ORDER BY accuracy1 + accuracy2;