v3.40.0 very slow with UNIONs (or so I think)

(1) By jose isaias cabrera (jicman) on 2022-11-22 21:51:04 [source]

Greetings!

I have query that Mr. Medcalf helped me create which returns all changes for a field for a given project. This query,

        SELECT ProjID,
               Updated_By,
               InsertDate,
               var,
               oldv,
               newv
        FROM
        (
        
            SELECT ProjID,
                   Updated_By,
                   InsertDate,
                   'Finish_Date' as var,
                   (
                      SELECT
                        coalesce(Finish_Date,'') FROM Project_List
                      WHERE ProjID = o.ProjID
                      AND InsertDate < o.InsertDate
                      ORDER BY InsertDate DESC
                      LIMIT 1
                    ) AS oldv,
                   coalesce(Finish_Date,'') as newv
                   FROM Project_List as o
            UNION
        
              SELECT ProjID,
                     Updated_By,
                     InsertDate,
                     'Ann_CapexP' as var,
                     (
                        SELECT
                          replace(round(Ann_CapexP),'.0','') FROM Project_List WHERE
                        ProjID = o.ProjID
                        AND InsertDate < o.InsertDate
                        ORDER BY InsertDate DESC
                        LIMIT 1
                      ) AS oldv,
                      replace(round(Ann_CapexP),'.0','') as newv
                     FROM Project_List as o
              
      )
      WHERE oldv <> newv 
      AND ProjID = 'PR0000020614' 
ORDER BY InsertDate ASC;

Grabs 8 records and takes 0.01 secs to execute. On v3.40.0, the same query grabs 8 records, but it takes 4.44 secs. This query is composed of just two fields. When I add all the desired fields for a record, v3.39.4 grabs 125 records and takes 0.02 secs to execute, while v3.40.0 grabs 125 records and takes 379.41 secs. I ran analyze, thinking that it would fix things, but, it does not. If you give me a spot I will try to get you the db, but everything is so locked up here that sometimes I have to do 'unorthodox' maneuvering to get things out. I had to revert back to 3.39.4. SQL versions being used are vanilla SQLite Windows DLL from the download site.

SQLiteVer: 3.39.4 2022-09-29 15:55:41
SQLiteVer: 3.40.0 2022-11-16 12:10:08

Thanks for the support.

josé

(2) By anonymous on 2022-11-22 22:30:34 in reply to 1 [link] [source]

Can you check the query plan for each, and then compare them? Seems like the query planner is choosing something less than optimal (obviously).

EXPLAIN ....

and

EXPLAIN QUERY PLAN ...

on the two versions.

(also, making sure the 3.40 version compile options are similar to your earlier configuration is worthwhile).

(3) By jose isaias cabrera (jicman) on 2022-11-23 02:53:29 in reply to 2 [link] [source]

Explain with SQLite version 3.39.4 2022-09-29 15:55:41

sqlite> explain
   ...>         SELECT ProjID,
   ...>                Updated_By,
   ...>                InsertDate,
   ...>                var,
   ...>                oldv,
   ...>                newv
   ...>         FROM
   ...>         (
   ...>
   ...>             SELECT ProjID,
   ...>                    Updated_By,
   ...>                    InsertDate,
   ...>                    'Finish_Date' as var,
   ...>                    (
   ...>                       SELECT
   ...>                         coalesce(Finish_Date,'') FROM Project_List
   ...>                       WHERE ProjID = o.ProjID
   ...>                       AND InsertDate < o.InsertDate
   ...>                       ORDER BY InsertDate DESC
   ...>                       LIMIT 1
   ...>                     ) AS oldv,
   ...>                    coalesce(Finish_Date,'') as newv
   ...>                    FROM Project_List as o
   ...>             UNION
   ...>
   ...>               SELECT ProjID,
   ...>                      Updated_By,
   ...>                      InsertDate,
   ...>                      'Ann_CapexP' as var,
   ...>                      (
   ...>                         SELECT
   ...>                           replace(round(Ann_CapexP),'.0','') FROM Project_List WHERE
   ...>                         ProjID = o.ProjID
   ...>                         AND InsertDate < o.InsertDate
   ...>                         ORDER BY InsertDate DESC
   ...>                         LIMIT 1
   ...>                       ) AS oldv,
   ...>                       replace(round(Ann_CapexP),'.0','') as newv
   ...>                      FROM Project_List as o
   ...>
   ...>       )
   ...>       WHERE oldv <> newv
   ...>       AND ProjID = 'PR0000020614'
   ...> ORDER BY InsertDate ASC;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     181   0                    0   Start at 181
1     InitCoroutine  1     152   2                    0   (subquery-4)
2     OpenEphemeral  5     6     0     k(6,B,B,B,B,B,B)  0   nColumn=6
3     OpenRead       3     164394  0     56             0   root=164394 iDb=0; Project_List
4     OpenRead       6     110918  0     k(3,,,)        2   root=110918 iDb=0; PL_ProjID_BL_Start
5     String8        0     2     0     PR0000020614   0   r[2]='PR0000020614'
6     SeekGE         6     72    2     1              0   key=r[2]
7       IdxGT          6     72    2     1              0   key=r[2]
8       DeferredSeek   6     0     3                    0   Move 3 to 6.rowid if needed
9       BeginSubrtn    0     4     0                    0   r[4]=NULL
10        Null           0     5     5                    0   r[5..5]=NULL; Init subquery result
11        Noop           7     3     0                    0
12        Integer        1     6     0                    0   r[6]=1
13        Ne             8     15    7                    67  if r[7]!=r[8] goto 15
14        ZeroOrNull     7     6     8                    0   r[6] = 0 OR NULL
15        MustBeInt      6     0     0                    0   LIMIT counter
16        IfNot          6     32    0                    0
17        OpenRead       4     164394  0     56             0   root=164394 iDb=0; Project_List
18        OpenRead       8     207716  0     k(3,,,)        0   root=207716 iDb=0; PL_ProjID_InsertDate_New
19        String8        0     9     0     PR0000020614   0   r[9]='PR0000020614'
20        IsNull         9     32    0                    0   if r[9]==NULL goto 32
21        Column         3     55    10                   0   r[10]=Project_List.InsertDate
22        IsNull         10    32    0                    0   if r[10]==NULL goto 32
23        SeekLT         8     32    9     2              0   key=r[9..10]
24        Null           0     10    0                    0   r[10]=NULL
25          IdxLE          8     32    9     2              0   key=r[9..10]
26          DeferredSeek   8     0     4                    0   Move 4 to 8.rowid if needed
27          Column         4     7     5                    0   r[5]=Project_List.Finish_Date
28          NotNull        5     30    0                    0   if r[5]!=NULL goto 30
29          String8        0     5     0                    0   r[5]=''
30          DecrJumpZero   6     32    0                    0   if (--r[6])==0 goto 32
31        Prev           8     25    0                    0
32      Return         4     10    1                    0
33      Column         3     7     3                    0   r[3]=Project_List.Finish_Date
34      NotNull        3     36    0                    0   if r[3]!=NULL goto 36
35      String8        0     3     0                    0   r[3]=''
36      Eq             3     71    5     BINARY-8       80  if r[5]==r[3] goto 71
37      Column         6     0     11                   0   r[11]=Project_List.ProjID
38      Column         3     25    12                   0   r[12]=Project_List.Updated_By
39      Column         3     55    13                   0   r[13]=Project_List.InsertDate
40      String8        0     14    0     Finish_Date    0   r[14]='Finish_Date'
41      BeginSubrtn    0     17    0                    0   r[17]=NULL
42        Null           0     18    18                   0   r[18..18]=NULL; Init subquery result
43        Noop           9     3     0                    0
44        Integer        1     19    0                    0   r[19]=1
45        Ne             8     47    7                    67  if r[7]!=r[8] goto 47
46        ZeroOrNull     7     19    8                    0   r[19] = 0 OR NULL
47        MustBeInt      19    0     0                    0   LIMIT counter
48        IfNot          19    64    0                    0
49        OpenRead       4     164394  0     56             0   root=164394 iDb=0; Project_List
50        OpenRead       10    207716  0     k(3,,,)        0   root=207716 iDb=0; PL_ProjID_InsertDate_New
51        Column         6     0     20                   0   r[20]=Project_List.ProjID
52        IsNull         20    64    0                    0   if r[20]==NULL goto 64
53        Column         3     55    21                   0   r[21]=Project_List.InsertDate
54        IsNull         21    64    0                    0   if r[21]==NULL goto 64
55        SeekLT         10    64    20    2              0   key=r[20..21]
56        Null           0     21    0                    0   r[21]=NULL
57          IdxLE          10    64    20    2              0   key=r[20..21]
58          DeferredSeek   10    0     4                    0   Move 4 to 10.rowid if needed
59          Column         4     7     18                   0   r[18]=Project_List.Finish_Date
60          NotNull        18    62    0                    0   if r[18]!=NULL goto 62
61          String8        0     18    0                    0   r[18]=''
62          DecrJumpZero   19    64    0                    0   if (--r[19])==0 goto 64
63        Prev           10    57    0                    0
64      Return         17    42    1                    0
65      SCopy          18    15    0                    0   r[15]=r[18]
66      Column         3     7     16                   0   r[16]=Project_List.Finish_Date
67      NotNull        16    69    0                    0   if r[16]!=NULL goto 69
68      String8        0     16    0                    0   r[16]=''
69      MakeRecord     11    6     22                   0   r[22]=mkrec(r[11..16])
70      IdxInsert      5     22    11    6              0   key=r[22]
71    Next           6     7     0                    0
72    OpenRead       1     164394  0     56             0   root=164394 iDb=0; Project_List
73    OpenRead       11    110918  0     k(3,,,)        2   root=110918 iDb=0; PL_ProjID_BL_Start
74    String8        0     23    0     PR0000020614   0   r[23]='PR0000020614'
75    SeekGE         11    141   23    1              0   key=r[23]
76      IdxGT          11    141   23    1              0   key=r[23]
77      DeferredSeek   11    0     1                    0   Move 1 to 11.rowid if needed
78      BeginSubrtn    0     24    0                    0   r[24]=NULL
79        Null           0     25    25                   0   r[25..25]=NULL; Init subquery result
80        Noop           12    3     0                    0
81        Integer        1     26    0                    0   r[26]=1
82        Ne             8     84    7                    67  if r[7]!=r[8] goto 84
83        ZeroOrNull     7     26    8                    0   r[26] = 0 OR NULL
84        MustBeInt      26    0     0                    0   LIMIT counter
85        IfNot          26    101   0                    0
86        OpenRead       2     164394  0     56             0   root=164394 iDb=0; Project_List
87        OpenRead       13    207716  0     k(3,,,)        0   root=207716 iDb=0; PL_ProjID_InsertDate_New
88        String8        0     27    0     PR0000020614   0   r[27]='PR0000020614'
89        IsNull         27    101   0                    0   if r[27]==NULL goto 101
90        Column         1     55    28                   0   r[28]=Project_List.InsertDate
91        IsNull         28    101   0                    0   if r[28]==NULL goto 101
92        SeekLT         13    101   27    2              0   key=r[27..28]
93        Null           0     28    0                    0   r[28]=NULL
94          IdxLE          13    101   27    2              0   key=r[27..28]
95          DeferredSeek   13    0     2                    0   Move 2 to 13.rowid if needed
96          Column         2     35    32                   0   r[32]=Project_List.Ann_CapexP
97          Function       0     32    29    round(1)       0   r[29]=func(r[32])
98          Function       6     29    25    replace(3)     0   r[25]=func(r[29..31])
99          DecrJumpZero   26    101   0                    0   if (--r[26])==0 goto 101
100       Prev           13    94    0                    0
101     Return         24    79    1                    0
102     Column         1     35    36                   0   r[36]=Project_List.Ann_CapexP
103     Function       0     36    33    round(1)       0   r[33]=func(r[36])
104     Function       6     33    22    replace(3)     0   r[22]=func(r[33..35])
105     Eq             22    140   25    BINARY-8       80  if r[25]==r[22] goto 140
106     Column         11    0     11                   0   r[11]=Project_List.ProjID
107     Column         1     25    12                   0   r[12]=Project_List.Updated_By
108     Column         1     55    13                   0   r[13]=Project_List.InsertDate
109     String8        0     14    0     Ann_CapexP     0   r[14]='Ann_CapexP'
110     BeginSubrtn    0     37    0                    0   r[37]=NULL
111       Null           0     38    38                   0   r[38..38]=NULL; Init subquery result
112       Noop           14    3     0                    0
113       Integer        1     39    0                    0   r[39]=1
114       Ne             8     116   7                    67  if r[7]!=r[8] goto 116
115       ZeroOrNull     7     39    8                    0   r[39] = 0 OR NULL
116       MustBeInt      39    0     0                    0   LIMIT counter
117       IfNot          39    133   0                    0
118       OpenRead       2     164394  0     56             0   root=164394 iDb=0; Project_List
119       OpenRead       15    207716  0     k(3,,,)        0   root=207716 iDb=0; PL_ProjID_InsertDate_New
120       Column         11    0     40                   0   r[40]=Project_List.ProjID
121       IsNull         40    133   0                    0   if r[40]==NULL goto 133
122       Column         1     55    41                   0   r[41]=Project_List.InsertDate
123       IsNull         41    133   0                    0   if r[41]==NULL goto 133
124       SeekLT         15    133   40    2              0   key=r[40..41]
125       Null           0     41    0                    0   r[41]=NULL
126         IdxLE          15    133   40    2              0   key=r[40..41]
127         DeferredSeek   15    0     2                    0   Move 2 to 15.rowid if needed
128         Column         2     35    22                   0   r[22]=Project_List.Ann_CapexP
129         Function       0     22    42    round(1)       0   r[42]=func(r[22])
130         Function       6     42    38    replace(3)     0   r[38]=func(r[42..44])
131         DecrJumpZero   39    133   0                    0   if (--r[39])==0 goto 133
132       Prev           15    126   0                    0
133     Return         37    111   1                    0
134     SCopy          38    15    0                    0   r[15]=r[38]
135     Column         1     35    48                   0   r[48]=Project_List.Ann_CapexP
136     Function       0     48    45    round(1)       0   r[45]=func(r[48])
137     Function       6     45    16    replace(3)     0   r[16]=func(r[45..47])
138     MakeRecord     11    6     48                   0   r[48]=mkrec(r[11..16])
139     IdxInsert      5     48    11    6              0   key=r[48]
140   Next           11    76    0                    0
141   Rewind         5     150   0                    0
142     Column         5     0     49                   0   r[49]=ProjID
143     Column         5     1     50                   0   r[50]=Updated_By
144     Column         5     2     51                   0   r[51]=InsertDate
145     Column         5     3     52                   0   r[52]=var
146     Column         5     4     53                   0   r[53]=oldv
147     Column         5     5     54                   0   r[54]=newv
148     Yield          1     0     0                    0
149   Next           5     142   0                    0
150   Close          5     0     0                    0
151   EndCoroutine   1     0     0                    0
152   SorterOpen     16    8     0     k(1,B)         0
153   InitCoroutine  1     0     2                    0
154     Yield          1     169   0                    0   next row of (subquery-4)
155     Copy           53    55    0                    2   r[55]=r[53]; (subquery-4).oldv
156     Copy           54    56    0                    2   r[56]=r[54]; (subquery-4).newv
157     Eq             56    168   55    BINARY-8       80  if r[55]==r[56] goto 168
158     Copy           49    56    0                    2   r[56]=r[49]; (subquery-4).ProjID
159     Ne             57    168   56    BINARY-8       81  if r[56]!=r[57] goto 168
160     Copy           49    59    0                    2   r[59]=r[49]; (subquery-4).ProjID
161     Copy           50    60    0                    2   r[60]=r[50]; (subquery-4).Updated_By
162     Copy           52    61    0                    2   r[61]=r[52]; (subquery-4).var
163     Copy           53    62    0                    2   r[62]=r[53]; (subquery-4).oldv
164     Copy           54    63    0                    2   r[63]=r[54]; (subquery-4).newv
165     Copy           51    58    0                    2   r[58]=r[51]; (subquery-4).InsertDate
166     MakeRecord     58    6     65                   0   r[65]=mkrec(r[58..63])
167     SorterInsert   16    65    58    6              0   key=r[65]
168   Goto           0     154   0                    0
169   OpenPseudo     17    66    8                    0   8 columns in r[66]
170   SorterSort     16    180   0                    0
171     SorterData     16    66    17                   0   r[66]=data
172     Column         17    5     64                   0   r[64]=newv
173     Column         17    4     63                   0   r[63]=oldv
174     Column         17    3     62                   0   r[62]=var
175     Column         17    0     61                   0   r[61]=InsertDate
176     Column         17    2     60                   0   r[60]=Updated_By
177     Column         17    1     59                   0   r[59]=ProjID
178     ResultRow      59    6     0                    0   output=r[59..64]
179   SorterNext     16    171   0                    0
180   Halt           0     0     0                    0
181   Transaction    0     0     1155  0              1   usesStmtJournal=0
182   Integer        1     7     0                    0   r[7]=1
183   Integer        0     8     0                    0   r[8]=0
184   String8        0     30    0     .0             0   r[30]='.0'
185   String8        0     31    0                    0   r[31]=''
186   String8        0     34    0     .0             0   r[34]='.0'
187   String8        0     35    0                    0   r[35]=''
188   String8        0     43    0     .0             0   r[43]='.0'
189   String8        0     44    0                    0   r[44]=''
190   String8        0     46    0     .0             0   r[46]='.0'
191   String8        0     47    0                    0   r[47]=''
192   String8        0     57    0     PR0000020614   0   r[57]='PR0000020614'
193   Goto           0     1     0                    0
Run Time: real 0.316 user 0.031250 sys 0.078125

Explain with SQLite version 3.40.0 2022-11-16 12:10:08

sqlite> explain
   ...>         SELECT ProjID,
   ...>                Updated_By,
   ...>                InsertDate,
   ...>                var,
   ...>                oldv,
   ...>                newv
   ...>         FROM
   ...>         (
   ...>
   ...>             SELECT ProjID,
   ...>                    Updated_By,
   ...>                    InsertDate,
   ...>                    'Finish_Date' as var,
   ...>                    (
   ...>                       SELECT
   ...>                         coalesce(Finish_Date,'') FROM Project_List
   ...>                       WHERE ProjID = o.ProjID
   ...>                       AND InsertDate < o.InsertDate
   ...>                       ORDER BY InsertDate DESC
   ...>                       LIMIT 1
   ...>                     ) AS oldv,
   ...>                    coalesce(Finish_Date,'') as newv
   ...>                    FROM Project_List as o
   ...>             UNION
   ...>
   ...>               SELECT ProjID,
   ...>                      Updated_By,
   ...>                      InsertDate,
   ...>                      'Ann_CapexP' as var,
   ...>                      (
   ...>                         SELECT
   ...>                           replace(round(Ann_CapexP),'.0','') FROM Project_List WHERE
   ...>                         ProjID = o.ProjID
   ...>                         AND InsertDate < o.InsertDate
   ...>                         ORDER BY InsertDate DESC
   ...>                         LIMIT 1
   ...>                       ) AS oldv,
   ...>                       replace(round(Ann_CapexP),'.0','') as newv
   ...>                      FROM Project_List as o
   ...>
   ...>       )
   ...>       WHERE oldv <> newv
   ...>       AND ProjID = 'PR0000020614'
   ...> ORDER BY InsertDate ASC;
addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     117   0                    0   Start at 117
1     InitCoroutine  1     88    2                    0   (subquery-4)
2     OpenEphemeral  5     6     0     k(6,B,B,B,B,B,B)  0   nColumn=6
3     OpenRead       3     164394  0     56             0   root=164394 iDb=0; Project_List
4     Rewind         3     40    0                    0
5       Column         3     0     2                    0   r[2]= cursor 3 column 0
6       Column         3     25    3                    0   r[3]= cursor 3 column 25
7       Column         3     55    4                    0   r[4]= cursor 3 column 55
8       String8        0     5     0     Finish_Date    0   r[5]='Finish_Date'
9       BeginSubrtn    0     8     0                    0   r[8]=NULL
10        Null           0     9     9                    0   r[9..9]=NULL; Init subquery result
11        Noop           6     3     0                    0
12        Integer        1     10    0                    0   r[10]=1
13        Ne             12    15    11                   67  if r[11]!=r[12] goto 15
14        ZeroOrNull     11    10    12                   0   r[10] = 0 OR NULL
15        MustBeInt      10    0     0                    0   LIMIT counter
16        IfNot          10    32    0                    0
17        OpenRead       4     164394  0     56             0   root=164394 iDb=0; Project_List
18        OpenRead       7     207716  0     k(3,,,)        0   root=207716 iDb=0; PL_ProjID_InsertDate_New
19        Column         3     0     13                   0   r[13]= cursor 3 column 0
20        IsNull         13    32    0                    0   if r[13]==NULL goto 32
21        Column         3     55    14                   0   r[14]= cursor 3 column 55
22        IsNull         14    32    0                    0   if r[14]==NULL goto 32
23        SeekLT         7     32    13    2              0   key=r[13..14]
24        Null           0     14    0                    0   r[14]=NULL
25          IdxLE          7     32    13    2              0   key=r[13..14]
26          DeferredSeek   7     0     4                    0   Move 4 to 7.rowid if needed
27          Column         4     7     9                    0   r[9]= cursor 4 column 7
28          NotNull        9     30    0                    0   if r[9]!=NULL goto 30
29          String8        0     9     0                    0   r[9]=''
30          DecrJumpZero   10    32    0                    0   if (--r[10])==0 goto 32
31        Prev           7     25    0                    0
32      Return         8     10    1                    0
33      SCopy          9     6     0                    0   r[6]=r[9]
34      Column         3     7     7                    0   r[7]= cursor 3 column 7
35      NotNull        7     37    0                    0   if r[7]!=NULL goto 37
36      String8        0     7     0                    0   r[7]=''
37      MakeRecord     2     6     15                   0   r[15]=mkrec(r[2..7])
38      IdxInsert      5     15    2     6              0   key=r[15]
39    Next           3     5     0                    1
40    OpenRead       1     164394  0     56             0   root=164394 iDb=0; Project_List
41    Rewind         1     77    0                    0
42      Column         1     0     2                    0   r[2]= cursor 1 column 0
43      Column         1     25    3                    0   r[3]= cursor 1 column 25
44      Column         1     55    4                    0   r[4]= cursor 1 column 55
45      String8        0     5     0     Ann_CapexP     0   r[5]='Ann_CapexP'
46      BeginSubrtn    0     16    0                    0   r[16]=NULL
47        Null           0     17    17                   0   r[17..17]=NULL; Init subquery result
48        Noop           8     3     0                    0
49        Integer        1     18    0                    0   r[18]=1
50        Ne             12    52    11                   67  if r[11]!=r[12] goto 52
51        ZeroOrNull     11    18    12                   0   r[18] = 0 OR NULL
52        MustBeInt      18    0     0                    0   LIMIT counter
53        IfNot          18    69    0                    0
54        OpenRead       2     164394  0     56             0   root=164394 iDb=0; Project_List
55        OpenRead       9     207716  0     k(3,,,)        0   root=207716 iDb=0; PL_ProjID_InsertDate_New
56        Column         1     0     19                   0   r[19]= cursor 1 column 0
57        IsNull         19    69    0                    0   if r[19]==NULL goto 69
58        Column         1     55    20                   0   r[20]= cursor 1 column 55
59        IsNull         20    69    0                    0   if r[20]==NULL goto 69
60        SeekLT         9     69    19    2              0   key=r[19..20]
61        Null           0     20    0                    0   r[20]=NULL
62          IdxLE          9     69    19    2              0   key=r[19..20]
63          DeferredSeek   9     0     2                    0   Move 2 to 9.rowid if needed
64          Column         2     35    15                   0   r[15]= cursor 2 column 35
65          Function       0     15    21    round(1)       0   r[21]=func(r[15])
66          Function       6     21    17    replace(3)     0   r[17]=func(r[21..23])
67          DecrJumpZero   18    69    0                    0   if (--r[18])==0 goto 69
68        Prev           9     62    0                    0
69      Return         16    47    1                    0
70      SCopy          17    6     0                    0   r[6]=r[17]
71      Column         1     35    27                   0   r[27]= cursor 1 column 35
72      Function       0     27    24    round(1)       0   r[24]=func(r[27])
73      Function       6     24    7     replace(3)     0   r[7]=func(r[24..26])
74      MakeRecord     2     6     27                   0   r[27]=mkrec(r[2..7])
75      IdxInsert      5     27    2     6              0   key=r[27]
76    Next           1     42    0                    1
77    Rewind         5     86    0                    0
78      Column         5     0     28                   0   r[28]=ProjID
79      Column         5     1     29                   0   r[29]=Updated_By
80      Column         5     2     30                   0   r[30]=InsertDate
81      Column         5     3     31                   0   r[31]=var
82      Column         5     4     32                   0   r[32]=oldv
83      Column         5     5     33                   0   r[33]=newv
84      Yield          1     0     0                    0
85    Next           5     78    0                    0
86    Close          5     0     0                    0
87    EndCoroutine   1     0     0                    0
88    SorterOpen     10    8     0     k(1,B)         0
89    InitCoroutine  1     0     2                    0
90      Yield          1     105   0                    0   next row of (subquery-4)
91      Copy           32    34    0                    2   r[34]=r[32]
92      Copy           33    35    0                    2   r[35]=r[33]
93      Eq             35    104   34    BINARY-8       80  if r[34]==r[35] goto 104
94      Copy           28    35    0                    2   r[35]=r[28]
95      Ne             36    104   35    BINARY-8       81  if r[35]!=r[36] goto 104
96      Copy           28    38    0                    2   r[38]=r[28]
97      Copy           29    39    0                    2   r[39]=r[29]
98      Copy           31    40    0                    2   r[40]=r[31]
99      Copy           32    41    0                    2   r[41]=r[32]
100     Copy           33    42    0                    2   r[42]=r[33]
101     Copy           30    37    0                    2   r[37]=r[30]
102     MakeRecord     37    6     44                   0   r[44]=mkrec(r[37..42])
103     SorterInsert   10    44    37    6              0   key=r[44]
104   Goto           0     90    0                    0
105   OpenPseudo     11    45    8                    0   8 columns in r[45]
106   SorterSort     10    116   0                    0
107     SorterData     10    45    11                   0   r[45]=data
108     Column         11    5     43                   0   r[43]=newv
109     Column         11    4     42                   0   r[42]=oldv
110     Column         11    3     41                   0   r[41]=var
111     Column         11    0     40                   0   r[40]=InsertDate
112     Column         11    2     39                   0   r[39]=Updated_By
113     Column         11    1     38                   0   r[38]=ProjID
114     ResultRow      38    6     0                    0   output=r[38..43]
115   SorterNext     10    107   0                    0
116   Halt           0     0     0                    0
117   Transaction    0     0     1155  0              1   usesStmtJournal=0
118   Integer        1     11    0                    0   r[11]=1
119   Integer        0     12    0                    0   r[12]=0
120   String8        0     22    0     .0             0   r[22]='.0'
121   String8        0     23    0                    0   r[23]=''
122   String8        0     25    0     .0             0   r[25]='.0'
123   String8        0     26    0                    0   r[26]=''
124   String8        0     36    0     PR0000020614   0   r[36]='PR0000020614'
125   Goto           0     1     0                    0
Run Time: real 0.197 user 0.031250 sys 0.078125

Thanks for the support.

(4) By jose isaias cabrera (jicman) on 2022-11-23 03:02:18 in reply to 2 [link] [source]

Explain query plan with SQLite version 3.39.4 2022-09-29 15:55:41

sqlite> explain query plan
   ...>         SELECT ProjID,
   ...>                Updated_By,
   ...>                InsertDate,
   ...>                var,
   ...>                oldv,
   ...>                newv
   ...>         FROM
   ...>         (
   ...>
   ...>             SELECT ProjID,
   ...>                    Updated_By,
   ...>                    InsertDate,
   ...>                    'Finish_Date' as var,
   ...>                    (
   ...>                       SELECT
   ...>                         coalesce(Finish_Date,'') FROM Project_List
   ...>                       WHERE ProjID = o.ProjID
   ...>                       AND InsertDate < o.InsertDate
   ...>                       ORDER BY InsertDate DESC
   ...>                       LIMIT 1
   ...>                     ) AS oldv,
   ...>                    coalesce(Finish_Date,'') as newv
   ...>                    FROM Project_List as o
   ...>             UNION
   ...>
   ...>               SELECT ProjID,
   ...>                      Updated_By,
   ...>                      InsertDate,
   ...>                      'Ann_CapexP' as var,
   ...>                      (
   ...>                         SELECT
   ...>                           replace(round(Ann_CapexP),'.0','') FROM Project_List WHERE
   ...>                         ProjID = o.ProjID
   ...>                         AND InsertDate < o.InsertDate
   ...>                         ORDER BY InsertDate DESC
   ...>                         LIMIT 1
   ...>                       ) AS oldv,
   ...>                       replace(round(Ann_CapexP),'.0','') as newv
   ...>                      FROM Project_List as o
   ...>
   ...>       )
   ...>       WHERE oldv <> newv
   ...>       AND ProjID = 'PR0000020614'
   ...> ORDER BY InsertDate ASC;
QUERY PLAN
|--CO-ROUTINE (subquery-4)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  |--SEARCH o USING INDEX PL_ProjID_BL_Start (ProjID=?)
|     |  |--CORRELATED SCALAR SUBQUERY 1
|     |  |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|     |  `--CORRELATED SCALAR SUBQUERY 1
|     |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|     `--UNION USING TEMP B-TREE
|        |--SEARCH o USING INDEX PL_ProjID_BL_Start (ProjID=?)
|        |--CORRELATED SCALAR SUBQUERY 3
|        |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|        `--CORRELATED SCALAR SUBQUERY 3
|           `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|--SCAN (subquery-4)
`--USE TEMP B-TREE FOR ORDER BY
Run Time: real 0.012 user 0.000000 sys 0.000000
sqlite>

Explain query plan with SQLite version 3.40.0 2022-11-16 12:10:08

sqlite> explain query plan
   ...>         SELECT ProjID,
   ...>                Updated_By,
   ...>                InsertDate,
   ...>                var,
   ...>                oldv,
   ...>                newv
   ...>         FROM
   ...>         (
   ...>
   ...>             SELECT ProjID,
   ...>                    Updated_By,
   ...>                    InsertDate,
   ...>                    'Finish_Date' as var,
   ...>                    (
   ...>                       SELECT
   ...>                         coalesce(Finish_Date,'') FROM Project_List
   ...>                       WHERE ProjID = o.ProjID
   ...>                       AND InsertDate < o.InsertDate
   ...>                       ORDER BY InsertDate DESC
   ...>                       LIMIT 1
   ...>                     ) AS oldv,
   ...>                    coalesce(Finish_Date,'') as newv
   ...>                    FROM Project_List as o
   ...>             UNION
   ...>
   ...>               SELECT ProjID,
   ...>                      Updated_By,
   ...>                      InsertDate,
   ...>                      'Ann_CapexP' as var,
   ...>                      (
   ...>                         SELECT
   ...>                           replace(round(Ann_CapexP),'.0','') FROM Project_List WHERE
   ...>                         ProjID = o.ProjID
   ...>                         AND InsertDate < o.InsertDate
   ...>                         ORDER BY InsertDate DESC
   ...>                         LIMIT 1
   ...>                       ) AS oldv,
   ...>                       replace(round(Ann_CapexP),'.0','') as newv
   ...>                      FROM Project_List as o
   ...>
   ...>       )
   ...>       WHERE oldv <> newv
   ...>       AND ProjID = 'PR0000020614'
   ...> ORDER BY InsertDate ASC;
QUERY PLAN
|--CO-ROUTINE (subquery-4)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  |--SCAN o
|     |  `--CORRELATED SCALAR SUBQUERY 1
|     |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|     `--UNION USING TEMP B-TREE
|        |--SCAN o
|        `--CORRELATED SCALAR SUBQUERY 3
|           `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|--SCAN (subquery-4)
`--USE TEMP B-TREE FOR ORDER BY
Run Time: real 0.009 user 0.000000 sys 0.000000
sqlite>

Thanks for the support.

(5) By anonymous on 2022-11-23 03:54:24 in reply to 4 [link] [source]

Not an expert, yet obviously, 3.39.4 is using indexes (SEARCH o USING INDEX PL_ProjID_BL_Start (ProjID=?)) that 3.40 is not (scan o).

Assume you have read https://www.sqlite.org/queryplanner-ng.html#howtofix

IMO:

You might try inserting INDEXED BY clauses on table o sub-queries as in https://www.sqlite.org/lang_indexedby.html

As per the first doc, if that works, then figure out how to write the query without the INDEXED BY clauses. In otherwords, using it as a debug tool isn't bad if you use it to find out why the query planner is getting confused (and scanning instead).

(6) By anonymous on 2022-11-23 04:44:31 in reply to 4 [link] [source]

You might also experiment with OFFSET 0 (added to the LIMIT 1) to avoid having the query flattened and see if that helps.

also check out the doc https://www.sqlite.org/optoverview.html , particularly section 11. An OFFSET 0 (even when only in the sub-query) will avoid flattening, rather than LIMIT which must be in both inner an outer queries to avoid flattening).

If adding OFFSET 0 to those LIMIT 1 clauses helps then I suspect the slowdown has to do with query flattening and then the query planner choosing a scan rather than the previous behaviour.

It will be interesting to see what you come up with.

(7) By anonymous on 2022-11-23 05:03:43 in reply to 4 [link] [source]

One last item (as if you didn't have enough to check already!):

refer to https://www.sqlite.org/changes.html

Particularly at query planner changes from the change log item 3-d, and 3-e.

My hunch, presuming analyze was run properly, is that it's within those changes.

It's just a guess though.. so I'm interested what you find.

(9) By jose isaias cabrera (jicman) on 2022-11-23 13:40:35 in reply to 7 [link] [source]

Thanks for all the explanations, but, what I expect from a library or software is that if something is correctly written, syntax-wise, and it was working well in a previous versions, the next version should be very close to the same. I would even expect, as in this situation to lose a few hundreds of a second, because of a fix, but to go from 0.08 seconds to 379.41 seconds, that's a huge difference. I will keep using 3.39.4 until a new version gives me close to what I have now. Thanks for your input.

(10) By Richard Hipp (drh) on 2022-11-23 13:51:54 in reply to 9 [link] [source]

It is unlikely to be "fixed" unless you provide us with a test case. A suitable test case might be either of these:

A database file together with a query that runs significantly slower in 3.40.
An SQL script that first constructs a database from scratch, and then runs a query that is significantly slower in 3.40.

The second option is preferred.

(11) By jose isaias cabrera (jicman) on 2022-11-23 16:05:09 in reply to 10 [link] [source]

I tried creating a script, but, it's not working. Probably because the . Also, there are a lot of records, and I was only trying to use 100 records, and that may not be sufficient to cause the slow down for the query. Option one is the choice that I can provide. Where can I place the DB? It's 827MB. Thanks.

josé

(14) By anonymous on 2022-11-23 18:16:52 in reply to 11 [link] [source]

While I well understand your frustration, following the advice of Dr Hipp (and the other well known forum members) is recommended. That extra effort will not only increase the chance of others putting in effort, but it is one of ways open source can actually work.

For your own purposes, you will want the ability to generate a db from scratch, preferably with test data. If you do not already have that, now (or yesterday) is a great time to set it up.

keith noted the WHERE was not being pushed to the outer select (i presume that was what was previously using an index and a scan in 3.40). You might look at placing one there as a debugging effort.

in any event, Dr Hipp and Keith are the best of the best, so my advice is follow their advice.

(16) By jose isaias cabrera (jicman) on 2022-11-23 19:20:23 in reply to 14 [link] [source]

Thanks for the understanding. :-)

I have been using SQLite since 2006, and I had never had a regression. Actually I've seen queries response faster. But, when I ran this report and it just hang, I thought something was really wrong. :-) I was not frustrated. I was just trying to get to the bottom of it. I even downsized the DB (from 847MB to 243) to provide a sample DB that has the problem to Dr. Hipp and the developers. So, if the developers are still interested, I can provide the DB with the scripts that will show the problem right away.

Right now, I am happy with the UNION ALL change. I had to compile a new version of the front-end program to replace the UNION vs UNION ALL, but that is a piece of cake. :-) Thanks for hanging in there with your support. :-)

josé

(8.1) By Keith Medcalf (kmedcalf) on 2022-11-23 07:50:43 edited from 8.0 in reply to 4 [link] [source]

The later version is not pushing the outer WHERE clause into the branches of the UNION.

(12) By Keith Medcalf (kmedcalf) on 2022-11-23 18:10:15 in reply to 1 [link] [source]

Try this one which pushes the where clause down by manually ...

  SELECT ProjID,
         Updated_By,
         InsertDate,
         var,
         oldv,
         newv
    FROM (
            SELECT ProjID,
                   Updated_By,
                   InsertDate,
                   'Finish_Date' as var,
                   (
                      SELECT coalesce(Finish_Date,'')
                        FROM Project_List
                       WHERE ProjID = o.ProjID
                         AND InsertDate < o.InsertDate
                    ORDER BY InsertDate DESC
                       LIMIT 1
                   ) AS oldv,
                   coalesce(Finish_Date,'') as newv
              FROM Project_List as o
             WHERE ProjID = 'PR0000020614'
          UNION
            SELECT ProjID,
                   Updated_By,
                   InsertDate,
                   'Ann_CapexP' as var,
                   (
                       SELECT replace(round(Ann_CapexP),'.0','')
                         FROM Project_List
                        WHERE ProjID = o.ProjID
                          AND InsertDate < o.InsertDate
                     ORDER BY InsertDate DESC
                        LIMIT 1
                   ) AS oldv,
                   replace(round(Ann_CapexP),'.0','') as newv
              FROM Project_List as o
             WHERE ProjID = 'PR0000020614'
         )
   WHERE oldv <> newv
ORDER BY InsertDate ASC;

You could also push down the WHERE oldv <> newv into each branch as AND oldv <> newv

See what happens.

(13) By Keith Medcalf (kmedcalf) on 2022-11-23 18:15:15 in reply to 12 [link] [source]

Or this:

  SELECT ProjID,
         Updated_By,
         InsertDate,
         var,
         oldv,
         newv
    FROM (
          (SELECT * FROM (
            SELECT ProjID,
                   Updated_By,
                   InsertDate,
                   'Finish_Date' as var,
                   (
                      SELECT coalesce(Finish_Date,'')
                        FROM Project_List
                       WHERE ProjID = o.ProjID
                         AND InsertDate < o.InsertDate
                    ORDER BY InsertDate DESC
                       LIMIT 1
                   ) AS oldv,
                   coalesce(Finish_Date,'') as newv
              FROM Project_List as o
             WHERE ProjID = 'PR0000020614'
                         )
                   WHERE newv <> oldv
          )
          UNION ALL
          (SELECT * FROM (
            SELECT ProjID,
                   Updated_By,
                   InsertDate,
                   'Ann_CapexP' as var,
                   (
                       SELECT replace(round(Ann_CapexP),'.0','')
                         FROM Project_List
                        WHERE ProjID = o.ProjID
                          AND InsertDate < o.InsertDate
                     ORDER BY InsertDate DESC
                        LIMIT 1
                   ) AS oldv,
                   replace(round(Ann_CapexP),'.0','') as newv
              FROM Project_List as o
             WHERE ProjID = 'PR0000020614'
                         )
                   WHERE newv <> oldv
          )
         )
ORDER BY InsertDate ASC;

(17) By anonymous on 2022-11-24 00:40:34 in reply to 13 [link] [source]

Keith, is the reason that the union all (vs union) does the trick in this case due to transient indices not being used on UNION ALL? ( section 2.8 of https://www.sqlite.org/tempfiles.html#transient_indices ).

If it's not too much to explain, could you describe how your UNION ALL version accomplishes the pushing out of the WHERE, and if/why is that solution preferable?

Also, any guidance on when to reach for such a solution would be educational as well.

thanks!

(18) By Keith Medcalf (kmedcalf) on 2022-11-24 01:20:09 in reply to 17 [link] [source]

A UNION B merges A and B returning only DISTINCT rows.
A UNION ALL B merges A and B returning all rows.

UNION uses a B-Tree so that it can tell if the row is a duplicate.
UNION ALL does not.

I don't know why (A UNION B) acts as an optimization barrier while (A UNION ALL B) does not.

Richard may be able to answer that one.

(19) By jose isaias cabrera (jicman) on 2022-11-24 02:53:52 in reply to 18 [link] [source]

This is the explain for the UNION ALL on 3.40.0

addr  opcode         p1    p2    p3    p4             p5  comment
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     173   0                    0   Start at 173
1     InitCoroutine  1     73    2                    0   left SELECT
2     Noop           5     8     0                    0
3     OpenRead       3     164394  0     56             0   root=164394 iDb=0; Project_List
4     OpenRead       6     207716  0     k(3,,,)        2   root=207716 iDb=0; PL_ProjID_InsertDate_New
5     String8        0     5     0     PR0000020614   0   r[5]='PR0000020614'
6     SeekGE         6     72    5     1              0   key=r[5]
7       IdxGT          6     72    5     1              0   key=r[5]
8       DeferredSeek   6     0     3                    0   Move 3 to 6.rowid if needed
9       BeginSubrtn    0     7     0                    0   r[7]=NULL
10        Null           0     8     8                    0   r[8..8]=NULL; Init subquery result
11        Noop           7     3     0                    0
12        Integer        1     9     0                    0   r[9]=1
13        Ne             11    15    10                   67  if r[10]!=r[11] goto 15
14        ZeroOrNull     10    9     11                   0   r[9] = 0 OR NULL
15        MustBeInt      9     0     0                    0   LIMIT counter
16        IfNot          9     32    0                    0
17        OpenRead       4     164394  0     56             0   root=164394 iDb=0; Project_List
18        OpenRead       8     207716  0     k(3,,,)        0   root=207716 iDb=0; PL_ProjID_InsertDate_New
19        String8        0     12    0     PR0000020614   0   r[12]='PR0000020614'
20        IsNull         12    32    0                    0   if r[12]==NULL goto 32
21        Column         6     1     13                   0   r[13]= cursor 6 column 1
22        IsNull         13    32    0                    0   if r[13]==NULL goto 32
23        SeekLT         8     32    12    2              0   key=r[12..13]
24        Null           0     13    0                    0   r[13]=NULL
25          IdxLE          8     32    12    2              0   key=r[12..13]
26          DeferredSeek   8     0     4                    0   Move 4 to 8.rowid if needed
27          Column         4     7     8                    0   r[8]= cursor 4 column 7
28          NotNull        8     30    0                    0   if r[8]!=NULL goto 30
29          String8        0     8     0                    0   r[8]=''
30          DecrJumpZero   9     32    0                    0   if (--r[9])==0 goto 32
31        Prev           8     25    0                    0
32      Return         7     10    1                    0
33      Column         3     7     6                    0   r[6]= cursor 3 column 7
34      NotNull        6     36    0                    0   if r[6]!=NULL goto 36
35      String8        0     6     0                    0   r[6]=''
36      Eq             6     71    8     BINARY-8       80  if r[8]==r[6] goto 71
37      Column         6     0     14                   0   r[14]= cursor 6 column 0
38      Column         3     25    15                   0   r[15]= cursor 3 column 25
39      Column         6     1     16                   0   r[16]= cursor 6 column 1
40      String8        0     17    0     Finish_Date    0   r[17]='Finish_Date'
41      BeginSubrtn    0     20    0                    0   r[20]=NULL
42        Null           0     21    21                   0   r[21..21]=NULL; Init subquery result
43        Noop           9     3     0                    0
44        Integer        1     22    0                    0   r[22]=1
45        Ne             11    47    10                   67  if r[10]!=r[11] goto 47
46        ZeroOrNull     10    22    11                   0   r[22] = 0 OR NULL
47        MustBeInt      22    0     0                    0   LIMIT counter
48        IfNot          22    64    0                    0
49        OpenRead       4     164394  0     56             0   root=164394 iDb=0; Project_List
50        OpenRead       10    207716  0     k(3,,,)        0   root=207716 iDb=0; PL_ProjID_InsertDate_New
51        Column         6     0     23                   0   r[23]= cursor 6 column 0
52        IsNull         23    64    0                    0   if r[23]==NULL goto 64
53        Column         6     1     24                   0   r[24]= cursor 6 column 1
54        IsNull         24    64    0                    0   if r[24]==NULL goto 64
55        SeekLT         10    64    23    2              0   key=r[23..24]
56        Null           0     24    0                    0   r[24]=NULL
57          IdxLE          10    64    23    2              0   key=r[23..24]
58          DeferredSeek   10    0     4                    0   Move 4 to 10.rowid if needed
59          Column         4     7     21                   0   r[21]= cursor 4 column 7
60          NotNull        21    62    0                    0   if r[21]!=NULL goto 62
61          String8        0     21    0                    0   r[21]=''
62          DecrJumpZero   22    64    0                    0   if (--r[22])==0 goto 64
63        Prev           10    57    0                    0
64      Return         20    42    1                    0
65      Copy           21    18    0                    0   r[18]=r[21]
66      Column         3     7     19                   0   r[19]= cursor 3 column 7
67      NotNull        19    69    0                    0   if r[19]!=NULL goto 69
68      String8        0     19    0                    0   r[19]=''
69      ClrSubtype     19    0     0                    0   r[19].subtype = 0
70      Yield          1     0     0                    0
71    Next           6     7     0                    0
72    EndCoroutine   1     0     0                    0
73    InitCoroutine  2     167   74                   0   right SELECT
74    Noop           11    8     0                    0
75    OpenRead       1     164394  0     56             0   root=164394 iDb=0; Project_List
76    OpenRead       12    207716  0     k(3,,,)        2   root=207716 iDb=0; PL_ProjID_InsertDate_New
77    String8        0     25    0     PR0000020614   0   r[25]='PR0000020614'
78    SeekGE         12    144   25    1              0   key=r[25]
79      IdxGT          12    144   25    1              0   key=r[25]
80      DeferredSeek   12    0     1                    0   Move 1 to 12.rowid if needed
81      BeginSubrtn    0     27    0                    0   r[27]=NULL
82        Null           0     28    28                   0   r[28..28]=NULL; Init subquery result
83        Noop           13    3     0                    0
84        Integer        1     29    0                    0   r[29]=1
85        Ne             11    87    10                   67  if r[10]!=r[11] goto 87
86        ZeroOrNull     10    29    11                   0   r[29] = 0 OR NULL
87        MustBeInt      29    0     0                    0   LIMIT counter
88        IfNot          29    104   0                    0
89        OpenRead       2     164394  0     56             0   root=164394 iDb=0; Project_List
90        OpenRead       14    207716  0     k(3,,,)        0   root=207716 iDb=0; PL_ProjID_InsertDate_New
91        String8        0     30    0     PR0000020614   0   r[30]='PR0000020614'
92        IsNull         30    104   0                    0   if r[30]==NULL goto 104
93        Column         12    1     31                   0   r[31]= cursor 12 column 1
94        IsNull         31    104   0                    0   if r[31]==NULL goto 104
95        SeekLT         14    104   30    2              0   key=r[30..31]
96        Null           0     31    0                    0   r[31]=NULL
97          IdxLE          14    104   30    2              0   key=r[30..31]
98          DeferredSeek   14    0     2                    0   Move 2 to 14.rowid if needed
99          Column         2     35    35                   0   r[35]= cursor 2 column 35
100         Function       0     35    32    round(1)       0   r[32]=func(r[35])
101         Function       6     32    28    replace(3)     0   r[28]=func(r[32..34])
102         DecrJumpZero   29    104   0                    0   if (--r[29])==0 goto 104
103       Prev           14    97    0                    0
104     Return         27    82    1                    0
105     Column         1     35    39                   0   r[39]= cursor 1 column 35
106     Function       0     39    36    round(1)       0   r[36]=func(r[39])
107     Function       6     36    26    replace(3)     0   r[26]=func(r[36..38])
108     Eq             26    143   28    BINARY-8       80  if r[28]==r[26] goto 143
109     Column         12    0     40                   0   r[40]= cursor 12 column 0
110     Column         1     25    41                   0   r[41]= cursor 1 column 25
111     Column         12    1     42                   0   r[42]= cursor 12 column 1
112     String8        0     43    0     Ann_CapexP     0   r[43]='Ann_CapexP'
113     BeginSubrtn    0     46    0                    0   r[46]=NULL
114       Null           0     47    47                   0   r[47..47]=NULL; Init subquery result
115       Noop           15    3     0                    0
116       Integer        1     48    0                    0   r[48]=1
117       Ne             11    119   10                   67  if r[10]!=r[11] goto 119
118       ZeroOrNull     10    48    11                   0   r[48] = 0 OR NULL
119       MustBeInt      48    0     0                    0   LIMIT counter
120       IfNot          48    136   0                    0
121       OpenRead       2     164394  0     56             0   root=164394 iDb=0; Project_List
122       OpenRead       16    207716  0     k(3,,,)        0   root=207716 iDb=0; PL_ProjID_InsertDate_New
123       Column         12    0     49                   0   r[49]= cursor 12 column 0
124       IsNull         49    136   0                    0   if r[49]==NULL goto 136
125       Column         12    1     50                   0   r[50]= cursor 12 column 1
126       IsNull         50    136   0                    0   if r[50]==NULL goto 136
127       SeekLT         16    136   49    2              0   key=r[49..50]
128       Null           0     50    0                    0   r[50]=NULL
129         IdxLE          16    136   49    2              0   key=r[49..50]
130         DeferredSeek   16    0     2                    0   Move 2 to 16.rowid if needed
131         Column         2     35    26                   0   r[26]= cursor 2 column 35
132         Function       0     26    51    round(1)       0   r[51]=func(r[26])
133         Function       6     51    47    replace(3)     0   r[47]=func(r[51..53])
134         DecrJumpZero   48    136   0                    0   if (--r[48])==0 goto 136
135       Prev           16    129   0                    0
136     Return         46    114   1                    0
137     Copy           47    44    0                    0   r[44]=r[47]
138     Column         1     35    57                   0   r[57]= cursor 1 column 35
139     Function       0     57    54    round(1)       0   r[54]=func(r[57])
140     Function       6     54    45    replace(3)     0   r[45]=func(r[54..56])
141     ClrSubtype     45    0     0                    0   r[45].subtype = 0
142     Yield          2     0     0                    0
143   Next           12    79    0                    0
144   EndCoroutine   2     0     0                    0
145   Noop           0     0     0                    0   Output routine for A
146   ResultRow      14    6     0                    0   output=r[14..19]
147   Return         3     0     0                    0
148   Noop           0     0     0                    0   Output routine for B
149   ResultRow      40    6     0                    0   output=r[40..45]
150   Return         4     0     0                    0
151   Noop           0     0     0                    0   eof-A subroutine
152   Gosub          4     149   0                    0
153   Yield          2     172   0                    0
154   Goto           0     152   0                    0
155   Noop           0     0     0                    0   eof-B subroutine
156   Gosub          3     146   0                    0
157   Yield          1     172   0                    0
158   Goto           0     156   0                    0
159   Noop           0     0     0                    0   A-lt-B subroutine
160   Gosub          3     146   0                    0
161   Yield          1     152   0                    0
162   Goto           0     169   0                    0
163   Noop           0     0     0                    0   A-gt-B subroutine
164   Gosub          4     149   0                    0
165   Yield          2     156   0                    0
166   Goto           0     169   0                    0
167   Yield          1     153   0                    0
168   Yield          2     156   0                    0
169   Permutation    0     0     0     [2]            0
170   Compare        14    40    1     k(2,B,)        1   r[14] <-> r[40]
171   Jump           160   160   164                  0
172   Halt           0     0     0                    0
173   Transaction    0     0     1157  0              1   usesStmtJournal=0
174   Integer        1     10    0                    0   r[10]=1
175   Integer        0     11    0                    0   r[11]=0
176   String8        0     33    0     .0             0   r[33]='.0'
177   String8        0     34    0                    0   r[34]=''
178   String8        0     37    0     .0             0   r[37]='.0'
179   String8        0     38    0                    0   r[38]=''
180   String8        0     52    0     .0             0   r[52]='.0'
181   String8        0     53    0                    0   r[53]=''
182   String8        0     55    0     .0             0   r[55]='.0'
183   String8        0     56    0                    0   r[56]=''
184   Goto           0     1     0                    0

This is the explain query plan with UNION ALL on 3.40.0

QUERY PLAN
`--MERGE (UNION ALL)
   |--LEFT
   |  |--SEARCH o USING INDEX PL_ProjID_InsertDate_New (ProjID=?)
   |  |--CORRELATED SCALAR SUBQUERY 1
   |  |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
   |  `--CORRELATED SCALAR SUBQUERY 1
   |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
   `--RIGHT
      |--SEARCH o USING INDEX PL_ProjID_InsertDate_New (ProjID=?)
      |--CORRELATED SCALAR SUBQUERY 3
      |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
      `--CORRELATED SCALAR SUBQUERY 3
         `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)

Perhaps these will say something compared with the previous ones.

(20) By anonymous on 2022-11-24 04:24:17 in reply to 19 [link] [source]

That is interesting (by the way, explain query plan seems more understandable at the moment)!

I took all three of your explain query plan outputs to compare and MERGE (UNION ALL) seems to work like a cross join in disabling some of the flattening. (see 7.1  of https://www.sqlite.org/optoverview.html ).    I am wondering why the OFFSET 0 or WHERE clause on outer select didn't seem to effect it (or perhaps it does and I missed that 'memo'?).

I might tackle generating random entries (via series, and random) to make a test db as I would like to see what is happening here. Could you provide a schema sql so I don't have to create one from scratch (I may discard extra columns to reduce complexity)?  I understand the distributions won't be the same, however it might show me what's going on.

The summary of the three explain query plans:

3.40 union all (fast 3.40 query)
QUERY PLAN
`--MERGE (UNION ALL)
   |--LEFT
   |  |--SEARCH o USING INDEX PL_ProjID_InsertDate_New (ProjID=?)
   |  |--CORRELATED SCALAR SUBQUERY 1
   |  |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
   |  `--CORRELATED SCALAR SUBQUERY 1
   |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
   `--RIGHT
      |--SEARCH o USING INDEX PL_ProjID_InsertDate_New (ProjID=?)
      |--CORRELATED SCALAR SUBQUERY 3
      |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
      `--CORRELATED SCALAR SUBQUERY 3
         `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)

3.39.4 (fast query)
QUERY PLAN
|--CO-ROUTINE (subquery-4)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  |--SEARCH o USING INDEX PL_ProjID_BL_Start (ProjID=?)
|     |  |--CORRELATED SCALAR SUBQUERY 1
|     |  |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|     |  `--CORRELATED SCALAR SUBQUERY 1
|     |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|     `--UNION USING TEMP B-TREE
|        |--SEARCH o USING INDEX PL_ProjID_BL_Start (ProjID=?)
|        |--CORRELATED SCALAR SUBQUERY 3
|        |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|        `--CORRELATED SCALAR SUBQUERY 3
|           `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|--SCAN (subquery-4)
`--USE TEMP B-TREE FOR ORDER BY

3.40 slow query
QUERY PLAN
|--CO-ROUTINE (subquery-4)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  |--SCAN o
|     |  `--CORRELATED SCALAR SUBQUERY 1
|     |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|     `--UNION USING TEMP B-TREE
|        |--SCAN o
|        `--CORRELATED SCALAR SUBQUERY 3
|           `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|--SCAN (subquery-4)
`--USE TEMP B-TREE FOR ORDER BY

My general thinking is it might be worth getting to the bottom of this rather than stopping because it now 'works' as it seems a fragile position.  So if you want to help investigate further, a schema sql create script would save me some time.

thanks.

(30) By jose isaias cabrera (jicman) on 2022-11-25 17:03:31 in reply to 20 [link] [source]

Happy belated Thanksgiving to all. I thank God for your wonderful tool, Dr. Hipp, and I am only doing good things with it. :-)

... Could you provide a schema sql so I don't have to create one from scratch (I may discard extra columns to reduce complexity)? ...

This script below will show the EXPLAIN QUERY PLANs differences/problems. It does not show the slowness of the query because there are over 165K of records on the table where the query is ran against. However, if you replicate this data with a script by just adding the same data with a different ProjID (PR0000000001 to PR0000030000), it to create enough data for some of the slowness of the query response.

CREATE TABLE Project_List
(
    ProjID, Finish_Date, BL_Finish, Updated_By, InsertDate,
    PRIMARY KEY (ProjID, Finish_Date, BL_Finish)
);
CREATE INDEX PL_ProjID_InsertDate_New ON "Project_List" (ProjID, InsertDate);

INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Kathy','2021-08-10_07-26-53');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Pamela','2021-08-17_06-28-38');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Barbara','2021-08-17_11-29-18');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Barbara','2021-08-17_13-46-26');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Pamela','2021-08-20_04-44-04');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Pamela','2021-08-25_07-31-24');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Barbara','2021-09-24_03-55-27');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Miguel','2021-10-15_03-59-51');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Miguel','2022-01-04_05-35-28');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Miguel','2022-02-08_13-04-48');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Externo','2022-02-16_04-59-37');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Jose','2022-02-22_08-20-39');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Pamela','2022-03-14_11-37-33');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Esther','2022-04-08_04-27-49');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Barbara','2022-04-25_05-17-31');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Pamela','2022-04-25_12-01-17');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','','Pamela','2022-05-13_04-05-26');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','','Externo','2022-05-20_04-12-13');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','','Miguel','2022-05-26_04-53-14');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','','Esther','2022-06-13_12-14-45');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','','Jose','2022-06-27_11-56-58');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','','Jose','2022-06-29_04-33-34');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Pamela','2022-06-29_10-09-49');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Barbara','2022-07-05_08-56-08');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Daniel','2022-07-06_07-09-57');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Daniel','2022-07-25_05-32-23');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Jose','2022-08-05_04-50-11');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Barbara','2022-08-10_04-15-47');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Barbara','2022-08-22_10-59-48');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Pamela','2022-08-23_11-48-46');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Externo','2022-08-31_05-47-18');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Jose','2022-09-06_05-56-24');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Esther','2022-09-13_09-41-39');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Barbara','2022-09-16_07-25-06');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Pamela','2022-10-03_12-19-50');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Barbara','2022-10-17_07-02-07');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Barbara','2022-10-31_05-09-06');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-08-31','2023-08-31','Jose','2022-11-03_05-57-56');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','2023-08-31','Donna','2022-11-10_08-57-13');
INSERT INTO Project_List(ProjID,Finish_Date,BL_Finish,Updated_By,InsertDate) VALUES('PR0000020614','2023-09-30','2023-09-30','Barbara','2022-11-16_11-48-08');

And use this query,

        SELECT ProjID,
               Updated_By,
               InsertDate,
               var,
               oldv,
               newv
        FROM
        (
        
            SELECT ProjID,
                   Updated_By,
                   InsertDate,
                   'BL_Finish' as var,
                   (
                      SELECT
                        coalesce(BL_Finish,'') FROM Project_List
                      WHERE ProjID = o.ProjID
                      AND InsertDate < o.InsertDate
                      ORDER BY InsertDate DESC
                      LIMIT 1
                    ) AS oldv,
                   coalesce(BL_Finish,'') as newv
                   FROM Project_List as o
            UNION
        
            SELECT ProjID,
                   Updated_By,
                   InsertDate,
                   'Finish_Date' as var,
                   (
                      SELECT
                        coalesce(Finish_Date,'') FROM Project_List
                      WHERE ProjID = o.ProjID
                      AND InsertDate < o.InsertDate
                      ORDER BY InsertDate DESC
                      LIMIT 1
                    ) AS oldv,
                   coalesce(Finish_Date,'') as newv
                   FROM Project_List as o
            
      )
      WHERE oldv <> newv 
      AND ProjID = 'PR0000020614' 
ORDER BY InsertDate ASC;

Just change the UNION to UNION ALL, and as Keith also suggested, moving the WHEREs into each SELECT, that also works fast. I am using Windows 10, so, I have downloaded the SQLite tools from the download site, and I have unzipped them, and opened a few command prompts and just CD to the parent directory of each tool and run SQLite3 executable from there and can test them. Just an FYI. Thanks all for your support.

(33) By Keith Medcalf (kmedcalf) on 2022-11-25 17:54:09 in reply to 30 [link] [source]

You are missing something from the primary key because running the script as shown generates many primary key constraint violations. Making the whole caboodle the primary key seems to allow the data to be inserted.

(34) By jose isaias cabrera (jicman) on 2022-11-25 17:58:41 in reply to 33 [link] [source]

Yeah, that is correct. The problem is that there are a bunch of other fields that I took off. And the constraints are ok in the original, but with just these 3 fields, they are popping out. I should have taken those out. Sorry.

(43) By anonymous on 2022-11-25 22:13:18 in reply to 30 [link] [source]

The posts locked in moderation were my (lame) guess at your schema. Once I was able to get the query plans to match your reported behavior I tried to reduce them. I was able duplicate your resulting query plans on the versions you posted.

In the meantime Dr Hipp and Keith (who are always miles ahead) were able to pin point an existing test case (the INTERSECT example). I refactored the reduced query into something more closely matching that, but keeping the same query plan.

At this point the reduced query is moot (now that we have your test rigging).

Thus, not sure how useful any of my previous posts are at this point, but admins can feel free to delete any to all of those posts for clarity (or edit).

regards

(44) By jose isaias cabrera (jicman) on 2022-11-26 01:57:18 in reply to 43 [link] [source]

Dr. Hipp fixed the problem that I was having. Thanks.

josé

(46) By anonymous on 2022-11-26 03:45:09 in reply to 44 [link] [source]

Still seeing anomalies when second part of union is a scan constant row (see my reply to Dr Hipp for an example below).

The example I provided to Dr Hipp was further reduction of your query (beyond my initial crack at it), isolating the to one side of the union. This appears to mean that if you had a union that added a row with only constants, that it reverts to the slow query you experienced (in otherwords, works like 3.40).

I could be wrong (and often am), but I think the story isn't quite finished on this (would love to be wrong).

(21) By anonymous on 2022-11-24 05:52:54 in reply to 19 [link] [source]

I worked up a small demo (you can use it on the fiddle).

I can replicate the query plans you saw, both with UNION ALL, and with UNION.  The  INDEXED BY clause also get's good results.  However the OFFSET 0 does not.

This is the script (the creation script is commented out so I can change the query  on the same dataset).  Also, no analyse has been run.. it operates as you descibed out of the box:

---- script

 create table Project_List (
 ProjID int,
 Updated_By text,
 InsertDate TEXT,
 Finish_Date TEXT,
 Ann_CapexP real);

 create index PL_ProjID_InsertDate_New on Project_List (ProjID, InsertDate);

 insert into Project_List SELECT
 value as ProjID,
 hex(randomblob(10)) as Updated_By,
 abs(random()) % (10000000 - 1) + 1 as Ann_CapexP,
 cast (round(julianday('now')) +3 as integer) as InsertDate,
 cast (round(julianday('now')) +5 as integer) as Finish_Date
 FROM generate_series(1,100,1);

 select count(*) from Project_List;

The query with poor performance:

 explain query plan 
SELECT ProjID,
               Updated_By,
               InsertDate,
               var,
               oldv,
               newv
        FROM
        (
        
            SELECT ProjID,
                   Updated_By,
                   InsertDate,
                   'Finish_Date' as var,
                   (
                      SELECT
                        coalesce(Finish_Date,'') FROM Project_List
                      WHERE ProjID = o.ProjID
                      AND InsertDate < o.InsertDate
                      ORDER BY InsertDate DESC
                      LIMIT 1
                    ) AS oldv,
                   coalesce(Finish_Date,'') as newv
                   FROM Project_List  as o -- INDEXED BY PL_ProjID_InsertDate_New
            UNION  ALL
        
              SELECT ProjID,
                     Updated_By,
                     InsertDate,
                     'Ann_CapexP' as var,
                     (
                        SELECT
                          replace(round(Ann_CapexP),'.0','') FROM Project_List WHERE
                        ProjID = o.ProjID
                        AND InsertDate < o.InsertDate
                        ORDER BY InsertDate DESC
                        LIMIT 1
                      ) AS oldv,
                      replace(round(Ann_CapexP),'.0','') as newv
                     FROM Project_List as o -- INDEXED BY PL_ProjID_InsertDate_New
              
      )
      WHERE oldv <> newv 
      AND ProjID = 'PR0000020614' 
ORDER BY InsertDate ASC;


---- output of 3.40 online fiddle with original sql
--------output -----
QUERY PLAN
|--CO-ROUTINE (subquery-4)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  |--SCAN o
|     |  `--CORRELATED SCALAR SUBQUERY 1
|     |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|     `--UNION USING TEMP B-TREE
|        |--SCAN o
|        `--CORRELATED SCALAR SUBQUERY 3
|           `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|--SCAN (subquery-4)
`--USE TEMP B-TREE FOR ORDER BY


----- using union all instead of union output (uncomment -- ALL)
QUERY PLAN
`--MERGE (UNION ALL)
   |--LEFT
   |  |--SEARCH o USING INDEX PL_ProjID_InsertDate_New (ProjID=?)
   |  |--CORRELATED SCALAR SUBQUERY 1
   |  |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
   |  `--CORRELATED SCALAR SUBQUERY 1
   |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
   `--RIGHT
      |--SEARCH o USING INDEX PL_ProjID_InsertDate_New (ProjID=?)
      |--CORRELATED SCALAR SUBQUERY 3
      |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
      `--CORRELATED SCALAR SUBQUERY 3
         `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)


------ using INDEXED BY on two nested sub-queries (uncomment INDEXED BY)
QUERY PLAN
|--CO-ROUTINE (subquery-4)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  |--SCAN o USING INDEX PL_ProjID_InsertDate_New
|     |  `--CORRELATED SCALAR SUBQUERY 1
|     |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|     `--UNION USING TEMP B-TREE
|        |--SCAN o USING INDEX PL_ProjID_InsertDate_New
|        `--CORRELATED SCALAR SUBQUERY 3
|           `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|--SCAN (subquery-4)
`--USE TEMP B-TREE FOR ORDER BY

So there are few ways to get the query to use the index, perhaps the most heavy handed is INDEXED BY because it makes it clear what you trying to do.  There also are query plan performance differences.  My rough guess from random runs is that the UNION ALL is faster than the INDEXED BY but I only used a small number of records.

Also, INDEXED BY seems to more closely match the 3.39.4 query plan (exactly?).

I can distill the above down to a smaller test case if Dr Hipp thinks it's worthwhile to take a look at it.

(22) By anonymous on 2022-11-24 06:39:24 in reply to 19 [link] [source]

Please ignore the section in my post below about INDEXED BY performance as I noticed the query plan in 3.40 only changed to SCAN USING INDEX rather than SEARCH USING INDEX.

I clearly have more work to do on that query so I'll have another look tomorrow evening.

Once(if?) I have that sorted out I will try to take a crack at a minimised schema and query that shows the same behaviour. And then try some performance comparisons (it is too early for that IMO).

What I appear to be able to assert is that the 3.40 query plan of union all version is different than the 3.39.4 original problematic query plan by OP (The MERGE (UNION ALL) rather than co-routine). I will double check that when I pick this up again.

If someone wants to sort it out before I get to it, please feel welcome! :) I used the online fiddle and scripts I posted.

(23) By Richard Hipp (drh) on 2022-11-24 12:35:36 in reply to 18 [link] [source]

See https://sqlite.org/src/artifact/8a709a8e19?ln=5012-5017 and https://sqlite.org/src/info/346a3b12b861ce7b.

The following is a simplification of the script that the dbsqlfuzz fuzzer found that demonstrated the problem:

CREATE TABLE t1(a,b,c COLLATE NOCASE);
INSERT INTO t1 VALUES(1,'a','a');
INSERT INTO t1 VALUES(9.9000000000000003552,'b','B');
INSERT INTO t1 VALUES(NULL,'C','c');
INSERT INTO t1 VALUES('hello','d','D');
INSERT INTO t1 VALUES(X'616263','e','e');

.mode qbox
.echo on
SELECT a,b,CASE c WHEN 943 THEN 967 WHEN 897 THEN 533 ELSE b END FROM t1 INTERSECT SELECT a,b,c FROM t1 WHERE -3.7e+921*11<>b
ORDER BY a,b,c;

SELECT * FROM (SELECT a,b,CASE c WHEN 943 THEN 967 WHEN 897 THEN 533 ELSE b END FROM t1 INTERSECT SELECT a,b,c FROM t1 WHERE -3.7e+921*11<>b
ORDER BY a,b,c) WHERE "a" ISNULL AND "b"='C' AND "CASE c WHEN 943 THEN 967 WHEN 897 THEN 533 ELSE b END"='C'

The second query is a wrapper around the first query that attempts to pick off a single row of the first query. However, it returns no rows. Try it using SQLite version 3.39 or earlier and you will get the wrong answer. Do the same on 3.40 and the correct answer comes out. The fix we put in place for this problem was the 346a3b12b861ce7b patch.

(24.1) By Keith Medcalf (kmedcalf) on 2022-11-24 16:14:34 edited from 24.0 in reply to 23 [link] [source]

Very interesting, except both versions 3.39.0 and 3.38.0 return correct responses.

SQLite version 3.38.0 2022-02-22 18:58:40
Enter ".help" for usage hints.
sqlite> .version
SQLite 3.38.0 2022-02-22 18:58:40 40fa792d359f84c3b9e9d6623743e1a59826274e221df1bde8f47086968a1bab
zlib version 1.2.11
gcc-5.2.0
sqlite> CREATE TABLE t1(a,b,c COLLATE NOCASE);
sqlite> INSERT INTO t1 VALUES(1,'a','a');
sqlite> INSERT INTO t1 VALUES(9.9000000000000003552,'b','B');
sqlite> INSERT INTO t1 VALUES(NULL,'C','c');
sqlite> INSERT INTO t1 VALUES('hello','d','D');
sqlite> INSERT INTO t1 VALUES(X'616263','e','e');
sqlite>
sqlite> .mode qbox
sqlite> .eqp on
sqlite>    SELECT a,b,CASE c WHEN 943 THEN 967 WHEN 897 THEN 533 ELSE b END
   ...>      FROM t1
   ...> INTERSECT
   ...>    SELECT a,b,c
   ...>      FROM t1
   ...>     WHERE -3.7e+921*11<>b
   ...> ORDER BY a,b,c;
QUERY PLAN
`--MERGE (INTERSECT)
   |--LEFT
   |  |--SCAN t1
   |  `--USE TEMP B-TREE FOR ORDER BY
   `--RIGHT
      |--SCAN t1
      `--USE TEMP B-TREE FOR ORDER BY
┌───────────┬─────┬───────────────────────────────────────────────────────┐
│     a     │  b  │ CASE c WHEN 943 THEN 967 WHEN 897 THEN 533 ELSE b END │
├───────────┼─────┼───────────────────────────────────────────────────────┤
│ NULL      │ 'C' │ 'C'                                                   │
│ 1         │ 'a' │ 'a'                                                   │
│ 9.9       │ 'b' │ 'b'                                                   │
│ 'hello'   │ 'd' │ 'd'                                                   │
│ x'616263' │ 'e' │ 'e'                                                   │
└───────────┴─────┴───────────────────────────────────────────────────────┘
sqlite>
sqlite> SELECT *
   ...>   FROM (
   ...>            SELECT a,b,CASE c WHEN 943 THEN 967 WHEN 897 THEN 533 ELSE b END
   ...>              FROM t1
   ...>         INTERSECT
   ...>            SELECT a,b,c
   ...>              FROM t1
   ...>             WHERE -3.7e+921*11<>b
   ...>          ORDER BY a,b,c
   ...>        )
   ...>  WHERE "a" ISNULL
   ...>    AND "b"='C'
   ...>    AND "CASE c WHEN 943 THEN 967 WHEN 897 THEN 533 ELSE b END"='C'
   ...> ;
QUERY PLAN
|--CO-ROUTINE SUBQUERY 2
|  `--MERGE (INTERSECT)
|     |--LEFT
|     |  |--SCAN t1
|     |  `--USE TEMP B-TREE FOR RIGHT PART OF ORDER BY
|     `--RIGHT
|        `--SCAN t1
`--SCAN SUBQUERY 2
┌──────┬─────┬───────────────────────────────────────────────────────┐
│  a   │  b  │ CASE c WHEN 943 THEN 967 WHEN 897 THEN 533 ELSE b END │
├──────┼─────┼───────────────────────────────────────────────────────┤
│ NULL │ 'C' │ 'C'                                                   │
└──────┴─────┴───────────────────────────────────────────────────────┘
sqlite>

Unless, of course, the executables for these versions as located on your website are not actually what they purport to be.

(25.1) By Keith Medcalf (kmedcalf) on 2022-11-24 18:06:44 edited from 25.0 in reply to 23 [link] [source]

Your simplified example is bogus (ie, does not demonstrate any issue). However, the actual testcase does in fact demonstrate an issue in that the specific subquery has a difficulty when a de-collationified expression is pushed down into the subquery.

(26) By Richard Hipp (drh) on 2022-11-24 18:21:57 in reply to 25.1 [link] [source]

The problem was introduced by a different bug-fix that happened earlier in the 3.40 development cycle. So it never appeared in a release. The previous bug-fix was at check-in ed14863dd72e35fa. So the example above only failed for the 69 check-ins over 12 days in mid-October.

I'll investigate and see how long the previous bug was valid for....

(27) By Richard Hipp (drh) on 2022-11-24 18:42:48 in reply to 26 [link] [source]

An alternative demonstration of the problem is this script:

CREATE TABLE t1(a,b,c COLLATE NOCASE);
INSERT INTO t1 VALUES(1,'a','a');
INSERT INTO t1 VALUES(9.9000000000000003552,'b','B');
INSERT INTO t1 VALUES(NULL,'C','c');
INSERT INTO t1 VALUES('hello','d','D');
INSERT INTO t1 VALUES(X'616263','e','e');

.echo on
.mode qbox
SELECT a,b,c FROM t1 INTERSECT SELECT a,b, b FROM t1 WHERE 'eT"3qRkL+oJMJjQ9z0'>=b
ORDER BY a,b,c;

SELECT * FROM (SELECT a,b,c FROM t1 INTERSECT SELECT a,b, b FROM t1 WHERE 'eT"3qRkL+oJMJjQ9z0'>=b
ORDER BY a,b,c) WHERE "a" ISNULL AND "b"='C' AND "c"='c';

The second query should return one row. But it returns zero rows ever since the push-down optimization was added for version 3.8.11 in 2015. Recent enhancements to the dbsqlfuzz fuzzer found this problem and brought it to your attention.

(29) By anonymous on 2022-11-25 06:56:40 in reply to 27 [link] [source]

To show the query plan difference I added:

create index i on t1 (a,b,c);

With the index, from what I checked, the show query plan details the same behaviour of the UNION example across the various sqlite versions and matches closely with the reduced query I posted earlier this evening.

Thus, it appears to me that this is indeed the issue you identified with a push down optimisation being incorrectly applied (as per you post above).

summary: Previously, the push down query optimisation that was in place was incorrect. in 3.40 it fixes correctness, but the queries that depended on that optimization are going to be (much) slower until re-written?

(28) By anonymous on 2022-11-25 06:17:27 in reply to 25.1 [link] [source]

(Apologies ahead of time if this is noise.)

I reduced the original Project_List query to the following:

drop  table if exists t ;
 create table t (a text);
 create index i on t (a);
explain query plan
 SELECT a 
        FROM
        (
        
            SELECT a 
                   FROM t as o 
            UNION
        
              SELECT 1 as a 
       
      )
       WHERE 
        a = 1;

then tested on three versions as follows:


3.34.1:
CO-ROUTINE 2
COMPOUND QUERY
LEFT-MOST SUBQUERY
SEARCH TABLE t AS o USING COVERING INDEX i (a=?)
UNION USING TEMP B-TREE
SCAN CONSTANT ROW

3.39.1:
CO-ROUTINE (subquery-2)
COMPOUND QUERY
LEFT-MOST SUBQUERY
SEARCH o USING COVERING INDEX i (a=?)
UNION USING TEMP B-TREE
SCAN CONSTANT ROW
SCAN (subquery-2)

3.40.0 (online fiddle and a custom 3.40.0 build with QPSG enabled)
QUERY PLAN
|--CO-ROUTINE (subquery-2)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  `--SCAN o
|     `--UNION USING TEMP B-TREE
|        `--SCAN CONSTANT ROW
`--SCAN (subquery-2)

It's clearly NOT pushing the where into the subquery.  What isn't clear is whether it should be.

As Dr Hipp has suggested that it's related to the INTERSECT example , I will look at refactoring the query to INTERSECT and comparing both output, and query plans across the above sqlite versions.

(admins: feel free to delete my partially retracted 'in limbo' post yesterday, the script and query was overly complicated, leading to mistakes and dead end investigations).

(15) By jose isaias cabrera (jicman) on 2022-11-23 19:09:04 in reply to 12 [link] [source]

This change above grabs 125 records in 0.1 seconds using SQLiteVer: 3.40.0 2022-11-16 12:10:08. However, if I use the original one, and replace UNION with UNION ALL, the result is 125 records in 0.09 seconds using the same SQLiteVer: 3.40.0 2022-11-16 12:10:08. So, I will keep the UNION ALL change. Thanks.

(31) By Richard Hipp (drh) on 2022-11-25 17:07:24 in reply to 1 [link] [source]

Can you retry your original query using the latest trunk check-in of SQLite and let me know whether or not the performance issue has been resolved?

(32.1) By jose isaias cabrera (jicman) on 2022-11-25 18:02:02 edited from 32.0 in reply to 31 [link] [source]

Where is sqlite3.c on the trunk? I usually download the snapshots, and in those, the sqlite3.c is right in the top directory. I usually use this command,

i686-w64-mingw32-gcc -shared -static-libgcc sqlite3.c -o sqlite3.dll

But, I can't find the sqlite3.c source.

$ pwd
/home/e608313/b/sqlite/SQLite-adbca344

$ ls -l
total 2.6M
-rw-r--r-- 1 e608313 Domain Users  263 Nov 25 12:05 LICENSE.md
-rw-r--r-- 1 e608313 Domain Users  49K Nov 25 12:36 Makefile
-rw-r--r-- 1 e608313 Domain Users  49K Nov 25 12:05 Makefile.in
-rw-r--r-- 1 e608313 Domain Users 3.4K Nov 25 12:05 Makefile.linux-gcc
-rw-r--r-- 1 e608313 Domain Users  81K Nov 25 12:05 Makefile.msc
-rw-r--r-- 1 e608313 Domain Users  16K Nov 25 12:05 README.md
-rw-r--r-- 1 e608313 Domain Users    7 Nov 25 12:05 VERSION
-rw-r--r-- 1 e608313 Domain Users 276K Nov 25 12:05 aclocal.m4
drwxr-xr-x 1 e608313 Domain Users    0 Nov 25 12:05 art/
drwxr-xr-x 1 e608313 Domain Users    0 Nov 25 12:05 autoconf/
-rw-r--r-- 1 e608313 Domain Users  48K Nov 25 12:05 config.guess
-rw-r--r-- 1 e608313 Domain Users  79K Nov 25 12:36 config.log
-rwxr-xr-x 1 e608313 Domain Users  55K Nov 25 12:36 config.status*
-rw-r--r-- 1 e608313 Domain Users  31K Nov 25 12:05 config.sub
-rwxr-xr-x 1 e608313 Domain Users 402K Nov 25 12:05 configure*
-rw-r--r-- 1 e608313 Domain Users  24K Nov 25 12:05 configure.ac
drwxr-xr-x 1 e608313 Domain Users    0 Nov 25 12:05 contrib/
drwxr-xr-x 1 e608313 Domain Users    0 Nov 25 12:05 doc/
drwxr-xr-x 1 e608313 Domain Users    0 Nov 25 12:05 ext/
-rwxr-xr-x 1 e608313 Domain Users 5.5K Nov 25 12:05 install-sh*
-rwxr-xr-x 1 e608313 Domain Users 255K Nov 25 12:36 libtool*
-rw-r--r-- 1 e608313 Domain Users 240K Nov 25 12:05 ltmain.sh
-rw-r--r-- 1 e608313 Domain Users 1.6K Nov 25 12:05 magic.txt
-rw-r--r-- 1 e608313 Domain Users  37K Nov 25 12:05 main.mk
-rw-r--r-- 1 e608313 Domain Users 160K Nov 25 12:05 manifest
-rw-r--r-- 1 e608313 Domain Users   65 Nov 25 12:05 manifest.uuid
-rw-r--r-- 1 e608313 Domain Users  937 Nov 25 12:05 mkso.sh
drwxr-xr-x 1 e608313 Domain Users    0 Nov 25 12:05 mptest/
-rw-r--r-- 1 e608313 Domain Users 1.8K Nov 25 12:05 spec.template
-rw-r--r-- 1 e608313 Domain Users  258 Nov 25 12:05 sqlite.pc.in
-rw-r--r-- 1 e608313 Domain Users 8.8K Nov 25 12:05 sqlite3.1
-rw-r--r-- 1 e608313 Domain Users  263 Nov 25 12:36 sqlite3.pc
-rw-r--r-- 1 e608313 Domain Users  267 Nov 25 12:05 sqlite3.pc.in
-rw-r--r-- 1 e608313 Domain Users 4.0K Nov 25 12:29 sqlite_cfg.h
-rw-r--r-- 1 e608313 Domain Users 3.7K Nov 25 12:05 sqlite_cfg.h.in
drwxr-xr-x 1 e608313 Domain Users    0 Nov 25 12:05 src/
drwxr-xr-x 1 e608313 Domain Users    0 Nov 25 12:05 test/
drwxr-xr-x 1 e608313 Domain Users    0 Nov 25 12:05 tool/
drwxr-xr-x 1 e608313 Domain Users    0 Nov 25 12:05 vsixtest/

(35) By jose isaias cabrera (jicman) on 2022-11-25 18:24:55 in reply to 32.1 [link] [source]

Never mind... I had to run,

./configure
make

The problem is no longer there. I grabbed 125 records in 0.03 seconds. SQLiteVer: 3.41.0 2022-11-25 17:05:55. I am going to build the sqlite3 tool to send EXPLAIN QUERY PLAN.

(37) By jose isaias cabrera (jicman) on 2022-11-25 18:40:37 in reply to 35 [link] [source]

New EXPLAIN QUERY PLAN on the same query as previously stated using SQLite version 3.41.0 2022-11-25 17:05:55 is:

QUERY PLAN
|--CO-ROUTINE (subquery-4)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  |--SEARCH o USING INDEX PL_ProjID_BL_Start (ProjID=?)
|     |  |--CORRELATED SCALAR SUBQUERY 1
|     |  |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|     |  `--CORRELATED SCALAR SUBQUERY 1
|     |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|     `--UNION USING TEMP B-TREE
|        |--SEARCH o USING INDEX PL_ProjID_BL_Start (ProjID=?)
|        |--CORRELATED SCALAR SUBQUERY 3
|        |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|        `--CORRELATED SCALAR SUBQUERY 3
|           `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?)
|--SCAN (subquery-4)
`--USE TEMP B-TREE FOR ORDER BY

(36) By Keith Medcalf (kmedcalf) on 2022-11-25 18:25:41 in reply to 31 [link] [source]

I get the following:

Without latest checkin:

>sqlite < testit.sql
QUERY PLAN
|--CO-ROUTINE (subquery-4)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  |--SCAN o (~1048576 rows)
|     |  `--CORRELATED SCALAR SUBQUERY 1
|     |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
|     `--UNION USING TEMP B-TREE
|        |--SCAN o (~1048576 rows)
|        `--CORRELATED SCALAR SUBQUERY 3
|           `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
|--SCAN (subquery-4) (~524288 rows)
`--USE TEMP B-TREE FOR ORDER BY
┌────────────────┬────────────┬───────────────────────┬───────────────┬──────────────┬──────────────┐
│     ProjID     │ Updated_By │      InsertDate       │      var      │     oldv     │     newv     │
├────────────────┼────────────┼───────────────────────┼───────────────┼──────────────┼──────────────┤
│ 'PR0000020614' │ 'Externo'  │ '2022-05-20_04-12-13' │ 'Finish_Date' │ '2023-09-30' │ '2023-08-31' │
│ 'PR0000020614' │ 'Pamela'   │ '2022-06-29_10-09-49' │ 'BL_Finish'   │ ''           │ '2023-08-31' │
│ 'PR0000020614' │ 'Donna'    │ '2022-11-10_08-57-13' │ 'Finish_Date' │ '2023-08-31' │ '2023-09-30' │
│ 'PR0000020614' │ 'Barbara'  │ '2022-11-16_11-48-08' │ 'BL_Finish'   │ '2023-08-31' │ '2023-09-30' │
└────────────────┴────────────┴───────────────────────┴───────────────┴──────────────┴──────────────┘
VM-steps: 129600104
Run Time: real 6.681 user 6.343750 sys 0.328125
QUERY PLAN
`--MERGE (UNION ALL)
   |--LEFT
   |  |--SEARCH o USING INDEX PL_ProjID_InsertDate_New (ProjID=?) (~9 rows)
   |  |--CORRELATED SCALAR SUBQUERY 1
   |  |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
   |  `--CORRELATED SCALAR SUBQUERY 1
   |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
   `--RIGHT
      |--SEARCH o USING INDEX PL_ProjID_InsertDate_New (ProjID=?) (~9 rows)
      |--CORRELATED SCALAR SUBQUERY 3
      |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
      `--CORRELATED SCALAR SUBQUERY 3
         `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
┌────────────────┬────────────┬───────────────────────┬───────────────┬──────────────┬──────────────┐
│     ProjID     │ Updated_By │      InsertDate       │      var      │     oldv     │     newv     │
├────────────────┼────────────┼───────────────────────┼───────────────┼──────────────┼──────────────┤
│ 'PR0000020614' │ 'Externo'  │ '2022-05-20_04-12-13' │ 'Finish_Date' │ '2023-09-30' │ '2023-08-31' │
│ 'PR0000020614' │ 'Pamela'   │ '2022-06-29_10-09-49' │ 'BL_Finish'   │ ''           │ '2023-08-31' │
│ 'PR0000020614' │ 'Donna'    │ '2022-11-10_08-57-13' │ 'Finish_Date' │ '2023-08-31' │ '2023-09-30' │
│ 'PR0000020614' │ 'Barbara'  │ '2022-11-16_11-48-08' │ 'BL_Finish'   │ '2023-08-31' │ '2023-09-30' │
└────────────────┴────────────┴───────────────────────┴───────────────┴──────────────┴──────────────┘
VM-steps: 3013
Run Time: real 0.011 user 0.000000 sys 0.000000

With the latest checkin:

>sqlite3 < testit.sql
QUERY PLAN
|--CO-ROUTINE (subquery-4)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  |--SEARCH o USING COVERING INDEX sqlite_autoindex_Project_List_1 (ProjID=?) (~9 rows)
|     |  |--CORRELATED SCALAR SUBQUERY 1
|     |  |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
|     |  `--CORRELATED SCALAR SUBQUERY 1
|     |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
|     `--UNION USING TEMP B-TREE
|        |--SEARCH o USING COVERING INDEX sqlite_autoindex_Project_List_1 (ProjID=?) (~9 rows)
|        |--CORRELATED SCALAR SUBQUERY 3
|        |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
|        `--CORRELATED SCALAR SUBQUERY 3
|           `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
|--SCAN (subquery-4) (~4 rows)
`--USE TEMP B-TREE FOR ORDER BY
┌────────────────┬────────────┬───────────────────────┬───────────────┬──────────────┬──────────────┐
│     ProjID     │ Updated_By │      InsertDate       │      var      │     oldv     │     newv     │
├────────────────┼────────────┼───────────────────────┼───────────────┼──────────────┼──────────────┤
│ 'PR0000020614' │ 'Externo'  │ '2022-05-20_04-12-13' │ 'Finish_Date' │ '2023-09-30' │ '2023-08-31' │
│ 'PR0000020614' │ 'Pamela'   │ '2022-06-29_10-09-49' │ 'BL_Finish'   │ ''           │ '2023-08-31' │
│ 'PR0000020614' │ 'Donna'    │ '2022-11-10_08-57-13' │ 'Finish_Date' │ '2023-08-31' │ '2023-09-30' │
│ 'PR0000020614' │ 'Barbara'  │ '2022-11-16_11-48-08' │ 'BL_Finish'   │ '2023-08-31' │ '2023-09-30' │
└────────────────┴────────────┴───────────────────────┴───────────────┴──────────────┴──────────────┘
VM-steps: 3044
Run Time: real 0.008 user 0.000000 sys 0.000000
QUERY PLAN
`--MERGE (UNION ALL)
   |--LEFT
   |  |--SEARCH o USING INDEX PL_ProjID_InsertDate_New (ProjID=?) (~9 rows)
   |  |--CORRELATED SCALAR SUBQUERY 1
   |  |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
   |  `--CORRELATED SCALAR SUBQUERY 1
   |     `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
   `--RIGHT
      |--SEARCH o USING INDEX PL_ProjID_InsertDate_New (ProjID=?) (~9 rows)
      |--CORRELATED SCALAR SUBQUERY 3
      |  `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
      `--CORRELATED SCALAR SUBQUERY 3
         `--SEARCH Project_List USING INDEX PL_ProjID_InsertDate_New (ProjID=? AND InsertDate<?) (~2 rows)
┌────────────────┬────────────┬───────────────────────┬───────────────┬──────────────┬──────────────┐
│     ProjID     │ Updated_By │      InsertDate       │      var      │     oldv     │     newv     │
├────────────────┼────────────┼───────────────────────┼───────────────┼──────────────┼──────────────┤
│ 'PR0000020614' │ 'Externo'  │ '2022-05-20_04-12-13' │ 'Finish_Date' │ '2023-09-30' │ '2023-08-31' │
│ 'PR0000020614' │ 'Pamela'   │ '2022-06-29_10-09-49' │ 'BL_Finish'   │ ''           │ '2023-08-31' │
│ 'PR0000020614' │ 'Donna'    │ '2022-11-10_08-57-13' │ 'Finish_Date' │ '2023-08-31' │ '2023-09-30' │
│ 'PR0000020614' │ 'Barbara'  │ '2022-11-16_11-48-08' │ 'BL_Finish'   │ '2023-08-31' │ '2023-09-30' │
└────────────────┴────────────┴───────────────────────┴───────────────┴──────────────┴──────────────┘
VM-steps: 3013
Run Time: real 0.007 user 0.000000 sys 0.000000

Using this schema and the data posted above duplicated 30000 times.

CREATE TABLE Project_List
(
    ProjID, Finish_Date, BL_Finish, Updated_By, InsertDate,
    PRIMARY KEY (ProjID, Finish_Date, BL_Finish, Updated_By, InsertDate)
);
CREATE INDEX PL_ProjID_InsertDate_New ON "Project_List" (ProjID, InsertDate);

(38) By jose isaias cabrera (jicman) on 2022-11-25 18:50:39 in reply to 36 [link] [source]

Thanks, Keith. How do you get VM-Steps to show with queries? I have .timer on, but VM-Step is not on mine.

I would have expected that UNION ALL would be slower, or would, at least, have more VM-Steps. Hmmmm. So, UNION ALL is the winner.

(39.1) By Larry Brasfield (larrybr) on 2022-11-25 19:06:24 edited from 39.0 in reply to 38 [link] [source]

Plain UNION has to do extra work to eliminate duplicates. With UNION ALL, that work is avoided. The amount of processing and main memory access is much reduced.

(41) By jose isaias cabrera (jicman) on 2022-11-25 19:53:00 in reply to 39.1 [link] [source]

There you go. I learned 17 new things today, including this one. Thanks Mr. Brasfield.

(40) By Richard Hipp (drh) on 2022-11-25 19:33:11 in reply to 38 [link] [source]

How do you get VM-Steps to show

Use: ".stats vmstep"

(42) By jose isaias cabrera (jicman) on 2022-11-25 19:54:57 in reply to 40 [link] [source]

Ok, 18 new things I have learned today. Adding it to my .sqlite3rc file. Thanks, Dr. Hipp.

(45) By anonymous on 2022-11-26 03:23:28 in reply to 31 [link] [source]

With 3.41.0 built from tarball today, the change resolves the OP test query (and the reduction I posted yesterday, if query typo of UNION ALL is corrected to UNION ).

However, with 3.41.0 (from tarball) we are still seeing the issue in the test query posted yesterday (now out of moderation today):

script:

drop  table if exists t ;
 create table t (a text);
 create index i on t (a);
 insert into t select value as a from  generate_series(1,100,1);

explain query plan
 SELECT a FROM (SELECT a FROM t as o UNION select 1 as o) WHERE a = 1;


results:
1

slow query plan:
QUERY PLAN
|--CO-ROUTINE (subquery-2)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  `--SCAN o
|     `--UNION USING TEMP B-TREE
|        `--SCAN CONSTANT ROW
`--SCAN (subquery-2)

a slight variation:

explain query plan
 SELECT a FROM (SELECT a FROM t as o UNION select 1 as a) WHERE a = 1;

results:
1
1

with identical explain query plan as previous query:

QUERY PLAN
|--CO-ROUTINE (subquery-2)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  `--SCAN o
|     `--UNION USING TEMP B-TREE
|        `--SCAN CONSTANT ROW
`--SCAN (subquery-2)

Was the fix/change intended to handle this case?  It would seem that the scan constant row could allow the same push down as the fix/change implements?

The above two queries were created by reducing the query from the OP, and then isolated to the first part of the union (making the second part of the union the scan constant row) for simplicity.

(47) By anonymous on 2022-11-26 04:56:13 in reply to 31 [link] [source]

update:

I see the column affinity requirement is new (check-in : https://sqlite.org/src/info/1ad41840c5e0fa70).

A good chance that is what's happening on the example with scan constant row.

explain query plan
 SELECT a FROM (SELECT a FROM t as o UNION select cast('1' as text) as o) WHERE a = 1;

results:
1

and has a query plan of:

|--CO-ROUTINE (subquery-2)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  `--SEARCH o USING COVERING INDEX i (a=?)
|     `--UNION USING TEMP B-TREE
|        `--SCAN CONSTANT ROW
`--SCAN (subquery-2)

SELECT a FROM (SELECT a FROM t as o UNION select cast('1' as text) as a) WHERE a = 1;

has identical results.

So this seems to work according to current check-in documentation.

Missed that there is a new requirement of the column affinity.

The only other detail that is a mystery is:

SELECT a FROM (SELECT a FROM t as o UNION select '1' as o) WHERE a = 1;

This seems to fail the new column affinity check.  Apparently '1' isn't considered the same as cast('1' as text).  

This seems to be true even with when the column affinity matches the data type inserted:

  drop  table if exists t ;
  create table t (a text);
  create index i on t (a);
  insert into t select cast (value as text) as a from  generate_series(1,100,1);

the query:

SELECT a FROM (SELECT a FROM t as o UNION select '1' as o) WHERE a = 1;

 still causes a scan, even with both column affinity and inserted as quoted literal being text.

(48.2) By Keith Medcalf (kmedcalf) on 2022-11-26 06:14:17 edited from 48.1 in reply to 47 [link] [source]

Apparently '1' isn't considered the same as cast('1' as text).

That would depend on your definiton of "the same".

'1' is an expression. It has value '1'. It is text. It has no affinity.

cast('1' as text) is an expression. It is text. Its value is '1'. It has text affinity.

You may determine what it means to have an affinity or not have an affinity and what this implies by reading the fine documentation on the concepts of datatypes and affinities here: https://sqlite.org/datatype3.html

1 (bare, by its lonesomne) is an expression. It is an integer. It has no affinity.

cast(1 as integer) is an expression. It is an integer. It has integer affinity.

'1' == 1 is False
cast('1' as text) == 1 is True
'1' == cast(1 as integer) is True
cast('1' as text) == cast(1 as integer) is True

That is to say, with clarity, that if one or the other side of a comparison has an affinity, then the side that has no affinity is converted to the same affinity as the other side. If both sides have affinity, then if one of them is numeric (integer, real, numeric) then it wins and the other side is converted to numeric affinity and a numeric comparison is performed. If neither side has affinity and and not the same type, they are neither IS nor ==.

Similar rules apply to the collation which is used only when the value is of type text (although any datatype may have a collation). That is:

1 (bare, by its lonesome) is an expression with the integer value 1 and has no affinity (and the default collation).

1 collate nocase is an expression with the integer value 1 and has no affinity however, it has the collation sequence nocase in the event that it gets text affinity applied.

(49) By anonymous on 2022-11-26 07:25:58 in reply to 48.2 [link] [source]

Sorry, yes, agree with you on the affinity.  I could have certainly written it up tighter in my description.

I was not suggesting that the items should have different affinity, but that the affinity of the items (or semantically more accurate, lack of affinity) was now affecting the query, where, prior to 3.41/3.40 they did not (in the same way).  By 'considered the same as...' I was referring to the handling by the query planner.  The affinity (and lack of) now effects that query where previously it handled 1, '1' or cast(1 as whatever) and the union worked.  That it now requires cast() is new behavior that will make queries much slower until everyone wraps the constants in cast()s.

It was a surprise that there is a now a restriction on optimization in this check-in (https://sqlite.org/src/info/1ad41840c5e0fa70) regarding column affinity.

In 3.39.4, the query works without the cast() to use the optimization, while  3.40 does not due to rule 8, and in the trunk 3.41 it does not due the push down optimization due to it not meeting the 'new' affinity rule 9 (in check-in cited).

If I understand you correctly, the 'new' requirements for that push down optimization require the 'new' affinity rule due to the collation rules (or perhaps to attempt not to get involved in this 'edge' case).

given table t (a text), index on t (a);

SELECT a FROM (SELECT a FROM t as o UNION select cast('1' as text) as a) WHERE a = 1;

works while

SELECT a FROM (SELECT a FROM t as o UNION select '1' as a) WHERE a = 1;

does not.

In 3.39.4 and prior, both worked (well, both used the push down optimization), so I am simply pointing out there is likely going to be some broken queries out there.  And the break will be a slow down like the OP, no error, just seems to hang.

Or folks can write casts around all unions with constants, matching each column's affinity (if they know where to look for the slow down).  I imagine there are going to be some frustrated folks out there on this.

Seems to me that in the above query, the a scan constant row could be cast to column affinity, but I can guess even that would have issues.  However as it is, a union constant without a cast won't ever meet rule 9 (as you point out, it has no affinity).  That seems like a less than desirable result of rule 9.

We seem to have one foot on Postel's law, and another is more STRICT :)

Anyway, thanks for the explanation and reference (though I've read/referred to that page so many times over the years I think I have it memorised by now).

(50) By anonymous on 2022-11-26 08:52:06 in reply to 48.2 [link] [source]

(2nd response)
Keith,

re: "That is to say, with clarity, that if one or the other side of a comparison has an affinity, then the side that has no affinity is converted to the same affinity as the other side."

That certainly has been the behaviour for a very very long time (since sqlite epoch 0).  And it's why the query (and many like it) worked in previous versions (before 3.40/3.41) and took advantage of the flexible typing of sqlite.

Requiring the select on the union to match the column affinity to use the push down optimization makes flexible typing a lot less useful.  A lot less.  In the OP query, without push down you had to write either a query with potentially different results (union all vs union is not the same set), or do the push down of the where manually.  And the slowdown came to them as a surprise.

Rule 9 also raises the question of how to match affinity on a STRICT table column of affinity/datatype "ANY".  I may be missing something, but can one write a union cast('hi' as ANY) as a? (typeof() says it's integer). And even if we could figure out how to write that to match the column affinity, is that something we want to force instead of 'hi' as a?

consider the following (done with 3.41 created from trunk this afternoon):

create table t (a ANY) strict;
create index i on t(a);
explain query plan select a from (select a from t as o union select cast('1' as ANY) as o) where a = 1;

query plan? you guessed it, left most sub-query is scan o (not search using index i).  In other words those queries are going to be slow but the sql writer will say 'I cast it to the column type!'.

In any event, having outlined what I see as the issue, seems to me a good place to put rule 9 in effect is on strict tables (with handling for ANY), and follow Postel (the pre 3.40 behavior) on the dynamic/flexible typed tables.  Perhaps take a another look at casting the constants (maybe wrap them in casts)on non strict tables.

I hope I am wrong, but I don't think we quite have the post 3.39.4 push down optimisation rules where they need to be to avoid a lot of pain for folks.

I think I've outlined about all I can, so I will leave things at that, as I have confidence in folks like yourself, Dr Hipp and everyone working on sqlite.

regards

(51) By anonymous on 2023-02-23 21:37:31 in reply to 50 [link] [source]

The check in 1ad41840 removes rule 9 (allows the where push down optimisation) and restores usage of an index.  With SQLite made from source tarball SQLite-a4aacdd3 :

create table t (a ANY) strict;
create index i on t(a);

for the two queries:

explain query plan select a from (select a from t as o union select cast('1' as ANY) as o) where a = 1;

and

explain query plan select a from (select a from t union select 1) where a = 1;

the query plans are identical, and now use the available index:

QUERY PLAN
|--CO-ROUTINE (subquery-2)
|  `--COMPOUND QUERY
|     |--LEFT-MOST SUBQUERY
|     |  `--SEARCH o USING COVERING INDEX i (a=?)
|     `--UNION USING TEMP B-TREE
|        `--SCAN CONSTANT ROW
`--SCAN (subquery-2)


This change should prevent/address unanticipated slowdowns post 3.39.4.

A big thanks to Dr Hipp and everyone for making it work AND making it faster.