SQLite Forum

COUNT statement returns inconsistent values when used with UNION ALL
Login

COUNT statement returns inconsistent values when used with UNION ALL

(1) By Yu Liang (LY1598773890) on 2021-04-05 04:55:14 [link]

For query: 


```SQL
CREATE VIEW v0 ( v1 ) AS SELECT CAST ( 0 AS VARCHAR(1) );
CREATE VIEW v2 ( v3 ) AS SELECT * FROM v0 UNION ALL SELECT ( v1 IN ( 10) ) from v0;
SELECT * FROM v2 NATURAL JOIN v2;
SELECT COUNT(*) FROM v2 NATURAL JOIN v2;
```

The expected answer should be: 00 2. 
However, the actual output is: 0000 2.

This problem seems to be introduced by de9ed6293de53e89b7c37e7de9a8697d86d7f619

(2) By Stephan Beal (stephan) on 2021-04-05 05:00:55 in reply to 1

> This problem seems to be introduced by de9ed6293de53e89b7c37e7de9a8697d86d7f619

That's not a hash from the sqlite project:

```
[stephan@nuc:~/fossil/sqlite3]$ fossil time -n 1
=== 2021-04-04 ===
23:56:15 [ad718388a1] Fix an assert() in sqlite3BtreeLast() that needs an "|| CORRUPT_DB" term. Dbsqlfuzz case b92b72e4de80b5140c30ab71372ca719b8feb618. (user: drh tags:
         trunk)
--- entry limit (1) reached ---

...

[stephan@nuc:~/fossil/sqlite3]$ fossil whatis de9ed6293de53e89b7c37e7de9a8697d86d7f619
unknown:    de9ed6293de53e89b7c37e7de9a8697d86d7f619
```

(3) By Yu Liang (LY1598773890) on 2021-04-05 05:08:38 in reply to 2 [link]

Sorry. "de9ed6293de53e89b7c37e7de9a8697d86d7f619" is the github commit ID. 

The Fossil hash should be:
FossilOrigin-Name: 00e4bf74d3dfb87666a2266905f7d1a2afc6eb088d22cfd4f38f048733d6b936

(4) By Keith Medcalf (kmedcalf) on 2021-04-05 06:16:41 in reply to 1 [link]

I would expect that the join result is 4 rows and the count should be 4.

v0.v1 has the affinity text
v2.v3 has two rows, one a '0' with affinity text and the other with the integer 0 with no affinity.

joining v2 to itself means that a.v3 == b.v3.  However, v3 (which comes from v2) has affinity text so the comparison, with affinity, results in the '0' == 0 with affinity text, which is true; so there are four rows.

The count is, however, wrong.  It should be 4.

That is, unless v2.v3 has no affinity, in which case the result should two rows.

(5) By John Dennis (jdennis) on 2021-04-05 06:54:08 in reply to 4 [link]

The final count(*) returns 2 in both 3.33 and 3.35
However SELECT * FROM v2 NATURAL JOIN v2 returns 2 rows of 0 in 3.33 and returns four rows of zero in 3.35 (on Windows 64-bit)

SQLite version 3.33.0 2020-08-14 13:23:32
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE VIEW v0 ( v1 ) AS SELECT CAST ( 0 AS VARCHAR(1) );
sqlite> CREATE VIEW v2 ( v3 ) AS SELECT * FROM v0 UNION ALL SELECT ( v1 IN ( 10) ) from v0;
sqlite> SELECT * FROM v2 NATURAL JOIN v2;
0
0
sqlite> SELECT COUNT(*) FROM v2 NATURAL JOIN v2;
2


SQLite version 3.35.4 2021-04-02 15:20:15
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> CREATE VIEW v0 ( v1 ) AS SELECT CAST ( 0 AS VARCHAR(1) );
sqlite> CREATE VIEW v2 ( v3 ) AS SELECT * FROM v0 UNION ALL SELECT ( v1 IN ( 10) ) from v0;
sqlite> SELECT * FROM v2 NATURAL JOIN v2;
0
0
0
0
sqlite> SELECT COUNT(*) FROM v2 NATURAL JOIN v2;
2
sqlite>

(6) By John Dennis (jdennis) on 2021-04-05 06:57:12 in reply to 5 [link]

So the count(*) is not inconsistent when comparing the two versions, although if a select * returns four rows, one would think a select count(*) should, for consistency, return 4.

(7.1) By Richard Hipp (drh) on 2021-04-05 12:34:15 edited from 7.0 in reply to 1 [link]

This is not a bug.  The relevant documentation is the third paragraph of
[the datatype documentation section 3.3][1], which I quote here:

> When the SELECT statement that implements a VIEW or FROM-clause subquery is a compound SELECT then the affinity of each supposed column of the VIEW or subquery will be the affinity of the corresponding result column for one of the individual SELECT statements that make up the compound. However, it is indeterminate which of the SELECT statements will be used to determine affinity. Different constituent SELECT statements might be used to determine affinity at different times during query evaluation. The choice might vary across different versions of SQLite. The choice might change between one query and the next in the same version of SQLite. The choice might be different at different times within the same query. Hence, you can never be sure what affinity will be used for columns of a compound SELECT that have different affinities in the constituent subqueries. Best practice is to avoid mixing affinities in a compound SELECT if you care about the datatype of the result.

In your V2 view, the affinity of the left side of the UNION ALL is TEXT,
whereas the affinity of the right side is NONE.  So the affinity of the
V2.V3 column is indeterminate.  The query planner is free to chose whatever
type affinity it wants for V2.V3.  Different answers might result depending
on what the query planner chooses.

The query planner is transforming your "SELECT *" query into something like
this:

> ~~~
SELECT aa.v1 FROM V0 AS aa JOIN V0 AS bb ON (aa.v1=bb.v1)
UNION ALL
SELECT aa.v1 FROM V0 AS aa JOIN V0 AS bb ON ((aa.v1 IN (10))=bb.v1)
UNION ALL
SELECT aa.v1 FROM V0 AS aa JOIN V0 AS bb ON (aa.v1=(bb.v1 IN (10)))
UNION ALL
SELECT aa.v1 FROM V0 AS aa JOIN V0 AS bb ON ((aa.v1 IN (10))=(bb.v1 IN (10)));
~~~

The above query generates four rows, one for each subquery because
the affinity rules always cast the result to TEXT before comparison,
except on the last subquery where the result of both sides of the
comparison is already an integer anyhow.

But the count(*) query is transformed into the rough equivalent of:

> ~~~
CREATE TEMP TABLE mx AS SELECT v3 FROM v2;
SELECT count(*) FROM mx AS aa JOIN mx AS bb ON (aa.v3=bb.v3);
~~~

In this case, the affinity of the mx.v3 column is BLOB,  So the 0='0' and
'0'=0 comparisons fail.  Only the 0=0 and '0'='0' cases work, and so the
result is two rows.

Yes, this is confusing.  That is why we say:
"Best practice is to avoid mixing affinities in a compound SELECT...."
But it does conform to the documented behavior, and hence is not a bug.

[1]: https://sqlite.org/datatype3.html#column_affinity_for_views_and_subqueries

(8) By Yu Liang (LY1598773890) on 2021-04-05 16:17:34 in reply to 7.1 [link]

Thank you for the explanation.