sum() bug using views and filtering
(1) By Matthew Carroll (matthewc) on 2023-11-25 23:32:09 [link] [source]
Posting on the forum as requested. I seem to have encountered a bug. There are some conditions where round(sum(x)) returns the correct result, but sum(x) alone - it seems - incorrectly returns zero. I've stripped this down as best I can. I'm sure it would be possible to construct a simpler example. Platform: macOS 11.7.10, sqlite installed with homebrew. Steps to reproduce: matthew@air ~ % /usr/local/opt/sqlite/bin/sqlite3 SQLite version 3.44.0 2023-11-01 11:23:50 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> CREATE TABLE pollbypoll ( "index" INTEGER, "year" INTEGER, ed_num INTEGER, ed_name TEXT, pd_num TEXT, affiliation TEXT, poll_votes INTEGER ); sqlite> INSERT INTO pollbypoll ("index","year",ed_num,ed_name,pd_num,affiliation,poll_votes) VALUES (0,2021,10001,'Avalon','1','Conservative',28), (1,2021,10001,'Avalon','1','NDP',13), (2,2021,10001,'Avalon','1','Liberal',55), (3,2021,10001,'Avalon','1','People''s Party - PPC',4), (36,2021,10001,'Avalon','10','Conservative',35), (37,2021,10001,'Avalon','10','NDP',6), (38,2021,10001,'Avalon','10','Liberal',43), (39,2021,10001,'Avalon','10','People''s Party - PPC',2); sqlite> CREATE VIEW results_by_fed AS SELECT "year", ed_num, ed_name, affiliation, SUM(poll_votes) AS votes FROM pollbypoll GROUP BY "year", ed_num, affiliation; sqlite> CREATE VIEW results_by_fed_ranked AS SELECT *, DENSE_RANK() OVER ( PARTITION BY "year", ed_num ORDER BY votes DESC ) AS vote_order FROM results_by_fed; sqlite> CREATE VIEW results_by_fed_tabulated AS select "year", ed_num, ed_name, max(affiliation) filter ( where vote_order = 1 ) as winner, ( sum(votes) filter ( where vote_order = 1 ) - sum(votes) filter ( where vote_order = 2 ) ) as margin, round( ( ( sum(votes) filter ( where vote_order = 1 ) - sum(votes) filter ( where vote_order = 2 ) ) / sum(votes) ) * 100, 1 ) as margin_pct from results_by_fed_ranked group by "year", ed_num; sqlite> CREATE VIEW results_by_fed_tabulated_fix AS select "year", ed_num, ed_name, max(affiliation) filter ( where vote_order = 1 ) as winner, ( sum(votes) filter ( where vote_order = 1 ) - sum(votes) filter ( where vote_order = 2 ) ) as margin, round( ( ( sum(votes) filter ( where vote_order = 1 ) - sum(votes) filter ( where vote_order = 2 ) ) / round(sum(votes)) --round() should not make a difference here ) * 100, 1 ) as margin_pct from results_by_fed_ranked group by "year", ed_num; sqlite> select * from pollbypoll; 0|2021|10001|Avalon|1|Conservative|28 1|2021|10001|Avalon|1|NDP|13 2|2021|10001|Avalon|1|Liberal|55 3|2021|10001|Avalon|1|People's Party - PPC|4 36|2021|10001|Avalon|10|Conservative|35 37|2021|10001|Avalon|10|NDP|6 38|2021|10001|Avalon|10|Liberal|43 39|2021|10001|Avalon|10|People's Party - PPC|2 sqlite> select * from results_by_fed; 2021|10001|Avalon|Conservative|63 2021|10001|Avalon|Liberal|98 2021|10001|Avalon|NDP|19 2021|10001|Avalon|People's Party - PPC|6 sqlite> select * from results_by_fed_ranked; 2021|10001|Avalon|Liberal|98|1 2021|10001|Avalon|Conservative|63|2 2021|10001|Avalon|NDP|19|3 2021|10001|Avalon|People's Party - PPC|6|4 sqlite> select * from results_by_fed_tabulated; 2021|10001|Avalon|Liberal|35|0.0 sqlite> select * from results_by_fed_tabulated_fix; 2021|10001|Avalon|Liberal|35|18.8 Expected result: margin_pct should evaluate to 18.8 in both views. Actual result: margin_pct is zero unless round() is added. If for some reason this is a misunderstanding on my part about the expected functionality here let me know. I'm new to using dense_rank() and filter(). It does seem like a bug to me though, as I had a different view (that went through different steps to get from different data to the same penultimate ranked structure) and that worked fine, without adding round(). Thanks, Matthew
(2) By Spindrift (spindrift) on 2023-11-26 00:44:32 in reply to 1 [link] [source]
The result of round() is always a float.
The result of sum() may be an integer, or a float.
Try substituting total() for sum() and see if that fixes your issue.
Additional hint: integer division is often distinct from floating point division.
(3) By Spindrift (spindrift) on 2023-11-26 09:46:55 in reply to 1 [source]
A minimal example for you.
You have located the correct position of your "error" but it isn't because the value returned by sum() is wrong; it is because the type returned is not what you expect, or possibly because dividing two integers does not do what you expect. See the examples and some options for getting what you seem to want, working below.
My suggestion would be for you to change all of your uses of sum() with total() but there are many other ways to ensure float division rather than integer division.
Check yo types!
CREATE TABLE votes(party, proportion);
INSERT INTO votes VALUES ('a',27),('b',105),('a',33);
SELECT (select sum(proportion) from votes where party = 'a') / sum(proportion) from votes; -- returns zero per your original "error"
-- All further selects return what you appear to expect
SELECT (select sum(proportion) from votes where party = 'a') / round(sum(proportion)) from votes; -- returns your expectation per your original solution
SELECT (select sum(proportion) from votes where party = 'a') / total(proportion) from votes; -- total() always returns a float
SELECT (select sum(proportion) from votes where party = 'a') / CAST(sum(proportion) AS REAL) from votes; -- casting to real
INSERT INTO votes VALUES ('a',7.0); -- add a float to the column forces sum to return float
SELECT (select sum(proportion) from votes where party = 'a') / sum(proportion) from votes; -- original query now returns number you expect due to the new inserted float
(4) By anonymous on 2023-11-27 00:32:44 in reply to 3 [link] [source]
test on other databases
postgresql 15
postgres=# CREATE TABLE votes(party varchar(10), proportion int);
CREATE TABLE
postgres=# INSERT INTO votes VALUES ('a',27),('b',105),('a',33);
INSERT 0 3
postgres=# SELECT (select sum(proportion) from votes where party = 'a') / sum(proportion) from votes;
?column?
----------
0
duckdb 0.92
D CREATE TABLE votes(party varchar(10), proportion int);
D INSERT INTO votes VALUES ('a',27),('b',105),('a',33);
D SELECT (select sum(proportion) from votes where party = 'a') / sum(proportion) from votes;
┌─────────────────────────────────────────────────────────────────────────────┐
│ ((SELECT sum(proportion) FROM votes WHERE (party = 'a')) / sum(proportion)) │
│ double │
├─────────────────────────────────────────────────────────────────────────────┤
│ 0.36363636363636365 │
└─────────────────────────────────────────────────────────────────────────────┘
D SELECT (select sum(proportion) from votes where party = 'a') // sum(proportion) from votes;
┌──────────────────────────────────────────────────────────────────────────────┐
│ ((SELECT sum(proportion) FROM votes WHERE (party = 'a')) // sum(proportion)) │
│ int128 │
├──────────────────────────────────────────────────────────────────────────────┤
│ 0 │
└──────────────────────────────────────────────────────────────────────────────┘
(5.2) By Spindrift (spindrift) on 2023-11-27 07:21:53 edited from 5.1 in reply to 4 [link] [source]
Duckdb performing floating point division on two ints with the "/" operator is not really an issue with sqlite.
Some may prefer it but it's not standard behaviour for SQL.
(6) By Matthew Carroll (matthewc) on 2023-11-27 20:51:08 in reply to 1 [link] [source]
Thanks for the explanations everyone, I appreciate it. That makes sense, and I'm glad it's not a bug.