SQLite User Forum

sum() bug using views and filtering
Login

sum() bug using views and filtering

(1) By Matthew Carroll (matthewc) on 2023-11-25 23:32:09 [link] [source]

Posting on the forum as requested. I seem to have encountered a bug. There are some conditions where round(sum(x)) returns the correct result, but sum(x) alone - it seems - incorrectly returns zero.

I've stripped this down as best I can. I'm sure it would be possible to construct a simpler example.

Platform: macOS 11.7.10, sqlite installed with homebrew.

Steps to reproduce:


matthew@air ~ % /usr/local/opt/sqlite/bin/sqlite3

SQLite version 3.44.0 2023-11-01 11:23:50
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.

sqlite> CREATE TABLE pollbypoll (
    "index" INTEGER,
    "year" INTEGER,
    ed_num INTEGER,
    ed_name TEXT,
    pd_num TEXT,
    affiliation TEXT,
    poll_votes INTEGER
);

sqlite> INSERT INTO pollbypoll ("index","year",ed_num,ed_name,pd_num,affiliation,poll_votes) VALUES
         (0,2021,10001,'Avalon','1','Conservative',28),
         (1,2021,10001,'Avalon','1','NDP',13),
         (2,2021,10001,'Avalon','1','Liberal',55),
         (3,2021,10001,'Avalon','1','People''s Party - PPC',4),
         (36,2021,10001,'Avalon','10','Conservative',35),
         (37,2021,10001,'Avalon','10','NDP',6),
         (38,2021,10001,'Avalon','10','Liberal',43),
         (39,2021,10001,'Avalon','10','People''s Party - PPC',2);

sqlite> CREATE VIEW results_by_fed AS
SELECT
    "year",
    ed_num,
    ed_name,
    affiliation,
    SUM(poll_votes) AS votes
FROM
    pollbypoll
GROUP BY
    "year",
    ed_num,
    affiliation;

sqlite> CREATE VIEW results_by_fed_ranked AS
SELECT
    *,
    DENSE_RANK() OVER (
        PARTITION BY
            "year",
            ed_num
        ORDER BY
            votes DESC
    ) AS vote_order
FROM
    results_by_fed;

sqlite> CREATE VIEW results_by_fed_tabulated AS
select
    "year",
    ed_num,
    ed_name,
    max(affiliation) filter (
        where
            vote_order = 1
    ) as winner,
    (
        sum(votes) filter (
            where
                vote_order = 1
        ) - sum(votes) filter (
            where
                vote_order = 2
        )
    ) as margin,
    round(
        (
            (
                sum(votes) filter (
                    where
                        vote_order = 1
                ) - sum(votes) filter (
                    where
                        vote_order = 2
                )
            ) / sum(votes)
        ) * 100,
        1
    ) as margin_pct
from
    results_by_fed_ranked
group by
    "year",
    ed_num;
sqlite> CREATE VIEW results_by_fed_tabulated_fix AS
select
    "year",
    ed_num,
    ed_name,
    max(affiliation) filter (
        where
            vote_order = 1
    ) as winner,
    (
        sum(votes) filter (
            where
                vote_order = 1
        ) - sum(votes) filter (
            where
                vote_order = 2
        )
    ) as margin,
    round(
        (
            (
                sum(votes) filter (
                    where
                        vote_order = 1
                ) - sum(votes) filter (
                    where
                        vote_order = 2
                )
            ) / round(sum(votes)) --round() should not make a difference here
        ) * 100,
        1
    ) as margin_pct
from
    results_by_fed_ranked
group by
    "year",
    ed_num;

sqlite> select * from pollbypoll;
0|2021|10001|Avalon|1|Conservative|28
1|2021|10001|Avalon|1|NDP|13
2|2021|10001|Avalon|1|Liberal|55
3|2021|10001|Avalon|1|People's Party - PPC|4
36|2021|10001|Avalon|10|Conservative|35
37|2021|10001|Avalon|10|NDP|6
38|2021|10001|Avalon|10|Liberal|43
39|2021|10001|Avalon|10|People's Party - PPC|2

sqlite> select * from results_by_fed;
2021|10001|Avalon|Conservative|63
2021|10001|Avalon|Liberal|98
2021|10001|Avalon|NDP|19
2021|10001|Avalon|People's Party - PPC|6

sqlite> select * from results_by_fed_ranked;
2021|10001|Avalon|Liberal|98|1
2021|10001|Avalon|Conservative|63|2
2021|10001|Avalon|NDP|19|3
2021|10001|Avalon|People's Party - PPC|6|4

sqlite> select * from results_by_fed_tabulated;
2021|10001|Avalon|Liberal|35|0.0

sqlite> select * from results_by_fed_tabulated_fix;
2021|10001|Avalon|Liberal|35|18.8


Expected result: margin_pct should evaluate to 18.8 in both views.

Actual result: margin_pct is zero unless round() is added.

If for some reason this is a misunderstanding on my part about the expected functionality here let me know. I'm new to using dense_rank() and filter(). It does seem like a bug to me though, as I had a different view (that went through different steps to get from different data to the same penultimate ranked structure) and that worked fine, without adding round().

Thanks,
Matthew

(2) By Spindrift (spindrift) on 2023-11-26 00:44:32 in reply to 1 [link] [source]

The result of round() is always a float.

The result of sum() may be an integer, or a float.

Try substituting total() for sum() and see if that fixes your issue.

Additional hint: integer division is often distinct from floating point division.

(3) By Spindrift (spindrift) on 2023-11-26 09:46:55 in reply to 1 [source]

A minimal example for you.

You have located the correct position of your "error" but it isn't because the value returned by sum() is wrong; it is because the type returned is not what you expect, or possibly because dividing two integers does not do what you expect. See the examples and some options for getting what you seem to want, working below.

My suggestion would be for you to change all of your uses of sum() with total() but there are many other ways to ensure float division rather than integer division.

Check yo types!

CREATE TABLE votes(party, proportion);
INSERT INTO votes VALUES ('a',27),('b',105),('a',33);

SELECT (select sum(proportion) from votes where party = 'a') / sum(proportion) from votes; -- returns zero per your original "error"

-- All further selects return what you appear to expect

SELECT (select sum(proportion) from votes where party = 'a') / round(sum(proportion)) from votes; -- returns your expectation per your original solution

SELECT (select sum(proportion) from votes where party = 'a') / total(proportion) from votes; -- total() always returns a float

SELECT (select sum(proportion) from votes where party = 'a') / CAST(sum(proportion) AS REAL) from votes; -- casting to real

INSERT INTO votes VALUES ('a',7.0); -- add a float to the column forces sum to return float

SELECT (select sum(proportion) from votes where party = 'a') / sum(proportion) from votes; -- original query now returns number you expect due to the new inserted float

(4) By anonymous on 2023-11-27 00:32:44 in reply to 3 [link] [source]

test on other databases

postgresql 15

postgres=# CREATE TABLE votes(party varchar(10), proportion int);
CREATE TABLE
postgres=# INSERT INTO votes VALUES ('a',27),('b',105),('a',33);
INSERT 0 3
postgres=# SELECT (select sum(proportion) from votes where party = 'a') / sum(proportion) from votes;
 ?column?
----------
        0

duckdb 0.92

D CREATE TABLE votes(party varchar(10), proportion int);
D INSERT INTO votes VALUES ('a',27),('b',105),('a',33);
D SELECT (select sum(proportion) from votes where party = 'a') / sum(proportion) from votes;
┌─────────────────────────────────────────────────────────────────────────────┐
│ ((SELECT sum(proportion) FROM votes WHERE (party = 'a')) / sum(proportion)) │
│                                   double                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                         0.36363636363636365 │
└─────────────────────────────────────────────────────────────────────────────┘
D SELECT (select sum(proportion) from votes where party = 'a') // sum(proportion) from votes;
┌──────────────────────────────────────────────────────────────────────────────┐

│ ((SELECT sum(proportion) FROM votes WHERE (party = 'a')) // sum(proportion)) │
│                                    int128                                    │
├──────────────────────────────────────────────────────────────────────────────┤

│                                                                            0 │
└──────────────────────────────────────────────────────────────────────────────┘

(5.2) By Spindrift (spindrift) on 2023-11-27 07:21:53 edited from 5.1 in reply to 4 [link] [source]

Duckdb performing floating point division on two ints with the "/" operator is not really an issue with sqlite.

Some may prefer it but it's not standard behaviour for SQL.

(6) By Matthew Carroll (matthewc) on 2023-11-27 20:51:08 in reply to 1 [link] [source]

Thanks for the explanations everyone, I appreciate it. That makes sense, and I'm glad it's not a bug.