SQLite Forum

Conflict using group by and partition by at the same time
Login

Conflict using group by and partition by at the same time

(1) By anonymous on 2020-04-04 08:29:08 [source]

Using "group by" at the end of a query which contains window functions using "partition by" may lead to funny results, e.g.(sqlite3 version 3.29.0 2019-07-10 17:32:03):

create table person (
country text,
town text,
name text,
weight integer,
height integer
);

insert into person(country, town, name, weight, height) values
('russia', 'moskow', 'iwan', 65, 160),
('russia', 'moskow', 'peter', 78, 170),
('russia', 'petersburg', 'igor', 70, 180),
('france', 'paris', 'pierre', 60, 190),
('france', 'paris', 'robert', 70, 180),
('france', 'nantes', 'michel', 80, 170);

Of course the following queries make no sense and should possibly be forbidden (as this is done with the "partition by" clause in window chaining (https://www.sqlite.org/windowfunctions.html#window_chaining).

A) select country, max(weight), max(height) over (partition by country, town)
from person
group by country;
/*
france|80|170
russia|78|170
*/

B)
select country, max(weight), max(weight) over (partition by country, town)
from person
group by country;
/*
france|80|80
russia|78|78
*/

C) select country, max(weight), min(weight) over (partition by country, town)
from person
group by country;
/*
france|80|80
russia|78|78
*/

FrankL

(2) By Dan Kennedy (dan) on 2020-04-04 13:27:09 in reply to 1 [link] [source]

The trick here is that computation of window functions takes place after aggregation. So your first query is actually equivalent to:

    SELECT country, max_weight, max(height) OVER (PARTITION BY country, town) FROM (
        SELECT country, max(weight) AS max_weight, height, town 
        FROM person 
        GROUP BY country
    );

In SQLite, the inner query returns:

    ('russia', 78, 170, 'moskow'),
    ('france', 80, 170, 'nantes');

For this query in SQLite, the values of "height" and "town" are always read from the same value that supplied the maximum weight value.

Given the results above for the inner query, the outer query returns:

    ('russia', 78, 170),
    ('france', 80, 170);

So I suppose the results are a bit funny, but I think they're technically correct.

(3) By anonymous on 2020-04-04 13:45:11 in reply to 2 [link] [source]

The trick here is that computation of window functions takes place after aggregation.

This is an important piece of information that I was recently looking for myself, but couldn't find. Can I suggest that this gets highlighted or added to both the Window Function and Group By documentation sections?

(4) By anonymous on 2020-04-05 07:29:47 in reply to 2 [link] [source]

The thing, I missed was the connection from the "max(weight) as max_weight" in the inner query to the max_weight in the outer query.
This helps me a lot!
Thank you very much.

FrankL