Sublte difference between aggregation and unique index?
(1) By ddevienne on 2022-08-05 13:39:28 [link] [source]
Hi. I'm replacing
CREATE UNIQUE INDEX tmp_unique_index ON $table ($columns)
and seeing whether this fails (then dropping the index), with
select exists (
select null
from $table
group by $columns
having count(*) > 1
)
But I'm surprised to discover that the latter can return 1
, yet the UNIQUE
index is still created OK. What am I missing?
NULL handling differences?
Can my select exists
SQL be made to behave identically to the create index
?
I.e. return 1 when the index creation fails, and 0 otherwise?
Thanks, --DD
PS: In the case I see a difference, there are 2 columns, i.e. composite index.
PPS: This is obviously an optimization. I'm trying to avoid disk IO (since using a memory temp_store).
I'm also assuming SQLite has a short-circuit optimization, i.e. stops at the first count(*) > 1
(2) By David Raymond (dvdraymond) on 2022-08-05 14:22:20 in reply to 1 [link] [source]
Here's the NULL Handling page of the documentation.
NULLs in a column of a unique index basically count as distinct values, however GROUP BY treats NULLs as equal.
(3) By ddevienne on 2022-08-05 14:26:03 in reply to 2 [link] [source]
OK. But is there a way to make the group by
behave similarly?
(4) By Ryan Smith (cuz) on 2022-08-05 14:38:34 in reply to 3 [link] [source]
Something like:
SELECT valueThatMightBeNull, othervalue, sum(x)
FROM t
GROUP BY IFNULL(valueThatMightBeNull, random()), othervalue
Probably better to add a time or rowcount to the random to ensure uniqueness, because there is no guarantee that even a sea of values as vast as 64bit wouldn't see the same value twice in one query.
(7) By ddevienne on 2022-08-08 13:42:56 in reply to 4 [source]
Thanks Ryan. Using ifnull(col, random())
is clever.
Although I think I prefer David's idea of using rowid
instead.
(5) By David Raymond (dvdraymond) on 2022-08-05 17:39:16 in reply to 3 [link] [source]
If you're looking to see if a create unique index on t (a, b, c) will succeed, then for a rowid table anyway what I'm thinking is compare select count(*) from t; against select count(*) from (select distinct a, b, c from t); (Since it doesn't seem to allow row values in distinct aggregates, ie no select count(distinct (a, b, c)) from t; or select count(distinct a, b, c) from t;) BUT: For each field in the unique index that is nullable, add a coalesce with the rowid to the select list. That makes all the null values that would be "equal" now distinct since no two can have the same rowid. So if b is the only nullable field, then compare select count(*) from t; vs select count(*) from (select distinct a, b, coalesce(b, rowid), c from t); and if the counts are equal, then creating a unique index will succeed. Whether it's worth all that to avoid a failure, I leave to you to decide.
(6.2) By David Raymond (dvdraymond) on 2022-08-05 17:47:22 edited from 6.1 in reply to 5 [link] [source]
Of course, after all that I realize you can shorten it as you did above. select exists(select 1 from t group by a, b, coalesce(b, rowid), c having count(*) > 1); If it returns 0, it will succeed, if it returns 1 it won't.
(8) By ddevienne on 2022-08-08 13:45:05 in reply to 5 [link] [source]
Thanks David (or Raymond, not sure, sorry).
I like your clever use of rowid
to make sure all NULL
s are distinct.
I'll give it a spin when I'm back in the office.