Sublte difference between aggregation and unique index?

(1) By ddevienne on 2022-08-05 13:39:28 [link] [source]

Hi. I'm replacing

CREATE UNIQUE INDEX tmp_unique_index ON $table ($columns)

and seeing whether this fails (then dropping the index), with

select exists (
  select null
    from $table
   group by $columns
  having count(*) > 1
)

But I'm surprised to discover that the latter can return 1, yet the UNIQUE index is still created OK. What am I missing?

NULL handling differences?

Can my select exists SQL be made to behave identically to the create index?
I.e. return 1 when the index creation fails, and 0 otherwise?

Thanks, --DD

PS: In the case I see a difference, there are 2 columns, i.e. composite index.

PPS: This is obviously an optimization. I'm trying to avoid disk IO (since using a memory temp_store).
I'm also assuming SQLite has a short-circuit optimization, i.e. stops at the first count(*) > 1

(2) By David Raymond (dvdraymond) on 2022-08-05 14:22:20 in reply to 1 [link] [source]

Here's the NULL Handling page of the documentation.

NULLs in a column of a unique index basically count as distinct values, however GROUP BY treats NULLs as equal.

(3) By ddevienne on 2022-08-05 14:26:03 in reply to 2 [link] [source]

OK. But is there a way to make the group by behave similarly?

(4) By Ryan Smith (cuz) on 2022-08-05 14:38:34 in reply to 3 [link] [source]

Something like:

SELECT valueThatMightBeNull, othervalue, sum(x)
  FROM t
 GROUP BY IFNULL(valueThatMightBeNull, random()), othervalue

Probably better to add a time or rowcount to the random to ensure uniqueness, because there is no guarantee that even a sea of values as vast as 64bit wouldn't see the same value twice in one query.

(7) By ddevienne on 2022-08-08 13:42:56 in reply to 4 [source]

Thanks Ryan. Using ifnull(col, random()) is clever.
Although I think I prefer David's idea of using rowid instead.

(5) By David Raymond (dvdraymond) on 2022-08-05 17:39:16 in reply to 3 [link] [source]

If you're looking to see if a create unique index on t (a, b, c) will succeed, then for a rowid table anyway what I'm thinking is compare

select count(*) from t;
against
select count(*) from (select distinct a, b, c from t);

(Since it doesn't seem to allow row values in distinct aggregates, ie no
select count(distinct (a, b, c)) from t;
or
select count(distinct a, b, c) from t;)

BUT: For each field in the unique index that is nullable, add a coalesce with the rowid to the select list. That makes all the null values that would be "equal" now distinct since no two can have the same rowid.

So if b is the only nullable field, then compare

select count(*) from t;
vs
select count(*) from (select distinct a, b, coalesce(b, rowid), c from t);

and if the counts are equal, then creating a unique index will succeed.

Whether it's worth all that to avoid a failure, I leave to you to decide.

(6.2) By David Raymond (dvdraymond) on 2022-08-05 17:47:22 edited from 6.1 in reply to 5 [link] [source]

Of course, after all that I realize you can shorten it as you did above.

select exists(select 1 from t group by a, b, coalesce(b, rowid), c having count(*) > 1);

If it returns 0, it will succeed, if it returns 1 it won't.

(8) By ddevienne on 2022-08-08 13:45:05 in reply to 5 [link] [source]

Thanks David (or Raymond, not sure, sorry).

I like your clever use of rowid to make sure all NULLs are distinct.
I'll give it a spin when I'm back in the office.