STRICT tables and UNIQUE indices

(1) By Marco Bubke (marcob) on 2021-09-01 09:48:17 [link]

First thank you for this new table mode! It will be very helpful, especially the primary key mode which cannot not null anymore. 

What about unique indices and null. I know that Sqlite like to follow Postgresql but I already run into some bugs because of it. Maybe there could be some STRICT unique indices too which would only allow one null value in an index. Very often I work around it by defining some special value like -1 but sometimes this is not easy and using null as unique "non value" would be very helpful.

Thank you again for the STRICT tables. :-)

Marco

(2) By Ryan Smith (cuz) on 2021-09-01 10:02:14 in reply to 1 [link]

I think you are mistaken - NULL values are distinct in UNIQUE INDEXes in most RDBMSes, [even in PostGres](https://www.postgresqltutorial.com/postgresql-indexes/postgresql-unique-index/). The only kind of UNIQUE INDEX that should not allow NULLS is a Primary Key (for obvious reasons) unless the field is also declared as NOT NULL, but then no nulls can exist, ditto for SQLite. 

If you run into bugs because of that notion then I'm afraid to say the bugs are your own doing.

One immediate remedy for your situation I can think of is to declare another calculated column with the expressions "COALESCE(my_key,-1)" which will always have the same value as the example "my_key" field, unless that field is NULL in which case it would have the value -1. Declaring a Unique Index on the calculated field should solve your problem (I think, but I haven't yet tried it).

(3) By Marco Bubke (marcob) on 2021-09-01 12:19:21 in reply to 2 [link]

I think we misunderstand each other. I was referencing https://sqlite.org/faq.html#q26.

So null is distinct for many but not in all databases. I am asking for a non distinct null value in unique indices as 'non value'.

I know how to handle the -1 case but this is not helping if -1 is a valid value. I try to give an example. You can have an value which is an int64 and you use it as an key. Lets assume that all integers can be a valid value and there can be the case of a non value. This could be mapped to null but can't because 'non value' is not distinct in most cases. It is only a special case. So you cannot use null for that case which comes up very often in many pattern. I know ways to work around it. I thought that would be clear.

Why I was coming up with it in the strict table case? There you can set a column to integer and it would be not anymore possible to workaround it with an other type like an empty bytearray. You have to add an extra column.

(4) By Ryan Smith (cuz) on 2021-09-01 12:57:53 in reply to 3 [link]

I'm not sure there is a misunderstanding, but in the interest of honest discourse, let me state it like this: The problem has two facets -

##1 - Is it sensible to have NULLs be NON-DISTINCT values in UNIQUE INDEXES?
Certainly a couple fringe DB Engines feel this way, but most of them luckily do not. I'm not being flippant when I say "Luckily". There are Two considerations:
- If you declare a column to be UNIQUE, you either

1 -- absolutely never want to add two of the same things in there, or
2 -- sometimes want to add records which do not yet have their Unique Key assigned, so being able to insert a NULL helps.

Now both 1 and 2 are reasonable and sensible ways of doing things, and works in the wider used interpretation of the standard.
However, there is one little addition in that some people, while wanting to absolutely be able to add records without Keys to UNIQUE columns, now also want the added restriction of absolutely never being allowed to do that more than once. Why? This is less sensible, and I am not going to argue the case against it, you may have very valid reasons for doing so, but my point is that it should be clear this is not the general need, this is a very fringe benefit (and can already be achieved via the workarounds we agreed on).

##2 - Should it change?
SQLite uses the interpretation of the SQL Standard that is most sensible and used by most other DB Engines (except a fringe few). It is not wrong, not buggy, not silly, not a design flaw, not a problem that needs fixing, not a stupid choice from long ago that makes developing difficult and should be eradicated... none of those. What you are asking is to now, above and beyond its already best interpretation of the standard and accordance to other major engines, also expand its tool-set to allow a mode that is used by the lesser DB engines and lesser used application - in fact to widen its regime of interpretation.

You can surely see the irony of how this is the very opposite of what STRICT mode tries to do.

To be clear - I'm not against the idea of allowing perhaps a Column parameter that will make NULLS non-distinct, perhaps a collation can achieve this? I just do not think that should be an effect of STRICT mode, ever.

(7) By Marco Bubke (marcob) on 2021-09-01 15:15:03 in reply to 4 [link]

I think a non distinct null parameter would be fine.

My general argumentation is not about a closed argumentation system(standard) like I think you propose but by problems who are embedded into their context. I think good tools can be adapted to their context. It is not always easy and sometimes not even worth it. But you can argument about it.
 
So my argumentation is that it is not important if it is sensible to the standard(curch ;-) but to the context it is used. So in my context a customization would be very useful. You can argument that my context is very special. Actually I cannot see so many cases there you want a unique index which is not constrained on null. And for that you can always use a partial index which makes it much more clearer. ;-)

My point was not that the STRICT mode would change the behavior but would add some customization for null values.

Anyway it was only an idea and I think I need to convince my company to pay for it which is not so probable. ;-)

(8) By J-L Hainaut (JLHainaut) on 2021-09-04 10:54:29 in reply to 4 [link]

As soon as one allows nullable columns in a DB schema, one has to accept their positive and negative consequences. The semantics of "null" is manyfold, its most common interpretations being "unknown", "missing", "not applicable", and the like.
In a strict application of the historical relational model, "null"s are not allowed in base tables. Indeed, they can cause many logic problems that translate into subtile errors in SQL queries that may be tricky to detect and fix. Actually, "null" is not a value but a tag or a marker denoting the absence of value.
One way to get rid of "null"'s in base tables is to decompose them so that each nullable column is isolated in its own table.
The problem is that SQL has an ambiguous attitude towards "null"'s. Sometimes "null"'s are treated as tags (the evaluation of ["null" = "null"] is not "True" but "Unknown") while elsewhere they are considered as ordinary values, as in these two cases:

1. "create table T(A,B);"
"select B,count(*) from T group by B;"
the resultset of which may include a row such as "(null,37)", suggesting that "null"'s
are comparable.

2. in some RDBMS's (as far as I remember, SQL Server, MS Access for example) a unique
index based on nullable column "A" allows only ONE base row "where A is null", also
suggesting that "null"'s are comparable.
This interpretation makes this index practically useless and is generally considered
as a design bug.
The fix recommended is to create a partial unique index on the rows "where A is not null".

Fortunately, the interpretation of "unique indexes on nullable columns" as implemented in SQLite complies with the standard. IMHO, the idea of a STRICT mode affecting the semantics of unique index will only add confusion for little added value.

(9) By Anthony on 2021-09-05 07:04:30 in reply to 8 [link]

Hi All,

Just wanna add some of my personal experience to this thread, since it’s related to exactly this

Ive also run into a situation that requires uniqueness of nulls when in a unique index, and as far as I know, there is no good work around with sqlite.

I’m building a database product myself, where the tables/objects are created by the users themselves at a different level and sqlite tables get generated dynamically.

This means I can’t put in specific work arounds to the individual cases or use data type defaults (-1 or “”) Because these values might be meaningful themselves and won’t allow me to distinguish them from the “unset” concept.
Using magic values such as a very small negative number or some magic string will make looking at the raw data in the db misleading  so I don’t like that option either.  The other thing I don’t like about default values is it forces the developer to account for every different data type when looking for the unset,

algorithmic languages like c, c#, Java all allow the use of null to represent the unset concept in a way that is very useful.  You can check if something is null, not needing to worry about the type in any way.  In sql, u can check for nulls with “is null” logic in sql selects, but for nulls in the unique index there is no way to do what I need

One solution I’ve seen people suggest is to use partial indexes, leaving out the nullable
The problem with this is it won’t work if I have many nullable columns.  I would have to have an index for every combination of nulled column, N choose M combinations

Another solution is to use triggers or conflict clauses, which as far as I know will stop the automatic “insert or replace” statements from working on unique matches.  (I’ll be happy to be wrong on this one)

Sql is an incredibly old language and null represents a bunch of different concepts all muddled up together.  I think it would not be done the same way today

The algo languages have made null to mean more of unset rather than sql s “unknown” meaning.

This means bridging the algo and db worlds difficult because the sql world doesn’t do it cleanly and in particular sqlite can’t represent unique unset values in the index.
Not allowing the unique unset means interoperability, and data loss issues if pulling data from an sqlserver db or other null unique capable software

There is absolutely a valid use for both cases, and I think sqlite would benefit from having an extra keyword somewhere that allows null to be unique in indexes perhaps for each column or for the entire unique index.

(10) By Ryan Smith (cuz) on 2021-09-05 13:06:55 in reply to 9 [link]

I'm sorry to say, but this entire thesis seems biased towards imperfect expectation born from limited experience. Not being nasty - I have had this happen to me before (about other concepts) so I recognize this reasoning from my own imperfect past.

Let me deal with the points raised one by one:

> ... and as far as I know, there is no good work around with sqlite.

Several workarounds (good ones) have been demonstrated in this thread, even some by the very poster who originally posed the question.

> algorithmic languages like c, c#, Java all allow the use of null to represent the unset concept in a way that is very useful.

So does SQL. (More below)

> You can check if something is null, not needing to worry about the type in any way.

Yeah, same as in SQL. You even go right ahead from that statement to note exactly the method of checking null in SQL (not bound to any typed check at all):   

> In sql, u can check for nulls with “is null” logic in sql selects,   

Exactly.

> but for nulls in the unique index there is no way to do what I need

It's not clear what is it you need, but I imagine you need unique nulls (since that's the main gist of the post), though you never explain why.   

> One solution I’ve seen people suggest is to use partial indexes, leaving out the nullable The problem with this is it won’t work if I have many nullable columns. I would have to have an index for every combination of nulled column, N choose M combinations

That sentence is either unparsable/misunderstood by me, or false. Could you demonstrate the problem with SQL that doesn't work, and for which there is no suitable way to make it work? We'd be better-able to respond to that.


> Another solution is to use triggers or conflict clauses, which as far as I know will stop the automatic “insert or replace” statements from working on unique matches. (I’ll be happy to be wrong on this one)

Again not understood - You seem to claim that the thing you want is possible (for triggers to stop inserts for unique matches), and then say you will be happy to be wrong.   
Did you mean to say that there is no way for a trigger to stop the insert where a column of your choice may contain more than one null value? If so then you are wrong, it's a rather easy thing to do.

> Sql is an incredibly old language and null represents a bunch of different concepts all muddled up together. I think it would not be done the same way today

SQL is incredibly new next to C, for instance. I bet it will be done exactly the same way again, and this time the Standard would probably be more clear on the silly notion that NULLs can ever be "duplicate". I'd say MSSQL would probably do it different if they had the opportunity to start over.

A claim of "duplicate" means that you can assert that one value (call it A) is exactly the same as another value (B). Only if A == B can be demonstrated, then it can be called "Duplicate" or "Not Unique". When both A and B are unknown, unset, unspecified, indeterminate (pick your favourite word for it) then by definition A cannot possibly be claimed to be exactly equal to B and therefore cannot be duplicate, non-unique, etc.

You can say that you want to insert items that are unknown because of an expectation, say "new" entries the user makes that were not accepted yet. Then the question becomes, Ok, but why do you only want to allow 1 of that and never more? Perhaps the user want to add 3 things so long that are not yet accepted? I'm willing to bet it's purely because MSSQL did not allow any more with their broken system, and so you designed your system that way and now your mindset is hooked on that notion. (That kind of thing certainly happened to me before)


> The algo languages have made null to mean more of unset rather than sql s “unknown” meaning.   
and
>This means bridging the algo and db worlds difficult because the sql world doesn’t do it cleanly and in particular sqlite can’t represent unique unset values in the index.

That can't be true, right?  Please demonstrate the claim with code examples in an algo language (C, say) and SQL where these differences in meanings of NULL (i.e. "Unset" vs. "Unknown") would produce different mathematically sound outcomes.


> Not allowing the unique unset means interoperability, and data loss issues if pulling data from an sqlserver db or other null unique capable software

Ah, here we get to the real crux of the matter. You are bemoaning the fact that SQLite (and Postgres and MySQL and many more) do not also allow the wrong and mathematically incompetent way that another RDBMS is doing it and so compatibility is somewhat hampered.
This is like calling Tesla and complaining they they do not also support fossil fuel in their cars because the gas station nearest your house doesn't have an EV charger.

To answer the question though, I do very many project cooperations between specifically MSSQL and SQLite, it's never a problem, it's very easy to do, unless the aim is (which I suspect in your case) to make the SQLite table behave exactly like the MSSQL table - in which case, even though I think it's folly, near any of the workarounds already discussed earlier in this thread can accomplish it.


> There is absolutely a valid use for both cases...

Not really, and I posit (from my own mentioned past misconceptions) that the only reason you think that is because you came from MSSQL and found that it works like that and then built previous projects around that expectation, adjusting your thinking to comply, and then being thoroughly surprised it wasn't so when you encountered another DB engine, like SQLite.  You are simply used to doing it the wrong way, when you really open up your mind to what is technically correct (which I'm hoping to assist with here), you will see that the correct way is better and adjusting your projects to work with the correct way will in the long run help you.


> ...and I think sqlite would benefit from having an extra keyword somewhere that allows null to be unique in indexes perhaps for each column or for the entire unique index.

I do not disagree here, perhaps I would add that the benefit is small but I would welcome it if it placed no additional resource-requirement/strain on the SQLite Query engine.  If it's a choice of small added benefit vs. faster Queries, I'm going to always choose the latter, and so should you.

Whether or not this comes to pass, you would do yourself an enormous favour to start designing DB interactions/projects from the expectation that more than one record can be unset/unknown (NULL) in a UNIQUE field in the your tables. Designing things so that it CAN work like that, means that the project/code will be completely compatible between MSSQL and SQLite (and any other DB you like).


Oh and PS: If you will petition MSSQL to improve their system and add a Keyword to their TSQL that would allow multiple NULLS in UNIQUE constraints - I can find you many people here, including myself, who will 100% join and sign that.  :)

(11) By anonymous on 2021-09-05 15:01:56 in reply to 10

<code>NULL</code> values from different rows _are_ considered equal in <code>SELECT DISTINCT</code>.

This suggests yet another language extension: allow index and table constraint declarations to use the keyword <code>DISTINCT</code> instead of <code>UNIQUE</code> to specify that <code>NULL</code> values should be considered equal to each other.

(12) By Marco Bubke (marcob) on 2021-09-05 19:44:58 in reply to 11 [link]

Actually I think a distict index is what I really want. Thank you for your clarification. 

To some other responders. I really appreciate every response but I don't think metaphysical discurses are not really helpful. But if you wish we can have a deeper discussion about metaphysical terms like 'truth', 'logic' etc. in a historical context. ;-)

(14) By Ryan Smith (cuz) on 2021-09-05 22:29:14 in reply to 12 [link]

> ...we can have a deeper discussion about metaphysical terms like 'truth', 'logic' etc.

No we can't.

(13) By Ryan Smith (cuz) on 2021-09-05 22:21:03 in reply to 11 [link]

> NULL values from different rows ARE considered equal in SELECT DISTINCT.

This is true, simply because with DISTINCT Queries you face the same logic problem as UNIQUE indexes, but this time inverted - think of it like this: for a UNIQUE constraint, you have to prove two values ARE EQUAL to constrain them, which cannot be done with NULLS as described elsewhere, and for a DISTINCT requirement on a query you have to assert that two values are definitely NOT EQUAL (distinct) to add them both, and that is again something you cannot do with NULL/undefined/unset/unknown values. Ok, so both equality AND inequality is not provable for NULLs, what to do then?

Well, what DISTINCT queries (and sorters) use is the most sensible alternate: to lump all the unknowns together. Perfectly sensible and fine, you will find nobody crying about this and I think near all engines agree on it. In order to facilitate this "lumping together" of the unknowns, the engine and its documentation "considers them equal" - but that is merely a consideration, a means to an end and not an admission of equality. To think otherwise would be a logic mistake, and doesn't transfer/apply to the rest of the discussion regarding UNIQUE constraints.

As to your proposed language extension - sure, if you can make a case for needing distinct queries to separate all the null values into different rows in stead of lumping them together, perhaps it might be considered.

(5) By David Raymond (dvdraymond) on 2021-09-01 13:19:30 in reply to 3 [link]

How about adding in another column used as a boolean that says whether you have an actual value for that?

Where column "b" is your column needing the "non value" value (for this example -1), then how about something like:

create table foo (
    a not null,
    b not null,
    b_is_unknown not null
        check (b_is_unknown in (0, 1)),
    check (case when b_is_unknown then b = -1 end)
);
create unique index foo_idx on foo (a, b, b_is_unknown);


sqlite> insert into foo values (1, 2, 0); -- normal value
sqlite> insert into foo values (1, 2, 0); -- can't dupcliate normal values
Error: UNIQUE constraint failed: foo.a, foo.b, foo.b_is_unknown
sqlite> insert into foo values (1, 2, 1); -- using bad value for unknown
Error: CHECK constraint failed: case when b_is_unknown then b = -1 end
sqlite> insert into foo values (1, -1, 1); -- proper unknown value
sqlite> insert into foo values (1, -1, 0); -- -1 is an actual value here
sqlite> insert into foo values (1, -1, 1); -- can't submit 2 unknown b's with the same a
Error: UNIQUE constraint failed: foo.a, foo.b, foo.b_is_unknown
sqlite> insert into foo values (1, 2, -1); -- bad value for b_is_unknown flag
Error: CHECK constraint failed: b_is_unknown in (0, 1)

(6) By Marco Bubke (marcob) on 2021-09-01 14:59:47 in reply to 5 [link]

This is what I doing all the time but I would say it is not making the design easier to read and maintain. ;-)