Create Index allows non-existent column
(1) By dbschwartz on 2022-05-29 05:41:05 [link] [source]
I was able to create an index like this:
CREATE TABLE FileInfo (
"fileInfoKey.caseFileVersionId.id" TEXT,
"fileInfoKey.versionId.id" TEXT,
"fileInfoKey" TEXT,
PRIMARY KEY ("fileInfoKey.caseFileVersionId.id", "fileInfoKey.versionId.id")
);
CREATE INDEX "abc" on FileInfo("xxx");
The table exists, but there is no column named "xxx". What column does it index in this case? If I do not use a quoted column name it does properly error out, saying a syntax error on 'xxx'.
(2) By SeverKetor on 2022-05-29 06:05:26 in reply to 1 [link] [source]
Just tried this myself. On my main 3.39.0 install with double-quoted strings disabled it didn't work. On an older 3.36.0 install with double-quoted strings allowed, it did.
I suspect what it's doing is creating an index on the expression 'xxx', which while non-sensical, is valid. Removing the quotes properly causes an error because it can no longer be considered a string.
If you can, I'd just re-compile with double-quoted strings disabled, as is recommended.
(3) By dbschwartz on 2022-05-29 12:59:13 in reply to 2 [source]
Thanks. I need the double quoted strings though, because we have various delimiters in our column names.
If we are careful and make sure the values in the double quotes are valid, will it work properly?
(4) By dbschwartz on 2022-05-29 13:03:35 in reply to 2 [link] [source]
For more info on this, if you look at the column names, those are not database delimiters, they are special things we need in our database. This will all be used only by our code, not an end-user, so from that standpoint it is fine.
If you can confirm that correctly specified it will generate an index on the desired column, then all good, we can work around this for now.
We would specify an index like this:
CREATE TABLE MyTable (
"a.b.c.d" INTEGER,
"x.y.z" TEXT,
);
CREATE INDEX MyIndex ON MyTable("x.y.z");
(5) By dbschwartz on 2022-05-29 13:27:06 in reply to 2 [link] [source]
It looks like this is supported, here is the segment of the documentation:
SQLite accepts both of the above. But, in an effort to be compatible with MySQL 3.x (which was one of the most widely used RDBMSes when SQLite was first being designed) SQLite will also interpret a double-quotes string as string literal if it does not match any valid identifier.
This misfeature means that a misspelled double-quoted identifier will be interpreted as a string literal, rather than generating an error. It also lures developers who are new to the SQL language into the bad habit of using double-quoted string literals when they really need to learn to use the correct single-quoted string literal form.
So as long as we spell them correctly it should be fine.
I verified using EXPLAIN QUERY PLAN. With a LIKE
operator and just a couple of records, it says it is using SCAN
. With an =
operator it showed it is indeed using the index.
This was just a couple of records, I assume with larger data the LIKE
operator would find it more efficient to use the index as well.
(6) By Keith Medcalf (kmedcalf) on 2022-05-29 17:25:49 in reply to 5 [link] [source]
This was just a couple of records, I assume with larger data the LIKE operator would find it more efficient to use the index as well.
Incorrect.
LIKE
is used to search text strings and the index must be useable for the search. By default, LIKE
is case insensitive. That means that if you do not change LIKE
to be case sensitive (there is a pragma for that), then it can only use a COLLATE NOCASE
index (or an index on a tuple value that is declared COLLATE NOCASE
). (Mutatis Mutandis if you have used the pragma so that LIKE
is case sensitive (not the default) then the index must not be COLLATE NOCASE
and the tuple value in the index must not be COLLATE NOCASE
.
Also, the LIKE string must not commence with a wildcard (that is the B in A LIKE B must not start with a wildcard).
If and only if both those conditions are met CAN the index be used for LIKE. If these conditions are not met then an index CANNOT be used, ever, and will not be.
(7) By dbschwartz on 2022-05-29 19:22:02 in reply to 6 [link] [source]
Thanks Keith, this is very helpful. I didn't realize there were special requirements for LIKE index use, I'll make the changes.
(8) By Keith Medcalf (kmedcalf) on 2022-05-29 19:56:56 in reply to 7 [link] [source]
See https://sqlite.org/optoverview.html#the_like_optimization for the full documentation on the how LIKE (and GLOB) are optimized.
(9.1) By Ryan Smith (cuz) on 2022-05-29 21:37:51 edited from 9.0 in reply to 3 [link] [source]
I think you misunderstand...
"Double-Quoted-Strings" refers to a habit of some earlier SQLite DB users to use double-quotes for STRINGS in an SQL statement, as opposed to IDENTIFIERs. IDENTIFIERs (such as column-names, DB-names, index names, etc.) may ALWAYS be double-quoted, and in fact are required to be double-quoted when they contain funky characters, like periods, as is the case for you. This has nothing to do with double-quoted-strings.
To be clear, in this example:
SELECT P.id, "P"."name", P."social.security"
FROM "people.details" AS P
WHERE P."paternal.parent" LIKE 'John'
;
The P is an identifier (Alias), and so are id, name, social.security and paternal.parent, whereas 'John' is a STRING. When Double-Quoted-Strings are turned OFF, all the identifiers as they are quoted currently, MAY still be quoted in the same way, since they are not strings and so that define does not apply to them. In fact, the identifiers "social.security", "people.details" and "paternal.parent" very much require double-quoting since they contain characters that would otherwise parse differently. The string 'John' however MUST be quoted with single quotes in this scenario.
When double-quoted-strings are allowed, the only difference is that the string 'John' may also be written as "John", which is technically wrong in SQL terms, but SQLite will forgivingly still parse it as a string.
To demonstrate why this is a problem, imagine if someone added a column named "John" at some point, now if we still allow double-quoted-strings, who can tell whether the Query designer meant the column "John", or the value 'John'?
Anyway, I'm pretty sure it's not a setting you need - certainly not because of your column names containing periods, that will still work just fine with double-quotes even when double-quoted-strings are turned off.
(10) By dbschwartz on 2022-05-29 22:44:36 in reply to 8 [link] [source]
Thanks very much. I have the LIKE query using an index now, perfect!
(11) By dbschwartz on 2022-05-30 13:44:26 in reply to 9.1 [link] [source]
Thanks Ryan, this is very helpful. You are totally correct, what we need are double-quoted identifiers, never double-quoted STRINGs. I will turn off the Double-Quoted Strings as we never will use those for literal values.
Between all of the responses I now have a full understanding of this to get it to work the way we need.