Indexing JSON array contents
(1) By Andrea Aime (andrea.aime) on 2020-05-08 15:57:16 [link] [source]
Hi, I'm trying to confirm one limit of JSON1 usage, in particular, that it's not possible to create an index speeding up matches on the contents of JSON arrays.
Point in case, see this snipped of SQL, creating a table with JSON contents, and finding all the records that have value 'a' in any position of the array:
-- create the table storing the arrays. The array column is a text one> CREATE TABLE test (
id INTEGER PRIMARY KEY NOT NULL,
name TEXT,
array TEXT NOT NULL
);
-- fill using json arrays for the array column
> INSERT INTO test VALUES
(1, 'first', '["a", "b", "c"]'),
(2, 'second', '["b", "c", "e"]'),
(3, 'third', '["d", "a"]');
-- find all records having 'a' in the json array
> SELECT name FROM test
WHERE EXISTS (
SELECT *
FROM json_each(test.array)
WHERE json_each.value = 'a'
);
first
third
I don't think it's possible to create an index that would speed up this query, without maybe resorting to separate tables and triggers... right?
Best regards Andrea
(2) By Warren Young (wyoung) on 2020-05-08 18:14:35 in reply to 1 [link] [source]
If you could use generated columns with table-valued functions, then you could create a STORED
version of your EXISTS
expression, but you can't, so...
Another approach might be to extend JSON1 with a json_find()
function that returns an array index. Then you could have something like this:
CREATE TABLE test (
id INTEGER PRIMARY KEY NOT NULL,
name TEXT,
array TEXT NOT NULL,
has_a INT AS (json_find(array, 'a') > 0) STORED
);
...and then create the index on test.has_a
.
(3) By Richard Hipp (drh) on 2020-05-08 18:53:46 in reply to 1 [link] [source]
Indexes are one-to-one. There is one entry in the index for each row in the table.
You are asking for an index that is many-to-one - multiple index entries for each table row.
A Full-Text Index is basically a many-to-one index. So if all of your array entries really are words, you could probably get this to work using a Full-Text Index.
But you would have to keep the full-text index up-to-date yourself, either using triggers, or in your application. It would not be automatic. So, I think the answer to your question is probably "no".
(4) By Kapil (kapily) on 2021-01-07 06:46:49 in reply to 1 [link] [source]
Here's an alternative solution using json_each and triggers. It's similar to the solution proposed by drh. If you want to perform count(...) queries on the elem
field, you might prefer this approach over the FTS5 approach. Here's an small example using your schema:
Create test table
CREATE TABLE test (
id INTEGER PRIMARY KEY NOT NULL,
name TEXT,
array TEXT NOT NULL
);
Create index table
CREATE TABLE test_idx (
fid INTEGER NOT NULL,
elem TEXT NOT NULL,
PRIMARY KEY(fid, elem),
FOREIGN KEY(fid) REFERENCES test(id)
);
Note:
- In this table, each row corresponds to one element in the array.
- In this example, we set our primary key to (fid, elem). If the elements inside
array
are not unique, you'll want to just create a new primary key field for this table and remove the PRIMARY KEY across both columns above.
Create Triggers to keep the index up to date
CREATE TRIGGER test_ai AFTER INSERT ON test BEGIN
INSERT INTO test_idx(fid, elem) select new.id, j.value from json_each(new.array) as j;
END;
CREATE TRIGGER test_ad AFTER DELETE ON test BEGIN
DELETE FROM test_idx where fid = old.id;
END;
CREATE TRIGGER test_au AFTER UPDATE ON test BEGIN
DELETE FROM test_idx where fid = old.id;
INSERT INTO test_idx(fid, elem) select new.id, j.value from json_each(new.array) as j;
END;
You'll likely want to add an index to the elem
column in test_idx
for faster querying.
(5) By little-brother on 2021-01-08 10:46:43 in reply to 2 [source]
use generated columns with table-valued functions, then you could create a STORED version of your EXISTS expression
More info here - https://dgl.cx/2020/06/sqlite-json-support :)