SQLite Forum

Indexing JSON array contents
Login

Indexing JSON array contents

(1) By Andrea Aime (andrea.aime) on 2020-05-08 15:57:16

Hi,
I'm trying to confirm one limit of JSON1 usage, in particular, that it's not possible to create an index speeding up matches on the contents of JSON arrays.

Point in case, see this snipped of SQL, creating a table with JSON contents, and finding all the records that have value 'a' in any position of the array:

```sql
-- create the table storing the arrays. The array column is a text one> CREATE TABLE test (
  id INTEGER PRIMARY KEY NOT NULL, 
  name TEXT,
  array TEXT NOT NULL
);

-- fill using json arrays for the array column
> INSERT INTO test VALUES
(1, 'first', '["a", "b", "c"]'),
(2, 'second', '["b", "c", "e"]'),
(3, 'third', '["d", "a"]');

-- find all records having 'a' in the json array
> SELECT name FROM test
WHERE EXISTS (
  SELECT *
  FROM json_each(test.array)
  WHERE json_each.value = 'a'
);
first
third

```

I don't think it's possible to create an index that would speed up this query, without maybe resorting to separate tables and triggers... right?

Best regards
Andrea

(2) By Warren Young (wyoung) on 2020-05-08 18:14:35 in reply to 1 [link]

If you could use generated columns with table-valued functions, then you could create a `STORED` version of your `EXISTS` expression, but you can't, so...

Another approach might be to extend JSON1 with a `json_find()` function that returns an array index. Then you could have something like this:

```SQL
CREATE TABLE test (
  id INTEGER PRIMARY KEY NOT NULL, 
  name TEXT,
  array TEXT NOT NULL,
  has_a INT AS (json_find(array, 'a') > 0) STORED
);
```

...and then create the index on `test.has_a`.

(5) By little-brother on 2021-01-08 10:46:43 in reply to 2 [link]

> use generated columns with table-valued functions, then you could create a STORED version of your EXISTS expression

More info here - [https://dgl.cx/2020/06/sqlite-json-support](https://dgl.cx/2020/06/sqlite-json-support) :)

(3) By Richard Hipp (drh) on 2020-05-08 18:53:46 in reply to 1 [link]

Indexes are one-to-one.  There is one entry in the index for each row in the
table.

You are asking for an index that is many-to-one - multiple index
entries for each table row.

A Full-Text Index is basically a many-to-one index.  So if all of your
array entries really are words, you could probably get this to work
using a [Full-Text Index][1].

[1]: https://www.sqlite.org/fts5.html

But you would have to keep the full-text index up-to-date yourself, either
using triggers, or in your application.  It would not be automatic.  So,
I think the answer to your question is probably "no".

(4) By Kapil (kapily) on 2021-01-07 06:46:49 in reply to 1 [link]

Here's an alternative solution using json_each and triggers. It's similar to the solution proposed by drh. If you want to perform count(...) queries on the `elem` field, you might prefer this approach over the FTS5 approach. Here's an small example using your schema:

**Create test table**

```
CREATE TABLE test (
  id INTEGER PRIMARY KEY NOT NULL, 
  name TEXT,
  array TEXT NOT NULL
);

```

**Create index table**

```
CREATE TABLE test_idx (
  fid INTEGER NOT NULL, 
  elem TEXT NOT NULL,
  PRIMARY KEY(fid, elem),
  FOREIGN KEY(fid) REFERENCES test(id)
);
```

Note:

- In this table, each row corresponds to one element in the array.
- In this example, we set our primary key to (fid, elem). If the elements inside `array` are not unique, you'll want to just create a new primary key field for this table and remove the PRIMARY KEY across both columns above. 


**Create Triggers to keep the index up to date**

```
CREATE TRIGGER test_ai AFTER INSERT ON test BEGIN
  INSERT INTO test_idx(fid, elem) select new.id, j.value from json_each(new.array) as j;
END;

CREATE TRIGGER test_ad AFTER DELETE ON test BEGIN
  DELETE FROM test_idx where fid = old.id;
END;

CREATE TRIGGER test_au AFTER UPDATE ON test BEGIN
  DELETE FROM test_idx where fid = old.id;
  INSERT INTO test_idx(fid, elem) select new.id, j.value from json_each(new.array) as j;
END;
```

You'll likely want to add an index to the `elem` column in `test_idx` for faster querying.