SQLite Forum

Dense vector search in SQLite?
Login

Dense vector search in SQLite?

(1) By Simon Willison (simonw) on 2022-09-05 18:12:46 [link] [source]

I just saw this new feature that's now available in Solr:

https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html

"Solr’s Dense Vector Search adds support for indexing and searching dense numerical vectors."

Made me think: are there existing tricks for doing something like this using SQLite? Is it something that an extension could implement?

It feels like SQLite could offer a really good platform to implement this kind of feature. I'm interested in hearing if there's existing work in this space already - or if it's the kind of thing that SQLite itself might consider implementing as an extension, similar to the existing R*Tree module.

(2) By Gunter Hick (gunter_hick) on 2022-09-06 06:24:40 in reply to 1 [link] [source]

SQLite's virtual table mechanism already allows the implementation of custom indexing methods. I have personally used it to implement bitmap indexing.

As long as it is possible to estimate the effort required for a specific set of constraints - a constraint being a tuple (field, relational-op) - reasonably well, the query planner can find a fast query by itself.

Even if you can only determine which constraints can be handled by your indexing method, you can use CROSS JOIN to force a hand-picked plan.

(3) By Simon Willison (simonw) on 2022-09-06 14:33:18 in reply to 2 [link] [source]

"Even if you can only determine which constraints can be handled by your indexing method, you can use CROSS JOIN to force a hand-picked plan."

That sounds VERY useful - could you expand that a little with an example? I can't quite picture what that would look like.

(4) By Gunter Hick (gunter_hick) on 2022-09-06 15:23:44 in reply to 3 [source]

An SQLite virtual table is implemented by a set of functions (methods) described here https://sqlite.org/vtab.html

The CROSS JOIN is explained in section 7.3 of https://sqlite.org/optoverview.html

Lets assume you have a table of text documents:

CREATE TABLE docs (Id INTEGER PRIMARY KEY, content TEXT);

You then need a virtual table module to generate a vector from a text.

CREATE VIRTUAL TABLE text2vec4 USING vector(4);

which pretends to be a table declared as (v1 float, v2 float, v3 float, v4 float, content text hidden). 

You can then SELECT * from text2vec4 where content = 'sample text'; to return a vector representation of the content. Syntactic sugar allows you to SELECT text2vec4('sample text'); instead

Then another virtual table to store and retrieve vectors.

CREATE VIRTUAL TABLE docs_index USING dense(4);

which pretends to be a table declared as (v1 float hidden, v2 float hidden, v3 float hidden, v4 float hidden, docid integer);

You can then load your data using

INSERT INTO docs_index(docid,v1,v2,v3,v4) SELECT Id,text2vec4(content) FROM docs;

And you can retrieve documents matching a search phrase using

SELECT content FROM docs_index CROSS JOIN docs ON docs_index.docid = docs.Id WHERE (v1,v2,v3,v4) = text2vec4('search phrase');

or shorter

SELECT content FROM docs WHERE Id = docs_index(text2vec4('search phrase'));

(5) By Simon Willison (simonw) on 2022-09-07 00:14:14 in reply to 4 [link] [source]

Thanks very much, I completely understand what you are proposing now. I'll keep this in mind for future projects!

(6) By Simon Willison (simonw) on 2022-09-07 00:49:54 in reply to 5 [link] [source]

I found this explanation of CROSS JOIN as an optimization trick really useful: https://sqlite.org/queryplanner-ng.html#_fixing_the_problem