SQLite User Forum

Virtual table xBestIndex help
Login

Virtual table xBestIndex help

(1) By curmudgeon on 2026-03-06 10:15:35 [source]

In attempting to understand virtual tables I've been coding a module that merely passes on all instructions to an existing table. It's pointless but, as it passes through functions, it prints out the function name and the data that's passed to it whenever a query is run. These print outs are helping my understanding no end but there's one thing (so far) that puzzles me.

The xBestIndex function has two parameters sqlite3_vtab* and sqlite3_index_info*. In the xBestIndex function I can use pIdxInfo->colUsed to determine what columns I need to retrieve from the underlying table and the pIdxInfo->aConstraint to determine the restrictions and suitable index. In this way I can build a stmt for the underlying actual table that can be used in xFilter. If the first parameter to xBestIndex was sqlite3_vtab_cursor* (rather than sqlite3_vtab*) then I could store this stmt in my derived version of the sqlite3_vtab_cursor object.

As things stand, I find the only thing I can do is store the stmt sql in pIdxInfo->idxStr so it's passed to xFilter but that means I then have to reprepare it (among other things) in xFilter.

So what puzzles me is this. Why is xBestIndex 'connection linked' rather than 'cursor linked' (with reference to its first parameter)? I could store the information on my derived version of sqlite3_vtab but it would be cluttered with stmts related to varying cursors associated with the connection.

I hope the above is understandable. I have difficulty with the terminology.

(2) By Alan L (ludlovian) on 2026-03-06 11:01:40 in reply to 1 [link] [source]

Why is xBestIndex 'connection linked' rather than 'cursor linked' (with reference to its first parameter)?

Because cursors only exist during sqlite3_step() of a stmt, and xBestIndex is called whilst sqlite3_prepare() is bulding the stmt.

During preparation, xBestIndex can be called many different times with different sqlite_index_info data as the query planner seeks to decide what approach is best for the query as a whole as it creates the stmt (=byte-code program). So I wouldn't build, create or do anything substantial inside xBestIndex, except return some idxNum/idxStr for each index strategy you are prepared to support, also saying which variables that you would need for that indexing strategy.

The query planner then makes its choice and builds the program/stmt.

When you sqlite3_step() that stmt, the byte-code program first calls xOpen for each cursor it needs on the virtual table, and then calls xFilter with the idxNum/idxStr that it has selected for this search of the virtual table. It also supplies the variables that your indexing strategy requested. Your xFilter will recognise the idxNum/idxStr selected and you can then build the underlying query to position the newly created cursor. It can then respond to xNext, xColumn etc.

(3) By curmudgeon on 2026-03-06 12:55:47 in reply to 2 [link] [source]

Thanks for the reply. I can see from the print out my VT created for a query joining 2 such virtual tables, xBestIndex is called for both tables before open is called for the cursors. That rules out any possibility that xBestIndex could have a cursor parameter.

In your last paragraph you stated "It also supplies the variables that your indexing strategy requested." I'm unsure what you mean as my whole problem revolves around xFilter being short on info due to only having the idxNum and idxStr variables for passing data.

All the data I need to create a parameterised stmt is available inside xBestIndex. xFilter on the other hand doesn't have the colUsed or the list of constraints.

I can pass the sql for the stmt to xFilter via the idxStr parameter but I then have to break that down to create a column map. e.g. If the underlying query is requesting cols 4, 8 and 9 from the real table I have to build a map so I know column 4 is at position 0, col 8 at pos 1 and col 9 at pos 2. This map can then be used in xColumn to extract the column values. Again, the map is easily compiled in xBestIndex but not xFilter.

To my mind it would be better to compile and prepare the stmt inside xBestIndex and add it to a stmt array in my derived version of sqlite3_vtab and pass xFilter the stmt array index via idxNum. Keeping the stmt array tidy (i.e. deleting / adding as necessary) doesn't seem trivial though.

(4) By Roger Binns (rogerbinns) on 2026-03-06 14:21:24 in reply to 3 [link] [source]

You seem to be fighting how virtual tables and best index work, which won't end well!

Can you provide any more information on what the underlying table actually is?

xBestIndex is about the table and the table has fixed columns that you supplied in xCreate so there is no map to make. You are only operating at the level of a table, not a query.

What may be simpler and applicable in your case is a table valued function. The concept is if your programming language had a function that took three parameters (param1, param2, param3 in this example) and provides 4 result columns. The result columns and names are the same no matter what the parameters. Your create would return this SQL:

CREATE TABLE mytable(
  column1,
  column2,
  column3,
  column4,
  param1 HIDDEN,
  param2 HIDDEN,
  param3 HIDDEN
);

For queries you only accept equality with AND in xBestIndex - for example:

SELECT * FROM vtable WHERE param1=3 AND param2='example' AND param3=5.323;

Doing a SELECT * only returns the non-hidden columns. The hidden columns can also be requested explicitly but the values are fixed by the query so that is easy.

If that doesn't work for you, then the documentation for my Python wrapper may provide some insights.

(7) By curmudgeon on 2026-03-06 17:27:49 in reply to 4 [link] [source]

I'll take a look at your python wrapper later Roger. I'll also have to take another look at hidden params to see if they would suit my purpose. Can the hidden params have different values for each cursor?

(10) By Roger Binns (rogerbinns) on 2026-03-06 18:22:12 in reply to 7 [link] [source]

Can the hidden params have different values for each cursor

The cursors are independent of each other so yes.

(5.1) By Alan L (ludlovian) on 2026-03-06 14:35:27 edited from 5.0 in reply to 3 [link] [source]

In your last paragraph you stated "It also supplies the variables that your indexing strategy requested." I'm unsure what you mean as my whole problem revolves around xFilter being short on info due to only having the idxNum and idxStr variables for passing data.

xBestIndex is essentially given a WHERE-like statement that the query planner is proposing.

For example, the planner might be saying "I can give you WHERE location = $param1 and age > $param2 - how about that?". xBestIndex may then reply "Yes, I could do that, with an esimate of 100 rows. Let's call that plan 'location-then-age', and make sure you give me $param1 and $param2 in that order". When the statement is running, the engine may call xOpen to create a new cursor, and then call xFilter on that cursor saying: "Can you position this cursor for the 'location-then-age' search plan, and here are the two parameters you requested for that plan".

I can pass the sql for the stmt to xFilter via the idxStr parameter but I then have to break that down to create a column map

If I understand it, your virtual table will always agree to whatever query plan is proposed (ie it will never return SQLITE_CONSTRAINT) and probably cannot do much to estimate the costs. So the planner will pick one of the plans it proposes, but it might not be optimal. You will have to encode the selection columns & operators you need into idxStr. Or you could put all the data you need into a list held in your subclassed sqlite3_vtab structure, indexed by idxNum and send idxNum back. You'd have to free all that data on disconnect. Or use reference counting and prune it on cursor close.

To my mind it would be better to compile and prepare the stmt inside xBestIndex

The user's query could have two or more cursors onto your table (perhaps it is joined in twice), so each cursor would need its own statement. Each statement may be bound with different parameters and may be stepped (via xNext) at different times. At the time of the xBestIndex you do not know how many cursors the engine will ask to create when the program is run. You only know the query plans it has asked you about.

So I suspect it is cleaner to generate the SQL you need in xBestIndex, but defer the preparation until xFilter and finalize the statement in xClose. Otherwise it gets complex - preparing statements you might not use, or needing to prepare another duplicate statement if more cursors are opened.

The statement_vtab virtual table does something similar and is worth looking at. It allows you to create parameterised SELECTs

sqlite> .load db/statement.so
sqlite> CREATE VIRTUAL TABLE this_and_next USING statement((SELECT :x as y UNION ALL SELECT :x + 1));
sqlite> WITH nums(x) as (VALUES (10),(20),(30)) SELECT * FROM nums JOIN this_and_next USING(x);
 x  | y
----+----
 10 | 10
 10 | 11
 20 | 20
 20 | 21
 30 | 30
 30 | 31

The columns & parameters (which become hidden columns) are analyzed at connection time in order to build the CREATE TABLE statement, but the underlying statement is only prepared in xOpen and then freed in xClose.

(6) By curmudgeon on 2026-03-06 17:27:11 in reply to 5.1 [link] [source]

So I suspect it is cleaner to generate the SQL you need in xBestIndex, but defer the preparation until xFilter and finalize the statement in xClose. Otherwise it gets complex - preparing statements you might not use, or needing to prepare another duplicate statement if more cursors are opened.

Yes, I agree with that. I could probably get away with passing the colUsed through the idxNum param (only first 32 bits passed though) and the sql through the idxStr param.

Will xFilter always be called with the same idxNum and idxStr for a particular cursor?

If that was the case then, on the first pass, I could store the prepared statement in my subclass of sqlite3_vtab_cursor ready for use in subsequent calls of xFilter from that cursor.

(8) By JayKreibich (jkreibich) on 2026-03-06 17:50:10 in reply to 6 [link] [source]

Reading through this, I feel like there is one bit that might be missing from the understanding of the API. That is, xBestIndex() can get called multiple times in a single prepare. The point of the function is for the query optimizer to get a chance to "poke" the virtual table and ask it "well, what about this? or this? or this?" to get an idea of what operations/ordering/filters can be done efficiently by the VT and what cannot. That's part of the reason you'd never want to prepare something in xBestIndex()... there might be multiple calls to xBestIndex(), but not all those responses are going to get used. This also means you need to be careful about how you allocate and store things you return from xBestIndex(), as you need to make sure they're cleaned up later.

(15) By curmudgeon on 2026-03-07 08:04:56 in reply to 8 [link] [source]

Yes. I've given up on the idea of storing anything cursor related in my sqlite3_vtab subclass.

(9) By Alan L (ludlovian) on 2026-03-06 18:01:15 in reply to 6 [link] [source]

Will xFilter always be called with the same idxNum and idxStr for a particular cursor?

I don't know, but I wouldn't rely on it. I can imagine the query planner reusing a cursor. For example, consider:

select * from your_virtual_table
where col1 = $param1

union all

select * from your_virtual_table
where col2 between $param2 and $param3

The engine will call xFilter with two separate plans - two of the possibly many it called xBestIndex with. Will it create a new cursor for each, or reuse the first cursor? Quite likely to reuse I would have thought.

(12) By curmudgeon on 2026-03-06 19:54:31 in reply to 9 [link] [source]

The print out for the union query had two calls to open and two to close. Not that that proves it's one filter per cursor.

(13) By Alan L (ludlovian) on 2026-03-06 21:58:52 in reply to 12 [link] [source]

Thats's interesting. Maybe it won't currently reuse a cursor for a different plan, but it can definitely have two xFilter calls for the same open cursor.

sqlite> .load db/statement.so
sqlite> create virtual table add_one using statement((select :x+1 as y));
sqlite> explain select y from add_one where x=10 or x=11;
addr  opcode         p1    p2    p3    p4             p5  comment      
----  -------------  ----  ----  ----  -------------  --  -------------
0     Init           0     24    0                    0   Start at 24
1     VOpen          0     0     0     vtab:5555D74B3B18 0   
2     BeginSubrtn    0     4     0                    0   r[4]=NULL
3       Once           0     12    0                    0   
4       OpenEphemeral  1     1     0     k(1,B)         0   nColumn=1; RHS of IN operator
5       Integer        10    5     0                    0   r[5]=10
6       MakeRecord     5     1     6     C              0   r[6]=mkrec(r[5])
7       IdxInsert      1     6     5     1              0   key=r[6]
8       Integer        11    5     0                    0   r[5]=11
9       MakeRecord     5     1     6     C              0   r[6]=mkrec(r[5])
10      IdxInsert      1     6     5     1              0   key=r[6]
11      NullRow        1     0     0                    0   
12    Return         4     3     1                    0   
13    Rewind         1     23    0                    0   
14      Column         1     0     3                    0   r[3]= cursor 1 column 0
15      IsNull         3     22    0                    0   if r[3]==NULL goto 22
16      Integer        0     1     0                    0   r[1]=0
17      Integer        1     2     0                    0   r[2]=1
18      VFilter        0     22    1                    0   iplan=r[1] zplan=''
19        VColumn        0     0     7                    0   r[7]=vcolumn(0)
20        ResultRow      7     1     0                    0   output=r[7]
21      VNext          0     19    0                    0   
22    Next           1     14    0                    0   
23    Halt           0     0     0                    0   
24    Transaction    0     0     1     0              1   usesStmtJournal=0
25    Goto           0     1     0                    0   

Steps 2-12 put the values 10 & 11 in ephemeral table. And then 13-22, including the VFilter at 18, is run for each row in that table.

(20) By curmudgeon on 2026-03-07 10:09:55 in reply to 13 [link] [source]

The xFilter can be called multiple times. In fact every time the values for the 'Expressions' change it will be called.

(11) By Joshua Randall (jrandall) on 2026-03-06 19:28:06 in reply to 3 [link] [source]

I think what you may be looking for is that xFilter is also passed argc and argv (int argc, sqlite3_value **argv) but those only include relevant column/expression values if your xBestIndex method asks for them.

xBestIndex does that by populating aConstraintUsage[].argvIndex in sqlite3_index_info.

The way this works is that the aConstraintUsage input array is parallel to the nConstraints input array (i.e. aConstraintUsage[0] corresponds to the constraints specified in nConstraints[0], aConstraintUsage[1] corresponds to nConstraints[1], and so on). By default, all of the aConstraintUsage elements are initialized with argvIndex of 0, which means to exclude them from the call to xFilter.

If your indexing strategy depends on some of the constraints, then you should set some (or all) of those values to consecutive non-zero numbers, which are the positions in the argv array (starting with argvIndex=1) where xFilter will receive the values when it is called.There is a bit of a quirk there in that you specify e.g. argvIndex = 1 for a constraint, and it is passed in to xFilter as argv0 but this is clearly documented:

For example, if the aConstraint[3].argvIndex is set to 1, then when xFilter is called, the argv[0] passed to xFilter will have the EXPR value of the aConstraint[3] constraint.

It sounds like your xBestFilter is not presently capable of doing much interesting work (usually the most relevant thing it can do is calculate estimates for cost and number of rows). If you want to do all the work in xFilter, just arrange for all the constraints to be passed as args to xFilter.

Cheers,

Josh.

(14) By curmudgeon on 2026-03-07 08:02:30 in reply to 11 [link] [source]

I'm not sure what you mean Josh. I'm aware aConstraintUsage passes the values for 'Expression' but I can't see any way of utilising the aConstraintUsage array to pass additional info. argc is set by sqlite for example.

(16) By Alan L (ludlovian) on 2026-03-07 08:24:18 in reply to 1 [link] [source]

In attempting to understand virtual tables I've been coding a module that merely passes on all instructions to an existing table

Thinking about this, I think "merely" is the wrong word. To get a virtual table to do what a real table does you have two main choices at either end of a spectrum.

First, you could do it "properly". By which I mean your xBestIndex() has to assess the constraints being considered and work out the likely cost of such a query, and will be asked about many such plans. Working out the cost is a big part of the query planner for real tables, involving looking at indices & statistics tables. Replicating that in your virtual table is heroic.

The alternative is "cheap & cheerful". You simply reply SQLITE_OK in your xBestIndex(). The default (expensive) cost will apply to all query plans. So the query planner will put your table in the outermost loop that it can, and assume a table scan. And that's what you implement: xFilter() starts an unconditional 'select * from', and xNext steps through it. The core will apply its own filters. It'll work, but it'll be slow.

You could do something in between. For example if the constraint includes EQUALS on the primary key, you can offer a lower cost, or a high cost (due to table scan) if not. Now you support two different query plans (numbered 1 & 2) - a simple select on the primary key, or a table scan.

(17) By curmudgeon on 2026-03-07 08:56:32 in reply to 16 [link] [source]

It's actually fairly easy to know if the actual underlying table has a usable index. In xBestIndex building the sql for the query is easy using the constraints and colUsed. You can then run the query "EXPLAIN QUERY PLAN sql" and check if the 'detail' column contains the word 'SCAN'. If it doesn't it's using an index (I think).

Working out the cost or estimated rows is harder.

(21) By Alan L (ludlovian) on 2026-03-07 11:14:42 in reply to 17 [link] [source]

Working out the cost or estimated rows is harder.

Indeed. But that's the whole point of xBestIndex!

A virtual table is usually for presenting data which looks like a regular table but is somehow different, else you'd simply access it as a regular table.

Perhaps because the underlying data comes from elsewhere - eg csv or zipfile. Or because the data is stored in the database but you have better indices than regular ones - eg FTS or rtree. Or because you want to present tabular data from a function with parameters - eg json_each or dbstat.

In each of those cases, you'll know which access plans are permitted. And so you'll either know the relative cost of those different plans (because they're your routes into the data), or you won't care beyond a certain point (because you will present the whole result table once you have enough parameters).

I appreciate you are trying something different - to replicate in a virtual table what you can already do with existing tables. But I don't think that's a well-trodden route, so I wouldn't expect that to be easy.

(18) By Bo Lindbergh (_blgl_) on 2026-03-07 09:01:31 in reply to 16 [link] [source]

You know what addition to SQLite would be really useful in this context? This one!

extern int sqlite3_estimate(
    sqlite3 *db,
    char const *sql_text,
    int sql_length,
    double *estimated_cost,
    sqlite3_int64 *estimated_rows);

(19) By curmudgeon on 2026-03-07 09:03:51 in reply to 18 [link] [source]

Yip, that would be great.

(22) By Gunter Hick (gunter_hick) on 2026-03-09 06:52:09 in reply to 1 [link] [source]

Consider the following pseudocode as a mental model:

procedure plan_query

foreach query_plan
  foreach table
    if table_is_virtual
      set usable parameters
      call xBestIndex
      save idxStr and usage
    else
      estimate_native_cost
discard inferior plans
code best plan


procedure run_query
  call xOpen
  foreach input_set
    call xFilter(idxStr,<input_values>)
    repeat
      call xColumn
      call xNext
    until call xEof returns TRUE

(23) By curmudgeon on 2026-03-09 14:00:07 in reply to 22 [link] [source]

The trace style printout my VT produces goes along with that Gunter.

(24) By curmudgeon on 2026-03-09 14:11:25 in reply to 1 [link] [source]

Can anyone explain this.

Suppose the real table I use for the VT has sql = 'CREATE TABLE t(c1, c2);'.

If I query

SELECT c1 FROM vt LIMIT 5 OFFSET 3;

the xBestIndex pIdxInfo->nConstraints == 2 (as expected).

If I query

SELECT c1 FROM vt WHERE c1 > c2 LIMIT 5 OFFSET 3

the xBestIndex pIdxInfo->nConstraints == 0

???????????????????

(25) By ddevienne on 2026-03-09 14:30:14 in reply to 24 [link] [source]

xBestIndex is supposed to be called several times, with different combinations of constraints active or not.
Could it be that this asks you for the cost of a full-table-scan? Isn't it called later with different constraints?

(26.1) By curmudgeon on 2026-03-09 15:18:42 edited from 26.0 in reply to 25 [link] [source]

In this case xBestIndex is only called 1 time before xOpen is called.

(27) By curmudgeon on 2026-03-09 16:43:53 in reply to 25 [link] [source]

I can understand why sqlite decides there's no constraint it can pass to xBestIndex that will help with the c1 > c2 condition but why doesn't it pass LIMIT and OFFSET? It did so for the previous sql which also required a full scan.

(28) By JayKreibich (jkreibich) on 2026-03-09 16:54:47 in reply to 27 [link] [source]

Because the limit/offset cannot be applied until after the filter (c1 > c2) has been applied.

In the first case, SQLite knows exactly how many rows are needed from the table to satisfy the query, and knows which rows they are.

In the second case, SQLite does not know. It might need thousands of rows. Since the virtual table doesn't understand the filter, it cannot pre-filter/pre-select the rows.

(29) By ddevienne on 2026-03-09 17:01:58 in reply to 28 [link] [source]

But Jay, the VT could have internal indexes to speed up the WHERE clause.
That's the whole point of xBestIndex, to let the VT know about filtering and see if it can speed that case up.
So somehow SQLite considers that particular WHERE clause isn't optimizable at all?

(30.1) By Alan L (ludlovian) on 2026-03-09 17:40:47 edited from 30.0 in reply to 29 [link] [source]

The main role is to opttmise the loop orders. So if the criterion was

...WHERE vt.c1 > other_table.c3

The it would call xBestIndex() with SQLITE_INDEX_CONSTRAINT_GT in order to see which of those two tables should be in the outer loop, and which in the inner.

But the constraint

...WHERE vt.c1 > vt.c2

is simply a filter test on each row. It's possible the VT could have some additional index allowing it to quickly pick rows where c1 > c2, but that's probably out of scope for the use-cases for VTs. Where they do apply better indices than normal SQL, like FTS or rtree, then the criteria are supplied via functions or as values from elsewhere than the current row.

(32) By JayKreibich (jkreibich) on 2026-03-09 17:51:07 in reply to 29 [link] [source]

Because that's not how databases work at the table level. A traditional filter compares some set of row values to a KNOWN value. Think about prepared statements, value binding, that kind of thing. Specialized indexing via virtual tables like FTS or RTrees are the same thing... they're designed to quickly filter and select rows based off comparisons to "outside" (i.e. not part of the row), constant values. It's no accident that the whole xBestIndex interface is designed around a series of constraints that compare one column to one (not even given) value using one operation. It's all designed around what can be expressed in bound statements, because those are designed around the core database engine and how it is expected to be used.

This query is comparing two columns of the same row. No constant values. There isn't even a way to express that with the current xBestIndex record structure... there is no way to define a single constraint with two columns, so even if the query optimizer could do something with this, it can't express it in the current xBestIndex model... so I'm not sure what you're expecting here.

More to the point, you cannot use a traditional database index to speed up that kind of query, and the whole query optimizer, and the formal Relational Algebra and Relational Calculus mathematics the optimizer is designed around, are all modeled on that type of traditional index. So, no... from an Relational math standpoint, that type of condition is not optimizable. Not in a way that would get passed down to the raw row reads.

And before you try to describe a complex set of "internal indexes" or conditions on the table as a whole that might create this constraint, remember that mathematically speaking database rows are a set, and have no ordering, nor any definable relationship between rows (in SQL query results can; tables do not). You might have a application-logic model that creates a table, virtual or otherwise, with known constraints or implied ways of computing/indexing this condition, but that's an application thing, not a Relational Model thing, and outside the scope of the query optimizer.

Any special system or index you might use to do so would basically boil down to pre-computing the condition and doing a more traditional filter (is this computed column equal to true?, with "true" being an external constant value across rows). You could use a traditional index on such a column, but in the real world it would likely lower performance.

If you change either of those values to a constant (or even a bind value; e.g. "c1 < ?") you should see them in xBestIndex, but not with an expression that doesn't involve a constant.

(31) By curmudgeon on 2026-03-09 17:39:01 in reply to 28 [link] [source]

In the second case sqlite just calls xNext 3 times to skip the OFFSET, then takes the next 5 xNext records to satisfy the LIMIT and make up the result set. Why doesn't it just do that in the first case? I don't understand why it passes the LIMIT, OFFSET in case 1 but not case 2.

(33) By JayKreibich (jkreibich) on 2026-03-09 18:09:32 in reply to 31 [link] [source]

It does not. In the second case it reads rows, looking for rows at satisfy the c1>c2 condition. It discards the first three rows that satisfy that condition. It keeps reading rows until it finds five more rows that satisfy that condition (or runs out of rows). LIMIT/OFFSET are applied to the result set after the WHERE conditions, not the table(s) (how would that even work in a query with multiple tables JOINed together?). In the first case, the table is the result set as the query has no additional filtering or constraints, so the filtering can be applied directly to the row-reads. In the second case it is not, so the filtering can't be done at the table level.

(34) By curmudgeon on 2026-03-09 18:35:09 in reply to 33 [link] [source]

My apologies. I was forgetting the rows it was reading in case 2 also had to pass the c1 > c2 condition.

(35) By Alan L (ludlovian) on 2026-03-09 19:02:48 in reply to 31 [link] [source]

I don't understand why it passes the LIMIT, OFFSET in case 1 but not case 2.

Passing the LIMIT and OFFSET to a virtual table is relatively recent (v3.38 I think). And it doesn't affect the query plan, but is passed down to allow xBestIndex() to better evaluate the cost. If the VT knows that (in your example) it only has to examine 8 rows, then that adjusts the cost accordingly.

As Jay has pointed out, the LIMIT & OFFSET can only be applied to the final result set, so it is only passed down to VTs if they are the only table in that query (or more likely, subquery). The comment in the relevant part of the code says:

/*
** Possibly add terms corresponding to the LIMIT and OFFSET clauses of the
** SELECT statement passed as the second argument. These terms are only
** added if:
**
**   1. The SELECT statement has a LIMIT clause, and
**   2. The SELECT statement is not an aggregate or DISTINCT query, and
**   3. The SELECT statement has exactly one object in its FROM clause, and
**      that object is a virtual table, and
**   4. There are no terms in the WHERE clause that will not be passed
**      to the virtual table xBestIndex method.
**   5. The ORDER BY clause, if any, will be made available to the xBestIndex
**      method.
**
** LIMIT and OFFSET terms are ignored by most of the planner code. They
** exist only so that they may be passed to the xBestIndex method of the
** single virtual table in the FROM clause of the SELECT.
*/

Why would that be useful? Imagine a query where you want all the recent orders made by your top 5 customers. You can either look at each of the top 5 customers and then their recent orders, or you can look at all recent orders and see which come from the top 5. The different costs of the different plans are what the query planner needs, and the "top 5 customers" is probably an IN () expression with a simple 1-table subquery with a LIMIT clause.

(36) By curmudgeon on 2026-03-10 07:38:20 in reply to 35 [link] [source]

Yes, I'm starting to get to grips with the difficulty sqlite has in what it can and can't pass to the xBestIndex. Having the trace print out mentioned in my opening post has helped my understanding no end. Thanks to Richard and the team for this sublime interface.

(37) By JayKreibich (jkreibich) on 2026-03-10 13:31:01 in reply to 36 [link] [source]

As someone that's written numerous virtual tables, both internal (self-referencing data stored in native SQLite tables) and external (access data outside of SQLite), I will say that xBestIndex is, by far, the most difficult part of implementing any virtual table**. It was even more difficult to describe it in my SQLite book, and I'll admit it got somewhat glossed over. In many cases (especially external access tables) I've just ignored it. That's often a valid option (unless, of course, the whole point of the VT is to provide specialty indexing). Regardless, I would get the whole VT working first and tackle xBestIndex last. xBestIndex is all about performance, and it helps to have a working (if possibly slow) VT before tackling indexing, so you have something to compare and validate against. Even if you are writing a speciality index, I would implement a single constraint (like "key_column = fixed_value") before expanding it to cover general cases.

The last VT I wrote allowed access to Apache Arrow or Parquet files via the SQLite engine, and I think we spent about 75% of the development time on xBestIndex, trying to merge it with Arrow's own indexing/lookup system (although no small part of that was tagging the Arrow schema meta-data so SQLite knew which columns were PKs or unique, or whatever). It worked fine without xBestIndex, it was just somewhat slow.

** with the possible exception of external VTs that are transaction safe, but that's usually "nope, not gonna do that."

(38) By Bo Lindbergh (_blgl_) on 2026-03-10 14:29:58 in reply to 37 [link] [source]

** with the possible exception of external VTs that are transaction safe, but that's usually "nope, not gonna do that."

That's actually "nope, can't do that". The current VT interface doesn't distinguish begin immediate from begin deferred.

(39) By Gunter Hick (gunter_hick) on 2026-03-10 16:04:52 in reply to 38 [link] [source]

From my extensive experience in writing virtual tables (Faircom CTree, shared memory, partitioned tables,...), I stopped providing xBegin/xCommit/xRollback methods because SQLite did funny things when they were provided (cursors/tables going out of scope unexpectedly). Maybe that has changed, but we are quite happy with our current non-transaction-safe virtual tables. We did look into transactional memory, but found that too complicated for the delivered benefit.

(40) By David Jones (vman59) on 2026-03-10 22:14:05 in reply to 37 [link] [source]

I ended up writing a library to facilitate processing the idxInfo argument using a common pattern. Combinations of constraints the VT wants to handle will encode as a idxStr with a standard format that the xFilter method can decode and apply to get a result set.

I'd say xUpdate is as hard cognitively as xBestIndex, but most of my virtual tables are readonly.

(41) By edoggie (ephraimstevens) on 2026-03-10 23:37:50 in reply to 1 [link] [source]

I am so thankful for this thread. I hope it keeps going as I'm learning a lot. This thread should be made a sticky.

I've struggled with parts of the virtual modules myself initially. So I figured I'll share my suggestions so that a newbie can successfully write a basic module. There are several virtual table modules out on the web but not as much has been written about how to successfully write them end-to-end.

I recently had the need to develop a virtual table and I had some growing pains -- especially around grasping the concepts and intricacies and cooperation between xBestIndex and xFilter.

Here is my suggestion for a newbie attempting to write a virtual module from scratch:

  1. Look at the templatevtab.c example that comes with the sqlite build library. I recall it simply print ten rows of numbers between 1000 and 1010 (something like that). This is barebones module where you can trace through the code to see how it works and how the functions interact. I don't recommend starting with the advanced modules because they cloud the basics of understanding of how the virtual modules work. C is verbose enough. Therefore, this templatevtab.c module is as barebones as you'll find. You won't have to navigate through a bunch of extra C code that doesn't pertain to the functionality of a virtual table.

  2. Purchase the book "Using SQLITE". I was shocked, and delighted to see the author (Jay Kreibich) is posting in this thread on the virtual table topic (How cool is that?). The book is an incredible resource for learning SQL concepts in general. Moreover, the book has the best chapter (in my opinion) on how to write virtual table modules step-by-step. This resource helped me the most with creating a virtual table because of how detailed it was written.

  3. Lastly, supplement the above resources with an old Dr. Dobbs journal from 2007: "Query anything with SQLite" by Michael Owens. Dr Dobbs website went offline years ago, but luckily some kind souls have archived some of the old articles. The SQLITE virtual table article can be found here: https://jacobfilipp.com/DrDobbs/articles/DDJ/2007/0712/071101mo01/071101mo01.html . This article is very concise and should fill in the blanks for concepts you may not have grabbed from the other two resources.

Between the three of these resources, I was able to implement a virtual table.

SQLite has some of the best documentation on the planet -- by far. However, it would be nice to have a few more trivial examples in the documentation pertaining to virtual tables.

Two extremely trivial suggestions:

  1. An example showing a virtual table that takes an input string and then returns a single row table containing the input string (aka an echo command). This would introduce someone to hidden columns, the match function, and how to get your parameters/arguments down to xFilter via xBestIndex.

  2. An example where a virtual table that takes a five-character string as input and then outputs a table with five columns. Each row corresponds to one character in the string. Column N of a given row will have the Nth character of the string. Row1, Column1 would contain the first character. Row2, Column2 would contain the second character etc... This would introduce someone to the xNext and XColumn concepts.

After following this thread, it's comforting to know that creating virtual table modules are not trivial -- even for the very experienced participants in the SQLite community. It make me feel better because I indeed struggled for a few days getting a module to work.

(42) By Stephan Beal (stephan) on 2026-03-11 08:20:38 in reply to 41 [link] [source]

This thread should be made a sticky.

The forum doesn't support sticky threads (and if we stickied everyone's favorite topic we'd have several pages of sticky posts, some of them (like this one) relevant only for a sub-1% group of users), but....

This thread was just added to the C API section of /wiki/FAQ in the hope that it may point someone along a helpful path in the future. Suggestions for additions to that FAQ are always welcomed.

(43) By curmudgeon on 2026-03-11 08:56:05 in reply to 42 [link] [source]

Wow. A link to my thread appears on the sqlite wiki? Does this mean I've arrived? 😁

(44) By ddevienne on 2026-03-11 09:14:11 in reply to 41 [link] [source]

SQLite has some of the best documentation on the planet

But it's mostly Reference documentation though. Not a Manual.

For advanced topics like Virtual Tables, resources are old, and you're on your own to work it out using the examples out there. There are many, if you know where to look. SQLite in general does not do hand-holding.

There's no technical writers in the SQLite team, but devs that like writing. There's a difference.

(45) By Gerry Snyder (GSnyder) on 2026-03-11 14:25:11 in reply to 44 [link] [source]

Perhaps you could prevail upon Jay Kreibich to update his book.

(46) By JayKreibich (jkreibich) on 2026-03-11 16:57:49 in reply to 45 [link] [source]

That's exceptionally unlikely to happen. The book originally came out in 2010, and covered SQLite 2.6.x. A handful of years ago (2020? Maybe 2022) O'Reilly contacted me about doing a second edition, and I have to admit that my first response was "You still publish books?".

There may be some other options, however... O'Reilly has options to transfer copyright to an author and/or release the copyright into the public domain once a book ages out. If we go through that process, that could allow sections to be used in the documentation, but it would need a lot of editing and updating. I'm likely to retire from full-time work in a few years (I've been with NVIDIA since 2019 and things have been good), and I've pondered doing some volunteer work for the team once I cross that line. No promises, but we'll see what happens. (DRH: I've been meaning to talk to you about that...)

To be clear, I don't own or control the copyright for "Using SQLite." I do not speak for those that do (O'Reilly Media). But from a personal standpoint, should someone accidentally stumble across one of the many pirate copies of the book's PDF on the web and decided to download and read it, it does not bother me.

(48) By Richard Hipp (drh) on 2026-03-11 17:06:48 in reply to 46 [link] [source]

Let me know know when you are ready to start working on SQLite, Jay. We'll be happy to have you on the team.

(49) By JayKreibich (jkreibich) on 2026-03-11 17:16:56 in reply to 44 [link] [source]

All of this is so very very true. The SQLite documentation is very reminiscent of classic 1990s man pages, that are super terse and dense. Every word, every sentience counts. The docs need to be carefully taken at very literal, face value, with zero assumptions. Reading that kind of documentation is almost as hard as writing it. It's a unique language, like legal contracts but for APIs.

It is also very much assumed that if you don't quite understand something, or you're unsure of a condition, you'll either "try it and see," or you'll dig into the source code to verify. It is clear the project is both written by, and developed for, the kind of super-curious, "know thy craft," low-level developer that can do math in binary in their head, convert hex without thinking, and rattle off IEEE 754 formats needing to look them up. That was most developers in the 1990s. It isn't today.

The documentation actually directly lead to writing the book. I was working on a project, trying to stuff all of Wikipedia into one of the first e-ink book readers (iRex iLiad) when I got frustrated with the SQLite APIs and documentation. It felt like the doc for every API call assumed you already knew and understood the doc for every other API call. I was like, "SOMEONE SHOULD WRITE A BOOK!" A moment later I swear there was a "ding!" in the room, and I literally looked up expecting to see a light bulb floating over my head. Almost three years later....

(47) By JayKreibich (jkreibich) on 2026-03-11 17:00:14 in reply to 41 [link] [source]

Thank you so much. In some ways, that book was written in what feels like a previous lifetime, but it makes me happy to hear there are people out there still getting some value out of it.

(50) By curmudgeon on 2026-03-18 20:51:48 in reply to 1 [link] [source]

If I include a LIMIT OFFSET clause when I'm querying a virtual table the OFFSET and LIMIT are included in the aConstraint array in xBestIndex. If I pass those constraints to xFilter and apply them in there I get a wrong answer. It seems to me that sqlite is applying the LIMIT OFFSET 'externally' in addition to them being applied in xFilter.

Can anyone confirm this? It's not something I can easily supply an example for.

Am I missing something. Should LIMIT & OFFSET be ignored in aConstraint? If so, why supply them?

(51) By Bo Lindbergh (_blgl_) on 2026-03-18 21:25:28 in reply to 50 [link] [source]

It works fine for me in this bit of sample code. Don't forget to set both argvIndex and omit.

(52) By curmudgeon on 2026-03-18 22:03:12 in reply to 51 [link] [source]

Why would I set omit in this case Bo? Up to now I was working on the basis that sqlite was supplying them in aConstraint so it expected me to handle (apply) them.

(53) By curmudgeon on 2026-03-18 22:10:24 in reply to 51 [link] [source]

Ah, I just found this.

The omit flag is always honored for SQLITE_INDEX_CONSTRAINT_OFFSET constraints as long as argvIndex is greater than 0. Setting the omit flag on an SQLITE_INDEX_CONSTRAINT_OFFSET constraint indicates to SQLite that the virtual table will itself suppress the first N rows of output, where N is the right operand of the OFFSET operator. If the virtual table implementation sets omit on an SQLITE_INDEX_CONSTRAINT_OFFSET constraint but then fails to suppress the first N rows of output, an incorrect answer will result from the overall query.

No mention of LIMIT though.

I don't feel this clears things up for me though. I'll have to investigate further tomorrow.

(54) By Gunter Hick (gunter_hick) on 2026-03-19 07:02:14 in reply to 50 [link] [source]

All of the constraints that are marked "usable" on input to xBestIndex constitute a question to the virtual table implementation.

The xBestIndex function supplies as answers:
- the order in which actual values for the selected constraints are to be supplied to the xFilter function, by setting an argvIndex value. The xBestIndex function is free to set an argvIndex only on a subset of the "usable" constraints, in which case SQLite would treat the other constraints as irrelevant for/unsupported by the virtual table implementation.
- an estimate of the total cost of a partial table scan specified by the selected constraints.
- (optional) an estimate of the total number of rows returned in a partial table scan specified by the selected constraints.
- (optional) an indication if the selected constraints constitute a unique index, i.e. eactly 0 or 1 rows will be returned
- (optional) an indication of contraints the virtual table implementation guarantees to honor, by setting the omit flag.

If one is writing nontrivial virtual tables, examining the generated bytecode in addition to the query plan is an excellent method to determine if this communication is working as expected.

If the backing storage for a virtual table is external to SQLite, the ratio of cost to records returned may need adjustment. More so, if there are diverse backing storage technologies implemented, e.g. reading from a tree structure located in shared memory is going to be orders of magnitude faster than accessing a Faircom CTree file.

(55) By curmudgeon on 2026-03-19 09:34:48 in reply to 54 [link] [source]

Thanks Gunter, that's helpful. I'm also getting to grips with the 'omit' setting as suggested by Bo. My problem was by not omitting the LIMIT & OFFSET these were being checked again by sqlite. In this particular case (where I'm just passing the sql to an existing table) I should set omit to true for all constraints with argv > 0. I've yet to try this but your suggestion to check out the EXPLAIN should make it clear to me.

(56) By Gunter Hick (gunter_hick) on 2026-03-19 12:40:08 in reply to 55 [link] [source]

You're welcome. The really hard work starts when you begin to estimate costs and returned records. That's a change from syntax (working the interface correctly) to semantics (returning useful values).

Assume a virtual table like

CREATE TABLE virtual (k1 INT, k2 INT, k3 INT, data TEXT, PRIMARY KEY (k1,k2,k3)) WITHOUT ROWID;

NB: You actually cannot supply the PRIMARY KEY and WITHOUT ROWID in the table definition declared to SQLite from inside the xCreate/xConnect methods).

Assume you have 1000 records in the table. Which values should xBestIndex return?

from easy to hard...

xBestIndex(<>): rows=1000, cost=rows*(cost of reading the next sequential row)
xBestIndex(<k1=,k2=,k3=>): rows=1, cost=cost of retrieving 1 record by unique key (usually O(log rows))
xBestIndex(<k1=>): rows = 1000 / (cardinality of k1), cost=(cost of locating the first row)+ rows*(cost of reading the next row)

If the cardinality of the key fields is unkown, estimating the cardinality of the key fields to be equal works; even if it is off, it is off in the same way for each table involved in a join.

(57) By curmudgeon on 2026-03-19 13:44:50 in reply to 56 [link] [source]

Thanks Gunter. I've resolved my 'omit' problems but I've yet to tackle index costs. This will help.