SQLite Forum

Nearest-match join
Login
Thanks everyone for your solutions and sorry for not following-up sooner.

I spend more time writing C++ than SQL and in my mind I have an "obvious" algorithm for this problem - I would concurrently scan both tables sequentially using by-timestamp indexes. If the first table has N rows and the second table has M rows, then that is O(N+M).

If I do an exact join in SQL, it seems to do:

    QUERY PLAN
    |--SCAN TABLE table1
    `--SEARCH TABLE table2 USING INDEX table2_by_datetime (<expr>=?)

That is O(N log M), which is worse than O(N+M) unless M is much larger than N. In this test, table1 has 8000 rows and table 2 has 62000 rows.

There is also an index table1_by_datetime, but it's not using that. Perhaps the query planner has looked at the sizes of the tables and decided that this is better. Does sqlite have a join implementation that scans two indexes?

It may also be significant that my indexes and join use the datetime() function:

    create index table1_by_datetime on table1( datetime(timestamp) );
    create index table2_by_datetime on table2( datetime(timestamp) );
    select * from table1 join table2 on( datetime(table1.timestamp) == datetime(table2.timestamp) );

Anyway... if that query plan for the exact join is the best, then I can believe that some of the solutions you've posted above may be have the same cost. I will investigate.

I envisage eventually having hundreds of thousands of rows in the first table and tens of millions of rows in the second table.