SQLite User Forum: Nearest-match join

I assume that you want to find, for each record in a table T1, the nearest record(s) in table T2 (but not vice versa). As already noted by others, there may be more than one record in T2 matching a record in T1.

Sample data:

    create table T1 (
      id integer primary key,
      ts real    not null,
       a char    not null
    );

    create table T2 (
      id integer primary key,
      ts real    not null,
       b integer not null
    );

    insert into T1(id, ts, a)
    values (1, 1.0, 'A'), (2, 2.0, 'B'), (3, 3.5, 'C');

    insert into T2(id, ts, b)
    values (10, 2.0, 50), (20, 2.5, 100), (30, 9.5, 200), (40, 4.5, 300);

Solution 1:

    select X.id, X.ts, X.a, Y.id, Y.ts, Y.b, abs(Y.ts - X.ts) as diff
      from T1 X, T2 Y
     where not exists (select 1
                         from T2 Z
                        where abs(Z.ts - X.ts) < abs(Y.ts - X.ts)
                      );

Solution 2:


    with ranking_table as (
      select rank() over (partition by T1.id order by abs(T2.ts - T1.ts)) as ranking,
             T1.id as t1id, 
             T1.ts as t1ts,
             T1.a as t1a,
             T2.id as t2id,
             T2.ts as t2ts,
             T2.b as t2b,
             abs(T2.ts - T1.ts) as diff
        from T1, T2
    )
    select t1id, t1ts, t1a, t2id, t2ts, t2b, diff
      from ranking_table
     where ranking = 1;

The latter *should* perform better, but you have to test it. Solution 2 has also the advantage that you may easily find the N timestamps in T2 nearest to each timestamp in T1, for N>1, simply by using `where ranking <= N`.

In both cases, the queries can be made to perform much better if you can set an upper bound to the difference between the timestamps.

This is an instance of the *k-nearest neighbour problem,* if you want to explore it further.