SQLite User Forum: Operation speed

Without an index there are two ways to solve the query:

(1) Scan the entire values_11 table and collect all the uint64_pointIDX column into a temporary table/b-tree, then scan the points_11 table and for each row look up into the ephemeral table and see if the reference does not exist to determine whether to delete the row.

(2) Scan the points_11 table and for each row, scan the entire points_11 table to see if the reference does not exist to determine whether to delete the appropriate row.

Clearly (2) requires one scan of the table values_11 for each row in points_11 so (1) is going to "cost less" (unless statistics indicate that values_11 has a very low number of rows relative to points_11) so this method will be chosen (which is, in fact, what SQLite3 chooses).

Once an index exists then another option can be considered:

(3) Scan the points_11 table and for each row perform an index lookup on the values_11 index to see the condition for row deletion is met to determine whether to delete the row.

If the index exists then (3) will be "faster" than (1) by the "cost" of scanning values_11 and building the ephemeral table since the index can be used directly rather than having to scan the table to build the ephemeral table. (This possibility will also be detected by the SQLite3 optimizer and chosen if an appropriate index exists).  This is effectively "spreading the cost" of the scan and build of the ephemeral table amongst all the insert/delete/update operations performed on values_11.

If pre-defining the index did not significantly reduce the elapsed time that indicates that the actual cost of scanning the values_11 table and building the ephemeral lookup table is very low -- so there may not be much point in maintaining the index at all (that is, the cost of maintaining the index may outweigh the benefit derived therefrom) and that the bottleneck lies elsewhere (that is, it is inherent in the design and methodology being used and that perhaps foreign key enforcement needs to be performed so that points_11 rows are deleted at the time the values_11 rows are deleted thus spreading the "cost" over a large number of operations).