SQLite Forum

Query planner fails to use obvious index
Login
Regarding the assertion,

> there is no case in which scanning a table's information is done faster by scanning any index which references the table.

Put that way, probably not. But if we substitute "a table's information" with "that subset of the table's information which is necessarily contained in an index" (and "scanning any index" with "scanning an index of the table on that same subset"), which is close to the OP's posed issue, then in some cases (but not the OP's case [a]) scanning the index is all that is needed and it **could** be faster in some cases.  To my point, those cases would be where the indexed field subset is not identical with the table's fields. IOW, where the table contains *more* information by volume (or byte count) than the index does. In that case, (but not the OP's where the index and table contain the same information), scanning the index will produce the results queried without having to pull into memory the additional data that is in the table b-tree pages but not the index b-tree pages.

[a. The OP's table contained only a single field beyond the rowid, and that field was indexed; there was no additional data in the table beyond what would have to be in the index. ]

This point is illustrated by an armchair experiment differing from yours:

```
A: // Index scan
for i = 0 to lastitem in Index A
    pull in new b-tree page of Index A if necessary
    read r from A
    do not look-up r in Table T because it was taken from the Index A.
    do not read row r from T because the required data is in Index A.
    do_whatever;
B: // Table scan
for r = 0 to lastrow in Table T
    pull in new b-tree page of Table T if necessary
    read row r from T
    do_whatever;
```

Comparing the actual work in A and B, it looks identical, but it is not when the pages of Table T are more numerous than the pages of Index A due to having more data in them, quite possibly a lot more.

Where we differ, I surmise, is in perception of need for "read row r from Table T" once r has been traversed to in the Index A b-tree. I am pretty sure that the b-tree must contain, in addition to its bare tree structure, the very data upon which the b-tree is ordered and sometimes balanced. Hence, when no data beyond what was indexed upon is needed for a query, what is needed is in the index and only the index need be read.

If I am wrong about the b-tree containing the indexed-upon data, I am willing to concede that my earlier point and above argument are mere fantasy. However, in the [Database File Format](https://sqlite.org/fileformat2.html) doc can be found this sentence confirming my notion: <code>For an index b-tree, the key is always arbitrary in length and hence the payload is the key.</code> Perhaps that payload is in such a form that the data which went into it can no longer be teased apart, and the indexed table must be consulted to retrieve it. If so, depending on how much data must be retrieved versus how much uselessly pulled into memory by using only the index to locate it, the performance contest could go either way. However, I expect that for single-field indexes, scanning on that same field is possible by reading just the index and quite possibly faster.

FWIW, I will be happy to be educated on this where needed.