ROWID primary key works slower than index

(1) By Anton Dyachenko (ahtoh_) on 2021-11-17 03:50:14 [source]

I have a test script to reproduce the issue:

DROP TABLE IF EXISTS Test;
CREATE TABLE Test (id INTEGER PRIMARY KEY, x INTEGER, y BLOB);
INSERT INTO Test
  WITH RECURSIVE
    cte(x) AS (
       SELECT random()
       UNION ALL
       SELECT random()
         FROM cte
        LIMIT 100000
  )
SELECT NULL, x, randomblob(1024)  FROM cte;
-- CREATE INDEX test_id ON Test (id);
SELECT id FROM test
-- ORDER BY id
-- WHERE id % 2 = 0

If create index is commented then the execution time of the last select linearly depends on the size of random blob call otherwise the last select is always fast and doesn't depend on anything

Note: order by slows down both versions by about 7ms, where clause works better with the index as well.

EXPLAIN QUERY PLAN SELECT id FROM test

reports SCAN TABLE test in case there is no index and SCAN TABLE test USING COVERING INDEX test_id if the index created.

The time difference on my PC is 3ms with the index and 70ms without the index for plain select that is about 25x. I have tested this on windows 10 using DB Browser for SQLite Version 3.12.2 that uses SQLite Version 3.35.5, but the same behavior reproduced on Mac and Linux as well.

This behaviour directly contradicst what is advertised in this page

The data for rowid tables is stored as a B-Tree structure containing one entry for each table row, using the rowid value as the key. This means that retrieving or sorting records by rowid is fast. Searching for a record with a specific rowid, or for all records with rowids within a specified range is around twice as fast as a similar search made by specifying any other PRIMARY KEY or indexed value.

Also the descritpion above does not match this page

WITHOUT ROWID tables will work (in the sense that they get the correct answer) for arbitrarily large rows - up to 2GB in size - but traditional rowid tables tend to work faster for large row sizes. This is because rowid tables are implemented as B*-Trees where all content is stored in the leaves of the tree, whereas WITHOUT ROWID tables are implemented using ordinary B-Trees with content stored on both leaves and intermediate nodes.

My questions are:

am I doing something wrong or misreading the documentation or the documentation is misleading?
If the documentation is misleading can somebody explain how does exactly a rowid table works in terms of performance? When should I create an index and when should not?
why is using the index faster in the plain select query?

(2) By David Raymond (dvdraymond) on 2021-11-17 13:16:56 in reply to 1 [link] [source]

I can't answer all of of your questions, but in this case the blob size vs the covering index is important. In the main B tree each record has the rowid and the blob for every value in the tree. With a 1K blob then you're only gonna get 3 records per 4k page (or use overflow pages) for the main storage.

Conversely the index on id only stores id, it doesn't store the blob in the index. Since it's only storing one integer for each record then you're gonna store thousands per page.

Since your query doesn't use any of the other fields, it doesn't have to go to the main table to do the query, it can get everything it needs from the nice compact index, which in this specific case means it's fetching a thousand times fewer pages from the database to complete it. If your query involved any of the other fields in the table, then the extra index on ID is going to be slower and not of any use.