SQLite Forum

transient indices vs. LIMIT
Login
A heap could help for ORDER BY, but it I don't see how it helps for DISTINCT.

Using a smaller index (only holding items that are currently candidates to meet the limit) would change the complexity from O(n log n) to O(n log k), where n is the number of rows in the table.

In Python (3.6.8), I turned journal_mode off. I built a table with 20 million rows of floats (from random.random()). Schema was CREATE TABLE a(b);

At that point, Windows task manager said my process had done a couple of hundred reads and about 80k writes. File size was 330MB.

I did SELECT COUNT(*) FROM a;
That added about 80k reads to the total. As expected, it looks like it had to read the entire database to satisfy that query.

I twice did

SELECT DISTINCT b FROM a ORDER BY b LIMIT 10;

each time that resulted in about 80k reads and about 60k writes.

I'm not sure what that says about the implementation. 80k reads is what it takes to read the table once. It isn't enough to do that and also build a full-size additional index.

However, it shouldn't take any writes at all to maintain a 10-entry temporary index, so it isn't clear what those 60k writes were actually doing.

I never saw the python process memory much above 10MB, so it certainly wasn't holding the full database (or a full temporary index) in memory.