pointer to memory as index?
... SELECT price FROM fruitsforsale WHERE fruit='Peach';
In "How Indexes Work" it is explained that, given an index on "fruit", SQLite has to do two binary searches to find the price of peaches... (one search in the index, and then finding the rowid).
Given a read only database in memory, is it possible to bring this down to just one binary search by creating an index with a direct "pointer" instead of the rowid?
The code is likely the same whether the DB is in-memory or not. So most likely not.
In-memory DBs are already faster that disk DBs, so does it really matter?
Having different code paths might make sqlite3_serialize harder too,
and make SQLite bigger. Not something I wouldn't myself welcome, but
many would be against though.
Funny enough, we were just discussing this in another thread.
The hard answer is no, the rowid is already the "pointer" index, but you can create a table with "WITHOUT ROWID" by which the PK will become the "pointer index", but you can't have more than one of those.
Getting it faster, if you know you will only need the fields for "fruit" and "price", then this will reduce that query to a single binary search:
CREATE INDEX fruit_price ON fruitforsale(fruit,price);
Now when you do this query:
SELECT price FROM fruitsforsale WHERE fruit='Peach';
The engine only needs to look up ONE BTREE to obtain both fields it needs to give the result.
Why this is not always feasible:
- The Index costs extra space (it matters if your DB has millions of rows)
- A small bit of extra cycles when inserting (it has to insert new rows into the table and also into every index you have on that table).
That said, it makes look-ups lightning-fast.
As an aside, for a medium-sized normal DB that can wholly fit in memory, I doubt you would see much of a gain. Memory is already pretty speedy, it will be hard to see a difference in human time-scales between one and two look-ups of an in-memory DB.
Many optimizations are possible for an in-memory database. SQLite uses the same data structure for on-disk and in-memory databases to reduce complexity.
Could an index use a "direct pointer" for an on-disk database? Yes it could. And in fact many other RDBMSes work this way. Instead of storing the primary key of the table in the index, they store an offset into the disk file that holds the main table, thus avoiding the second b-tree search.
The problem with storing a pointer rather than the record primary key in the index is this: if the main table is reorganized in any way, such as being vacuumed or compacted or if a record simply moves around to keep a clustered index in balance, all corresponding index entries need to be updated too.
Notice however, that this does not reduce your asymptotic access time. Doing an indexed lookup is O(NlogN) regardless of whether or not the index stores a pointer or a primary key. Any savings is but a constant factor, and it turns out that constant is small, since the fan-out on the table is usually large and the extra search time for the table entry is minimal.
So, yes, you could completely redesign the storage system of the database in order to use different storage mechanisms for tables and indexes (thus requiring different code paths, doubling the code and complexity) in order to get a small constant improvement in indexed lookup time at the cost of a larger constant increase in rebalancing operations for clustered indexes. That is a trade-off that many RDBMSes make. But it won't be happening with SQLite, because:
The current trade-off seems to work extremely well.
The need to move around table entries is greater when multiple tables and indexes are stored together in the same disk file, thus making the trade-off more extreme.
For the common case of a rowid table, storing a file offset rather than a rowid usually takes up more space, thus making the database slightly larger.
There really is a substantial increase in complexity and code size.
Making such a change would result an incompatible file-format.
- [...] incompatible file-format
Isn't that moot, since this thread is about in-memory DBs?
I would assume the two formats being different, yet convertible
to each other, with different optimization goals.
The hypothetical in-memory-optimized format would also have
no need for backward compatibility, since always transient in nature.
Could even not be paged at all for example.
Isn't that moot, since this thread is about in-memory DBs?
Even if it is moot (which i'm not saying it is), Richard provided four more perfectly valid justifications for not applying the requested change.
Thank you for all your answers. I did not at all intend to request a change.
From my beginners perspective sqlite is a very cool swiss knife, and I wondered whether I could use it as data structure in memory, without the need to copy the data elsewhere. When the SQL request yields the data, I want to keep a pointer to the record, for later access, without copying the data (it is in memory already!). So is there a way to do this with sqlite (without hacking)?
I want to keep a pointer to the record, for later access, without copying the data.... So is there a way to do this
No. The content in the database file is encoded for compact storage. The values must be read and be decoded prior to being returned, as the values would be largely useless to the application otherwise.
(12) By Gunter Hick (gunter_hick) on 2020-07-02 06:58:48 in reply to 9 [source]
The only way (I can think of at the moment) to have the rowid be the memory address of a row is with a virtual table implementation that keeps uncompressed rows directly in memory. I have written such a virtual table for a proprietary application that supports tree, hash and array keys. That involved a large amount of coding. And memory addresses are only stable as long as records are never deleted. A different virtual table implements access to Faircom CTree files where the memory address of a row is always the same for a given table, namely the record buffer associated with the cursor. This is of course only valid for the current record and gets overwritten as soon as the next record is requested. I suspect the carray virtual table would return the memory address as the rowid, but that would appear to be a very roundabout way to access rows that you put into known memory locations yourself beforehand
(7) By Larry Brasfield (LarryBrasfield) on 2020-07-01 14:09:08 in reply to 5 [link] [source]
No, it is not moot for that reason.
In order to use pointers for in-memory table row references and rowids with b-tree traversal for on-disk table row references, there would have to be significant duplication of functionality to provide row reference creation and use, in portions of code which, today, have no dependence upon the physical form of storage.
The in-memory and on-disk distinction is only made at the VFS level; the code which produces the file-format (even in memory) is insulated from how the block I/O is implemented. That would cease to be true if this putative optimization were to be effected in the non-VFS portions of the code.
That's point #4. You can't use it to somehow make #5 not moot :)
Changes to how in-memory DBs are implemented are entirely an implementation
detail of SQLite, and wouldn't in any way affect the on-disk and documented
All the other points remain of course, and we all know this won't happen;
Richard made that crystal clear. Your argument does seem very flawed to me!