SQLite Forum

pointer to memory as index?
Login

pointer to memory as index?

(1) By anonymous on 2020-06-30 23:26:45 [link]

... SELECT price FROM fruitsforsale WHERE fruit='Peach';

In "How Indexes Work" it is explained that, given an index on "fruit",
SQLite has to do two binary searches to find the price of peaches...
(one search in the index, and then finding the rowid).

Given a read only database in memory, is it possible to bring this down to just one binary search by creating an index with a direct "pointer" instead of the rowid?

Thank you

(2) By ddevienne on 2020-07-01 08:01:20 in reply to 1 [link]

The code is likely the same whether the DB is in-memory or not. So most likely not.  
In-memory DBs are already faster that disk DBs, so does it really matter?

Having different code paths might make [sqlite3_serialize](1) harder too,  
and make SQLite _bigger_. Not something I wouldn't myself welcome, but  
many would be against though.

[1]: https://www.sqlite.org/capi3ref.html#sqlite3_serialize

(3) By Ryan Smith (cuz) on 2020-07-01 10:32:23 in reply to 1 [link]

Funny enough, we were just discussing this in another thread.

The hard answer is no, the rowid is already the "pointer" index, but you can create a table with "WITHOUT ROWID" by which the PK will become the "pointer index", but you can't have more than one of those.

Getting it faster, if you know you will only need the fields for "fruit" and "price", then this will reduce that query to a single binary search:

```
CREATE INDEX fruit_price ON fruitforsale(fruit,price);
```

Now when you do this query:

```
SELECT price FROM fruitsforsale WHERE fruit='Peach';
```

The engine only needs to look up ONE BTREE to obtain both fields it needs to give the result.

Why this is not always feasible:

- The Index costs extra space (it matters if your DB has millions of rows)
- A small bit of extra cycles when inserting (it has to insert new rows into the table and also into every index you have on that table).

That said, it makes look-ups lightning-fast.

As an aside, for a medium-sized normal DB that can wholly fit in memory, I doubt you would see much of a gain. Memory is already pretty speedy, it will be hard to see a difference in human time-scales between one and two look-ups of an in-memory DB.

(4) By Richard Hipp (drh) on 2020-07-01 11:10:56 in reply to 1 [link]

Many optimizations are possible for an in-memory database.
SQLite uses the same data structure for on-disk and in-memory databases
to reduce complexity.

Could an index use a "direct pointer" for an on-disk database?  Yes it could.
And in fact many other RDBMSes work this way.  Instead of storing
the primary key of the table in the index, they store an offset into the disk
file that holds the main table, thus avoiding the second b-tree search.

The problem with storing a pointer rather than the record primary key in the
index is this: if the main table is reorganized in any way, such as being
vacuumed or compacted or if a record simply moves around to keep
a [clustered index][1] in balance,
all corresponding index entries need to be updated too.

[1]: https://en.wikipedia.org/wiki/Database_index#Clustered

Notice however, that this does not reduce your _asymptotic_ access time.
Doing an indexed lookup is O(NlogN) regardless of whether or not the index
stores a pointer or a primary key.  Any savings is but a constant
factor, and it turns out that constant is small, since the fan-out on
the table is usually large and the extra search time for the table entry
is minimal.

So, yes, you could completely redesign the storage system of the database
in order to use different storage mechanisms for tables and indexes (thus
requiring different code paths, doubling the code and complexity) in order
to get a small constant improvement in indexed lookup time at the cost of a larger constant increase in rebalancing operations for clustered indexes.
That is a trade-off that many RDBMSes make.  But it won't be happening with
SQLite, because:

  1.  The current trade-off seems to work extremely well.

  2.  The need to move around table entries is greater when multiple tables
      and indexes are stored together in the same disk file, thus making
      the trade-off more extreme.

  3.  For the common case of a rowid table, storing a file offset rather than
      a rowid usually takes up more space, thus making the database slightly
      larger.

  4.  There really is a substantial increase in complexity and code size.

  5.  Making such a change would result an incompatible file-format.

(5) By anonymous on 2020-07-01 12:14:25 in reply to 4 [link]

> 5. [...] incompatible file-format

Isn't that moot, since this thread is about *in-memory* DBs?

I would assume the two formats being different, yet convertible  
to each other, with different optimization goals.

The _hypothetical_ in-memory-optimized format would also have  
no need for backward compatibility, since always transient in nature.  
Could even not be paged at all for example.

(6) By Stephan Beal (stephan) on 2020-07-01 12:20:42 in reply to 5 [link]

> Isn't that moot, since this thread is about in-memory DBs?

Even if it is moot (which i'm not saying it is), Richard provided four more perfectly valid justifications for *not* applying the requested change.

(9) By anonymous on 2020-07-01 16:42:19 in reply to 6 [link]

Thank you for all your answers. I did not at all intend to request a change.

From my beginners perspective sqlite is a very cool swiss knife, and I wondered whether I could use it as data structure in memory, without the need to copy the data elsewhere. 
When the SQL request yields the data, I want to keep a pointer to the record, for later access, without copying the data (it is in memory already!). So is there a way to do this with sqlite (without hacking)?

(10) By Richard Hipp (drh) on 2020-07-01 16:49:47 in reply to 9

> I want to keep a pointer to the record, for later access, without copying the data.... So is there a way to do this

No.  The content in the database file is encoded for compact storage.  The
values must be read and be decoded prior to being returned, as the values
would be largely useless to the application otherwise.

(12) By Gunter Hick (gunter_hick) on 2020-07-02 06:58:48 in reply to 9 [link]

The only way (I can think of at the moment) to have the rowid be the memory address of a row is with a virtual table implementation that keeps uncompressed rows directly in memory.

I have written such a virtual table for a proprietary application that supports tree, hash and array keys. That involved a large amount of coding. And memory addresses are only stable as long as records are never deleted.

A different virtual table implements access to Faircom CTree files where the memory address of a row is always the same for a given table, namely the record buffer associated with the cursor. This is of course only valid for the current record and gets overwritten as soon as the next record is requested.

I suspect the carray virtual table would return the memory address as the rowid, but that would appear to be a very roundabout way to access rows that you put into known memory locations yourself beforehand

(7) By Larry Brasfield (LarryBrasfield) on 2020-07-01 14:09:08 in reply to 5 [link]

No, it is not moot for that reason.

In order to use pointers for in-memory table row references and rowids with b-tree traversal for on-disk table row references, there would have to be significant duplication of functionality to provide row reference creation and use, in portions of code which, today, have no dependence upon the physical form of storage.

The in-memory and on-disk distinction is only made at the VFS level; the code which produces the file-format (even in memory) is insulated from how the block I/O is implemented.  That would cease to be true if this putative optimization were to be effected in the non-VFS portions of the code.

(8) By anonymous on 2020-07-01 15:08:53 in reply to 7 [link]

That's point #4. You can't use it to somehow make #5 not moot :)

Changes to how in-memory DBs are implemented are entirely an implementation  
detail of SQLite, and wouldn't in any way affect the on-disk and documented  
file format.

All the other points remain of course, and we all know this won't happen;  
Richard made that crystal clear. Your argument does seem very flawed to me!

(11) By Larry Brasfield (LarryBrasfield) on 2020-07-01 17:17:35 in reply to 8 [link]

> That's point #4. You can't use it to somehow make #5 not moot :)

Conceded.

> Your argument does seem very flawed to me!

The argument itself was not flawed, but certainly misdirected.