SQLite Forum

Index Btree

Index Btree

(1) By anonymous on 2021-05-12 07:25:15 [link] [source]

I'm a CS student and I'm doing some research on index Btree of SQLite. I have read the documentation and my questions are:

  • Is there a solution I can use to know the number of pages of Index Btree based on the number of entries and byte length of entries? (Assume that I create a database that has a table (indexed) of 1 column containing numbers)

  • Can I determine the degree of pages based on these features above?

(2) By curmudgeon on 2021-05-12 11:15:09 in reply to 1 [link] [source]

Would this help?


(3) By anonymous on 2021-05-12 12:35:26 in reply to 2 [source]

I mean can I predict based on the number of entries and byte length of entries without using that table?

(4) By Richard Hipp (drh) on 2021-05-12 14:45:38 in reply to 3 [link] [source]

The byte-length of each index entry is (probably) different. You can't easily know what that byte-length is.

CS textbook treatment of b-trees typically assumes each entry within a single b-tree is the same size in bytes. The b-tree is much easier to explain and analyze that way. But in the real world, data sizes vary. So the entry sizes vary. Any particular page of a b-tree might have just a single entry or it might have thousands of entries, depending on how many will fit. And the number of entries per page can vary widely among pages in the same b-tree.

So, for SQLite, if you want to know:

  1. The number entries in a b-tree
  2. The sizes of all entries in the b-tree
  3. The depth of the b-tree
  4. The number of pages in the b-tree

Then you have to walk the b-tree. There are no shortcuts.

(5) By Kees Nuyt (knu) on 2021-05-12 15:06:25 in reply to 1 [link] [source]

I guess it would be possible to make a reasonable estimate using the file format specification in the documentation, but it will not be easy.

Additionally, you may want to study the btree source code, and other sources. All sources have a lot of informative and accurate comments.

Note that integer numbers are stored in a compact form, so low values consume fewer bytes than high values.

If you develop an algorithm, it would make sense to verify its validity with sqlite3_analyzer, which is available in the sqlite-tools download for your platform, or can be complied from source.