SQLite Forum

Index Btree

Index Btree

(1) By anonymous on 2021-05-12 07:25:15 [link]

I'm a CS student and I'm doing some research on index Btree of SQLite. I have read the documentation and my questions are:

- Is there a solution I can use to know the number of pages of Index Btree based on the number of entries and byte length of entries? (Assume that I create a database that has a table (indexed) of 1 column containing numbers)

- Can I determine the degree of pages based on these features above?

(2) By curmudgeon on 2021-05-12 11:15:09 in reply to 1

Would this help?


(3) By anonymous on 2021-05-12 12:35:26 in reply to 2 [link]

I mean can I predict based on the number of entries and byte length of entries without using that table?

(4) By Richard Hipp (drh) on 2021-05-12 14:45:38 in reply to 3 [link]

The byte-length of each index entry is (probably) different.  You can't easily
know what that byte-length is.

CS textbook treatment of b-trees typically assumes each entry within a single
b-tree is the same size in bytes.  The b-tree is much easier to explain and
analyze that way.  But in the real world, data sizes vary.  So the entry sizes
vary.  Any particular page of a b-tree might have just a single entry or it
might have thousands of entries, depending on how many will fit.  And the
number of entries per page can vary widely among pages in the same b-tree.

So, for SQLite, if you want to know:

  1.  The number entries in a b-tree
  2.  The sizes of all entries in the b-tree
  3.  The depth of the b-tree
  4.  The number of pages in the b-tree

Then you have to walk the b-tree.  There are no shortcuts.

(5) By Kees Nuyt (knu) on 2021-05-12 15:06:25 in reply to 1 [link]

I guess it would be possible to make a reasonable estimate using the [file format specification][1] in the documentation, but it will not be easy.

Additionally, you may want to study the [btree source code][2], and other sources. All sources have a lot of informative and accurate comments.

Note that integer numbers are stored in a [compact form][3], so low values consume fewer bytes than high values.

If you develop an algorithm, it would make sense to verify its validity with sqlite3_analyzer, which is available in the [sqlite-tools download][4] for your platform, or can be complied from source.

[1]: https://www.sqlite.org/fileformat2.html#b_tree_pages
[2]: https://www.sqlite.org/cgi/src/file?udc=1&ln=on&ci=trunk&name=src%2Fbtree.c
[3]: https://www.sqlite.org/cgi/src/file?ci=trunk&name=src/util.c&ln=895-913
[4]: https://www.sqlite.org/download.html