DBSTAT docs example potentially misleading

(1) By mlin (dnamlin) on 2021-06-02 13:05:24 [link] [source]

The DBSTAT docs conclude with this cool example to measure contiguity of the database file.

CREATE TEMP TABLE s(rowid INTEGER PRIMARY KEY, pageno INT);
INSERT INTO s(pageno) SELECT pageno FROM dbstat ORDER BY path;
SELECT sum(s1.pageno+1==s2.pageno)*1.0/count(*)
  FROM s AS s1, s AS s2
 WHERE s1.rowid+1=s2.rowid;
DROP TABLE s;

I don't think this is a good measurement if the database has multiple large btrees, because path is not prefixed by the btree name/id. So the s table ends up with colliding paths interleaved from different btrees, which most all then count against the contiguity, even after a fresh vacuum. The measurement as written is therefore pessimistic (possibly, extremely so).

I have a big database that gets reported as only 60% contiguous by the above algorithm, but if I modify it to look at one btree at a time, actually each one is almost perfect!

(2) By mlin (dnamlin) on 2021-06-03 00:35:44 in reply to 1 [source]

tl;dr the example should say ORDER BY name, path instead of just ORDER BY path