SQLite Forum

Clarification on VACUUMing
Login

Clarification on VACUUMing

(1) By adrian on 2022-10-17 15:43:07 [link] [source]

Hi,

I understand that performing a VACUUM on a DB can reduce it's size by removing 'holes' generated by DELETE commands, whether they be full or partial page 'holes'. However, as I understand it, these 'holes' are only temporary as they will be filled in by subsequent INSERT commands.

My question is then, does a VACUUM do anything else to reduce the size of a DB, maybe by optimizing the data structures that are used for accessing the underlying rows, or perhaps by some other optimization that I'm not considering?

Am I also correct that VACUUM's defragmentation isn't that big a deal on a non-mechanical media in terms of read performance? At what point would non-sequential fragmentation be a substantial performance hit due to cache misses?

Thx.

(2) By adrian on 2022-10-17 18:14:58 in reply to 1 [link] [source]

From https://www.sqlitetutorial.net/sqlite-vacuum/, it would seem that there is some overhead managing the 'holes'. How significant is this overhead?

(3) By Warren Young (wyoung) on 2022-10-17 20:21:32 in reply to 2 [link] [source]

Short of benchmarking actual uses, my advice is, "Do it rarely."

I have databases I haven't vacuumed in over a year, and they're fine.

(4) By DiamondJohn on 2022-10-18 06:05:06 in reply to 1 [link] [source]

Even on a mechanical disk, if the db files is tiny, it will fit in the disk cache, so repeated access to the same disk location will run pretty fast, from memory. However, I use SQLite on an android device (which has a tiny cache when compared to a desktop)

So logically, if my dB is greater than my cache, then the cache will not be reusable as much. ie VACUUM'ing will help in my case. I have to do it off device, as VACUUM'ing fails on the device.

My dB stores a big blob of text in each row. My app has the ability to convert and mark each (selected) row (the blob only) as compressed, saving up to 90% space. The file jumps from 20MB down to 2MB. Enough for the whole file to be kept in the cache and skip direct disk reads.

(5) By Simon Slavin (slavin) on 2022-10-18 09:39:34 in reply to 4 [link] [source]

VACUUM does a great deal of reading and writing and deleting. It should not be done on a solid state drive routinely: it will just wear out the memory with operations which give you no advantage.

The normal operation of SQLite will reuse space in the database which has been freed up. If you need to make the database small because you need to fit it in as small a space as possible – to back it up, or to transmit it over a slow connection – then perhaps you might use VACUUM. Otherwise leave it alone.

(6) By ddevienne on 2022-10-18 09:50:15 in reply to 5 [source]

Vacuuming is also about making full scans read contiguous blocks on disk.
Which can limit seeking, and thus improve IO performance a bit, especially on non-SSD disks.
So it's not only about making the DB smaller.

(7) By adrian on 2022-11-22 16:03:41 in reply to 4 [link] [source]

Our db are GB in size. And when we apply a delta, they can be 100s of MBs of rows added and deleted.

If applying a delta is going to generate holes in the db that sqlite can't utilize later, then this could cause us issues with our limited drive space.

(8) By Keith Medcalf (kmedcalf) on 2022-11-22 16:23:46 in reply to 7 [link] [source]

However it won't.

(9) By adrian on 2022-11-22 17:02:19 in reply to 8 [link] [source]

It won't because? I need actual answers to tell my team. Otherwise, they'll continue to go down this rabbit hole that I don't think is necessary and is a waste of time.

(10) By Richard Damon (RichardDamon) on 2022-11-22 17:17:20 in reply to 9 [link] [source]

Pages that become empty from deletion will be used when a new page is needed in preference to increasing the size of the database.

Yes, SOME empty space will continue to exist, but it will be a limited percentage of the total space used.

(11.1) By Keith Medcalf (kmedcalf) on 2022-11-22 20:52:57 edited from 11.0 in reply to 9 [link] [source]

Because that is how it works. When a "row" is added to a table, its location (the page on which it should be put) is determined. If there is enough "unused space" on that particular page, then the "row" is put on that page without further ado. If it will not fit, then the btree is split to create some "free space" (which may require allocation of a new data page), after which there exists space to put the "row". If all the "row" on a page are deleted, then that page is moved from the "contains active data" list to the "free page" list, where it will be re-used by whatever page allocation subsequently needs to allocate a page.

This is how all btree storage systems and databases have worked for a long time.

Granted, very early database systems did not work this way, but then again, SQLite3 cannot handle a database which takes 27 acres of disk drives (3330) to store (such as RETAIN, for example).

(12.1) By adrian on 2022-11-23 18:13:22 edited from 12.0 in reply to 9 [link] [source]

Deleted