WAL size for large database of 5 TB ?
(1) By anonymous on 2022-07-06 10:11:43 [link] [source]
Hello, I have database with a single table which contains records. something like 30 millions records. database file has reached 5 Terra bytes. the table looks like : CREATE TABLE tiles ( id INTEGER PRIMARY KEY NOT NULL, zoom_level INTEGER NOT NULL, tile_column INTEGER NOT NULL, tile_row INTEGER NOT NULL, tile_data BLOB NOT NULL, UNIQUE (zoom_level, tile_column, tile_row) ) I had to make a request : UPDATE tiles SET zoom_level = $x WHERE zoom_level = $y This will update nearly all records (and it's currently running) My question is on the WAL file. it is currently growing, so far 7 GB. Will it grow up to 5TB ? Is there a way to speed up things ?
(2) By Stephan Beal (stephan) on 2022-07-06 10:48:45 in reply to 1 [link] [source]
Is there a way to speed up things ?
i don't have a concrete answer to your question, but can suggest:
tile_data BLOB NOT NULL,
Updating a row requires copying it in full. If your blobs are "not tiny" then updating the zoom_level may require a huge amount of copying for every row. You can eliminate much of that I/O by moving the blobs into a separate table and referencing them from your current table. Something like (untested):
create table blob (
bid INTEGER PRIMARY KEY NOT NULL,
tile_data BLOB NOT NULL
);
then redefine tiles.tile_data
as tile_id
and point it to blob.bid
(Note that many queries can written more simply if all tables in the queries have unique column names.)
(3) By Ryan Smith (cuz) on 2022-07-06 10:56:38 in reply to 2 [link] [source]
Stephen is of course just giving a possible example form of the new suggested table, a scenario with which I agree wholeheartedly, but in case that wasn't clear, please do not use 'blob' as a table name, it's a reserved word, perhaps "title_blobs" or whatever other good name you fancy - but the principle is solid and will save you hours of updating.
(4) By Stephan Beal (stephan) on 2022-07-06 10:59:43 in reply to 3 [link] [source]
please do not use 'blob' as a table name, it's a reserved word
FWIW, the Fossil SCM's primary content table, as designed by Richard himself, is named "blob" and that's caused us no issues whatsoever in fossil's 15 years. There's no harm in using that name.
$ fossil sql
SQLite version 3.39.0 2022-06-25 14:57:57
Enter ".help" for usage hints.
sqlite> .schema blob
CREATE TABLE repository.blob(
rid INTEGER PRIMARY KEY,
rcvid INTEGER,
size INTEGER,
uuid TEXT UNIQUE NOT NULL,
content BLOB,
CHECK( length(uuid)>=40 AND rid>0 )
);
(5) By Ryan Smith (cuz) on 2022-07-06 11:21:48 in reply to 4 [link] [source]
Why the appeal to authority? I'm not trying to suggest it won't work, I'm suggesting the use of reserved words as DB object names is not a good idea.
That is objectively true, and you are welcome to argue the contrary, but "The infallible Richard himself has done it at least once" is not a valid counter-argument.
(10) By Stephan Beal (stephan) on 2022-07-06 13:14:41 in reply to 5 [link] [source]
"The infallible Richard himself has done it at least once" is not a valid counter-argument.
The counter-argument is that sqlite demonstrably/provably has no issues whatsoever using "blob" as a table name. Your objection to the use of the name "blob" is presumably based on experience with other DB systems and/or the letter of the SQL standard, and that's fine, but there is no technical reason that name isn't perfectly suitable for use as a table name in an sqlite schema.
(12) By Ryan Smith (cuz) on 2022-07-06 14:28:38 in reply to 10 [link] [source]
and that's fine, but there is no technical reason that name isn't perfectly suitable
You are 100% correct about that, technically it will work, and I honestly hope I never suggested otherwise.
We seem to be at an impasse on technicalities, so I'll drop the subject and instead address the OP (or any interested observer) and simply say that in my opinion:
"It is a bad idea to use reserved words as DB object names."
It is further my opinion that this remains true even when not technically impossible, even if it has been done successfully in the past, for ALL RDBMS systems, including SQLite.
I am however just some guy saying all this and I could be wrong, so don't solely take my word for it, please ask around.
(14) By Stephan Beal (stephan) on 2022-07-06 15:04:04 in reply to 12 [link] [source]
technically it will work, and I honestly hope I never suggested otherwise.
Granted, and my sincere apologies if i seemed to be accusing you of such. My objection, which i admittedly utterly failed to articulate, is that "bad idea" is completely subjective and no explanation for why it's a "bad idea" was given. (Not that i'm asking for any - i do absolutely understand the argument that it's generally poor practice to use reserved words out of their reserved contexts, in particular when portability is of any concern, but an unqualified "bad idea" is perhaps overstated.)
(15) By Ryan Smith (cuz) on 2022-07-06 15:35:53 in reply to 14 [link] [source]
Fair enough, and I also quite agree on the wider issue of "unqualified statements". Perhaps my misconception (or hubris maybe) led me to believe I'm peddling a notion so ubiquitous as to allow foregoing any qualification. I'll try remember this and take more care in future.
(16) By jose isaias cabrera (jicman) on 2022-07-06 15:44:00 in reply to 15 [link] [source]
Man, this is beautiful! I love when reactions are taken care accordingly. Beautiful. Thanks for the lesson on humility guys. Kumbayah, my Lord, Kumbayah...
josé
(13) By Eduardo on 2022-07-06 14:59:05 in reply to 10 [link] [source]
Please, check https://sqlite.org/quirks.html#keywords_can_often_be_used_as_identifiers where you can find why you can use reserved names as table, column, trigger, index... names and an example of why you shouldn't do that.
There's a difference between "you can do it" and "you should do it"
(19) By KIT.james (kjames3411) on 2022-07-10 00:36:17 in reply to 4 [link] [source]
No issue for Fossil because it will stay with sqlite forever. But it might not be the case for other systems, so it is indeed a bad idea.
This is obvious.
(8) By Chris Locke (chrisjlocke1) on 2022-07-06 12:03:59 in reply to 2 [link] [source]
Updating a row requires copying it in full.
Never really appreciated that, and that now explains a slowdown I was experiencing on an old database I had. It was a simple table that held issues, so had (in RTF) a description of the problem, which could be a large text string (if it had screenshots, et al). If I updated a date in it, it caused a lot of network i/o (the database was held on a network server). The network had issues anyway, but this compounded the problem. Interesting.
(6) By Ryan Smith (cuz) on 2022-07-06 11:33:17 in reply to 1 [link] [source]
Will it grow up to 5TB ?
Possibly - stop it now if that's not wanted.
Is there a way to speed up things ?
Not really (apart from changing your DB as Stephen suggests in another post), but you can manage a bit better.
Do something like:
UPDATE tiles SET zoom_level = $x WHERE id BETWEEN 1 AND 1000 AND zoom_level = $y;
measure the time, see the WAL size growth, etc. Then do a WAL checkpoint.
then maybe: UPDATE tiles SET zoom_level = $x WHERE id BETWEEN 1001 AND 2000 AND zoom_level = $y;
Increase the range as needed until it takes a comfortable amount of time and uses a comfortable amount of additional space. Then perhaps write a quick bit of code to keep doing that for higher and higher ranges.
Either way, the total time you can not do much about on an existing table - perhaps in future it could be saved with a different table like Stephen suggested, but for now you are stuck time-wise, however, space-wise this method should help.
(7) By Richard Damon (RichardDamon) on 2022-07-06 11:50:30 in reply to 1 [link] [source]
One important thing to remember, SQLite allows you to Roll-back and go back to what was there up to the point you finally commit the results, so if you are touching all the data in the database, your journal file will be AT LEAST as big as the database itself.
So, if you are touching all the data, then yes, the journal file will get to be at least as big as the database if you do it all in one transaction.
As for speed, this means that you are going to need to read and write the whole database (and then do thet again when the WAL file checkpoints), so your speed is going to be limited by the speed of your media you are storing this on.
Others have given some good suggestions on how to improve things.
(9) By anonymous on 2022-07-06 12:31:03 in reply to 7 [link] [source]
In my context, I'm the only one working on the base, What about using pragmas ? like JOURNAL_MODE=off ? what that be better ?
(11) By Harald Hanche-Olsen (hanche) on 2022-07-06 13:24:22 in reply to 9 [source]
Well, if it is all done in a single transaction then SQLite needs to be able to roll back the transaction. So data has to be copied, whether to the WAL file or elsewhere. So I don't think that is likely to work.
Here is one possible suggestion: Run a moderate number of updates at a time, and make sure that data is committed between runs. I believe the following UNTESTED code should work:
WITH fixme AS (
SELECT id FROM tiles
WHERE zoom_level=$y
LIMIT 10000)
UPDATE tiles
SET zoom_level=$x
FROM fixme
WHERE tiles.id = fixme.id;
Of course, adjust the number 10000 as you see fit, and set it up to continue doing this until no updates result.
(17) By Thomas Hess (luziferius) on 2022-07-08 10:43:04 in reply to 9 [link] [source]
You can disable the journal, but only if you don’t value your data or have a recent backup you can restore in case of failures.
Doing a fresh external backup and then updating without a journal may be faster than using a journal, because the system can do a linear copy to an independent hard drive. Then SQLite can do in-place writes.
(18) By anonymous on 2022-07-08 13:59:06 in reply to 1 [link] [source]
If the mapping between the old and new zoom levels is static, perhaps you can consider a virtual generated column?