Strategy for frequently updated rows
(1) By Alexandr Burdiyan (burdiyan) on 2022-05-20 12:45:08 [link] [source]
I'm working on a system where small number of rows will be updated very frequently. Basically it's a text editor input where every time user stops typing, the database row gets updated.
I was trying to find details about whether or not SQLite is able to update rows in-place if the new value is of the same size as the previous one, but I couldn't find authoritative information about it.
So I wonder what would be the best strategy for this use case?
- I could naively send UPDATE with the new value directly.
- I could only do INSERTs and periodically delete old values.
- I could write a zeroed buffer as the initial value of size larger than my expected value. This would keep the row value of the same size most of the time. For example I could write zero buffer of 1 MB knowing that my common value is always going to be less than 1 MB. I can afford the size overhead. This way my UPDATEs are always going to be of the same size. Will this force SQLite to do in-place updates?
(2.1) By Ryan Smith (cuz) on 2022-05-20 13:14:11 edited from 2.0 in reply to 1 [link] [source]
I do not think SQLite ever does "in-place" updates, but I could be wrong (imperfect knowledge of SQLite internals). As far as I know the existing values have to be available throughout the transaction, so the pages with removed values only become "free" at the end of the transaction, if indeed enough was removed to free up a page. In subsequent writes, the free pages can certainly be used again, but also isn't guaranteed to be used on any specific change as it may conflict with some internal optimizations.
The above was how I think it works, now onto what I know to be true: SQLite is fast. Probably fast enough for whatever you are planning to do to outperform human typing speed by several orders of magnitude, even on a phone.
That said, you should never try to inline those DB updates, or any other such tasks with the live UI. What you should (nay, MUST) do, is have a separate thread that connects the DB and handles DB read/writes. That way you can let that thread set its own status markers to let you know if it has completed the previous write, or is ready for the next or such. All you need to do, is in stead of just using "user stopped typing" as the instigation event, wait for whatever you like (even a simple timer event) then, when triggered, check if the SQLite-handler thread is ready to accept the new write (or whatever else you wish), and if so, pass it the most up-to-date string to write, or wait till the next triggered event. This way it will never interfere or make your UI input laggy, even if the DB for some reason hits a speed-wobble.
This is the base standard for UI btw... not really groundbreaking information, and a 100 times more effective at keeping system-engine-usage from interfering with user-input than any amount of speed-up you can force out of SQLite (or any other back-end store).
(3) By Richard Hipp (drh) on 2022-05-20 13:23:27 in reply to 1 [link] [source]
The SQLite b-tree engine does indeed do in-place updates if the content is the same size.
That said, I really think you are chasing a premature optimization here. I think you are anticipating performance problems without actually observing them.
I suggest you code up your application using whatever approach is easiest to code and maintain. Then measure the performance. If those measurements show bottlenecks, then we can talk about how to optimize.
(4) By Alexandr Burdiyan (burdiyan) on 2022-05-20 18:20:08 in reply to 2.1 [link] [source]
Thanks a lot for the tips! I will take them into account. Actually the UI part is not on my side in this project, but I'll definitely share the feedback!
(5) By Alexandr Burdiyan (burdiyan) on 2022-05-20 18:25:00 in reply to 3 [link] [source]
That's absolutely true. It was actually more of a thought exercise, and an attempt to understand more of SQLite internals and behavior.
It's good to know that it does in-place updates for the values of the same size. I couldn't find more info about that in the docs though, is it written somewhere already?
And to complete the exercise, I'd be glad to know what would you do if the naive approach would become a bottleneck :)
(6.1) By Simon Slavin (slavin) on 2022-05-21 06:01:04 edited from 6.0 in reply to 5 [link] [source]
I don't think the documentation mentions in-place updates. You would have to read the source code.
I do agree with a previous poster: you are probably worried about something that won't be a problem. If I was working on the project I would definitely write the normal code first and worry about this only if I did get unacceptably slow results.
However, if it does turn out to be a problem, you might split your table up into two: one has all the rows which were changed today (or this week), the other has all the other rows. You can SELECT from both using UNION. And every day/week you run a program which writes all the 'recent' rows back to the 'old' table, then deletes all rows from the 'recent' table.
(7) By KIT.james (kjames3411) on 2022-05-21 13:13:27 in reply to 5 [link] [source]
I couldn't find more info about that in the docs though, is it written somewhere already?
He is basically saying that you should not search for such a mention, and that such a mention is not supposed to be stated in the docs (even if it were) because it is an implementation detail & you are trying to tackle a problem that does not exist yet. So you should not expect an answer because everyone giving you 'more info' here would be helping you and others do 'bad things' (in an IT sense).
If you want to know about the details for a thought exercise, then read the source code.
It's good to know
You don't know really, because it depends on the implementation and on many other factors. It would be more correct to say "only (some of) the sqlite devs know if 'it's good' or not".
In your case I would say it would be better you didn't know. (same for many others)
And to complete the exercise, I'd be glad to know what would you do if the naive approach would become a bottleneck :)
It's your exercise, right? :) It depends on your bottleneck, so no one can tell.
You must first create the problem before people can help you solve it.
(8) By Gary (1codedebugger) on 2022-05-22 06:48:36 in reply to 5 [source]
I caveat this comment by stating that I am nothing but an amateur at best (a hack is likely the more accurate term), and 99% of what I read on this forum is well over my head. However, since you said this was more of a thought exercise at the moment, I wondered if you had considered a piece table for this. https://www.cs.unm.edu/~crowley/papers/sds.pdf
I had a little model of it working about six months ago but had to stop working on it and haven't gotten back to it since.
The only data that gets written is the newly typed text (plus start position and length) and the "coordinates" of the text deleted. So, only inserts until the piece table is used to build the new document, and you start over with one single big piece of text again. Every edit adds a new piece or splits existing pieces.
It was a bit challenging for my limited mental capacity to grasp but I managed to get the basics functioning, but never got it built out far enough to really understand what happens to the undo/redo chain after the pieces are used to construct the new document again. The piece table is supposed to provide an unlimited undo/redo chain per session; but it appears that once its all rolled up into a single piece again, the undo/redo chain is lost such that there is no restoring it between editing session.
If updating is a concern for efficiency, perhaps a model that uses inserts would be worth investigating. But, as I said, I'm just a hack and making this comment might be equivalent to the popcorn vendor suggesting to Ted Williams that he might consider choking up on the bat.