Can BLOB incremental I/O be employed to manage edits to a block of text?
(1) By Gary (1codedebugger) on 2021-09-14 18:21:29 [link] [source]
Hello, I'm not sure this would be considered a "pure" SQLite question and may have more to do with data structure than how to accomplish something in SQLite alone; but I remember someone on here, Keith I think, explaining that a BLOB and a chunk of text are very similar overall, and that got me to thinking there might be a better way than what I've been doing.
I'm trying to add a basic text editor to a library tool in which users can write their interpretations, etcetera concerning specific resource documents. It currently works along with an undo/redo chain but I'm not confident I'm handling the storing and updating steps as efficiently as they ought to be.
My question is how should the state of a text document be updated in the database? I've been tracking the undo/redo data and storing one copy of the current state of the full document. But when a minor edit is made and the equivalent of a change event takes place, the entire document is passed back to the database and used to overwrite the last version or state. That should work fine most of the time because the text blocks are most often relatively small notes but others could be similar to an introduction to a chapter or book and run several pages; but I expect those would still always be around 1 to 2 MB maximum.
Is it okay at these sizes to simply overwrite the full text block or can the text block be treated like a BLOB and something like incremental BLOB I/O employed to apply the last change of the undo/redo chain to it, rather than passing the full document and overwriting the content upon each edit? If so, is it more work for SQLite to perform the incremental I/O than to simply overwrite the full content, but the efficiency gain is in not having to pass the full content at every edit but only the information required to adjust the last saved state to the current state?
I was thinking that, if there were no limit to the number of undo/redo items stored, they are collectively the full document; adn there had to be a better approach than what I was doing.
Obviously, this isn't my area of expertise and I'm likely not using the most accurate terms. If there is a name for this concept in general of how to handle the storage of editable text documents that I could read upon, would you please let me know what it is called? I tried searching on it but without any success.
(2) By anonymous on 2021-09-14 19:12:43 in reply to 1 [link] [source]
You should be aware of the major limitation of incremental blob I/O: it can't change the length of a blob.
(3) By RandomCoder on 2021-09-14 20:04:08 in reply to 1 [source]
At the risk of an off-topic answer:
You might consider looking into text editing data storage structures like a piece table. It might make sense to store in a SQLite database, to some degree, and prevent you from needing to make large changes for small changes in text.
There are of course other techniques around storing and managing documents for a text editor. They could very well all be overkill for your needs, just thought I'd throw one option out there.
(4.2) By Gary (1codedebugger) on 2021-09-16 02:21:33 edited from 4.1 in reply to 3 [link] [source]
Thank you very much. This is what I was trying to find and didn't know enough about the subject to have the right search terms. It appears that piece tables made all the difference for Word 1.0 and 1.1 when machines were not so powerful and quick. I don't want to spend a year trying to build something complicated because I'm not capable of it and building a text editor isn't the objective. But I hate to do something poorly regardless, even if no one really notices a delay. Anyway, pointing me to this as a start is very helpful. Thank you.
I wanted to add that, if I'm understanding the piece table correctly, one issue is that once it is applied to build the new document from the original and the append buffer, the undo/redo chain is useless because the data for any deletions is permanently lost, at least from the original. Thus, the undo/redo chain cannot be restored between sessions if the append buffer is applied and the piece table reset upon open or close.
I thought I had a method figured out that would hold each change event as a table row, passing only additions and no deletions data. Then those rows could be used to build a piece table only when the original is to be updated and overwritten to reflect the changes. I thought building the piece table once at that time would make updating the original more efficient than attempting to apply all the recorded changes. Then realized that the deletion data required for preserving undo/redo between sessions would be lost.
I think the method can still be used in a database set up but the deletions will need to be passed with each event also; and am still working through that. It appears that the method is/was made for an in-RAM only process, apart from the quick save, that didn't preserve undo/redo between sessions.
Thanks again for pointing me to it. It was interesting but all the documents I came across give the method and leave out how to handle the actual splitting of the piece table ranges as edits take place.