[novice] Handling edit conflicts with wiki-style editing
(1) By Arun Prasad (akprasad) on 2022-07-03 01:28:49 [source]
Hello all, Forgive me if this question has been asked here before, but I'm a database novice and don't know the keywords that are relevant to my question. I want to implement wiki-style editing for a project of mine (Flask+SQLAlchemy with gunicorn, if that matters), and I have a basic schema like so: Page: - id - slug - title - ... Revision: - id - created_at - content - page_id - ... I hope the mechanism here is clear: the HTML template for some page displays the content from the page's latest revision, and when a user edits the page, the database creates a new revision. So far so simple. But suppose two users A and B edit the page and submit their change simultaneously, or close enough to simultaneously that this scenario matters. In this case, my preferred behavior is that we commit changes from one user (it doesn't matter which) and notify the other that there are new changes that need to be incorporated into the edit. I'm stuck because I don't know how to implement this logic in a safe way. My naive solution is: ``` def write(...): if latest_revision == revision_that_user_was_using: session.add(Revision(...)) session.commit() ``` But `write` is not an atomic function, so there's a race condition here. Here are some solutions I've considered: - Add a `parent_id` column to the revision and require it to be unique. Problem: how to handle the first revision for the page, which has no parent? I have multiple pages, and it seems that multiple pages would have parent_id=null, which would violate the uniqueness constraint. - I've heard this kind of operation is called `select_for_update`. But I'm appending here (as opposed to updating), and I hear that SQLite doesn't support select_for_update [1]. I'd be very grateful for any advice on this issue, and I'm happy to provide clarifying details if they would be useful. [1]: https://groups.google.com/g/sqlalchemy/c/RIBdLP_s6hk
(2) By Stephan Beal (stephan) on 2022-07-03 02:05:59 in reply to 1 [link] [source]
So far so simple. But suppose two users A and B edit the page and submit their change simultaneously, or close enough to simultaneously that this scenario matters.
Fossil's wiki system works on a "last one wins" principle. It does not participate in the branching, merging, and similar features supported by "real" files. Because of this, it can be tedious to merge in edits from two users who edit the same wiki page, and the wiki is "not the right tool for the job" if multiple people are intended to edit it on a regular bases.
For multi-user documents, "embedded docs" are the preferred way to manage edits. Embedded docs are full-fledged members of the SCM, exist as files in your repository, and participate in all SCM features such as branching and merging.
In this case, my preferred behavior is that we commit changes from one user (it doesn't matter which) and notify the other that there are new changes that need to be incorporated into the edit.
Fossil's notifications are not that fine-grained, users can sign up to be notified of certain types of changes. Log in as an admin, visit the /setup page, then look for an entry labeled "Notifications." Repository-level notifications can be configured there and each user can sign up for the specific notifications they wish to receive, but they'll need to look at those notifications and decide whether they're affected or need to take action. There's now way for fossil to tell them "you need to merge in changes from your edits into this one."
(3) By Arun Prasad (akprasad) on 2022-07-03 04:06:38 in reply to 2 [link] [source]
Thanks for replying! I appreciate that you took the time to help me out.
Fossil's wiki system works on a "last one wins" principle.
Interesting -- that might be sufficient for my use case, but I'll think about it more. For my application, I don't expect that simultaneous edits will be common at all, but I want to make sure that I think through this edge case carefully so that there are no surprises as we scale.
Out of curiosity, do you know how Wikipedia might implement this logic in their RDBMS, and do you know how to implement something similar in sqlite? I found a high-level summary of Wikipedia's approach here (TLDR: the later user is taken to an "edit conflict" page), but I don't know what the backend implementation is that allows this to occur, due to the race condition I mentioned above.
(4) By Stephan Beal (stephan) on 2022-07-03 05:33:42 in reply to 3 [link] [source]
Thanks for replying! I appreciate that you took the time to help me out.
With my apologies: i thought we were in the Fossil SCM forum and that you were specifically talking about Fossil's wiki system, and thus my answer was specific to that platform. Sorry about that!
Out of curiosity, do you know how Wikipedia might implement this logic in their RDBMS...
i have no clue how Wikipedia does it.
but I don't know what the backend implementation is that allows this to occur, due to the race condition I mentioned above.
If their db is centralized, the race-check can possibly always be caught and the 2nd person to save could be alerted. Wikipedia knows exactly when an edit starts and when the finale save would occur, so they can see overlaps in that time frame. If a second save would overlap a first save made 0.001ms earlier, the transaction for the 2nd could be rolled back. To be accurate, that check would presumably have to be implemented as a DB-side after-insert/update trigger:
https://www.sqlite.org/lang_createtrigger.html
However, i'm in no way an expert on such details, but this forum has many experts. Perhaps one of them can provide a concrete approach to doing this.
(5) By Simon Slavin (slavin) on 2022-07-03 08:09:52 in reply to 1 [link] [source]
To do this in the most polite informative way, the way you describe, you need to keep a central table of who entered edit mode on which page and when.
It won't be easy to handle this process because of the way the web works: people can abandon an edit session just by quitting their browser, and unless you trap this and try to do a server communications process when it happens (and please don't do that because it's immensely annoying for a user on a slow connection or if it otherwise takes a long time) your server won't be informed.
Slightly less polite and informative just involves each client keeping a note of which version was current when they started editing, checking to see whether it's still current when they save their changes, and explaining what happened to any editor who didn't finish first. Unless you're up to performing a complete 'diff' on the two new versions, as WikiMedia does, which requires a huge amount of programming, sooner or later you're going to get a complaint from one of your editors that they lost all their work.
This is the sort of thing that requires calmly thinking through possibilities, with pencil and paper if necessary, before you start writing code. But the way you asked the question suggest you already knew that.
(6) By Wout Mertens (wmertens) on 2022-07-03 08:19:56 in reply to 1 [link] [source]
I'd like to chime in with a big-picture solution: CRDTs (https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type)
In particular, an implementation like https://docs.yjs.dev/getting-started/a-collaborative-editor
Your problem is conflict management; CRDTs remove most conflicts.
The way it works is that each operation on a document is stored separately, and as often as possible, everybody sends their operations to everybody else. A CRDT then manages the merging so that everybody has the same document state (which hopefully is a sane document).
So the way that could work for you is that for each wiki page you maintain a yjs document on the server, and people edit that, sending their changes whenever they can. Then you also provide a "publish" button, which copies the current yjs document to the wiki database.
This way, you can:
- support offline editing
- support multiple editors at once (a rare situation mind you)
- keep the full editing history of the entire document
It may not be for you but I thought I'd mention it.
Cheers,
Wout.
(7) By anonymous on 2022-07-03 14:48:52 in reply to 1 [link] [source]
Make your save function atomic by wrappping a database transaction around its database accesses.