Multiple Writers

(1) By pyarlath on 2025-01-03 08:35:02 [link] [source]

I am new to this forum, so please be adjusting of me if i ask some noob level questions.

Why cant sqlite have multiple writers?

My theoretical understanding is if you open a file in append only mode by 2 process, each process can append data without overwriting each other, now i further understand the support for append is weak in other systems apart from linux or may even differ, but then whats stopping to write a module with shared memory or file or something equivalent to maintain a counter and allocate exclusive space for different writers such that they dont overwrite each other in same file a little like MVCC but instead of version of same data its writing different data from different writers..

I understand there will be more issues internal to sqlite to manage other ds but i feel they can managed with some clever solution, the real limitation should be file cannot accept multiple writer, but they do in an append only mode..

Does this click some thought?

(2.1) By Gunter Hick (gunter_hick) on 2025-01-03 11:16:26 edited from 2.0 in reply to 1 [link] [source]

Not even big RDBM client/server systems can do multiple writers that affect the same data. They may have sophisticated locking capabilities down to record (or even field) level locks; this gives the illusion of supporting multiple writers, but only as long as there is no contention for locks. The moment that two "independant" writers want to update the same entity, this illusion breaks down and at least one of the writers is forced top roll back their transaction.

SQLite as an embedded library is designed around file level locks which are relatively fast and easy to handle, but restrict you to one writer per database file at a time.

(4) By pyarlath on 2025-01-03 13:14:00 in reply to 2.1 [link] [source]

I may be wrong but as a DB system it should be able to provide multiple writers to a same sqlite file is my recommendation here, if to writers wants to update same at same time fine we should be able to give them that provision with complete data integrity, what meaning that data makes to them is app driven logic..

I am talking here about lockless system like MVCC where updates to same row can be made by just appending to the file at the same time without any locks. So i think there is no illusion here its a well known practice in DB world... i feel

(6) By anonymous on 2025-01-03 14:51:21 in reply to 4 [link] [source]

I see no sign that you read what Hicks wrote, so you probably won't read this one either. But here's my take on it.

When SQLite was first designed and built, multiple writers wasn't an issue. It is, after all, a C library, linked into a single program. Where would the second writer be?

So it was built and designed for single writers. That's the machinery it started with.

You're talking theory, but the SQLite creators and maintainers have to work with the machinery they've already built. No amount of theory is going to change that machinery. And if you want that machinery to change, you're going to have to take a good, hard look at what's really there, to see what challenges that poses.

Challenge #1: backwards compatibility for the file format. There are literally a trillion SQLite database files out in the world, in use, so this is not negotiable.

SQLite does not simply append data to the file whenever something changes. The format is much more complex than that.

I'll leave the other challenges to more expert users/developers, but the bottom line is this: regardless of theory, there are practical obstacles to overcome. And simply repeating the theory does not make those practical matters go away.

(8) By Chris Locke (chrisjlocke1) on 2025-01-03 15:13:52 in reply to 4 [link] [source]

if to writers wants to update same at same time fine

In your logic, how would this work? Two processes run at the same time both want to update the same record. One wants to write 'foo' and the other 'bah'. Which one wins? It can't be both.

If it has to update two fields so now field 1 gets updated by writer 1 and field 2 is updated by writer 2. You're creating data corruption.

(9) By pyarlath on 2025-01-03 15:50:39 in reply to 8 [link] [source]

Let's take foo and bah case for the same field

Just simplifying with files, assume W1 comes appends f1=foo, W2 comes appends f1=bah to the same fil, when reading, you read the file bottom up since on the last line you found f1 when you read the second line f1 is ignored..

Similarly when 2 writers comes at the same time they append without waiting for any locks.. reader eliminates the duplicates while reading so at writing time there is no locks involved

If I look at what resources would be common then it would be the file size counter incrementing for each append which is an atomic operation on CPU level..

Thus you can enable n writers to come at the same point in time and writer their values..

Depending on your app logic you can either read only last one and ignore all other or give application a chance to select which value it wants to read since theoretically all happened at the same time..

To avoid reader crashing into a writer when it's active you can have an EOP SOP markers to mark valid data ready to be read..

Hope this helps

(10) By Gunter Hick (gunter_hick) on 2025-01-03 17:28:28 in reply to 9 [link] [source]

This is an excellent demonstration of data corruption. You are absolutely losing any semblance of ACID in your transaction.

Assume the set of related changes (aka transaction T1) applied by W1 includes updating f1 from 'old' to 'foo'.
Assume the set of related changes (aka transaction T2) applied by W2 includes updating f1 from 'old' to 'bah'.

Assume that both transactions run independantly and sequentially would have resulted in f1 being 'new'.

There is no way to sort out this mess. T2 has lost Atomicity - it is only partly applied, one of it's changes is lost. The database has lost Consistency - the correct value of f1 is never attained.

(3) By cj (sqlitening) on 2025-01-03 12:59:40 in reply to 1 [link] [source]

https://www.sqlite.org/wal.html

WAL mode used with combining multiple writes with a transaction can dramatically increase performance. Many thousands of writes per second can be performed by combining them.

(5) By pyarlath on 2025-01-03 13:16:22 in reply to 3 [link] [source]

Not sure why you thought of performance is what i am after, i am not my request was just like multiple process can read the same DB file, i wanted to understand what is the challenge to provide multiple writer? specially when systems like append-only data structures exists..

(7) By cj (sqlitening) on 2025-01-03 14:52:38 in reply to 5 [link] [source]

Good starting point. 
https://www.sqlite.org/lockingv3.html

(11) By Nuno Cruces (ncruces) on 2025-01-03 17:42:50 in reply to 5 [source]

So far, you've shown a very superficial understanding of what SQLite is trying to achieve and how.

SQLite wants to pretend that any transactions you ask it to execute, do so as if they were run serially (in some unspecified, but valid, order). Transactions should also be durable, even in the event of a full system crash.

And this is true, even if multiple processes try to execute transactions simultaneously: processes must coordinate through the filesystem (and/or a shared memory file) with no long-lived coordinator process.

Even ignoring the file format, which is fixed for compatibility purposes (and fully specified, you should read up on it), how do you propose that just because append exists, it's somehow easy to support multiple simultaneous writers?

(12) By pyarlath on 2025-01-03 19:07:57 in reply to 11 [link] [source]

I never said I understand sqlite, infact I am noob like i said in the original first post.. but am a big fan of the embedded database strategy..

Transactions in append only file formats are durable and also have order the append order..

I am not asking for a coordinator process but all we need is to do is coordinate of the underlying file size and keep appending ie adding to it as a shared counter..

I completely understand file format compatibility is number 1 priority, but we can always achieve the said by having extra file just like Wal which will be home for multiple writers and possible move data to sqlite format on periodic time..

Again my interest is in suggesting this concept of achieving multiple writer but the community can tell me if this a feasible idea..

(13) By Nuno Cruces (ncruces) on 2025-01-03 19:48:50 in reply to 12 [link] [source]

Transactions are rarely write-only. Transactions typically read something, then write something, possibly based on what was read.

So, in the presence of multiple writers, you need to prove a previous writer did not change what you read before your writes can be committed.

You can be more or less optimistic about this, but, eventually, you need to know what others are/were doing for this to work (and keep the illusion of a serial order).

I'm not sure how you expect “just append the new data without regard for what everyone else is doing” to just work.

That's not the issue.

(15) By pyarlath on 2025-01-04 11:24:41 in reply to 13 [link] [source]

That's called snapshot, Each transaction starts with a snapshot[simply say file size here] so when final commit happens transaction just compares the snapshot[file size] if no data is mutated it remains same else it's changes and we can throw serialization error for the transaction..

Again all this is common practice in PG and other ACID compliant RDBMS and solved problems.. you may want to read up on MVCC and Snapshotting

So wondering what's stopping sqlite from providing multiple writers? 

Till now I could gather file backward compatibility as P1 kind of issue for providing multiple writers..

(18) By ingo on 2025-01-04 14:11:10 in reply to 15 [link] [source]

MVCC and Snapshotting

just read a bit about it. Not just multiple writers but also multiple DB's (files). As speed is not the issue here, you can create multiple db's of the same schema and add one writer to each. Create one db of truth and keep it up to date using the on line back up API. https://sqlite.org/c3ref/backup_finish.html

;)

(19) By anonymous on 2025-01-04 14:53:48 in reply to 15 [link] [source]

wondering what's stopping sqlite from providing multiple writers?

If you're still wondering, then you really haven't thought through what the experts here have been telling you.

Accommodating the illusion of multiple simultaneous writers, while maintaining all the guarantees that SQLite currently makes, will take a very non-trivial effort. I suspect that you're not aware of those guarantees, how they interact with each other, and how seriously they must be taken in this project. (Hint: there are millions of lines of tests that must pass.)

If you really want to understand it, then try to code it up. Make sure that your code meets all of those guarantees. They may be subtle, and they may not be familiar to you yet. But it isn't SQLite until it does.

Just the effort of trying to design such a thing should convince you of the obstacles ahead, for anyone trying to make your wish a reality.

I understand that you may not have the time or resources to make that effort. And that's fine. But as you've presented your case so far, understanding will require you to make an effective effort at understanding.

In the meantime, depending on your other requirements, other databases may meet your needs better. LMDB comes to mind. It does take the MVCC approach you seem to prefer. On the other hand, to do that simply and efficiently, it cannot enact most of SQLite's guarantees. Only you can decide which guarantees are important to your application.

(20.1) By Nuno Cruces (ncruces) on 2025-01-04 17:16:40 edited from 20.0 in reply to 15 [link] [source]

What you just described implies writers serializing (making progress one-at-a-time) when they commit.

Without a central coordinator, there's always a (small) period where some things have to happen in order. Even with a central coordinator, that's possibly the best way to implement it.

Having said that, in SQLite's WAL mode readers do use a snapshot; writers block each other.

But if you're willing to build from a branch you can use BEGIN CONCURRENT. Then you get the behavior you're describing.

One branch to use for this is bedrock; that should be well maintained, and include the feature.

If it works, I'm sure SQLite developers would like the feedback. Unlike other features (e.g.), this should be fully compatible at the file format level.

(14) By cj (sqlitening) on 2025-01-03 22:50:57 in reply to 12 [link] [source]

A queue or multiple temp files can be created (doesn't have to be in SQLite)
If using a server this is not needed.  SQLite only takes milliseconds between writers.

Create Thread Writer(UserNum)to hthread  //create threads
// WaitForSingleUser hThread             //wait for thread(s) to finish
// Process temp files

THREAD FUNCTION Writer (UserNum AS LONG) AS LONG
 OPEN "temp" + UserNum                  // create database
 create table if not exists t(c text)   // create table
 INSERT into t(c) values(?),binddata    // append
END FUNCTION

(16) By pyarlath on 2025-01-04 11:30:15 in reply to 14 [link] [source]

If it can be done in sqlite then it's the best option as this gives everyone out of the box ability to have multiple writers, I can always create a wrapper that's not the point..

Also having central server is muted point, I would move to ready solutions like PG etc..

The best part of SQLite is it's embedded nature which simplifies a lot of things the only point in my opinion that is holding it down is multiple writers in my view

(17) By cj (sqlitening) on 2025-01-04 12:38:27 in reply to 16 [link] [source]

PG being Postgres.
SQLite may be 50-times faster using transactions (read a recent comparision.)   
Are writers other processes that require TCP or some other way to connect?

https://postgrespro.com/blog/pgsql/5967899

(21.1) By cj (sqlitening) on 2025-01-06 12:17:41 edited from 21.0 in reply to 1 [link] [source]

SQLite CAN have multiple writers (they wait for others to finish)

https://www.sqlite.org/rescode.html#busy
https://www.sqlite.org/c3ref/busy_timeout.html

PRAGMA busy_timeout; 
Pragma busy_timeout=10000     default in milliseconds.  Retry on error error 5, 6