SQLite Forum

Timeline
Login

36 forum posts by user example-user

2021-09-28
21:04 Reply: Can I detect if the disk supports WAL mode? (artifact: 1d79822528 user: example-user)

A VM/docker are different things, but when you use the Docker CLI you may also be using a VM underneath too.

I agree with you that Docker is just the latest fad, and has lots of issues.

I am using it to easily distribute a CLI: docker run x will work on Mac, Linux and Windows, each having their own set up of VM's etc that I do not need to configure. I also want to use it as a basic sandbox.

As I understand, when the host is Linux, and the guest is Linux, the process runs as if it just a normal Linux-host process, just with extra sandbox-like protections applied to the process. In the basic cases I tested, WAL mode seems to work fine here (I guess because the mmap memory is managed by the single Linux-host).

When the host is Mac OS or Windows, a VM is used like this (which breaks WAL):

Mac OS docker levels

18:52 Reply: Can I detect if the disk supports WAL mode? (artifact: cf3c224ac2 user: example-user)

Thanks, I will take a look at the tests.

Cross-process memory access is the antithesis of containerisation.

True, but containers kind of work like firewalls where you whitelist any files so the guest can read/write to the host.

https://sqlite.org/wal.html says:

The wal-index is implemented using an ordinary file that is mmapped for robustness

So I guess mmapped files only work on the same OS.

Linux host and Linux guest seems to work (same Linux kernel instance), but a non-Linux host always results in a corrupt DB (Linux kernel runs in a VM).

17:17 Post: Can I detect if the disk supports WAL mode? (artifact: 047f1e7542 user: example-user)

I have a process that defaults to reading/writing a SQLite DB in WAL mode (for the concurrent readers and improved write performance).

An issue that I have (prev forum posting) is that the process can run in a Docker container or a VM.

I think VM/docker volumes do not work with WAL mode for the same reasons as WAL mode not working on network disks (guess: two OS's do not know about each others locks).

Am I able to detect if the disk directory provided will work flawlessly with WAL mode?

At the moment WAL mode seems to corrupt the database if given a VM/docker volume.

I want to proactively detect if this is the case and exit my process with an error.

01:25 Edit: Mac M1/Docker/QEMU corrupts database (artifact: 29b17a0cff user: example-user)

Hello,

My computer is a Mac M1, I have Docker Desktop installed.

I am running a container, sharing a directory from my host to the guest:

docker run --platform -v /m1-host-dir:/vol linux/amd64 ubuntu

When Docker for M1 runs based amd64 containers, I think it uses QEMU to emulate a amd64 CPU on the arm based M1.

This is slow but seems to work for code.

But the issue I have is that there is SQLite code running inside the container, writing to a DB file that should be persisted to the host disk (I assume via some sort of feature to share directories host->QEMU).

I am getting strange results; the process continues as if it can write to the database, but both inside of and outside of the container sqlite3 seems to show a frozen version of the file (not showing the most recent writes). I also get a range of errors.

Is there any way this could work?

I assume it does not work because the way SQLite does locking/wal writes is by using the same OS, and docker for Mac M1 is probably emulating complete second OS with a Linux kernel.

Edit: This seems to be the set up https://news.ycombinator.com/item?id=25454121

00:17 Post: Mac M1/Docker/QEMU corrupts database (artifact: d2432b5dc2 user: example-user)

Hello,

My computer is a Mac M1, I have Docker Desktop installed.

I am running a container, sharing a directory from my host to the guest:

docker run --platform -v /m1-host-dir:/vol linux/amd64 ubuntu

When Docker for M1 runs based amd64 containers, I think it uses QEMU to emulate a amd64 CPU on the arm based M1.

This is slow but seems to work for code.

But the issue I have is that there is SQLite code running inside the container, writing to a DB file that should be persisted to the host disk (I assume via some sort of feature to share directories host->QEMU).

I am getting strange results; the process continues as if it can write to the database, but both inside of and outside of the container sqlite3 seems to show a frozen version of the file (not showing the most recent writes). I also get a range of errors.

Is there any way this could work?

I assume it does not work because the way SQLite does locking/wal writes is by using the same OS, and docker for Mac M1 is probably emulating complete second OS with a Linux kernel.

2021-05-06
17:29 Post: Foreign key that is never enforced? (artifact: 013dde7e5f user: example-user)

Hello,

Is it possible to have a foreign key that is never enforced when writing to the database?

I'd like the foreign key to exist as many GUI tools read the schema and allow jumping to the related rows.

I am representing a JSON graph as SQL tables/rows, and some relations are allowed to be violated.

As an example, there are two objects, customer, subscription. A customer can be deleted, but a subscription cannot be deleted.

A FK exists on subscription: FOREIGN KEY(customer) REFERENCES Customer(id).

When the customer is deleted I do not want any error to be thrown.

2020-10-29
00:07 Reply: Is `JSON` a valid `create table` type? (artifact: 6a5723ff1c user: example-user)

Thanks Keith thats helpful, it makes sense now.

It seems I would be better off setting TEXT instead of JSON.

2020-10-28
22:08 Post: Is `JSON` a valid `create table` type? (artifact: 36d44878d3 user: example-user)

E.g:

CREATE TABLE row_a (
row_a_id INTEGER PRIMARY KEY AUTOINCREMENT,
doc JSON NOT NULL,
);
  • Why does SQLite seem to take any string as a valid create table type?

  • Is JSON a valid type? Ive seen it in example snippets online.

  • Is there any benefit to defining the column types as SQLite seems to accept anything? What if I used SOMEINVALIDTYPE for all of the types?

2020-10-05
14:37 Reply: What is the poll frequency of `sqlite3_busy_timeout`? (artifact: f36022a903 user: example-user)

I see, that makes sense.

The default callback using int tmout = ((sqlite3 *)ptr)->busyTimeout is private SQLite state that would not be used by the application.

2020-10-04
23:18 Reply: What is the poll frequency of `sqlite3_busy_timeout`? (artifact: f9bbb14f0e user: example-user)

Thanks for the explanation,

In your example you are calling both:

1. sqlite3_busy_timeout(db, val)
2. sqlite3_busy_handler(db, cbFn, db)

But for each db connection, you can only use one of these? The second call will clear the first's behaviour?

So to summarise:

  • sqlite3_busy_timeout

    • Will block app code, poll the db file until either (lock is achieved OR timeout expires)
  • sqlite3_busy_handler

    • Completely application defined polling - SQLite itself will not sleep or poll.
      • Return true = SQLite will immediately try to get a lock.
      • Return false = SQLite returns BUSY for the statement.
      • On sleep = SQLite is waiting for a return value.
19:42 Reply: What is the poll frequency of `sqlite3_busy_timeout`? (artifact: 81450f23b4 user: example-user)

Thanks.

Does the sqlite3_busy_handler callback function get invoked after each of those intervals, or is the intention to let the application sleep inside the callback function then return true to define its own intervals?

18:40 Post: What is the poll frequency of `sqlite3_busy_timeout`? (artifact: 00312b3d02 user: example-user)

As far as I understand sqlite3_busy_timeout, SQLite will poll up to the time provided to the argument.

https://www.sqlite.org/c3ref/busy_timeout.html

How does SQLite know when the DB file has been unlocked, and how often does it poll to get this status?

Thanks

2020-09-13
00:28 Reply: Is data in the OS buffers faster to read? (artifact: 48ddb51796 user: example-user)

This is true that the application process will not have visibility or can rely on the behaviour of the OS.

But, SQLite does give you the ability to hint that writes can be stored in the buffer (with pragma sync=x), and you might be able to measure it with ebpf/dtrace probes for kernel functions.

Im just wondering if I had two processes that exhibit that pattern (new data written by A, immediately read by B) if they would see oodles of perf increases.

00:12 Post: Is data in the OS buffers faster to read? (artifact: bc72c569f2 user: example-user)

E.g. If there are two processes, A, the writer, and B the reader.

Both are in wal mode, sync=normal.

If process A writes a large amount of data, and the OS has not fsync'd that to disk yet, are those writes now in the RAM of the OS buffers?

Will process B's reads now be faster as some of the data is still in the RAM backed OS buffers which should be faster to read than the disk based data?

How much faster would it be? Would it be Redis-like speeds for simple lookups?

2020-09-06
20:55 Post: Process vs OS level durability (sync=NORMAL, WAL) (artifact: 9d6f13e346 user: example-user)

https://www.sqlite.org/pragma.html#pragma_synchronous

WAL mode is always consistent with synchronous=NORMAL, but WAL mode does lose durability. A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash. Transactions are durable across application crashes regardless of the synchronous setting or journal mode.

This question is about sync=NORMAL in WAL mode:

Why is it that durability is maintained over process crashes, but not OS crashes?

Where is the OS level state kept?

Does this mean that transactions are only durable after a checkpoint process is completed moving data from the wal file into the db file?

In the above quote, what does "might" mean? Is that a low probability?

Thanks

14:27 Reply: Can Richard do some YouTube videos explaining transaction implementation? (artifact: fca7ba5f03 user: example-user)

Ok

2020-09-05
19:31 Reply: Can Richard do some YouTube videos explaining transaction implementation? (artifact: 5c2382a1c1 user: example-user)

It was just a suggestion that maybe a video or two would be awesome.

I didn’t think I would get such strong opposition

12:12 Reply: Can Richard do some YouTube videos explaining transaction implementation? (artifact: 67f2c91af1 user: example-user)

To be clear:

  • Your point is:

    • Video = wasted time because code changes.
    • Comments = Not wasted time.
  • My point:

    • Video and comments = both time spent communicating about code that changes.
    • Comments are worth doing therefore so is video.
      • Inverse: If video is a waste because code changes, comments are also a waste.
11:55 Edit reply: Can Richard do some YouTube videos explaining transaction implementation? (artifact: defc85c595 user: example-user)

That's completely not what Stephen said - he hints at the idea that making extraneous effort to hard-document (Like books, video, etc.) soft code (stuff that may change daily) is a great waste

  • My perspective is that it is what he said.

  • Code comments and video both aim to do achieve the same goal: "put my understanding into your brain".

  • So in general both comments and video are "time spent by one human to communicate to another".

  • Because they are both in the same set, the same logic applies.

    • Comments are also a waste if the underlying code changes, the same as video based communication.

      • This is where my point comes from - Im saying if its worth commenting line by line, perhaps its also worth making a few hours of video based content to explain an overview of a feature.

Really? You would consider it so? Well then, how many videos have you made and posted that serves as code-comments to your code?

  • Comments are only used because they are text based and are the most efficient way of communicating.

  • If editors had a standard way of attaching video, audio and images inline with code, I think they would be quite popular.

  • I do not do "video as code comments" because it is more expensive with my current tools, but if I were to explain a code base to a newcomer, a face to face interaction explaining the system at a high level is often much more useful than starting with the lower level comments.

Offering unfounded "reasons" why it should do be done in service of the community/code/greater good/support/etc. is just a tad, well, let's say "naughty".

  • As above, I think the reasons are valid.
11:50 Reply: Can Richard do some YouTube videos explaining transaction implementation? (artifact: e72b695b86 user: example-user)

That's completely not what Stephen said - he hints at the idea that making extraneous effort to hard-document (Like books, video, etc.) soft code (stuff that may change daily) is a great waste

  • My perspective is that it is what he said.

  • Code comments and video both aim to do achieve the same goal: "put my understanding into your brain".

  • So in general both comments and video are "time spent by one human to communicate to another".

  • Because they are both in the same set, the same logic applies.

    • Comments are also a waste if the underlying code changes, the same as video based communication.

      • This is where my point comes from - Im saying if its worth commenting line by line, perhaps its also worth making a few hours of video based content to explain an overview of a feature.

Really? You would consider it so? Well then, how many videos have you made and posted that serves as code-comments to your code?

  • Comments are only used because they are text based and are the most efficent way of communicating.

  • If editors had a standard way of attaching video, audo and images in place of code, I think they would be quite popular.

  • I do not do "video as code comments" because it is more expensive with my current tools, but if I were to explain a code base to a newcomer, a face to face interaction explaining the system at a high level is often much more useful than starting with the lower level comments.

Offering unfounded "reasons" why it should do be done in service of the community/code/greater good/support/etc. is just a tad, well, let's say "naughty".

  • As above, I think the reasons are valid.
2020-09-04
20:54 Edit reply: Can Richard do some YouTube videos explaining transaction implementation? (artifact: 2093f3ce1a user: example-user)

Does googling for "how do database transactions work?" not tell you what you're wanting to know?

No because SQLite is embedded so the implementation details are different from a typical server db (that implements isolation by controlling all the clients in a central process).

I'm more interested in how SQLite implements transactions specifically.

I think Richard could also provide a lot of historical context as to why it was implemented the way it was.

and it seems a shame to waste time documenting, in a "permanent" video, any internal implementation details

Well, commenting and writing code isn't "wasting time", so I would consider the video as code comments encoded in video and audio. Also the time spent is work towards the goal of supporting SQLite until 2050 - any programmers working on the code base could watch the video.

20:52 Reply: Can Richard do some YouTube videos explaining transaction implementation? (artifact: 7125aae6bd user: example-user)

Does googling for "how do database transactions work?" not tell you what you're wanting to know?

No because SQLite is embedded so the implementation details are different from a typical server db (that implements isolation by controlling all the clients in a central process).

I'm more interested in how SQLite implements transactions specifically.

It differs from server based dbs as they have a single central process that controls isolation. SQLite uses OS locks.

I think Richard could also provide a lot of historical context as to why it was implemented the way it was.

and it seems a shame to waste time documenting, in a "permanent" video, any internal implementation details

Well, commenting and writing code isn't "wasting time", so I would consider the video as code comments encoded in video and audio. Also the time spent is work towards the goal of supporting SQLite until 2050 - any programmers working on the code base could watch the video.

19:54 Post: Can Richard do some YouTube videos explaining transaction implementation? (artifact: b4cf8cf87a user: example-user)

On Federico Razzoli's YouTube channel Richard mentions how he does not have time to explain transactions (at around 30min):

https://youtu.be/2eaQzahCeh4?t=1800

I know the code is commented and there is text based documentation, but I think it would be interesting and useful to have Richard walk through at a high level how transactions work.

Richard, is this a possibility? Would any one else be interested in this also?

2020-06-17
10:28 Post: Can SQLite be used with AWS Elastic File System (EFS)? (artifact: 8e84712c9d user: example-user)

The SQLite docs mention not to use NFS because some implementations have broken locking implementations.

But what if the locking implementation works? Can SQLite be used then?

In particular AWS have just released Elastic File System (EFS) which is a type of NFS:

https://aws.amazon.com/blogs/aws/new-a-shared-file-system-for-your-lambda-functions/

In their examples they use Pythons fcntl.flock to demonstrate getting and releasing a lock.

2020-06-08
19:37 Reply: Problem with select from Sqlite Expo. I can't save the array returned by the query through the hook function (artifact: 4e5c0ea7fc user: example-user)
// I think the issue is:
{() => setUsers(temp)}
// This creates a function that you are not calling, so `setUsers` is never executed
() => 
// Replace with:
setUsers(temp)
2020-05-26
22:08 Reply: Best way to observe database operations (artifact: 13bd8a46d5 user: example-user)

but who computes the "update" node, and from what data?

I was thinking of a light layer over the SQLite FFI, so something like:

watch_query("select..", handleEventFunction)

This would be the same process, but possibly different languages (E.g. C inside a Python program).

I think there would need to be ratelimiting. If there are 100 writes per second you may want to try and wait until the writes stop and then re-run the query.

Laudable in principle, but only worth doing if it's actually a net improvement, all things considered.

This is how React and most other front end JS works by default. Most Google web apps do this for example. It is worth doing because the API between the JS VM and the DOM is slow compared to a JS function just updating a JSON structure - so React computes the smallest diff, then uses the slow DOM API to write changes all in one batch.

A web server is typically 10-20 ms

The minimum to the closest CDN is likely 35ms. But any solution has to accept this.

Personally I prefer the diff based API because:

  • If your result set has 1000 rows, and only a few change, you are only sending the changes over the APIs.
    • SQLite -> Host Language -> Network, maybe -> View
21:55 Edit reply: Best way to observe database operations (artifact: 7ef4901361 user: example-user)

I meant computing the difference between two queries.

So the first time you run a query that populates the view initially.

Subsequent changes only update the part of the view that changed.

In the select avg(n) from data case:

  • initial: {a: 2}
  • diff: {update: {a: 2.5}}
  • diff ...
  • diff ...

I am thinking of views in terms of how they work on the web - you have a tree-like document. When your data changes you mutate the smallest number of nodes.

So its like using https://en.wikipedia.org/wiki/Copy-on-write for both the view and the JSON-like structure in your language. If you have a diff event stream you can update only what has changed and keep the unchanged portion immutable.

I think you're using the view model where you paint every pixel on the canvas, which means re-generating 100% of the frame.

21:38 Edit reply: Best way to observe database operations (artifact: b65ba2b190 user: example-user)

I meant computing the difference between two queries.

So the first time you run a query that populates the view initially.

Subsequent changes only update the part of the view that changed.

In the select avg(n) from data case:

  • initial: {a: 2}
  • diff: {update: {a: 2.5}}
  • diff ...
  • diff ...

I am thinking of views in terms of how they work on the web - you have a tree-like document. When your data changes you mutate the smallest number of nodes.

So its like using https://en.wikipedia.org/wiki/Copy-on-write for both the view and the JSON-like structure in your language. If you have a diff event stream you can update only what has changed and keep the unchanged portion immutable.

I think your using the view model were you paint every pixel on the canvas, which means re-generating 100% of the frame.

21:37 Reply: Best way to observe database operations (artifact: 08e9344831 user: example-user)

I meant computing the difference between two queries.

So the first time you run a query that populates the view initially.

Subsequent changes on update the part of the view that changed.

In the select avg(n) from data case:

  • initial: {a: 2}
  • diff: {update: {a: 2.5}}
  • diff ...
  • diff ...

I am thinking of views in terms of how they work on the web - you have a tree-like document. When your data changes you mutate the smallest number of nodes.

So its like using https://en.wikipedia.org/wiki/Copy-on-write for both the view and the JSON-like structure in your language. If you have a diff event stream you can update only what has changed and keep the unchanged portion immutable.

I think your using the view model were you paint every pixel on the canvas, which means re-generating 100% of the frame.

20:08 Reply: Best way to observe database operations (artifact: c3ecec4e72 user: example-user)
Yeh I think that is probably the easiest route.

Reader:
- Query tables, store results in RAM.
- Watch the file for changes.
    - On change, re-run query, diff with RAM results.
    - Take actions on diff result (like update the view).

Writer:
- Write to tables as normal.
19:31 Reply: Best way to observe database operations (artifact: 957ff4f920 user: example-user)

Ill try it but according to these links update hooks only work within a single process (in your example my db_sys hook would not fire on writes to db_client?).

https://stackoverflow.com/questions/48378859/sqlite-update-hook-in-a-multi-process-application https://news.ycombinator.com/item?id=19469033

18:58 Reply: Best way to observe database operations (artifact: f42a2a5d74 user: example-user)

Im just trying to take SQLites idea of "will persist data without a server", and add on top "will respond to write events to the SQLite file". All within a single OS - no distribution.

I think maybe I confused things with the example - the "remote store" may just by another GUI that gets updated with whatever is in the SQLite file - like a live stats dashboard.

I think there are only two ways to do this:

  1. Modify writers to write diff events to the file.

  2. Compute diff events in the reader by some type of comparison with an older snapshot.

I do not want to do 1 because that means I cannot observe SQLite files from other software and respond to their writes. Bedrock is in category 1 above - each node writes its diff events to the file/system.

Im asking if there is an alternative to 2 as it does not sound feasible for large databases.

18:30 Edit reply: Best way to observe database operations (artifact: f9af7ad0fa user: example-user)

You're right that I asked an XY question, but the reason is Y is a general solution/tool that will let me apply it to a large set of specific X's. The general X is:

IPC without a server, message bus, or another process to receive messages.

You have two processes, one writes to a SQLite file using vanilla SQLite. The other process must respond and run a function based on certain create/update/delete operations.

I do not have control over the first process, but I can use any code in the second process.

As an example, the first process could be a GUI database tool, and the second process could be something updating a remote store. The user deletes a row in the GUI, the second process detects this from the file, and deletes something in a remote store.

Using vanilla SQLite is important in the first process as it would allow any SQLite library to write to the file.

I think the issue is that unless the first process records the change event, the second process will not know what changed so will need to run an expensive diff operation by checking all the rows against the older snapshot.

This is mostly an experiment.

18:04 Reply: Best way to observe database operations (artifact: bc9e5678c6 user: example-user)

My understanding is (please correct me if Im wrong):

OS processes and threads are distinct. A single OS process can have many threads. Threads within a single process can share memory (and SQLite C pointers to connections).

Two processes cannot easily share C pointers.

Thanks for the sqlite3_preupdate_hook suggestion - I think this only works on the connection it is set on though? If I have another process with another connection it will not fire?

17:56 Reply: Best way to observe database operations (artifact: 1f5ed1755e user: example-user)

Your right that I asked an XY question, but the reason is Y is a general solution/tool that will let me apply it to a large set of specific X's. The general X is:

IPC without a server, message bus, or another process to receive messages.

You have two processes, one writes to a SQLite file using vanilla SQLite. The other process must respond and run a function based on certain create/update/delete operations.

I do not have control over the first process, but I can use any code in the second process.

As an example, the first process could be a GUI database tool, and the second process could be something updating a remote store. The user deletes a row in the GUI, the second process detects this from the file, and deletes something in a remote store.

Using vanilla SQLite is important in the first process as it would allow any SQLite library to write to the file.

I think the issue is that unless the first process records the change event, the second process will not know what changed so will need to run an expensive diff operation by checking all the rows against the older snapshot.

This is mostly an experiment.

13:03 Post: Best way to observe database operations (artifact: fbb9e00efd user: example-user)

I have two process writing to a single SQLite file.

Id like to get an event/notification of any insert updates or deletes per table when the other process writes to the file.

Whats the most efficient way to do this?

E.g. Process A inserts x rows, Process B gets an event that x rows were inserted. (and the same for updates and deletes).

I think it would be a combination of triggers to log any create/update/delete operations, and using the OS to watch for file changes?

Questions: A. Is there a version number for the SQLite file so I can detect that another process has written to it (ignoring the current processes changes).

B. Is this possible to do in a general way without editing the database schema or using triggers?