SQLite Forum

Timeline
Login

47 forum posts by user wmertens

2021-10-09
07:19 Reply: Updating in-process connections on write (artifact: 69d62c6e3d user: wmertens)

I'll set up a test rig and experiment. Possibly just reading data_version might be enough then.

05:29 Reply: Updating in-process connections on write (artifact: 0851b43f3d user: wmertens)

Ok thank you, that validates me using two connections.

My writes are multi-step affairs (from event sourcing processing) so the transactions should be hidden from reads.

05:23 Reply: Updating in-process connections on write (artifact: c52197c22b user: wmertens)

Yes, the latter, I pass SQLITE_OPEN_READONLY during open().

I wish there was something like PRAGMA really_check_for_change :)

2021-10-08
10:57 Post: Updating in-process connections on write (artifact: b7718ee466 user: wmertens)

Hi,

I'm using the node-sqlite3 library in my program and I'm opening 2 connections to the same DB file in the same process, one read-only (RO) and the other for transacting (RW).

I'm doing this so that reads from the application don't see intermediate values during transactions, but maybe that's misguided. In any case it seems easier than wrapping everything in read transactions.

The problem I'm having is that sometimes, after completing a write transaction, the RO connection doesn't "see" the change. I would have expected connections from the same process to notify one another.

As a workaround, I'm setting pragma user_version in the transaction and then polling the RO connection (using a prepared query) until it sees the same user_version.

Is there a better way to do this?

Thanks,

Wout.

2021-05-22
20:06 Reply: Building a queue based on SQLite (artifact: c983aa9c7a user: wmertens)

I made a queue with sqlite in JS, it's reasonably straightforward, just make sure to keep transactions short so inserting doesn't block.

https://github.com/StratoKit/strato-db/blob/master/src/EventQueue.js

I would definitely use a db file per queue. A modern filesystem won't have issues with 300k files in a single directory, but you can use some subdirectories if you want.

Sqlite should perform fine IMHO, but you'll probably need to use multiple processes.

2021-02-26
17:39 Reply: How do I create a "rolling cursor" ? (artifact: 2cfa137263 user: wmertens)

Keysets won't work for sorting in different directions. They're also unnecessary, syntactic sugar (unless they can be optimized somehow).

I implemented general cursoring using nested and/or, you can see the code at https://github.com/StratoKit/strato-db/blob/master/src/JsonModel/JsonModel.js#L375-L390

The way it works is as follows: given 3 columns a, b, c you sort on, you keep values v0, v1, v2 of the last returned result, and use it to find the next value with

a >= v0 && (a != v0 || (b >= v1 && (b != v1 || (c > v2))))

if b is sorted descending the condition becomes

a >= v0 && (a != v0 || (b <= v1 && (b != v1 || (c > v2))))

if you sort on an extra column d you need to replace the last condition with

c >= v2 || (c != v2 || (d > v3))

This is in one direction. If you want to page backwards, you have to invert the cursor conditions and sorting.

So basically, you have to do a limit query, remember the last values and page by generating a nested comparision condition while retaining the sorting.

Cheers,

Wout.

2021-02-17
15:08 Edit reply: SELECT skips empty fields ? (artifact: 71b5f2aadb user: wmertens)

I tried using fancy row values for paging, but in the end the only way I could make things work with mixed sort directions was by composing individual compares. See https://github.com/StratoKit/strato-db/blob/master/src/JsonModel/JsonModel.js line 375

Anyway, perhaps you wanted to use ifnull(val, replacement)?

15:07 Reply: SELECT skips empty fields ? (artifact: 16b25754ef user: wmertens)

I tried using fancy row values for paging, but in the end the only way I could make things work with mixed sort directions was by composing individual compares. See https://github.com/StratoKit/strato-db/blob/master/src/JsonModel/JsonModel.js

Anyway, perhaps you wanted to use ifnull(val, replacement)?

2021-02-07
11:43 Reply: Tiny C99-ism in sqlite3.c breaks C89 build compatibility (artifact: 632b204860 user: wmertens)

I have to wonder why you don't compile with c99? It's been more than 30 years since c89...

2021-01-02
21:35 Reply: Parallel async page fetching from (read-only) db? (artifact: 6a7d89e276 user: wmertens)

But how will the FaaS SQLite instance access the storage? SQLite does not support writes on network filesystems?

18:33 Edit reply: Parallel async page fetching from (read-only) db? (artifact: d48db73475 user: wmertens)

SQLite requires local storage, so unless you want to pay extremely big bucks, you're better off using a pool of connection workers.

If you want to improve parallelism beyond that, it has to happen at the application layer.

FaaS and SQLite don't mix, except if your db is completely read-only and you have a mechanism for updating it and can handle the rollout lag inconsistencies.

18:32 Reply: Parallel async page fetching from (read-only) db? (artifact: 180bd5e26f user: wmertens)

SQLite requires local storage, so unless you want to pay extremely big bucks, you're better off using a pool of connection workers.

If you want to improve parallelism beyond that, it has to happen at the application layer.

FaaS and SQLite don't mix, except if your db is completely read-only and you have a mechanism for updating it and can handle the rollout last inconsistencies.

15:51 Reply: Parallel async page fetching from (read-only) db? (artifact: 9783d687f8 user: wmertens)

Yes, but the Node docs also clarify that they don't help with i/o because Node is better at it.

SQLite doesn't do threads, so node worker threads running SQLite synchronously will actually see some benefit.

A worker pool will most likely still be a good idea.

10:39 Reply: Parallel async page fetching from (read-only) db? (artifact: 1d23de4a06 user: wmertens)

Define what you mean by "shared memory that WAL uses"

If you use WAL journaling, it creates a .wal file and a .shm file. All processes opening that DB have access to the block of memory represented by the .shm file.

So if that shared block would contain the page cache, that would optimize memory use for the system, but I can imagine that being difficult to orchestrate.

07:29 Reply: Parallel async page fetching from (read-only) db? (artifact: 7c90893579 user: wmertens)

All pages read from persistent storage via an OS syscall are cached in the SQLite3 page cache.

I presume the shared memory that WAL uses isn't used for the page cache? At first glance, that seems like a nice optimization...

@dforsber: > In other words, I get the data much faster with let's say 10 workers compared to main thread only for the same amount of rows fetching from SQLite3 db.

I'm also looking at better-sqlite for my https://www.npmjs.com/package/strato-db wrapper. Did you try with 1 worker vs main thread?

The problem with better-sqlite and http servers is that it isn't asynchronous, and thus it blocks the server for however long the queries run. I opened https://github.com/JoshuaWise/better-sqlite3/issues/32 for that. Now that it can be run in WebWorkers, I wonder what the effect of multiple threads is on throughput and memory use.

It seems to me that, given unlimited cores, running multiple readers at once will only make the query processing run in parallel, not the query I/O. So if it takes 0.5 ms to get all the data and 0.2ms to process it all (in SQLite+JS), then N worker threads would reduce that to 0.5ms + 0.2ms/N. Which seems like a minimal gain.

However, a single worker thread would free up the main thread for HTTP handling, so that's definitely a plus for latency. 100 simultaneous requests would still have to wait their turn for results, but they would get immediate HTTP responses.

2020-12-23
05:55 Reply: Possible Bug - View behaves differently to Table in SELECT (artifact: baa7933954 user: wmertens)

That's quite a database! Thanks for the interesting schema.

Unrelated question, I've never seen the [] surrounding identifiers before, and I can't find them documented in the SQLite docs either. From experimenting, they seem the same as using double-quotes?

I see you mix their use, in your examples you use both [e].[id] and e.id. Is there a particular reason you use it?

Thanks,

Wout.

2020-11-04
15:31 Reply: Disparity in results using like and '=' (artifact: ca301f720d user: wmertens)

I know you found it but next time you can do "genre like 'Jazz%' and genre <> 'Jazz - Fusion'"

2020-11-02
19:31 Reply: JSON literals, how they work (artifact: 91d09974c3 user: wmertens)

Ok, so json() is a sort of proprietary "CAST". Now I understand thanks!

13:43 Reply: JSON literals, how they work (artifact: 41c588bf2b user: wmertens)

But how does JSON('true') then work? What does it convert to? In theory, it should be a string but when I provide 'true', the JSON for a boolean, it thinks it's '"true"'

13:20 Post: JSON literals, how they work (artifact: d3677d7789 user: wmertens)

Hi all,

I'm wondering about why these queries return what they return. I would expect them all to return the same thing, the first result:

sqlite> select json_set('{}','$.t',json('true')); {"t":true}

sqlite> select json_set('{}','$.t','true'); {"t":"true"}

sqlite> select json_set('{}','$.t',CAST('true' as JSON)); {"t":0}

How does it work?

Wout.

2020-10-07
10:30 Reply: Performance Issue: How can I increase a performance? (artifact: b69da08977 user: wmertens)
  • Did you run ANALYZE?
  • What is the result of adding EXPLAIN QUERY PLAN in front of your select query?
  • What is the disk type, SSD I hope?
  • your index of (list_id, item_id) still means scanning for modified_date_time, how many rows does it need to scan? Did you try adding the index (list_id, item_id, modified_date_time)?
2020-10-01
15:48 Reply: Draft Doc on SQLite SAVEPOINT, and its internal implementation (artifact: 01da863fea user: wmertens)

Ah yes, that is nice! I wasn't even aware of it, but I also use BEGIN IMMEDIATE mostly.

2020-09-30
10:14 Reply: Draft Doc on SQLite SAVEPOINT, and its internal implementation (artifact: af082a4226 user: wmertens)

Upon reading this, I'm missing an explicit pointing out of the surprising difference that is promised in the beginning, and the standard definition that other database engines are implementing instead.

For the rest, looks good to me!

2020-09-26
12:14 Reply: FTS5 rows not searchable with match (artifact: db41a55336 user: wmertens)

Silly question: does contentless FTS5 even work on a view? Doesn't it always need a rowid?

07:25 Reply: FTS5 rows not searchable with match (artifact: 87891c708f user: wmertens)

I propose you create a test case too research your issues. Triggers are part of the transaction - how could they not be?

Data not being searchable even after triggers ran is weird.

We don't use triggers, we manage the FTS data from the application so we have more control. We tried contentless, but it is hard to manage since you need to provide the previous content on every update.

We don't encounter the problems you're seeing, in any case.

2020-07-13
05:08 Reply: Charts (artifact: 287f318489 user: wmertens)

Oh wow, that's impressive!

One comment, this is eye-watering ;)

WITH x(x) AS(SELECT $min_x UNION ALL SELECT x+$sample_size FROM x WHERE x<$max_x)
INSERT INTO x SELECT x FROM x;
2020-07-03
10:10 Reply: Drop caches & Sqlite cache management (artifact: 9ce16fa493 user: wmertens)

Reading this makes me wonder, since only the OS has all the information regarding the available memory and other instances of sqlite, would it not be best to leave page caching as much as possible to the OS?

Meaning, only cache pages in memory that you access so much that the round trip to the kernel to get the page from it becomes a bottleneck.

In fact, I'd guess that that is already what is happening :)

2020-06-25
10:23 Reply: Build for Apple Silicon (artifact: 746c184307 user: wmertens)

The question is if the Mac Mini will use the same CPU as the iPad Pro (likely) and the same clockspeed (probably), and if the cooling will be similar (likely).

If the Mac Mini has a larger thermal envelope, it will be faster for high loads.

However, since SQLite is I/O constrained, I expect performance to be completely determined by the speed of the DRAM and the SSD, which might be better specs as the iPad Pro but are likely the same.

2020-06-23
05:49 Reply: Decimal128 (artifact: 0f260b377a user: wmertens)

Or... Or! We could just have some form of DECIMAL in SQLite.

You forgot the other option.

The problem is that money values look a lot like floats but they aren't.

They are countable values with a required precision (4 digits, to avoid divide-by-7-and-add-7-times errors) and display precision (2 digits), PLUS the digit significance changes over the lifetime of the coin, even in the space of years.

So instead of creating an inefficient new type, you can also implement the calculations on this type properly: FIXED_SUM(column, precision = 4) etc. After every step you round to the precision.

Less work, same result but more efficiently.

2020-06-21
16:49 Reply: Decimal128 (artifact: df376d319f user: wmertens)

In the real world the daily rate would be $200 and the weekly rate $800.

Actually, no, not always, this is how one of my customers does it. But I was just illustrating my point, and judging from your other answers I think we're on the same page.

08:52 Reply: table locked when dropping a temp table (artifact: 5c238047de user: wmertens)

Statement 21 is BEGIN and I don't see an END or ROLLBACK.

08:46 Reply: Decimal128 (artifact: d1c08ee2e8 user: wmertens)

Some more about that last point: I am pretty convinced that that sort of rounding problem is what people most associate with floating point precision even though they are not related.

These things crop up when converting from a weekly to a daily rate, for example. If some property is 800$ per week, the daily rate would be $114.29, but a week's worth of daily rate is $800.03.

08:26 Reply: Decimal128 (artifact: fada9f1079 user: wmertens)

0.1 + 0.2 == 0.3 That fails

Well, yes. The binary patterns on both sides are not exactly the same.

When comparing floats, you should test if they are within 1 Epsilon of each other, then they are equivalent.

Furthermore, you should always store all your data in full precision, and display it to the user with their expected precision.

So all in all, a lot of errors come from the fact that languages make it too easy to do the wrong thing. One should be able to express "almost equal" and "round to 2 significant digits" in a non-annoying way. Storing in full precision is easy of course.

There is a caveat with money though: Any intermediate result you show to users should be replaced with its rounded value. Otherwise you run the risk that calculations with irrational numbers (when you're dividing) round to a different last digit than when you repeat the calculation with the rounded values. This problem would also happen with decimal128.

2020-06-20
08:35 Edit reply: Performance issues with query parameters vs. static strings (artifact: a177d6e22c user: wmertens)

My take on this:

  • Since we moved to the forum, we seem to be getting a lot more traffic. All good content, so that's a win
  • I read the forum in my email and reply on the forum. About 50-50 on my phone.
  • Global internet traffic is about 50% mobile (ok I just picked some random site, I couldn't find a more reputable source offhand, I'm on my phone)

So IMHO the people throwing absolutes around are wrong:

  • clearly there is a significant portion of phone users (drh should have the exact proportion in the Fossil logs)
  • clearly a desktop keyboard is better suited to writing about SQL
  • clearly a big screen can show more information than a small screen
  • clearly being able to read the forum anywhere is something people choose to do.
08:35 Reply: Performance issues with query parameters vs. static strings (artifact: ec3527101e user: wmertens)

My take on this:

  • Since we moved to the forum, we seem to be getting a lot more traffic. All good content, so that's a win
  • I read the forum in my email and reply on the forum. About 50-50 on my phone.
  • Global internet traffic is about 50% mobile (ok I just picked some random site, I couldn't find a more reputable source offhand, I'm on my phone)

So IMHO the people throwing absolutes around are wrong: * clearly there is a significant portion of phone users (drh should have the exact proportion in the Fossil logs) * clearly a desktop keyboard is better suited to writing about SQL * clearly a big screen can show more information than a small screen * clearly being able to read the forum anywhere is something people choose to do.

2020-06-18
05:26 Reply: Are result values always the same? (artifact: a3738b4684 user: wmertens)

You might get a speed-up if you index the checkbox (sparsely, WHERE checkbox=1) and make the update condition WHERE pl_id=? OR checkbox=1

2020-06-13
11:41 Reply: unable to open a database file in a btrfs readonly snapshot filesystem (artifact: 9fc9d29f73 user: wmertens)

What you could do is, after a snapshot, mount it as read-write, and open the db so the state is consistent, and maybe also changing the journaling mode.

Note that this will increase the storage since it will rewrite some blocks that can no longer be shared with other snapshots.

2020-06-05
19:56 Reply: Database WAL snapshot (artifact: 6ce9d6d81d user: wmertens)

Would it not be more intuitive if the sqlite snapshot code created the needed wal structure if it's missing?

2020-05-29
17:08 Reply: SQLite turns 20 (artifact: 20132b6197 user: wmertens)
           ~                  ~
     *                   *                *       *
                  *               *
  ~       *                *         ~    *
              *       ~        *              *   ~
                  )         (         )              *
    *    ~     ) (_)   (   (_)   )   (_) (  *
           *  (_) # ) (_) ) # ( (_) ( # (_)       *
              _#.-#(_)-#-(_)#(_)-#-(_)#-.#_
  *         .' #  # #  #  # # #  #  # #  # `.   ~     *
           :   #    #  #  #   #  #  #    #   :
    ~      :.       #     #   #     #       .:      *
        *  | `-.__                     __.-' | *
           |

"""""""""""

     *     |         | ||\ |~)|~)\ /         |
           |         |~||~\|~ |~  |          |       ~
   ~   *   |                                 | *
           |      |~)||~)~|~| ||~\|\ \ /     |         *
   *    _.-|      |~)||~\ | |~|| /|~\ |      |-._
      .'   '.      ~            ~           .'   `.  *
  jgs :      `-.__                     __.-'      :
       `.
"""""""""""
         `-.._                             _..-'
""""-----------""""```


```

2020-05-24
10:46 Reply: ORDER BY not working for a specific DB/table (artifact: 64c90f0402 user: wmertens)

On this forum neé mailinglist we like to flog horses until they are well and truly deceased ;-)

Please ignore us while we frolic in nerdy thoughtscapes.

2020-05-22
21:35 Reply: ORDER BY not working for a specific DB/table (artifact: 26351c5293 user: wmertens)

Very interesting! However, I’m confused about your “exactly rounded” scenario, aren’t there always base-10 rational numbers that can’t be specified exactly as a base-2 floating point number?

I’d like to find some documented consensus on how to treat currency in binary. Fixed point arithmetic just doesn’t seem right to me, and I’d like to think that even a mere 64 bit floating point number has enough precision to represent all the money in the world. It seems to me that one should convert currency to binary once, and then all arithmetic to be done in binary, only converting it back to currency when showing results.

However, that approach can cause rounding to the nearest significant digit to happen one way or another, depending on which way you do a mathematically identical calculation. What is the best approach? When you calculate e.g. pairs that sum up to a fixed total, make sure to calculate the other half of the pair using the rounded first half?

2020-05-20
08:52 Reply: DATA RACE: Found in sqlite3.c (artifact: facfbd2ed9 user: wmertens)

Do you mean that you tested this with working file locks, and it triggers the breakpoints simultaneously?

I think the reason you're not getting replies is because it requires very specic knowledge to verify your findings, and most of us don't have it.

If you could post a step-by-step of how to reproduce, that would really help!

2020-04-07
20:19 Reply: 3.31.1: docs as PDF? (artifact: 796f1ae8cc user: wmertens)

Actually, this is what Web Bundles are meant to do:

A Web Bundle is a file format for encapsulating one or more HTTP resources in a single file. It can include one or more HTML files, JavaScript ...

It has support for signing, and basically allow you to provide an offline copy of an entire website. See also the explainer.

Sadly it doesn't quite replace PDF because PDF has provisions for paging afaik, and this is just a way to encapsulate websites.

2020-04-06
21:38 Reply: 3.31.1: docs as PDF? (artifact: e010a540e4 user: wmertens)

I'm of the contrary opinion: I hate pdf files. They don't reformat for your screen, they're hard to parse, they're an endless source of crash bugs, and they are just good enough to make people not look for alternatives.

In the meantime, we could have had something based on light weight formats like epub, mhtml or now web bundles, but the inertia is tremendous. It makes me a little sad.

2020-03-24
14:40 Reply: Welcome (artifact: c23e770b32 user: wmertens)

Actually, I like that the messages are only the message and none of the thread, it saves on resource usage. I have a threading email client (the GMail web interface), I only archive emails so I always have the context locally.

I like how the forum works and how fast it is. My only gripe is that emails are sent immediately, and then when a poster edits their post that sends another email. I'd prefer some delay, so posting only sends an email after 30s or so, and if you start editing the post it delays it for 5 minutes or until you save + 30s.

2020-03-23
11:33 Reply: Renaming a database (artifact: 64c854682c user: wmertens)

A problem occurs if you use e.g. WAL mode and it actually uses 3 files per database. You can't rename them atomically. You could rename a parent directory.

Anyway, if you want to move "archived" records out of the way and always have a certain filename contain the latest data, it would be better to create the archive db, copy records into it and then delete them from the original once the copy completed. It's more work but no weirdness can ensue from renames.

2020-03-14
19:33 Reply: sqlite write performance is slower than the filesystem (artifact: e8bba5bedb user: wmertens)

Try removing the VACUUM call, there's no empty pages to defragment anyway.

Furthermore, you're doing a bunch of PRAGMA calls that may or may not help. Try not doing those.