SQLite Forum

Timeline
Login

14 forum posts by user SpacedCowboy

2020-09-21
16:05 Reply: Network filesystems for sqlite DBs (artifact: afde0cca3c user: SpacedCowboy)

This would be worth following up in the NFS case, or if I get around to configuring the clustering filesystem with a userspace personality that understands posix locks, but I don't think it has much impact on the flock() call, judging by the comments in the code:

************************** Begin flock Locking ********************************
**
** Use the flock() system call to do file locking.
**
** flock() locking is like dot-file locking in that the various
** fine-grain locking levels supported by SQLite are collapsed into
** a single exclusive lock.  In other words, SHARED, RESERVED, and
** PENDING locks are the same thing as an EXCLUSIVE lock.  SQLite
** still works when you do this, but concurrency is reduced since
** only a single process can be reading the database at a time.

I had to patch the sqlite code to get flock() to work at all under Linux, but once enabled it did make the critical difference (crashing programs now worked).

I'm not personally too concerned with the locking style in use, as long as it's functional and correct. The queries in use are highly unlikely to be time-consuming, so taking an EXCLUSIVE lock (while reducing concurrency) is likely to be massively outweighed by other processing and network costs in the system as a whole.

Actually, since you mention the bug, I just checked my SQLite amalgamation version number, and it's 3.31.0 so

  • It's possible the flock() call has been patched properly, in which case disregard the above patch comments
  • I would apparently be vulnerable to that bug you mention (gulp: COMMIT not committing...).
  • It seems it's more than past the time to re-do that amalgamation...
2020-09-17
17:12 Edit reply: Network filesystems for sqlite DBs (artifact: 0d899a2fd0 user: SpacedCowboy)

So, just to circle back, since I've spent the best part of today getting a clustered-filesystem up and running on my little raspberry Pi kubernetes cluster...

Here's how standard NFS did, running using Btrfs on the Asustor AS6510T, served out to three of the workers on the cluster:

@k8-w1:~/dbio$ dst/bin/dctest -u /archive/tmp/test.db -i 1 -c -n 10000
Connected to resource: /archive/tmp/test.db
worker  1 value = 29472 after 698.149864 secs

@k8-w2:~/dbio$ dst/bin/dctest -u /archive/tmp/test.db -i 2 -n 10000
Connected to resource: /archive/tmp/test.db
worker  2 value = 29759 after 703.140837 secs

@k8-w3:~/dbio$ dst/bin/dctest -u /archive/tmp/test.db -i 3 -n 10000
Connected to resource: /archive/tmp/test.db
worker  3 value = 30000 after 704.651374 secs

I then set up the iscsi target on the AS6510T to be an ocfs2 filesystem - Oracle clustering filesystem v2. I'll spare anyone reading this the tortuous details, but once it was up and running it was rock solid. Configuring the thing isn't exactly straightforward, though.

Running the same program targeting the iscsi mount-point resulted in core-dumps pretty much as soon as 2 machines were accessing the same file, which wasn't a great start. It seems that with the kernel personality, only flock() locking is supported, not posix locks, which is what SQLite uses by default.

So I reconfigured the SQLite amalgamation with -DSQLITE_ENABLE_LOCKING_STYLE, fixed the compile-time errors with missing #includes, and tried again. This time the program ran but told me it couldn't open the database file when I tried passing the "unix-flock" vfs string to sqlite3_open_v2().

A little frustrated, I tried other vfs locking implementations, and "unix-dotfile" worked, so I could get the below comparison versus NFS. I daresay it's not as efficient as flock() would have been but for my purposes it works well enough:

@k8-w1:~/dbio$ dst/bin/dctest -u '/iscsi/tmp/test.db?vfs=unix-dotfile' -i 1 -c -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-dotfile
worker  1 value = 10000 after 149.560367 secs

@k8-w3:~/dbio$ dst/bin/dctest -u '/iscsi/tmp/test.db?vfs=unix-dotfile' -i 3 -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-dotfile
worker  3 value = 24595 after 455.827016 secs

@k8-w2:~/dbio$ dst/bin/dctest -u '/iscsi/tmp/test.db?vfs=unix-dotfile' -i 2 -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-dotfile
worker  2 value = 30000 after 531.742586 secs

Points of note:

  1. Both NFS and the clustered FS still maintain the locking semantics, and I still get the correct final result.

  2. For NFS, there is clearly a good spread of the locking/unlocking across the various clients, because the very first worker to return is almost at the end of the count. This seems pretty much ideal, in fact.

  3. Something worked a bit differently (and it could be the dot file approach) for the clustered filesystem, because the first worker got exclusive access until it had finished, and then workers 2 & 3 fought it out between them.

I did look at using the programs that Warren mentioned (kvtest.c, speedtest1.c) but they didn't seem to be what I want to look at. kvtest seemed focussed on blobs (and I don't really have those as part of the usage model) and speedtest1 looked to just be a time-test, not a correctness test. I do, however, intend to run one of the test-harnesses - th3 seems like a good candidate.

If I can figure out why "unix-flock" isn't working for me, or perhaps replace the kernel personality of ocfs2 with a userspace one that supports posix locks, I'll try the above tests again and see if there's any speed increase - it's not exactly a priority right now though.

I am still kind of encouraged that both the methods were giving me the result I'd hoped for, which presumably means the NFS locking is implemented correctly, or at least sufficiently correctly for this test. We'll see how it works on the test harness.

[edit]

So I got the unix-flock VFS to work, basically by patching the changes here into my amalgamation source. Once patched, it compiled without error on the PI's running Ubuntu 20.04LTS. I guess -DSQLITE_ENABLE_LOCKING_STYLE isn't normally required for Linux since most Linux filesystems will offer posix locking.

I also found an ocfs lock test Commandline tool to verify that the ocfs2 filesystem did support flock() but not posix-locks.

The results using flock() rather than dot-file locking were in fact speedier:

flock() iSCSI

@k8-w1:~/dbio$ dst/bin/dctest -u 'sqlite:/iscsi/tmp/test.db?vfs=unix-flock' -i 1 -c -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-flock
worker  1 value = 15671 after 144.488417 secs

@k8-w2:~/dbio$ dst/bin/dctest -u 'sqlite:/iscsi/tmp/test.db?vfs=unix-flock' -i 2 -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-flock
worker  2 value = 29791 after 306.061729 secs

@k8-w3:~/dbio$ dst/bin/dctest -u 'sqlite:/iscsi/tmp/test.db?vfs=unix-flock' -i 3 -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-flock
worker  3 value = 30000 after 307.713789 secs

... although I'm still seeing the same 'first one wins then the others fight it out' behaviour, I am still getting the correct answer at the end.

To Warren's point earlier about common code, I noticed there's an 'mptest' subdirectory in the sqlite source code, which can be used to test out parallel access to a database. I'll be giving that a go next.

01:15 Reply: Network filesystems for sqlite DBs (artifact: b8982a23de user: SpacedCowboy)

So, just to circle back, since I've spent the best part of today getting a clustered-filesystem up and running on my little raspberry Pi kubernetes cluster...

Here's how standard NFS did, running using Btrfs on the Asustor AS6510T, served out to three of the workers on the cluster:

@k8-w1:~/dbio$ dst/bin/dctest -u /archive/tmp/test.db -i 1 -c -n 10000
Connected to resource: /archive/tmp/test.db
worker  1 value = 29472 after 698.149864 secs

@k8-w2:~/dbio$ dst/bin/dctest -u /archive/tmp/test.db -i 2 -n 10000
Connected to resource: /archive/tmp/test.db
worker  2 value = 29759 after 703.140837 secs

@k8-w3:~/dbio$ dst/bin/dctest -u /archive/tmp/test.db -i 3 -n 10000
Connected to resource: /archive/tmp/test.db
worker  3 value = 30000 after 704.651374 secs

I then set up the iscsi target on the AS6510T to be an ocfs2 filesystem - Oracle clustering filesystem v2. I'll spare anyone reading this the tortuous details, but once it was up and running it was rock solid. Configuring the thing isn't exactly straightforward, though.

Running the same program targeting the iscsi mount-point resulted in core-dumps pretty much as soon as 2 machines were accessing the same file, which wasn't a great start. It seems that with the kernel personality, only flock() locking is supported, not posix locks, which is what SQLite uses by default.

So I reconfigured the SQLite amalgamation with -DSQLITE_ENABLE_LOCKING_STYLE, fixed the compile-time errors with missing #includes, and tried again. This time the program ran but told me it couldn't open the database file when I tried passing the "unix-flock" vfs string to sqlite3_open_v2().

A little frustrated, I tried other vfs locking implementations, and "unix-dotfile" worked, so I could get the below comparison versus NFS. I daresay it's not as efficient as flock() would have been but for my purposes it works well enough:

@k8-w1:~/dbio$ dst/bin/dctest -u '/iscsi/tmp/test.db?vfs=unix-dotfile' -i 1 -c -n 10000
Opening with VFS 'unix-dotfile'
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-dotfile
worker  1 value = 10000 after 149.560367 secs

@k8-w3:~/dbio$ dst/bin/dctest -u '/iscsi/tmp/test.db?vfs=unix-dotfile' -i 3 -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-dotfile
worker  3 value = 24595 after 455.827016 secs

@k8-w2:~/dbio$ dst/bin/dctest -u '/iscsi/tmp/test.db?vfs=unix-dotfile' -i 2 -n 10000
Connected to resource: sqlite:/iscsi/tmp/test.db?vfs=unix-dotfile
worker  2 value = 30000 after 531.742586 secs

Points of note:

  1. Both NFS and the clustered FS still maintain the locking semantics, and I still get the correct final result.

  2. For NFS, there is clearly a good spread of the locking/unlocking across the various clients, because the very first worker to return is almost at the end of the count. This seems pretty much ideal, in fact.

  3. Something worked a bit differently (and it could be the dot file approach) for the clustered filesystem, because the first worker got exclusive access until it had finished, and then workers 2 & 3 fought it out between them.

I did look at using the programs that Warren mentioned (kvtest.c, speedtest1.c) but they didn't seem to be what I want to look at. kvtest seemed focussed on blobs (and I don't really have those as part of the usage model) and speedtest1 looked to just be a time-test, not a correctness test. I do, however, intend to run one of the test-harnesses - th3 seems like a good candidate.

If I can figure out why "unix-flock" isn't working for me, or perhaps replace the kernel personality of ocfs2 with a userspace one that supports posix locks, I'll try the above tests again and see if there's any speed increase - it's not exactly a priority right now though.

I am still kind of encouraged that both the methods were giving me the result I'd hoped for, which presumably means the NFS locking is implemented correctly, or at least sufficiently correctly for this test. We'll see how it works on the test harness.

2020-09-16
17:07 Reply: Network filesystems for sqlite DBs (artifact: 6e280b237f user: SpacedCowboy)

That will perform much better on an SSD than your test predicts

Yep. The test wasn't really about the SSD performance though.

I can tell you from experience that you want to batch that up and save many state changes all at once

So there's another wrinkle. The current system runs with a REDIS database that manages all the flags for the app, and as I said, the app will open multiple connections to the service at once, which means it gets different IP addresses on the load-balancer and so different connections to the REDIS database. If the app round-robins the commands it sends, and command N sets state to X, then command N+1 depends on that state, we have a problem if we're batching things.

Performance isn't something I'm overly concerned with, at least compared to reliability and correctness. Batching is a good idea in general, don't get me wrong, but specifically here I think it could be a problem. I don't get to change the way the client application works apart from under really difficult circumstances.

Only if you run it from multiple nodes

Understood, and I intend to rectify that. It will be interesting to see if the results over the network are even vaguely similar.

Informed guess: you aren't testing against NVMe on the server side of the NFS connection

Possibly. The NFS server in question is an Asustor AS6510T, with 10 x 16TB drives, 32 GB of RAM and 2TB of NVME SSDs as cache to the array. The SSD's on the device are specifically to cache the RAID (they don't contribute to the storage volume) but it's entirely possible it is using the hierarchy of RAM <-> SSD <-> HDD, given that it's a special-purpose device. And yes, because I'm paranoid, it does have its own UPS*.

There are several organizations that will happily rent you a Gluster or Ceph cluster for an hour or two

The last time I tried to set up an AWS account for a quick test, it cost me ~$500 or so. I'd not noticed the various data-transfer fees, and when I thought I'd closed everything down, I'd missed something (can't remember what, now) so I paid a month's usage of that. These days I'm very leery of setting up things on time-and-data-based services.

The suggestion has merit, if I can persuade someone else to foot any bills :)

[*] I have a bunch of UPSs that I had before the whole-house battery was installed. I deploy them judiciously.

16:11 Reply: Network filesystems for sqlite DBs (artifact: 79b259c0d9 user: SpacedCowboy)

Why not use one of the ones included with SQLite, so we can all play along?

Two reasons: (1) it didn't occur to me, and (2) I didn't know about them :)

What realistic scenario does a pure overwrite test simulate? DB applications are usually read heavy. Even a data logging application is going to be appending pages to the DB, not continually overwriting the same DB page Further, your application is perversely reimplementing AUTOINCREMENT. What possible correlate to a real-world application does this model?

In this case, the database holds a lot of different types of data. Quite a bit of it is stored once and then only read later and I didn't really see that as the problem. We also have a fair amount of persistent state for a client application, state that is constantly being modified to reflect what the user is doing, and which they expect to be preserved if they move away from the native app and use the same account on the web-app. It's obviously not a simple counter, but the flags are being set and reset constantly, and the counter allows me to tell whether the end-result is what I'd expect it to be.

It seemed like a reasonable analogue of what I thought the database might have problems with doing.

Are you suggesting that SQLite's test regime might be insufficient for ensuring that transactions work as they're supposed to?

Well, no, under normal circumstances, of course not. It did seem that if the database-locking was problematic over a network filesystem, though, that this might be a test that would show that up.

This should have been a huge red flag to you: NFS is 3x faster than a local filesystem!?

It was, in fact. Lacking any other ideas, I'd come to the same conclusion as you did that it was the write-heavy environment that was the cause. It still seems odd though.

You haven't said what filesystem you're running over iSCSI

In this case, it's my home setup (covid and the entire world burning means I'm working at home) but if it's possible to make it work, the real system would be running on the existing scaleio setup. iscsi is the closest thing I can get to it for testing out ideas before going through the whole "submit a proposal in triplicate, convene a committee, ..."

To answer your question, I'm running Btrfs, which isn't a clustered filesystem AFAIK. I'm not actually sure what the "real" system uses, but it pretty much has to be a clustered filesystem for the existing usage.

You say "with caching," which suggests you have write-back caching enabled, as opposed to the safer write-thru mode. That's a highly risky cheat in any application that cares about durable transactions unless you've got battery-backed cache RAM to close the RAID write hole.

... which it does. And the house has 3 Tesla batteries on it as well. I laugh in the face of power outages. Well, actually I don't. I quietly smile on the inside when the neighbor comes around asking why my lights are still on in a power outage, and offer to throw a cable over the fence to power his fridge.

What is your reason for looking into SQLite on network filesystems in the first place?

From the original post:

"I'm trying to figure out if we could use an existing enormous iscsi-type shared filesystem to allow a database-per-person (which would allow easy per-person encryption for privacy) using a bunch of sqlite databases, or go with one humungous database server with everyone's data inside it running with a local disk."

The multi-tenancy huge-database is the obvious solution, but it doesn't lend itself well to searches across the metadata in a per-user encrypted-data scenario. Encrypting columns of data effectively removes the ability to index things, homomorphic encryption notwithstanding, and not encrypting them leaks privacy.

Conversely, having a bunch of SQLite databases on a shared filesystem accessible to a bunch of nodes makes life a lot easier for the encryption scenario (things like SqlCipher mean the database can be encrypted en toto but still use indexes as if it were not. In my tests on low-data volume databases, the overhead is only ~5% or so rising to ~15% on data-heavy tables). Managing a large number of individual databases rather than one large one has its own issues, of course.

So what I'm trying to do here is do the groundwork to see what's even feasible, after which we can move onto the proposal stage.

There's a few things we can do at ingres - one of which would be to divert all traffic for a given user account to a specific node, thus giving us single-node access to any given database - and it sounds like this might be feasible from yours and Keith's response.

The "I want it all" approach would be the multiple writer scenario, where read and write requests (the client app can open multiple simultaneous connections to the service) would come in over different load-balanced IP's, thus giving us multiple-writer requirements. Again, from reading both the responses, I need to re-run those tests from different machines (which did occur to me, but only after posting).

Greybeards on the Internet saying "Don't do that" isn't good enough? :)

As a greybeard myself, albeit in a slightly different sphere, I wish that were true, unfortunately when I'm challenged about why we want to do it this way, it's best to have an answer somewhat more authoritative than "well, some bloke on the internet, who sounded like he knew what he was talking about, said X" :)

I do appreciate the time taken to answer the question, thank you :)

15:15 Reply: Network filesystems for sqlite DBs (artifact: d39ffe3e53 user: SpacedCowboy)

Thanks Keith,

That's the detail level I was after. The filesystem running on the NFS server is btrfs btw.

It occurred to me (after posting, of course) that the NFS tests ought to be running from multiple machines to test it properly, it wasn't, in fact.

I'm perfectly happy to trade speed for scale. Unless it's truly glacial in nature, the speed of the filesystem isn't going to the the limiting factor. Reliability of course is a totally different matter.

2020-09-15
23:13 Post: Network filesystems for sqlite DBs (artifact: 4c34fa0875 user: SpacedCowboy)

Right, so first off, I understand the basic advice is "Don't do it". I'd like to understand the reasons why not to do it, and if any of those reasons could be mitigated in any way.

I'm trying to figure out if we could use an existing enormous iscsi-type shared filesystem to allow a database-per-person (which would allow easy per-person encryption for privacy) using a bunch of sqlite databases, or go with one humungous database server with everyone's data inside it running with a local disk. Both approaches have their ups and downs and to that end I wrote a small test, the guts of which are:

    /**************************************************************************\
    |* Loop for the number of times, reading and incrementing the count
    \**************************************************************************/
	for (int i=0; i<count; i++)
		{
		mgr->beginTransaction();
		int64_t value = mgr->intForQuery("SELECT value FROM counter");
		mgr->update("UPDATE counter set value = ?", value +1);
		mgr->commit();
		}

which would read and write in a loop (yes, I know this could be one statement, I wanted to make sure reads & writes in a transaction wouldn't be compromised). The loop is constantly obtaining and releasing the lock on the database (mgr->beginTransaction() calls "BEGIN EXCLUSIVE TRANSACTION"). The idea is to introduce as much contention as possible to see if the transaction locking logic breaks down under stressful circumstances.

After all the updating, the code prints out the final value, and the idea is to run more than one instance of these programs in parallel, and see if the last result is (instances) * (count) in the database - ie: was there any time when the database transaction wasn't honoured ? Obviously I don't care about anything other than the last value, because the intermediate values are going to vary based on which program got more access at any given time.

The results were kind of interesting - it did better than I expected over a network... Listing them in increasing performance:

Local single NVME SSD:
[@lyonesse DCtest]$ ./RUNME 100000
Connected to resource: /tmp/test.db
Connected to resource: /tmp/test.db

worker  0 value = 199465 after 2089.432570 secs
worker  1 value = 200000 after 2095.206688 secs



NFS, 1G ethernet:
[@lyonesse DCtest]$ ./RUNME 100000
Connected to resource: /archive/tmp/test.db
Connected to resource: /archive/tmp/test.db

worker  1 value = 197756 after 734.510462 secs
worker  0 value = 200000 after 742.014655 secs



iScsi, 1G ethernet:
[@lyonesse DCtest]$ ./RUNME 100000
Connected to resource: /mnt/iscsi/tmp/test.db
Connected to resource: /mnt/iscsi/tmp/test.db

worker  0 value = 175305 after 342.477881 secs
worker  1 value = 200000 after 392.721822 secs



Local 16-disk RAID:
[@lyonesse DCtest]$ ./RUNME 100000
Connected to resource: /raid/tmp/test.db
Connected to resource: /raid/tmp/test.db

worker  1 value = 177115 after 69.417077 secs
worker  0 value = 200000 after 77.233248 secs

Points of interest:

  1. The final value of the counter after the last worker exited was always what you would hope it to be, even with NFS.

  2. iScsi is a lot faster than NFS :)

  3. Local (16-way with cache, Areca) RAID is a lot faster than any network, as expected.

  4. A single local PCIe-based SSD is nowhere near any of the others.

The only one I'm really concerned with here is (1). The speed hit vs local RAID I'm completely comfortable with, it's an adequate trade-off for the situation and the actual deployment would be 10G not 1G ethernet so it's likely to be better than that. The situation doesn't change with multiple programs accessing the same database either, example:

[@lyonesse DCtest]$ ./RUNME 100000 5
Connected to resource: /mnt/iscsi/tmp/test.db
Connected to resource: /mnt/iscsi/tmp/test.db
Connected to resource: /mnt/iscsi/tmp/test.db
Connected to resource: /mnt/iscsi/tmp/test.db
Connected to resource: /mnt/iscsi/tmp/test.db

worker  4 value = 444736 after 921.902477 secs
worker  1 value = 468154 after 968.464125 secs
worker  0 value = 485641 after 1005.294065 secs
worker  3 value = 495642 after 1025.872606 secs
worker  2 value = 500000 after 1034.514943 secs

Questions are:

a) Is the above the sort of test I ought to be running in order to try and figure out what areas could have issues, or is there some better way to show up any problem ?

b) Am I just being lucky that there's no impact on the reliability over the network in the above ? If so, is there a good way to make it more challenging and flush out problems ?

c) Given that SQLite is supposed to support multiple threads/processes accessing the same file on a local filesystem, then if I could create a VFS layer that provided the same semantics but stored to the network, would that be a more reliable route ? (Assuming I'm missing something with the above results :)

d) Or is there something more fundamental that I'm just getting lucky with, and the whole idea is a non-starter. If this is the case, I'd love a pointer to somewhere it's explained, or some insight :)

A last note. I'm coming to this with an open mind - if there's a genuine reason why it can't be done, I'm very happy to know that. I just like to have something real and tangible to point to, to say "that's why"... :)

2020-09-03
23:19 Reply: Pre-release Versions of System.Data.SQLite packages? (artifact: f23e8ba282 user: SpacedCowboy)

Hey Keith,

Curious as to what your opinion would be on using something like iscsi/SAN/ScaleIO etc to make a network-available disk, since (as far as I know) the local disk semantics are maintained, even if the spinning rust is remote...

Is that what you're actually getting at above ?

I'm not sure what SQLite's performance would be like in that sort of circumstance, the page model would probably help a lot, but there'd be latency in fetching those pages before they're part of the cache. In a mainly-read environment (email database, perhaps) it'd probably work pretty well, assuming those "local disk" semantics are in fact maintained.

2020-08-17
17:10 Reply: Difference between MATCH('x') and MATCH('x*') (artifact: f24bc82fee user: SpacedCowboy)

I think the confusion stemmed from the snippet() function, which highlights any occurrence of 'x' in the string, not just the ones that the query matches. So I'm seeing highlighting of the X in 'my xylophone is broken' as well as 'x marks the spot', even when searching for just 'x'

2020-08-16
18:22 Post: Word search in FTS5 (artifact: 7b25de0659 user: SpacedCowboy)

Anyone got any hints for how to do a word-search query in FTS5 ?

Basically I want the results to match the start of words, so if I present 's' as the input, then I only get back rows that have words starting with 's'. If I then present 'st' as the input, I only get back words starting with 'st' etc.

I tried presenting ' s', but of course the tokenizer strips out the ' ' character, so that won't work.

18:19 Post: Difference between MATCH('x') and MATCH('x*') (artifact: 5d788f303f user: SpacedCowboy)

I'm a bit confused over exactly what the difference is between the two queries of an FTS index:

			sql = "SELECT rowid FROM fts WHERE fts MATCH ('x*')";
		
			sql = "SELECT rowid FROM fts WHERE fts MATCH ('x')";

.. where 'x' is a single character. I'm getting very different results,

tsql> select rowid from fts where fts match ('x');
Time taken: 0.024171 secs, rows=6383

tsql> select rowid from fts where fts match ('x*');
Time taken: 0.147986 secs, rows=19496

So clearly it is different, but I'm not really seeing why - from the description in 3.3, it seems that 'x' will match anything that 'x*' does, apart from perhaps words ending in x (therefore not a prefix), but 'x*' is matching more rows than 'x'.

2020-08-11
22:19 Post: FTS5 Query times (artifact: 0e154d2a82 user: SpacedCowboy)

I'm trying to make SQLite index a bunch of emails, and make them searchable by a bunch of criteria (full-text, flags, from header, ...)

To do this I created a 'body' table for the email body, and created an FTS index on 3 criteria - body, subject and from, thusly:


    /***************************************************************************\
    |* We store the actual content in a different database
    |*
    |* Update: No we don't. It's all in the same database, but in a different
    |* table - I think there's a way to do this using a virtual table that
    |* references a foreign database via a C interface, but you can't ATTACH
    |* a database currently, and use a table in that attached DB as the content.
    |* It'd sure be nice to be able to store the content remotely...
    \***************************************************************************/
    "DROP TABLE IF EXISTS body",
    
    "CREATE TABLE body"
    "	("						// implicit rowid
    "	body				BLOB,"		// The body
    "	subject				BLOB,"		// The subject
    "	fromTxt				BLOB"		// name & email
    "	)",

    /***************************************************************************\
    |* Add the FTS index
    \***************************************************************************/
    "DROP TABLE IF EXISTS fts",
	
    "CREATE VIRTUAL TABLE fts "    			 // full-text search
    "USING fts5("
    "	body,"                  	 		 // from body tbl
    "	subject,"					 // from body tbl
    "	fromTxt,"					 // from body tbl
    "	content=body,"					 // table to store in
    "	tokenize = 'porter unicode61'"			 // handle stemming
    "	)", 

    /***************************************************************************\
    |* Experiment with prefix tables for speed vs space
    \***************************************************************************/
    "CREATE VIRTUAL TABLE fts_prefix "			 // prefix index
    "USING fts5("
    "			body,"				 // from body tbl
    "			subject,"			 // from body tbl
    "			fromTxt,"			 // from body tbl
    "			prefix='2 3 4 5'"		 // Prefix lengths
    "			)",

So, after ingesting my mailboxes, there's about 65k emails in there...

@tanelorn mdir % tsql mail.db
tsql> select count(*) from email;
+----------+
| count(*) |
+----------+
| 65992    |
+----------+
Time taken: 0.012128 secs

... and to make the search query as generic as possible, I opted to do a series of 'select id/rowid using criteria-1 INTERSECT select id/rowid using criteria-2 INTERSECT ...' so I could switch in and out any of the criteria that the user has selected. I figured that I could get an INTERSECTion of all the various criteria to get a bunch (say 25) of row-ids, then go off and get the data using just those row-ids.

This works well on standard tables, where I have an 'id' column defined as INTEGER PRIMARY KEY (Rows omitted for brevity - there's no information content here)

tsql> SELECT id, subject, createDate, size, attachments FROM email WHERE id IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25);
Time taken: 0.000566 secs

but when I try it on the FTS table, I'm getting a far longer response even though it eventually returns exactly the same rows - this is what led me to try the fts_prefix table creation.

tsql>  SELECT rowId FROM fts('th*') WHERE rowid IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25);
Time taken: 3.644571 secs

However, if I do:

tsql> SELECT rowId, SNIPPET(fts, 0, '<b><u>', '</u></b>', '...', 48) FROM fts('th*') limit 25;
Time taken: 0.134950 secs

... which leads me to believe it's more of a "we're not using an index" problem. I've just re-read the SQLite documentation about rowid, and it does seem to say that any table without a PRIMARY KEY defined automatically gets a 'rowid' column applied to it, which is keyed.

I can work around it - the 'body' table has the content, and it's indexed with a PRIMARY KEY, so I can implement the snippet() function myself and just pull the data from there, which is much faster:

tsql>  SELECT rowId,body FROM body WHERE rowid IN (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25);
Time taken: 0.000536 secs

None of the examples I've seen ever apply an INTEGER PRIMARY KEY to the fts5 table either. Is the way I'm going about it just an anti-pattern, or have I missed something ? Because currently it can do the main query with a half-dozen criteria (including one to the FTS table) in the blink of an eye, and then take a subjective eternity to return snippets from the actual 25 rows from the FTS table that I'm interested in :)

Suggestions very much gratefully received :)

2020-08-09
00:16 Reply: FTS index size (artifact: 3baccecae5 user: SpacedCowboy)

So to circle back to this (yes, it's taken this long to properly rewrite it ... :) after reading your reply I went back and looked at my email-ingest code. Turns out I'd goofed there, and we were ingesting (considering it to be text) a whole bunch of binary data within the email corpus, which was causing the database size to grow significantly.

After rewriting the ingest-to-the-database code, I got an altogether more reasonable

 @atlantis mdir % ls -l
 total 558568
 -rwxr--r--@   1 cowboy  staff          533 Aug  8 16:38 INGEST.wrap
 -rwxr-xr-x    1 cowboy  staff      1390208 Aug  8 16:39 ingest
 drwxr-xr-x  236 cowboy  staff         7552 Aug  6 10:21 src
 -rw-r--r--    1 cowboy  staff    283463680 Aug  8 16:45 total_complete.db

Only a saving of roughly 39x :) vs the one in the original post...

  -rw-r--r--   1 cowboy  staff  11120689152 Aug  5 16:45 total_complete.db

So this post continues a long and storied history of PEBCAK whenever I've had an issue with SQLite. Keep it up guys, it's brilliant :)

2020-08-06
05:13 Post: FTS index size (artifact: ccf1ea60a3 user: SpacedCowboy)

Hi :)

I've been playing with making an FTS5 index on an email corpus (about 80k emails), and I was kind of surprised about the size of the generated index. Here's some files:

 @tanelorn dbtests % ls -l total*.db
 -rw-r--r--  1 cowboy  staff  11120689152 Aug  5 16:45 total_complete.db
 -rw-r--r--  1 cowboy  staff   4610375680 Aug  5 17:44 total_content.db
 -rw-r--r--  1 cowboy  staff   1976729600 Aug  5 17:44 total_fts.db
 -rw-r--r--  1 cowboy  staff     27185152 Aug  5 17:45 total_without_fts.db

total_without_fts.db is the starting database. It's just the metadata without an FTS index on the body, and doesn't contain the actual body content. It's a good representation of what we have at the moment.

total_fts.db is after creating the FTS index with content set to "", detail=none and tokenize='porter unicode61'. The content (the mail body) is stored in the separate database 'total_content.db'

total_complete.db is the default 'detail' setting and the 'body' of the email stored in the same database (it's basically everything all together).

In our setup, the email content is typically stored on a different server to the database (and URLs are generated for clients to fetch stuff directly), so usually we'd only have the equivalent of 'total_without_fts.db' stored. I'm kind of surprised that the size balloons to ~70x the original when adding the FTS index (or almost half the size of the content). Am I being naive or does that mean I'm doing something wrong.

Is there a best-practice for getting FTS functionality (really, I only want the inverted index features) and keeping the size down ?