SQLite slow on EFS?
(1) By Swastik (961679) on 2022-10-07 18:40:26 [source]
I've SQLite .db file mounted on EFS. I found that queries were slower on EFS, so to test this I copied same .db file on other folder(on local SSD) which is not EFS and executed same queries on both db files and found that the query that ran on non-EFS mounted file took 1ms and query that I ran on EFS mounted file took 10ms.
This should be fine because EFS is slower than local SSD. But this is acceptable when we run query for first time, for next time if we execute same query I expect SQLite to use cached data to return result and that should be faster on EFS also.
But looks like SQLite is not caching any result despite of having high cache_size, because even in subsequent query execution SQLite is taking same 10ms to execute query on db file mounted on EFS.
Won't SQLite cache data if it reads from file mounted on EFS?. If it caches, then why there is no improvement in query speed in subsequent runs
(2) By anonymous on 2022-10-07 19:32:18 in reply to 1 [link] [source]
Because locking over the network I assume, try opening with an exclusive lock
(3) By Simon Slavin (slavin) on 2022-10-07 22:05:33 in reply to 1 [link] [source]
Are you talking about Amazon Elastic File System or NTFS Encrypted File System ? Sorry for the basic question, but you'd be amazed at the posts we get here.
What API are you using to make SQLite calls ? Is are you using the C API from your C program or using a library from another language ?
Assuming that it's NTFS, can you spin up process manager and tell us whether you can track the extra time to a process ? Is that process one which is part of your program, or one which is part of the file system ?
(4.1) By Swastik (961679) on 2022-10-08 11:30:17 edited from 4.0 in reply to 3 [link] [source]
I'm taking about Amazon EFS.
Also I'm using SQLite with Java with the help of xerial JDBC driver
https://github.com/xerial/sqlite-jdbc
(5) By Simon Slavin (slavin) on 2022-10-08 12:19:33 in reply to 4.1 [link] [source]
Thanks. I'm betting a lot of previous answers thought you were using the Encrypted File System on a local drive.
You won't get a speedup through caching. It's a networking file protocol. Nothing can be cached because the data on the source drive may have changed, and the only way to tell it hasn't been changed is to read from the source drive (across the network) again.
Technically, SQLite is saving time by not reading all the data again. But the time taken to do any network access takes most of the time. The difference between reading 5 bytes and 500 bytes is not big.
(6) By anonymous on 2022-10-08 12:27:24 in reply to 5 [link] [source]
That's why an exclusive lock will help in this case. SQLite cannot safely assume the internal page cache is up to date without constantly checking for this unless it has exclusive access to the file.
(7) By Swastik (961679) on 2022-10-08 12:27:47 in reply to 5 [link] [source]
Sorry if I'm missing any basics, but I didn't understand this part from your answer
Technically, SQLite is saving time by not reading all the data again
If SQLite is not caching how above line will be true?
(8) By Simon Slavin (slavin) on 2022-10-08 12:51:23 in reply to 7 [link] [source]
There's a data version number in the database file header. You can access it yourself using this:
https://www.sqlite.com/pragma.html#pragma_data_version
So SQLite may be reading just that integer across the network to see whether it needs to reread the actual data.
However, most of the time taken for sending data across a network is used no matter how much data is sent (within reason). It's involved in making sure the right computers are talking, and that they both understand what they're talking about. The increase in time taken because they send/receive 500 bytes instead of 5 bytes is small in comparison with the time it takes to create/use/stop the session.
(9.1) By Swastik (961679) on 2022-10-08 16:37:38 edited from 9.0 in reply to 8 [link] [source]
Sorry if I'm wrong, slightly confused here
So can we say that SQLite is caching data but still accessing EFS(through network ) to read data_version number and if data_version is same SQLite is using cache data?
(10) By Keith Medcalf (kmedcalf) on 2022-10-08 16:49:25 in reply to 9.1 [link] [source]
Accessing the network takes 5 seconds, no matter how much data you are reading.
The first query has to read a pooper-scooper full of data -- say 5 Gigabytes. Accessing the network takes 5 seconds. Data transfer takes a few billiseconds. Total elapsed time: 5 seconds.
The second query has to read the "data_version" to see if it can use its cached data or not. Accessing the network takes 5 seconds. Four bytes are transferred (the data_version) in a few billiseconds. THe data version is unchanged, therefore the rest of the query can be serviced from the in-process cache. Total elapsed time: 5 seconds.
I don't know how to make it more clear.
(11) By Swastik (961679) on 2022-10-08 17:57:40 in reply to 10 [link] [source]
Thank you 🙌, now it's clear