SQLite Forum

Disk I/O error on cifs volume
Login

Disk I/O error on cifs volume

(1) By anonymous on 2020-04-10 00:59:23 [link] [source]

Hi, We have a sqlite database that is uploaded to a cifs volume and only accessed in read-only mode. The application (dotnet core) is running on a docker container on a Linux managed machine in Azure (Ubuntu 4.4.0).

This was working fine, but recently when querying the sqlite database a disk I/O error occurs sporadically, even if the database is opened in read-only mode (and no concurrency).

Output from a very simply sqlite3 session with a tiny dummy database (a table with 3 rows):

sqlite3 -readonly test.db
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
sqlite> select * from orders;
Error: disk I/O error
sqlite> select * from orders;
O1|A|100
O2|B|200
O3|C|300
sqlite> select * from orders;
O1|A|100
O2|B|200
O3|C|300
sqlite> select * from orders;
Error: disk I/O error
sqlite> select * from orders;
O1|A|100
O2|B|200
O3|C|300
sqlite> select * from orders;
Error: disk I/O error
sqlite>

I know the error above could occur if write operations were involved on a cifs volume, as sqlite uses 'byte-range locking' that's not entirely supported on that type of mounted file system, but was expecting that opening a database in read-only mode and querying a table would work without issues (as that was the until recently).

Any idea what could be causing the error or if there are any workarounds? Thank you

(2) By Warren Young (wyoung) on 2020-04-10 08:55:19 in reply to 1 [link] [source]

What happens when you do the same test on the CIFS host machine?

How far apart are the .NET Core app and CIFS server? It's not clear from your question whether they're both in Azure, or if they're in different data centers.

If the two machines aren't side-by-side in the same data center rack, then I'd try to find a way to disable UDP, if that's possible with CIFS. I tried some web searches, but didn't come up with a method to get TCP-only CIFS.

(3) By anonymous on 2020-04-10 13:57:56 in reply to 2 [link] [source]

@wyoung I don't have access to the actual CIFS host machine, just indirectly through the mounted drive. This is done automatically by Azure and it's mounted on /home (any data outside /home is not persisted). The actual app service host and CIFS host are both on the same data center for sure.

For reference, the app service code is located on /home/site/wwwroot and the sqlite database on /home/site/wwwroot/data).

This is how the CIFS volume was mounted (anonymized the output):

mount -l -t cifs
//10.0.AAA.BB/volume-CC-default/aaaaaaaaaaaaaaaaaaaa/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb on /home type cifs (rw,relatime
,vers=3.0,sec=ntlmssp,cache=strict,username=dummyadmin,domain=RDcccccccccccc,uid=0,noforceuid,gid=0,noforcegid,addr=10.0.AAA.BB,file_mode=0777,dir_mode=0777,nounix,serverino,mapposix,mfsymlinks,noperm,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1)

(4) By Keith Medcalf (kmedcalf) on 2020-04-10 19:22:23 in reply to 2 [link] [source]

CIFS is the Microsoft Protocol called MSNet. Encapsulated in SMB. Over TCP. UDP is only used for name resolution. At the time Microsoft coined the term "Common Internet File System" it was neither common nor working very well over the Internet. The moniker was prayerful rather than descriptive.

(5) By anonymous on 2020-04-10 20:47:29 in reply to 1 [link] [source]

I'm adding some extra information for the issue I'm facing.

I managed to compile a version of sqlite3 using the flag SQLITE_ENABLE_IOTRACE which adds the command .iotrace (Enable I/O diagnostic logging to FILE).

Showing below another session replicating the issue and the iotrace log.

# sqlite3

SQLite version 3.31.1 2020-01-27 19:55:54 Enter ".help" for usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> .iotrace iotrace.log sqlite> .open --readonly test.db sqlite> select * from orders; Error: disk I/O error sqlite> select * from orders; Error: disk I/O error sqlite> select * from orders; Error: disk I/O error sqlite> select * from orders; Error: disk I/O error sqlite> select * from orders; Error: disk I/O error sqlite> select * from orders; Error: disk I/O error sqlite> select * from orders; Error: disk I/O error sqlite> select * from orders; Error: disk I/O error sqlite> select * from orders; Error: disk I/O error sqlite> select * from orders; O1|A|100 O2|B|200 O3|C|300 sqlite> select * from orders; O1|A|100 O2|B|200 O3|C|300 sqlite> select * from orders; O1|A|100 O2|B|200 O3|C|300 sqlite> select * from orders; O1|A|100 O2|B|200 O3|C|300 sqlite> select * from orders; Error: disk I/O error sqlite> .quit

# cat iotrace.log OPEN 55FE2505E378 /home/site/wwwroot/testdb/test.db DBHDR 55FE2505E378 0 100 UNLOCK 55FE2505E378 0 UNLOCK 55FE2505E378 0 UNLOCK 55FE2505E378 0 UNLOCK 55FE2505E378 0 UNLOCK 55FE2505E378 0 UNLOCK 55FE2505E378 0 UNLOCK 55FE2505E378 0 UNLOCK 55FE2505E378 0 UNLOCK 55FE2505E378 0 LOCK 55FE2505E378 1 PGIN 55FE2505E378 1 UNLOCK 55FE2505E378 0 LOCK 55FE2505E378 1 CKVERS 55FE2505E378 16 PGIN 55FE2505E378 2 UNLOCK 55FE2505E378 0 LOCK 55FE2505E378 1 CKVERS 55FE2505E378 16 UNLOCK 55FE2505E378 0 LOCK 55FE2505E378 1 CKVERS 55FE2505E378 16 UNLOCK 55FE2505E378 0 LOCK 55FE2505E378 1 CKVERS 55FE2505E378 16 UNLOCK 55FE2505E378 0 UNLOCK 55FE2505E378 0 UNLOCK 55FE2505E378 0 CLOSE 55FE2505E378

The iotrace file shows that a lock operation (SHARED mode) is attempted when doing the Select statement, and also it checks the internal version to see if the database file had changes (CKVERS).

I know that the file won't be changed. Is it possible somehow (pragma?) to prevent sqlite from generating those locks?

(6) By Keith Medcalf (kmedcalf) on 2020-04-10 21:24:48 in reply to 1 [link] [source]

So what changed between when it worked and when it didn't?

(7) By anonymous on 2020-04-10 23:30:27 in reply to 6 [link] [source]

@kmedcalf Nothing changed as far as I'm concerned. We developed the app in November and it was working perfectly fine (tested with different databases).

This strange behaviour appeared recently (or at least it started with those error much more frequently). If the sqlite database is copied to the local file system in Azure (i.e. to folder /tmp/data) then everything works fine. Somehow, the problem only occurs when the database is on a cif volume (don't know if something has changed on the underlying Azure infrastructure).

I compiled a new version of sqlite3 that traces debug information (with flags SQLITE_FORCE_OS_TRACE and SQLITE_DEBUG_OS_TRACE).

With those flags it's evident that the behaviour of the LOCK is not consistent, sometimes it fails and sometimes it succeeds.

SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open --readonly test.db
OPENX   3   /home/site/wwwroot/testdb/test.db 0400000
OPEN    3   /home/site/wwwroot/testdb/test.db
READ    3     100       0 0
sqlite> select * from orders;
LOCK    3 SHARED was NONE(NONE,0) pid=9931 (unix)
LOCK    3 SHARED failed (unix)
UNLOCK  3 0 was 0(0,0) pid=9931 (unix)
Error: disk I/O error
sqlite> select * from orders;
LOCK    3 SHARED was NONE(NONE,0) pid=9931 (unix)
LOCK    3 SHARED ok (unix)
READ    3    4096       0 0
UNLOCK  3 0 was 1(1,1) pid=9931 (unix)
LOCK    3 SHARED was NONE(NONE,0) pid=9931 (unix)
LOCK    3 SHARED ok (unix)
READ    3      16      24 0
READ    3    4096    4096 0
O1|A|100
O2|B|200
O3|C|300
UNLOCK  3 0 was 1(1,1) pid=9931 (unix)
sqlite>
sqlite> .quit
UNLOCK  3 0 was 0(0,0) pid=9931 (unix)
UNLOCK  3 0 was 0(0,0) pid=9931 (unix)
CLOSE   -1

(8) By Larry Brasfield (LarryBrasfield) on 2020-04-11 02:48:14 in reply to 7 [link] [source]

If you had followed this forum for long, you would wonder why the question, "Why does my use of SQLite over a networked filesystem fail in manner (*)?", is not in the FAQ. If it was there, it would briefly explain that locks and concurrent access protections act inconsistently over the network with many/most implementations, and might link to an article worth perusal by anybody still considering use of SQLite over a network filesystem, titled How To Corrupt An SQLite Database File, particularly section 2.1. "Filesystems with broken or missing lock implementations".

I recommend that reading to you. I think the question posed here, "Why does this use fail now?" could be better recast as "Why did it seem to work for so long?" After that reading, I suspect you will agree.

If you really want to use SQLite over the network, you can find published programs (with source) which run locally on the same machine with some SQLite database(s) residing on reliable attached storage hardware and offer networked-accessible access. The SQLite designers have included your use case in Appropriate Uses For SQLite. (See "Server-side database".)

IMO, while it is interesting to watch yet another attempt to chase down a gremlin plaguing a SQLite-over-networked-filesystem application, it is not going to result in a reliable solution unless whatever transient circumstances led to transient success before can be made less transient. (It's like watching somebody after they say "Watch me build this big pyramid of eggs!)

(9) By anonymous on 2020-04-11 05:36:40 in reply to 8 [source]

@LarryBrasfield When we developed the application we didn't notice that the application and database were stored on a CIFS filesystem. Unfortunately, our application does not write to the database, otherwise the issue would have been identified much earlier (the database is uploaded by an external system and only queries are executed).

I read the articles you mention and completely agree there are chances of corrupting database IF any attempt is made to write to that SQLite database, as the locking implementation on a network filesystem is unreliable.

But again, the scenario we have is a bit different in that the database is created on an external system, and then uploaded to this server. The application opens the database in read-only mode, and just SQL queries are executed. There's no chance that the database is written or corrupted.

I'm aware now (confirmed by the debug logs) that SQLite is calling the lock functions even when opening the database in read-only mode, but was hoping we could use some PRAGMA to completely turn-off the locking, or to ignore any locking errors, before considering other options.

(10) By Keith Medcalf (kmedcalf) on 2020-04-11 07:47:05 in reply to 9 [link] [source]

Open the database using a URI with the parameter nolock=1 or specify the VFS for your OS that does not use locking (ie, unix-none or win32-none or win32-longpath-none).

Basically all the nolock=1 URI parameter does is set the VFS to use to be the -none variant of the default VFS.

https://sqlite.org/vfs.html

(11) By anonymous on 2020-04-11 15:02:32 in reply to 10 [link] [source]

@kmedcalf Thank you Keith! It worked!

Here's the output of sqlite3 compiled with debug logs, no LOCK is shown now:

# sqlite3
SQLite version 3.31.1 2020-01-27 19:55:54
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open --readonly "file:/home/site/wwwroot/testdb/test.db?nolock=1"
OPENX   3   /home/site/wwwroot/testdb/test.db 0400000
OPEN    3   /home/site/wwwroot/testdb/test.db
READ    3     100       0 0
sqlite> select * from orders;
READ    3    4096       0 0
READ    3      16      24 0
READ    3    4096    4096 0
O1|A|100
O2|B|200
O3|C|300
sqlite> .quit
UNLOCK  3 0 was 0(0,0) pid=35524 (unix)
CLOSE   -1

(12) By Larry Brasfield (LarryBrasfield) on 2020-04-11 16:39:04 in reply to 9 [link] [source]

If, truly,

There's no chance that the database is written or corrupted

, then you could use the immutable=1 parameter in the URI to be opened.

Keith's suggestion (nolock=1) is somewhat safer if you're mistaken and the database is overwritten or modified while your application runs.