Duplicated rows (and missing rows) when SELECTing with ORDER BY

(1) By Marius Shekow (marius.shekow) on 2021-06-30 17:17:49 [link] [source]

Hi,

I have a SQLite database (created with Python 3.7's sqlite3 module) with a table called files that has several columns, many of them having a unique constraint. I think I found a terrible SQLite bug, and it has nothing to do with Python: I can reproduce the bug with the sqlite3.exe obtained from the official download page.

Steps to reproduce:

When I export the files table content as CSV file (as described here), using a simple SELECT * FROM files query, I get a CSV file with 18623 rows (+1 header row), where the values of the rows are unique (following the constraints). Everything looks as expected.
When I export the files table as CSV again, this time using SELECT * FROM files ORDER BY path ASC (where path is a TEXT column with a unique constraint), not only does the CSV file suddenly use "pipe" as delimiter (instead of "comma"), but the CSV now only contains 18622 rows (1 row is missing), and actually several entries are complete duplicates. For instance, the 1285'th and 1242'th row in the CSV file have the exact same content (including the primary key ID column).

I'm willing to share the database file, but I won't upload it to the WWW for everyone to see, as it contains sensitive information. Feel free to contact me directly, or let me know other ways how to send you the file (in case you need it at all).

Best regards, Marius

(2) By Domingo (mingodad) on 2021-06-30 17:47:06 in reply to 1 [link] [source]

In cases like your why not scramble the data and see if the problem remain there so you can publish it ?

Example:
====
id, name, age, nationality
1, DUDE, 24, UK
2, CARL, 12, SP
====

Scramble to (for numbers not unique use randon):
====
id, f2, f3, f4
1, f21, 8902, f41 
2, f22, 1092, f42 
====

(3) By David Raymond (dvdraymond) on 2021-06-30 18:09:46 in reply to 1 [link] [source]

What does pragma integrity_check; return?

(4.1) By Harald Hanche-Olsen (hanche) on 2021-06-30 21:13:24 edited from 4.0 in reply to 1 [source]

Assuming the database file is corrupted, as seems likely (see the answer by David Raymond), you may be able to recover it using the .recover command of the sqlite shell:

sqlite3 original.db .recover | sqlite3 new.db

Run .help recover in the interactive shell to see some possibly useful options.

But after any database corruption, you should view the recovered data with some skepticism.

(5) By Marius Shekow (marius.shekow) on 2021-07-01 05:59:56 in reply to 4.1 [link] [source]

Thank you David and Harald. PRAGMA integrity_check; revealed that some rows were simply missing in the respective indices, the output being something like:

row 10828 missing from index path_index
row 10828 missing from index sqlite_autoindex_files_2
row 10829 missing from index path_index
row 10829 missing from index sqlite_autoindex_files_2
....
wrong # of entries in index path_index
wrong # of entries in index sqlite_autoindex_files_2

As suggested, the recover command was able to repair the database without data loss (although a REINDEX command should have been sufficient as well, in this particular case). Interestingly, the encoding of special chars got broken in the new.db file, not sure why.