Duplicated rows (and missing rows) when SELECTing with ORDER BY
(1) By Marius Shekow (marius.shekow) on 2021-06-30 17:17:49 [link] [source]
I have a SQLite database (created with Python 3.7's
sqlite3 module) with a table called
files that has several columns, many of them having a
unique constraint. I think I found a terrible SQLite bug, and it has nothing to do with Python: I can reproduce the bug with the
sqlite3.exe obtained from the official download page.
Steps to reproduce:
- When I export the
filestable content as CSV file (as described here), using a simple
SELECT * FROM filesquery, I get a CSV file with 18623 rows (+1 header row), where the values of the rows are unique (following the constraints). Everything looks as expected.
- When I export the
filestable as CSV again, this time using
SELECT * FROM files ORDER BY path ASC(where
TEXTcolumn with a
uniqueconstraint), not only does the CSV file suddenly use "pipe" as delimiter (instead of "comma"), but the CSV now only contains 18622 rows (1 row is missing), and actually several entries are complete duplicates. For instance, the 1285'th and 1242'th row in the CSV file have the exact same content (including the primary key ID column).
I'm willing to share the database file, but I won't upload it to the WWW for everyone to see, as it contains sensitive information. Feel free to contact me directly, or let me know other ways how to send you the file (in case you need it at all).
Best regards, Marius
In cases like your why not scramble the data and see if the problem remain there so you can publish it ? Example: ==== id, name, age, nationality 1, DUDE, 24, UK 2, CARL, 12, SP ==== Scramble to (for numbers not unique use randon): ==== id, f2, f3, f4 1, f21, 8902, f41 2, f22, 1092, f42 ====
(3) By David Raymond (dvdraymond) on 2021-06-30 18:09:46 in reply to 1 [link] [source]
What does pragma integrity_check; return?
(4.1) By Harald Hanche-Olsen (hanche) on 2021-06-30 21:13:24 edited from 4.0 in reply to 1 [link] [source]
Assuming the database file is corrupted, as seems likely (see the answer by David Raymond), you may be able to recover it using the
.recover command of the sqlite shell:
sqlite3 original.db .recover | sqlite3 new.db
.help recover in the interactive shell to see some possibly useful options.
But after any database corruption, you should view the recovered data with some skepticism.
(5) By Marius Shekow (marius.shekow) on 2021-07-01 05:59:56 in reply to 4.1 [source]
Thank you David and Harald.
PRAGMA integrity_check; revealed that some rows were simply missing in the respective indices, the output being something like:
row 10828 missing from index path_index row 10828 missing from index sqlite_autoindex_files_2 row 10829 missing from index path_index row 10829 missing from index sqlite_autoindex_files_2 .... wrong # of entries in index path_index wrong # of entries in index sqlite_autoindex_files_2
As suggested, the
recover command was able to repair the database without data loss (although a
REINDEX command should have been sufficient as well, in this particular case). Interestingly, the encoding of special chars got broken in the
new.db file, not sure why.