SQLite Forum

Duplicated rows (and missing rows) when SELECTing with ORDER BY
Login

Duplicated rows (and missing rows) when SELECTing with ORDER BY

(1) By Marius Shekow (marius.shekow) on 2021-06-30 17:17:49 [link]

Hi,

I have a SQLite database (created with Python 3.7's `sqlite3` module) with a table called `files` that has several columns, many of them having a `unique` constraint. I think I found a terrible SQLite bug, and it has nothing to do with Python: I can reproduce the bug with the `sqlite3.exe` obtained from the [official download page](https://www.sqlite.org/2021/sqlite-tools-win32-x86-3360000.zip).

Steps to reproduce:

- When I export the `files` table content as CSV file (as described [here](https://www.sqlitetutorial.net/sqlite-tutorial/sqlite-export-csv/)), using a simple `SELECT * FROM files` query, I get a CSV file with 18623 rows (+1 header row), where the values of the rows are unique (following the constraints). Everything looks as expected.
- When I export the `files` table as CSV again, this time using `SELECT * FROM files ORDER BY path ASC` (where `path` is a `TEXT` column with a `unique` constraint), not only does the CSV file suddenly use "pipe" as delimiter (instead of "comma"), but the CSV now only contains 18622 rows (1 row is missing), and actually several entries are complete duplicates. For instance, the 1285'th and 1242'th row in the CSV file have the exact same content (including the primary key ID column).

I'm willing to share the database file, but I won't upload it to the WWW for *everyone* to see, as it contains sensitive information. Feel free to contact me directly, or let me know other ways how to send you the file (in case you need it at all).

Best regards,
Marius

(2) By Domingo (mingodad) on 2021-06-30 17:47:06 in reply to 1 [link]

In cases like your why not scramble the data and see if the problem remain there so you can publish it ?

Example:
====
id, name, age, nationality
1, DUDE, 24, UK
2, CARL, 12, SP
====

Scramble to (for numbers not unique use randon):
====
id, f2, f3, f4
1, f21, 8902, f41 
2, f22, 1092, f42 
====

(3) By David Raymond (dvdraymond) on 2021-06-30 18:09:46 in reply to 1 [link]

What does [pragma integrity_check;](https://www.sqlite.org/pragma.html#pragma_integrity_check) return?

(4.1) By Harald Hanche-Olsen (hanche) on 2021-06-30 21:13:24 edited from 4.0 in reply to 1 [link]

Assuming the database file is corrupted, as seems likely (see the answer by David Raymond), you may be able to recover it using the `.recover` command of the sqlite shell:

    sqlite3 original.db .recover | sqlite3 new.db

Run `.help recover` in the interactive shell to see some possibly useful options.

But after any database corruption, you should view the recovered data with some skepticism.

(5) By Marius Shekow (marius.shekow) on 2021-07-01 05:59:56 in reply to 4.1

Thank you David and Harald. `PRAGMA integrity_check;` revealed that some rows were simply missing in the respective indices, the output being something like:
```
row 10828 missing from index path_index
row 10828 missing from index sqlite_autoindex_files_2
row 10829 missing from index path_index
row 10829 missing from index sqlite_autoindex_files_2
....
wrong # of entries in index path_index
wrong # of entries in index sqlite_autoindex_files_2
```

As suggested, the `recover` command was able to repair the database without data loss (although a `REINDEX` command should have been sufficient as well, in this particular case). Interestingly, the encoding of special chars got broken in the `new.db` file, not sure why.