SQLite Forum

SQLite diff tool doesn't work with CRLF new line characters
Login

SQLite diff tool doesn't work with CRLF new line characters

(1) By anonymous on 2021-07-21 08:37:41 [link] [source]

I use the SQLDiff tool to keep a backup of a main SQLite database. The backup DB is stored on an external storage device and is written to periodically. The SQL output of the diff tool is used to apply the changes to keep it up to date.

I have noticed that some text columns that contain new lines do not seem to be diffed correctly. The application that writes to the main database is a Universal Windows Platform (UWP) app and so I think it uses CRLF new lines in its text area controls.

When I run the diff tool on the backup and main databases I will get some SQL output like this even though the two DBs are identical:

~\Downloads\sqlite-tools-win32-x86-3360000\sqlite-tools-win32-x86-3360000\sqldiff.exe '~\Desktop\Test Diff\test_diff - Copy.db' '~\Desktop\Test Diff\test_diff.db'
UPDATE "Test Table" SET "Test Text"='THIS TEXT

CONTAINS SOME NEW LINES' WHERE rowid=3;

I can execute this SQL query against the backup DB, run the diff tool again, and get the exact same output ad nauseum.

This becomes as a problem as we create more and more records in our DB with new lines causing the backup process takes longer and longer each time due to an ever increasing number of records to update.

Could this be an issue with the sqldiff.exe tool's handling of certain new line characters or is it my usage of the tool?

I have a minimum reproducible example DB I could share but I'm not sure where to host the file.

Many thanks in advance.

(2) By Dan Kennedy (dan) on 2021-07-21 16:12:46 in reply to 1 [link] [source]

If you pipe the output of that sqldiff.exe command into a file and then inspect the results with a hex editor or similar, does it look like well-formed utf-8?

Or, if you run:

    SELECT hex(tbl."Test Text") FROM "Test Table" AS tbl WHERE rowid=3

does it look like well-formed utf-8 or utf-16 text?

Dan.

(3) By Larry Brasfield (larrybr) on 2021-07-21 17:07:43 in reply to 1 [source]

[... Showing a diff reported by sqldiff.exe. ]

Could this be an issue with the sqldiff.exe tool's handling of certain new line characters or is it my usage of the tool?

I highly doubt either case is the cause of the diffs you see. The sqldiff utility, (source for which is at tool/sqldiff.c in the project source), finds differences by constructing queries which detect missing, extra, and not identical rows in the two DBs. The tool itself makes no effort to compare individual column-intersect-row values; it relies on the SQLite library to do that. For this reason, I highly doubt the precise truth of:

I will get some SQL output like this even though the two DBs are identical:

I think that, if you do a conversion to BLOB of the differing fields, you will find that they are not identical. I suspect you will want to subject your text fields to a newline normalization process to make simple comparison (such as sqldiff does) work as you expect. Or, you could modify the tool to do further processing on differences, subjecting just different fields to newline normalization for re-comparison before reporting them as different (by generating UPDATE statements.)