SQLite diff tool doesn't work with CRLF new line characters
(1) By anonymous on 2021-07-21 08:37:41 [link]
I use the SQLDiff tool to keep a backup of a main SQLite database. The backup DB is stored on an external storage device and is written to periodically. The SQL output of the diff tool is used to apply the changes to keep it up to date. I have noticed that some text columns that contain new lines do not seem to be diffed correctly. The application that writes to the main database is a Universal Windows Platform (UWP) app and so I think it uses CRLF new lines in its text area controls. When I run the diff tool on the backup and main databases I will get some SQL output like this even though the two DBs are identical: ``` ~\Downloads\sqlite-tools-win32-x86-3360000\sqlite-tools-win32-x86-3360000\sqldiff.exe '~\Desktop\Test Diff\test_diff - Copy.db' '~\Desktop\Test Diff\test_diff.db' UPDATE "Test Table" SET "Test Text"='THIS TEXT CONTAINS SOME NEW LINES' WHERE rowid=3; ``` I can execute this SQL query against the backup DB, run the diff tool again, and get the exact same output ad nauseum. This becomes as a problem as we create more and more records in our DB with new lines causing the backup process takes longer and longer each time due to an ever increasing number of records to update. Could this be an issue with the `sqldiff.exe` tool's handling of certain new line characters or is it my usage of the tool? I have a minimum reproducible example DB I could share but I'm not sure where to host the file. Many thanks in advance.
(2) By Dan Kennedy (dan) on 2021-07-21 16:12:46 in reply to 1 [link]
If you pipe the output of that sqldiff.exe command into a file and then inspect the results with a hex editor or similar, does it look like well-formed utf-8? Or, if you run: SELECT hex(tbl."Test Text") FROM "Test Table" AS tbl WHERE rowid=3 does it look like well-formed utf-8 or utf-16 text? Dan.
(3) By Larry Brasfield (larrybr) on 2021-07-21 17:07:43 in reply to 1 [link]
\[... Showing a diff reported by sqldiff.exe. \] > Could this be an issue with the sqldiff.exe tool's handling of certain new line characters or is it my usage of the tool? I highly doubt either case is the cause of the diffs you see. The sqldiff utility, (source for which is at tool/sqldiff.c in the project source), finds differences by constructing queries which detect missing, extra, and not identical rows in the two DBs. The tool itself makes no effort to compare individual column-intersect-row values; it relies on the SQLite library to do that. For this reason, I highly doubt the precise truth of: > I will get some SQL output like this even though the two DBs are identical: I think that, if you do a conversion to BLOB of the differing fields, you will find that they are **not** identical. I suspect you will want to subject your text fields to a newline normalization process to make simple comparison (such as sqldiff does) work as you expect. Or, you could modify the tool to do further processing on differences, subjecting just different fields to newline normalization for re-comparison before reporting them as different (by generating UPDATE statements.)
(4) By Rowan Worth (sqweek) on 2021-07-26 04:16:25 in reply to 3 [link]
Obviously they are not identical, the question is _why_ are they not identical after following the instructions that sqldiff itself gave? It may be that the CRLFs are lost/converted when capturing or running the query -- OP what process are you using to apply the changes specified by sqldiff?
(5) By Larry Brasfield (larrybr) on 2021-07-26 04:48:45 in reply to 4 [link]
> Obviously they are not identical, the question is why are they not identical after following the instructions that sqldiff itself gave? That's a key question with respect to the OP's problem. I would argue that if sqldiff gave the "instructions" that made one DB almost but not quite like the other, then something non-visible in the representation coming out of sqldiff **must** be at work. It's enough to make me wish SQLite had PostgreSQL's C-style string escape sequences. > It may be that the CRLFs are lost/converted when capturing or running the query -- OP what process are you using to apply the changes specified by sqldiff? That question is apropos, but this should be added: OP, how are you capturing the changes specified by sqldiff?
(6.1) By Larry Brasfield (larrybr) on 2021-07-26 20:03:50 edited from 6.0 in reply to 1
(Edited to reflect new sqldiff behavior no longer optional.) The sqldiff utility, in its latest form (on the trunk branch as of https://sqlite.org/src/info/ff74c0cc4cefa527 ), has a new behavior for its normal output (which is SQL to bring named DB's to sameness.) It renders text field values that are to be updated or inserted in a form that should not suffer the ambiguity of non-graphical characters being treated in special ways that has (probably) led to or contributed to the OP's problem. Now, these sqldiff invocations will show differing line-endings or other less visible control characters that make putatively identical databases not quite identical. For example, where line-endings are CRLF in one and LF in another, running<code> sqldiff test-copy.db test.db and sqldiff test.db test-copy.db </code>will visibly show the difference in those control character sequences. BTW, I still recommend newline normalization for the OP's situation. The sqldiff change makes problems such as his easier to see and remedy, but newline normalization would side-step them altogether.