JSON functions stopped working with blobs in 3.45
(1.1) By nalgeon on 2024-01-21 04:47:01 edited from 1.0 [source]
Given this file data.json
:
[1, 2, 3]
This query in SQLite shell 3.44 works fine:
select readfile('data.json') -> 1;
-- 2
The same query in SQLite shell 3.45 returns an error:
select readfile('data.json') -> 1;
-- Runtime error near line 1: malformed JSON
(2) By Larry Brasfield (larrybr) on 2024-01-20 22:18:13 in reply to 1.0 [link] [source]
This should work:
select CAST(readfile('data.json') AS TEXT) -> 1;
What you are seeing is a result of the JSON functions treating blob input as if it is encoded in the new JSON binary format introduced with SQLite release 3.45 .
(3) By Richard Hipp (drh) on 2024-01-20 22:19:22 in reply to 1.0 [link] [source]
The readfile() function returns a BLOB. That you could use a BLOB as the left-hand side argument to -> in 3.44.0 is a previously unreported bug in 3.44.0. The documentation said that you could not - the LHS of -> needed to be TEXT.
The quickest work-around that I can think of is:
SELECT CAST(readfile('data.json') AS TEXT) -> 1;
(4) By nalgeon on 2024-01-21 03:39:34 in reply to 3 [link] [source]
The thing is, every JSON function that used to work with readfile()
(not just in 3.44) now stopped working.
For example, json_each()
:
select *
from json_each(readfile('data.json'));
Seems more like breaking backwards compatibility than a previously unreported bug.
(5.1) By nalgeon on 2024-01-21 03:59:33 edited from 5.0 in reply to 4 [link] [source]
The problem is not only with readfile
. JSON functions stopped working with blobs altogether. So if I had stored my JSONs as blobs, I wouldn't be able to work with them now without explicit type casting:
create table data(val blob);
insert into data values
(cast('[1,2,3]' as blob));
select * from json_each(data.val), data;
-- Runtime error near line 6: malformed JSON
It all worked on previous SQLite versions.
(6.1) By nalgeon on 2024-01-21 04:32:47 edited from 6.0 in reply to 5.1 [link] [source]
Previously, the doc stated:
All json1 functions currently throw an error if any of their arguments are BLOBs.
But in fact they never did. Everything worked just fine.
For example, this query worked on every SQLite version prior to 3.45:
select json_extract(
cast('[1,2,3]' as blob),
'$[1]'
)
So I'm pretty sure people have been using blobs with JSON functions. I did.
(14) By Richard Hipp (drh) on 2024-01-21 11:36:44 in reply to 5.1 [link] [source]
This bug in prior versions of SQLite was unknown to the developers. If I had known about it, I would have done something about it. I probably would have tried to preserve the buggy behavior - to maintain "bug compatibility". But I didn't know.
The work around to accepting BLOBs as JSON TEXT seems dubious and likely to lead to other more subtle problems. For example, what if the "data.json" file contained this text: "Gabcd". That would be understood as valid JSONB. Behold:
SELECT CAST('Gabcd' AS BLOB) -> '$';
Yielding:
"abcd"
I can come up with countless similar examples where ordinary text, when cast to a BLOB, appears to be valid JSONB.
I could, perhaps, come up with a new DBCONFIG option, say "SQLITE_DBCONFIG_BLOB_CANBE_JSON", which is disabled by default, but which if enabled causes the JSON routines to interpret BLOB inputs according to this algorithm:
- If it appears to be valid JSONB, interpret it as such.
- If it appears to be valid TEXT JSON, interpret it as such.
- Raise an error.
The proposed new DBCONFIG causes the addition of step 2. With such an option, applications that depended on the old buggy behavior would still function unchanged after setting the new option, since, as far as I know, there is no valid JSON TEXT that when cast to a BLOB will appears to be JSONB.
The idea of enhancing the "readfile()" function to support an optional second argument that causes it to read text also has merit. Or we could just add an new function: "readfileastext()".
(15) By Bo Lindbergh (_blgl_) on 2024-01-21 19:01:20 in reply to 14 [link] [source]
as far as I know, there is no valid JSON TEXT that when cast to a BLOB will appears to be JSONB
I hope you realise that a statement like that will be read as a challenge.
with ambiguous(j) as
(select '[' || char(9) || '721]')
select
json_valid(j, 0x1) as "RFC 8259",
json_valid(cast(j as blob), 0x8) as "JSONB"
from ambiguous;
Horizontal tab is an allowed whitespace character, even if it's unlikely to appear in this particular context in real-life JSON text.
(16) By cliff (clifff) on 2024-01-22 04:23:08 in reply to 14 [link] [source]
I wonder if there is the potential to build a type of randomized test that could catch issues like this in the future. Specifically, trigger review if a query returns different results for two different versions of sqlite, and returns an error in neither (or, perhaps it could be made stronger). This might not be possible due to so many queries containing randomness (i.e. no ORDER BY). I just wanted to throw the idea out there as a brainstorming exercise. (and, if there is interest in it, I may be willing to work on an implementation)
(18) By Larry Brasfield (larrybr) on 2024-01-22 04:47:14 in reply to 16 [link] [source]
That's an interesting idea, but doing it would be complicated. There are many dataset/query combinations now which do not return results that can be predicted by documented behavior, even when subsequent result set ordering is imposed.
(17) By Larry Brasfield (larrybr) on 2024-01-22 04:43:53 in reply to 14 [link] [source]
The idea of enhancing the "readfile()" function to support an optional second argument that causes it to read text also has merit. Or we could just add an new function: "readfileastext()".
Such a function could also normalize how newlines are translated (or not) and perhaps pay attention to a BOM if present. It might later evolve to do other encoding translations.
The present readfile() function is really just a simple file-to-blob converter.
(25) By Vadim Goncharov (nuclight) on 2024-03-01 11:16:37 in reply to 14 [link] [source]
As I earlier proposed for possible future CBOR, the BLOB field must have some magic number to distinguish it's really JSONB. And this magic should be really binary...
I think it's not yet too late to fix JSONB format to be this, as there were just one release with this feature as experimental.
(7) By Paul van Helden (paulvh) on 2024-01-21 05:16:05 in reply to 4 [link] [source]
Fix your code with CASTs instead of complaining about it. You didn't follow the documentation. It happens. This type of thing is extremely rare in SQLite so be happy and don't cry.
(8) By Larry Brasfield (larrybr) on 2024-01-21 05:32:48 in reply to 7 [link] [source]
Alternatively, the OP can fix the data and whatever updates it to store JSON as TEXT instead of blobs. This will avoid an occassional runtime penalty associated with blob to text conversion where memory must be reallocated to make room for the zero terminator needed for C-style strings.
(9) By nalgeon on 2024-01-21 06:13:53 in reply to 8 [link] [source]
I believe that if a certain feature worked for N years, and then stopped working — that is a breaking change.
At the very least, it deserves to be mentioned in the release notes.
Also, maybe the SQLite team should not encourage ad hominem on their forum.
(10) By Spindrift (spindrift) on 2024-01-21 07:08:20 in reply to 9 [link] [source]
This is an interesting issue, and similar unrecognised and unintended erroneous behaviour has required specific workaround handling in the past too, of course.
I'm sure you realise that this behaviour you have "relied on" is explicitly counter to the documented behaviour (even if it was the way the underlying library functioned).
The argument is that behaviour is defined by the documentation and not the specific implementation that we see. And this erroneous behaviour breaks a very specific intended behaviour now that json blobs exist.
Previously such issues have been solved in both possible ways, and this is a decision that can only be made by the lead engineer in this case. I look forward to seeing which path is taken, though I have strong suspicions I already know.
I think you've raised an interesting and important issue, and I'm uncomfortable about the style of one of the responses you received above. However, I think you are now in "wait and see" territory.
Regards, S
(11) By Larry Brasfield (larrybr) on 2024-01-21 07:09:00 in reply to 9 [link] [source]
I believe that if a certain feature worked for N years, and then stopped working — that is a breaking change.
The development team is well aware of this. SQLite users should be aware that when they rely on undocumented behavior, or behavior contrary to the documentation, they bear the risk of such changes. Nevertheless, the developers often elect to preserve undocumented behavior, and sometimes elect to document it so as to make it part of the public API.
Worth noting in relation to this issue is that the JSON binary encoding feature, with its effect upon the JSON functions, was exposed for several weeks prior to the 3.45 release. The pre-release drops and announcements are intended to give attentive users a chance to bring out issues, such as the one instigating this thread, before they become less subject to remedy by altering behavior or the public API. After a release, behavior changes that conflict with the published API are much less likely to be effected.
At the very least, it deserves to be mentioned in the release notes.
That is a sensible suggestion.
Also, maybe the SQLite team should not encourage ad hominem on their forum.
Technically, the post to which I replied contained no ad hominem.1 I would agree that labeling your complaint as "crying" would be rude, insensitive, and out of place in this forum. Abusive posts are often deleted or partially elided in moderation. We moderators try to keep discussions civil here. However, you were simply urged to not cry. While I saw the weak implication that you might do so or were doing so as a bit over the edge of clear civility, I elected to ignore that minor transgression -- in part because it is good advice when applied to the set of people who may encounter similar difficulties.2
That said, we (moderators) much prefer unquestionably civil discussion. Those who depart from it may either never see their uncivil posts or may see them deleted or correctively edited. Further, no appearance of uncivil remarks or phrasing should be considered to be encouragement of such. At most, it represents neglect or very limited tolerance by the moderator(s).
(12) By Spindrift (spindrift) on 2024-01-21 07:50:28 in reply to 11 [link] [source]
I would additionally (to the OPs comments about this situation) point out that it is extremely convenient for readfile() in the cli to be plumbed into the JSON functions, ideally without the need for casting.
I'm not sure what ultimate solution will be chosen, but reading and writing JSON files is obviously quite an easily predictable use case and one does wonder if it could be preserved, specifically within readfile perhaps. And writing.
(13) By anonymous on 2024-01-21 08:39:05 in reply to 12 [link] [source]
Maybe a new function like readtextfile(FILENAME,ENCODING)
?
(20) By anonymous on 2024-01-22 15:12:53 in reply to 12 [link] [source]
+1 for reading in a file ending in .json being treated as json. Just as a handy convenience method. Kind of like NodeJS require("file.json")
.
(26) By Vadim Goncharov (nuclight) on 2024-03-01 11:22:09 in reply to 11 [link] [source]
was exposed for several weeks prior to the 3.45 release. The pre-release drops and announcements are intended to give attentive users a chance to bring out issues, such as the one instigating this thread, before they become less subject to remedy by altering behavior or the public API. After a release, behavior changes that conflict with the published API are much less likely to be effected.
That's not a very good position for a product used all over the world - majority of users can't be attentive to read forum. Most read only changelogs. A policy when a feature first goes to release as "experimental and subject to change" is much better, and successfully tried by many opensource projects (e.g. Perl).
So I suggested to add binary marker for JSONB in a sibling post.
(21) By J. Zebedee (jzebedee) on 2024-01-23 04:58:37 in reply to 7 [link] [source]
This is definitely not constructive.
FWIW, I am happy to see nalgeon post this, as I woke up to the exact same breaking change at 1:00AM when my SQLite ingestion jobs all broke. By the time I tracked it down to this change and fixed it, I forgot all about bleary-eyed forum posts to document the issue.
This is 100% a compatibility break. Pretty much any CLI ingestion tool, e.g., datasette, sqlite-utils, sqlean, and no doubt others, went up in smoke with no mention of the change after 3.45.0 released.
(22) By Nuno Cruces (ncruces) on 2024-01-23 08:36:48 in reply to 21 [link] [source]
The point of pre-releases is that you test with them (and provide valuable feedback) before you upgrade. Or releases, for that matter; you don't need to upgrade production workloads without testing first.
(23) By anonymous on 2024-01-30 11:45:09 in reply to 21 [link] [source]
This is 100% a compatibility break.
..But only for code that did not adhere to documentation saying:
All json1 functions currently throw an error if any of their arguments are BLOBs.
(24) By Richard Hipp (drh) on 2024-01-30 13:31:58 in reply to 23 [link] [source]
FWIW, the documentation will change in the forthcoming 3.45.1 release to describe the legacy behavior (BLOBs are cast into text and interpreted as text JSON) so that it will be supported for future releases.
(19) By Richard Hipp (drh) on 2024-01-22 14:20:03 in reply to 1.1 [link] [source]
After a recent check-in, if you compile using the -DSQLITE_JSON_BLOB_INPUT_BUG_COMPATIBLE option, the JSON functions should once again start accepting blob inputs that are really disguised text JSON strings. Please try this out and report back whether or not the patch fixes your problem.
(27) By Alexandr Burdiyan (burdiyan) on 2024-04-13 20:34:39 in reply to 1.1 [link] [source]
I'm suspecting that I just got bit by this problem in my system.
Although very surprisingly the issues were only present on Apple-Silicon-based Mac computers, but on Intel-based Mac computers everything was working fine. And it was exactly the same code, and exactly the same SQLite amalgamation file.
I'd assume it's related to some other third-party factor, but I didn't investigate it any further, beyond fixing all my queries to use TEXT instead of BLOB for JSON.
As a side note: it's a bit surprising that SQLite chose to break compatibility here, even if it was opposite to the documented behavior (one could argue it's a bug in the documentation, not an oversight in the implementation), while it would keep things like disabled foreign keys by default to preserve compatibility.