JSON5 with tab characters inside multi-line strings not accepted
(1) By Rusty Hollmann (infrahead) on 2024-01-31 00:06:31 [source]
I think it is common to have JSON5 with multi-line strings with indentation:
{
// these lines are indented with tab chars
example: "start line \
another line \
another line"
}
However, when attempting this sort of input to SELECT json(?) AS output
, throws SQLiteError: malformed JSON
. And I tracked it down to the tab characters inside multi-line strings.
According to the JSON5 Extensions documentation... "Additional white space characters are allowed", and my other tooling also supports this.
Stripping leading tabs from the lines inside a multi-line string (e.g. /(?<=\\\n)\t+/g
) is my workaround.
(2.3) By Stephan Beal (stephan) on 2024-01-31 06:19:02 edited from 2.2 in reply to 1 [link] [source]
Edit: with my apologies, it seems i was focused on the wrong part of your post:
throws SQLiteError: malformed JSON. And I tracked it down to the tab characters inside multi-line strings.
We will look into that sqlite parsing parsing behavior promptly.
What follows is my initial response, which got sidetracked on the other details of your report.
According to the JSON5 Extensions documentation... "Additional white space characters are allowed"
You're quoting that out of context. The page specifically lists what is permitted, vis a vis JSON, for each data type. For strings it says:
Strings
Strings may be single quoted.
Strings may span multiple lines by escaping new line characters.
Strings may include character escapes.
Several blocks down, after the sections on Numbers and Comments, it says:
White Space
Additional white space characters are allowed.
That is not the same as saying arbitrary whitespace in strings should/may be ignored. Nowhere in the spec does it mention that any "extraneous" whitespace should/may be stripped from a string literal. Specifically, the section on strings says nothing to that effect. It very specifically defines LineContinuation
as:
\ LineTerminatorSequence
, not\ LineTerminatorSequence AnyAmountOfWhiteSpace
and my other tooling also supports this.
Then they're in violation of the spec and have a bug-in-waiting. Consider this example:
foo: "if(1){ \
print(1,2,3); \
print(4,5,6); \
}"
Stripping those leading spaces would change the intended representation of that embedded code (which, for argument's sake, let's assume the user entered with conventional code indentation). Though that whitespace is not meaningful in this example, because C is agnostic regarding leading spaces, it is if the string contains python code.
No library-level code (like sqlite or your aforementioned tooling) has any idea what the contents of a string literal contain and therefore cannot make informed decisions about when it "should" or "must not" strip any of its content.
(3.1) By Rusty Hollmann (infrahead) on 2024-01-31 06:18:19 edited from 3.0 in reply to 2.1 [link] [source]
Thanks for the response. I'm definitely NOT expecting the tab characters (or any whitespace or anything else for that matter) to be stripped/modified. I'm saying I had to do that because SQLite was throwing malformed JSON error on acceptable JSON5 input...which I tracked down to the tab characters inside multi-line strings.
At least I thought it should be "acceptable", as believed I was using a spec-compliant parser to check myself on this, and have never had an issue before.
EDIT: didn't see your edited response at first...
And my own apologies for indeed misquoting the documentation, and confusing the matter.
Reading https://spec.json5.org/ and seems that characters can be any unicode sequence that is not one of '
or \
or LineTerminator
, but there is some confusion on if have to be tab escapes \t
or not.
(5) By Stephan Beal (stephan) on 2024-01-31 06:18:07 in reply to 3.0 [link] [source]
I'm saying I had to do that because SQLite was throwing malformed JSON error on acceptable JSON5 input...which I tracked down to the tab characters inside multi-line strings.
My sincere apologies for the confusion - i was focused on the wrong details. We will look into that promptly to ensure that sqlite is spec-conformant here.
And/or I'm unsure how the escaping is working when using param binding with SQL.
Aside from potentially UTF8/16-related encoding, parameter bindings are retained literally, with no escaping. If you bind tab literals, that's what will be retained. If the binding injects a function argument, though, like json(?1)
, then it's up to that function to do any escaping it's supposed to do.
(6.1) By Stephan Beal (stephan) on 2024-01-31 08:46:17 edited from 6.0 in reply to 3.1 [link] [source]
Reading https://spec.json5.org/ and seems that characters can be any unicode sequence that is not one of
'
or\
or LineTerminator, but there is some confusion on if have to be tab escapes\t
or not.
Indeed, it doesn't seem to be explicit, but section 5.1 says:
Any character may be escaped. ...(large snip)... Alternatively, there are two-character sequence escape representations of some popular characters.
The word "alternatively" seems to imply that the characters in that chart may also be provided in raw form. That interpretation is strengthened by the fact that the reference JS implementation permits hard tabs, as seen in this output from a browser dev console:
> JSON5.parse('{foo:"bar<HARD TAB HERE>baz"}')
Object { foo: "bar\tbaz" }
When Richard's time zone wakes up i'll ping him about this. My current understanding is that sqlite needs to be accepting hard tabs in string literals.
Edit: in pre-5 JSON, hard tabs are disallowed according to the chart on json.org, which says that a string may contain...
Any codepoint except
"
or\
or control characters
noting that a hard tab is a control character. Looking at sqlite's json parser, that's where it seems to inherit that limitation from.
(4.1) By Rusty Hollmann (infrahead) on 2024-01-31 06:13:39 edited from 4.0 in reply to 2.2 [link] [source]
Deleted(7) By Richard Hipp (drh) on 2024-01-31 15:31:32 in reply to 1 [link] [source]
Please try again with trunk check-in 380f09c194caff55 or later and report back whether or not this resolves your problem.
(8) By Rusty Hollmann (infrahead) on 2024-02-01 00:39:39 in reply to 7 [link] [source]
Thank you both for addressing this! I'll need to figure out how to integrate a custom build of SQLite with my setup to properly test, but the code change looks good.