Help me understand the path syntax used by json_extract()
(1) By Simon Willison (simonw) on 2022-01-18 21:43:41 [link] [source]
The path syntax for the json_extract()
function is currently documented using examples on https://www.sqlite.org/json1.html#jex
json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$') → '{"a":2,"c":[4,5,{"f":7}]}'
json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.c') → '[4,5,{"f":7}]'
json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.c[2]') → '{"f":7}'
json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.c[2].f') → 7
json_extract('{"a":2,"c":[4,5],"f":7}','$.c','$.a') → '[[4,5],2]'
json_extract('{"a":2,"c":[4,5],"f":7}','$.c[#-1]') → 5
I can follow these up to a point, but the $.c[#-1]
one isn't clear to me.
Is there documentation for this syntax somewhere? Alternatively, if people can help me understand it (and maybe point me to some C code) I'd be happy to help contribute improvements to the existing documentation.
(2) By Richard Hipp (drh) on 2022-01-18 21:52:17 in reply to 1 [link] [source]
The "[#-1]
" syntax means the last element of an array. "#
" is the number
of elements in the array and JSON arrays using 0-based indexing, so the last
element will be "[#-1]
".
(4.2) By Simon Willison (simonw) on 2022-01-18 22:00:07 edited from 4.1 in reply to 2 [link] [source]
Is that a SQLite-specific extension? I didn't spot that in the MySQL JSON path language.
Found the commit that added that feature in 2019 here: https://sqlite.org/src/info/35ed68a651f - and the tests are https://sqlite.org/src/file?name=test/json105.test
(7) By Richard Hipp (drh) on 2022-01-18 22:50:33 in reply to 4.2 [link] [source]
It is an SQLite extension.
(10) By tom (younique) on 2022-05-16 23:59:02 in reply to 2 [link] [source]
Wouldn't "[-1]" (without the hash symbol) be sufficient? Corresponding documentation: "Use negative indexes to access array elements starting from the right".
(3.1) By Simon Willison (simonw) on 2022-01-18 21:55:30 edited from 3.0 in reply to 1 [link] [source]
I found some documentation of the equivalent syntax used by MySQL: https://dev.mysql.com/doc/refman/8.0/en/json.html#json-path-syntax
And then more documentation on an advanced form of that syntax here (with things like support for range expressions): https://dev.mysql.com/doc/refman/8.0/en/json.html#json-paths
It looks like the C function that implements path lookups in SQLite is static JsonNode *jsonLookup(
in https://www3.sqlite.org/src/file?name=ext/misc/json.c
(5.1) By Simon Willison (simonw) on 2022-01-18 22:10:09 edited from 5.0 in reply to 3.1 [link] [source]
From reading the C code it looks like both of these should have the same effect:
$.c
$."c"
Presumably the double quote variant is there to support things like the following:
$."key.with.dots.in.it"
It looks to me like there's no mechanism for accessing keys that themselves contain a double quote - so the following JSON:
{
"key\"with a double quote": 5
}
Could not have the 5 extracted using json_extract()
.
(6) By TripeHound on 2022-01-18 22:46:58 in reply to 5.1 [link] [source]
As you found, you can do:
sqlite> select json_extract( '{ "a.b":4 }', '$."a.b"' ) ;
4
to extract a value with a dot in its name. You can also do:
sqlite> select json_extract( '{ "a\"b":4 }', '$.a\"b' ) ;
4
to extract a value with a double-quote in its name. By "playing around" (i.e. I've not tried to read the docs) I've not been able to find a way of combining the two to extract a value with both a dot and a double-quote in its name. I'm also on an oldish version of the SQLite CLI (3.32) so things may have changed.
(8) By Simon Willison (simonw) on 2022-01-18 23:05:04 in reply to 1 [source]
Thanks to tips in this thread I was able to put together some detailed notes on the currently supported syntax here: https://til.simonwillison.net/sqlite/json-extract-path
(9.3) By Simon Willison (simonw) on 2022-01-19 04:51:19 edited from 9.2 in reply to 1 [link] [source]
I had somehow missed the section of the documentation that describes the syntax! It's this bit here: https://sqlite.org/json1.html#path_arguments
I always hop down to the documentation for the individual functions, such as this section for json_extract()
: https://sqlite.org/json1.html#the_json_extract_function - I think because those are linked to prominently at the top of the json1.html
documentation page.
Two suggestions then: first, it would be great if the documentation for each of the individual JSON functions that accept a path linked back to that definition of the PATH rules.
Secondly, I suggest adding a paragraph that explains how double quotes work - my attempt at that is the middle paragraphs here (otherwise quoting the existing description):
For functions that accept PATH arguments, that PATH must be well-formed or else the function will throw an error. A well-formed PATH is a text value that begins with exactly one '$' character followed by zero or more instances of ".objectlabel" or "[arrayindex]".
The object label can be wrapped in double quotes, which is required if the key that it is selecting contains period or square bracket characters - for example
$."key.with.periods".
If a key contains a double quote it can be accessed using a backslash escape, for example
$.key with \" double quote
.The arrayindex is usually a non-negative integer N. In that case, the array element selected is the N-th element of the array, starting with zero on the left. The arrayindex can also be of the form "#-N" in which case the element selected is the N-th from the right. The last element of the array is "#-1". Think of the "#" characters as the "number of elements in the array". Then the expression "#-1" evaluates is the integer that corresponds to the last entry in the array.
Feature suggestion for the path implementation itself: It would be neat if you could use escape sequences inside double quotes, so you could use the following:
$."this key.has a period \"and a double quote"