SQLite Forum

Help me understand the path syntax used by json_extract()
Login

Help me understand the path syntax used by json_extract()

(1) By Simon Willison (simonw) on 2022-01-18 21:43:41 [link] [source]

The path syntax for the json_extract() function is currently documented using examples on https://www.sqlite.org/json1.html#jex

json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$') → '{"a":2,"c":[4,5,{"f":7}]}'
json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.c') → '[4,5,{"f":7}]'
json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.c[2]') → '{"f":7}'
json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.c[2].f') → 7
json_extract('{"a":2,"c":[4,5],"f":7}','$.c','$.a') → '[[4,5],2]'
json_extract('{"a":2,"c":[4,5],"f":7}','$.c[#-1]') → 5

I can follow these up to a point, but the $.c[#-1] one isn't clear to me.

Is there documentation for this syntax somewhere? Alternatively, if people can help me understand it (and maybe point me to some C code) I'd be happy to help contribute improvements to the existing documentation.

(2) By Richard Hipp (drh) on 2022-01-18 21:52:17 in reply to 1 [link] [source]

The "[#-1]" syntax means the last element of an array. "#" is the number of elements in the array and JSON arrays using 0-based indexing, so the last element will be "[#-1]".

(4.2) By Simon Willison (simonw) on 2022-01-18 22:00:07 edited from 4.1 in reply to 2 [link] [source]

Is that a SQLite-specific extension? I didn't spot that in the MySQL JSON path language.

Found the commit that added that feature in 2019 here: https://sqlite.org/src/info/35ed68a651f - and the tests are https://sqlite.org/src/file?name=test/json105.test

(7) By Richard Hipp (drh) on 2022-01-18 22:50:33 in reply to 4.2 [link] [source]

It is an SQLite extension.

(10) By tom (younique) on 2022-05-16 23:59:02 in reply to 2 [link] [source]

Wouldn't "[-1]" (without the hash symbol) be sufficient? Corresponding documentation: "Use negative indexes to access array elements starting from the right".

(3.1) By Simon Willison (simonw) on 2022-01-18 21:55:30 edited from 3.0 in reply to 1 [link] [source]

I found some documentation of the equivalent syntax used by MySQL: https://dev.mysql.com/doc/refman/8.0/en/json.html#json-path-syntax

And then more documentation on an advanced form of that syntax here (with things like support for range expressions): https://dev.mysql.com/doc/refman/8.0/en/json.html#json-paths

It looks like the C function that implements path lookups in SQLite is static JsonNode *jsonLookup( in https://www3.sqlite.org/src/file?name=ext/misc/json.c

(5.1) By Simon Willison (simonw) on 2022-01-18 22:10:09 edited from 5.0 in reply to 3.1 [source]

From reading the C code it looks like both of these should have the same effect:

$.c
$."c"

Presumably the double quote variant is there to support things like the following:

$."key.with.dots.in.it"
It looks to me like there's no mechanism for accessing keys that themselves contain a double quote - so the following JSON:
{
  "key\"with a double quote": 5
}
Could not have the 5 extracted using json_extract().

(6) By TripeHound on 2022-01-18 22:46:58 in reply to 5.1 [link] [source]

As you found, you can do:

sqlite> select json_extract( '{ "a.b":4 }', '$."a.b"' ) ;
4

to extract a value with a dot in its name. You can also do:

sqlite> select json_extract( '{ "a\"b":4 }', '$.a\"b' ) ;
4

to extract a value with a double-quote in its name. By "playing around" (i.e. I've not tried to read the docs) I've not been able to find a way of combining the two to extract a value with both a dot and a double-quote in its name. I'm also on an oldish version of the SQLite CLI (3.32) so things may have changed.

(8) By Simon Willison (simonw) on 2022-01-18 23:05:04 in reply to 1 [link] [source]

Thanks to tips in this thread I was able to put together some detailed notes on the currently supported syntax here: https://til.simonwillison.net/sqlite/json-extract-path

(9.3) By Simon Willison (simonw) on 2022-01-19 04:51:19 edited from 9.2 in reply to 1 [link] [source]

I had somehow missed the section of the documentation that describes the syntax! It's this bit here: https://sqlite.org/json1.html#path_arguments

I always hop down to the documentation for the individual functions, such as this section for json_extract(): https://sqlite.org/json1.html#the_json_extract_function - I think because those are linked to prominently at the top of the json1.html documentation page.

Two suggestions then: first, it would be great if the documentation for each of the individual JSON functions that accept a path linked back to that definition of the PATH rules.

Secondly, I suggest adding a paragraph that explains how double quotes work - my attempt at that is the middle paragraphs here (otherwise quoting the existing description):

For functions that accept PATH arguments, that PATH must be well-formed or else the function will throw an error. A well-formed PATH is a text value that begins with exactly one '$' character followed by zero or more instances of ".objectlabel" or "[arrayindex]".

The object label can be wrapped in double quotes, which is required if the key that it is selecting contains period or square bracket characters - for example $."key.with.periods".

If a key contains a double quote it can be accessed using a backslash escape, for example $.key with \" double quote.

The arrayindex is usually a non-negative integer N. In that case, the array element selected is the N-th element of the array, starting with zero on the left. The arrayindex can also be of the form "#-N" in which case the element selected is the N-th from the right. The last element of the array is "#-1". Think of the "#" characters as the "number of elements in the array". Then the expression "#-1" evaluates is the integer that corresponds to the last entry in the array.

Feature suggestion for the path implementation itself: It would be neat if you could use escape sequences inside double quotes, so you could use the following:

$."this key.has a period \"and a double quote"