SQLite Forum

Help me understand the path syntax used by json_extract()
Login

Help me understand the path syntax used by json_extract()

(1) By Simon Willison (simonw) on 2022-01-18 21:43:41 [link] [source]

The path syntax for the json_extract() function is currently documented using examples on https://www.sqlite.org/json1.html#jex

json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$') → '{"a":2,"c":[4,5,{"f":7}]}'
json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.c') → '[4,5,{"f":7}]'
json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.c[2]') → '{"f":7}'
json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$.c[2].f') → 7
json_extract('{"a":2,"c":[4,5],"f":7}','$.c','$.a') → '[[4,5],2]'
json_extract('{"a":2,"c":[4,5],"f":7}','$.c[#-1]') → 5

I can follow these up to a point, but the $.c[#-1] one isn't clear to me.

Is there documentation for this syntax somewhere? Alternatively, if people can help me understand it (and maybe point me to some C code) I'd be happy to help contribute improvements to the existing documentation.

(2) By Richard Hipp (drh) on 2022-01-18 21:52:17 in reply to 1 [link] [source]

The "[#-1]" syntax means the last element of an array. "#" is the number of elements in the array and JSON arrays using 0-based indexing, so the last element will be "[#-1]".

(4.2) By Simon Willison (simonw) on 2022-01-18 22:00:07 edited from 4.1 in reply to 2 [link] [source]

Is that a SQLite-specific extension? I didn't spot that in the MySQL JSON path language.

Found the commit that added that feature in 2019 here: https://sqlite.org/src/info/35ed68a651f - and the tests are https://sqlite.org/src/file?name=test/json105.test

(7) By Richard Hipp (drh) on 2022-01-18 22:50:33 in reply to 4.2 [link] [source]

It is an SQLite extension.

(10) By tom (younique) on 2022-05-16 23:59:02 in reply to 2 [link] [source]

Wouldn't "[-1]" (without the hash symbol) be sufficient? Corresponding documentation: "Use negative indexes to access array elements starting from the right".

(3.1) By Simon Willison (simonw) on 2022-01-18 21:55:30 edited from 3.0 in reply to 1 [link] [source]

I found some documentation of the equivalent syntax used by MySQL: https://dev.mysql.com/doc/refman/8.0/en/json.html#json-path-syntax

And then more documentation on an advanced form of that syntax here (with things like support for range expressions): https://dev.mysql.com/doc/refman/8.0/en/json.html#json-paths

It looks like the C function that implements path lookups in SQLite is static JsonNode *jsonLookup( in https://www3.sqlite.org/src/file?name=ext/misc/json.c

(5.1) By Simon Willison (simonw) on 2022-01-18 22:10:09 edited from 5.0 in reply to 3.1 [link] [source]

From reading the C code it looks like both of these should have the same effect:

$.c
$."c"

Presumably the double quote variant is there to support things like the following:

$."key.with.dots.in.it"
It looks to me like there's no mechanism for accessing keys that themselves contain a double quote - so the following JSON:
{
  "key\"with a double quote": 5
}
Could not have the 5 extracted using json_extract().

(6) By TripeHound on 2022-01-18 22:46:58 in reply to 5.1 [link] [source]

As you found, you can do:

sqlite> select json_extract( '{ "a.b":4 }', '$."a.b"' ) ;
4

to extract a value with a dot in its name. You can also do:

sqlite> select json_extract( '{ "a\"b":4 }', '$.a\"b' ) ;
4

to extract a value with a double-quote in its name. By "playing around" (i.e. I've not tried to read the docs) I've not been able to find a way of combining the two to extract a value with both a dot and a double-quote in its name. I'm also on an oldish version of the SQLite CLI (3.32) so things may have changed.

(8) By Simon Willison (simonw) on 2022-01-18 23:05:04 in reply to 1 [source]

Thanks to tips in this thread I was able to put together some detailed notes on the currently supported syntax here: https://til.simonwillison.net/sqlite/json-extract-path

(9.3) By Simon Willison (simonw) on 2022-01-19 04:51:19 edited from 9.2 in reply to 1 [link] [source]

I had somehow missed the section of the documentation that describes the syntax! It's this bit here: https://sqlite.org/json1.html#path_arguments

I always hop down to the documentation for the individual functions, such as this section for json_extract(): https://sqlite.org/json1.html#the_json_extract_function - I think because those are linked to prominently at the top of the json1.html documentation page.

Two suggestions then: first, it would be great if the documentation for each of the individual JSON functions that accept a path linked back to that definition of the PATH rules.

Secondly, I suggest adding a paragraph that explains how double quotes work - my attempt at that is the middle paragraphs here (otherwise quoting the existing description):

For functions that accept PATH arguments, that PATH must be well-formed or else the function will throw an error. A well-formed PATH is a text value that begins with exactly one '$' character followed by zero or more instances of ".objectlabel" or "[arrayindex]".

The object label can be wrapped in double quotes, which is required if the key that it is selecting contains period or square bracket characters - for example $."key.with.periods".

If a key contains a double quote it can be accessed using a backslash escape, for example $.key with \" double quote.

The arrayindex is usually a non-negative integer N. In that case, the array element selected is the N-th element of the array, starting with zero on the left. The arrayindex can also be of the form "#-N" in which case the element selected is the N-th from the right. The last element of the array is "#-1". Think of the "#" characters as the "number of elements in the array". Then the expression "#-1" evaluates is the integer that corresponds to the last entry in the array.

Feature suggestion for the path implementation itself: It would be neat if you could use escape sequences inside double quotes, so you could use the following:

$."this key.has a period \"and a double quote"

(11.1) By drjdpowell on 2022-05-26 08:19:39 edited from 11.0 in reply to 9.3 [link] [source]

Just a comment, but I am the author of a JSON library for the language I use, and I attacked this problem by allowing my objectlables to be either simple text (no escaping anything) or the actual JSON String that is the label, complete with full use of all characters (\" [] \n etc.).

Example: $."My Name that contains "quotes", []brackets, and dots."

Gives the User full power if they need it, and they can use the JSON functions themselves to build the JSON Paths.