Proposed JSON enhancements.

(1) By Richard Hipp (drh) on 2022-01-07 18:15:24 [link] [source]

I'm experimenting with enhancements to SQLite's JSON support. Prototype changes are on the json-enhancements branch. The branch includes a brief documentation sketch.

Tell me your thoughts? What am I don't wrong?

(2) By ddevienne on 2022-01-07 19:25:33 in reply to 1 [link] [source]

Hi Richard,

I had few questions pop in my mind reading the doc:

What does the n prefix mean? What's the mnemonic?
These new -> and ->> operators are JSON only? That's why you moved JSON into the Core?
Do the JSON functions use a subtype? Such that nested JSON functions can know whether the argument is already validated well-formed JSON?

About the new N functions, I don't think they are for me, 'cause I tend to be on the strict side of schemas,
and would have a CHECK constraint with json_valid(), if I expect a column to be JSON. But in the long
tradition of SQLite being flexibly typed, I can see where they'd simplify the SQL sometimes.

Regarding the syntax sugar operators -> and ->>, why not. I'm surprised, they do not seem to fit
the SQLite moto of lite/light, and but if JSON becomes Core, why not again. I've seen these operators in the PG doc,
and they do not aid readability of the SQL IMHO, especially since there are tons such operators in PG.

(3.1) By Richard Hipp (drh) on 2022-01-07 19:49:12 edited from 3.0 in reply to 2 [link] [source]

As far as the core SQLite is concerned, -> and ->> are new undefined operators (kind of like REGEXP) that the parser converts into two-argument SQL functions with the same name as the operator. To complete the magic, the JSON1 extension then defines new extension functions using sqlite3_create_function() named "->" and "->>".

Your application can overload these extension functions. You can define your own SQL functions functions named "->" and "->>" that do whatever you want. You can also write crazy SQL like this:

SELECT "->>"('[1,2,3,4]',2);

Which generates exactly the same AST as the more readable:

SELECT '[1,2,3,4]' ->> 2;

The "n" prefix means "no-errors". Or it means "null-if-error". Whatever works for you. I'm open to better name suggestions.

(8) By AlexJ (CompuRoot) on 2022-01-07 21:35:41 in reply to 3.1 [link] [source]

In my opinion "null-if-error" describes n prefix better. It could be also "null-on-error" to match "On Error" concept in BASIC

(28) By anonymous on 2022-01-10 04:16:30 in reply to 3.1 [link] [source]

What is the thinking for using separate function names (eg. json_extract for e functions vs json_nextract for n functions) as opposed to an extra argument "nullonerror" flag (eg. json_extract(j, nflag)?

Is there a particular space/time trade off, or is the choice a result of simply adding a n function after the e function was written?

So while the utility of e and n functions is pretty clear, the dual name nature is less so (at least to me). I am not suggesting one or the other, but simply curious as the thinking behind it.

(4) By Richard Hipp (drh) on 2022-01-07 19:48:42 in reply to 2 [link] [source]

Yes, the JSON extension uses subtypes. In fact, subtypes were created specifically to support the JSON functions.

(5) By Richard Hipp (drh) on 2022-01-07 20:00:20 in reply to 2 [link] [source]

they do not aid readability of the SQL IMHO

I've never used the operators in MySQL or PG. I've only read about them.

My guess that the new operators would be helpful comes with my experience working on Fossil. The Fossil repository does not use a lot of JSON, but it does use some. (It might have use more if JSON where easier to work with.) Here is a typical Fossil query that involves JSON (taken from here)

     SELECT json_extract(value,'$.base'),
            json_extract(value,'$.hash'),
            json_extract(value,'$.wiki')
       FROM config WHERE name=lower($name)

To me, this seems easier to write and understand as follows:

     SELECT value->'base', value->'hash', value->'wiki'
       FROM config WHERE name=lower($name)

Do you disagree?

(6) By AlexJ (CompuRoot) on 2022-01-07 20:58:32 in reply to 5 [link] [source]

First of all - thank you for this !

To me, this seems easier to write and understand as follows:

To me it looks very nice feature, semantic matched with most advanced open source PostgreSQL database and easy to understand.

I wish also one day PostgreSQL's hstore would landed in SQLite to handle directly key/values and this -> will bring one more compatibility bridge between two.

(7) By Harald Hanche-Olsen (hanche) on 2022-01-07 21:35:23 in reply to 5 [link] [source]

I have used the -> and ->> operators in PG quite a bit, and find them extremely useful. However, the semantics is a bit different in PG: -> returns a json (or jsonb) object, while ->> returns text. Both require a json (or jsonb) object on the left, so errors due to malformed json can't happen. This difference could be confusing to some.

(9.2) By Richard Hipp (drh) on 2022-01-07 22:20:22 edited from 9.1 in reply to 7 [link] [source]

errors due to malformed json can't happen

Here is a counter example that I just ran on PG 9.6 over at sqlfiddle:

SELECT '{not valid json}'::json -> 'a';

I am aware that -> and ->> are different in PG. If you stick to using ->>, then the SQL should be portable, I think. I understand that in PG (and in MySQL too) the -> operator returns JSON and the ->> operator returns TEXT. It's the same in SQLite, really. It is just that in SQLite, JSON is indistinguishable from TEXT and so both operators end up working the same.

I did make SQLite's ->> operator use json_extract(), not json_nextract(), so that it had error semantics that are more similar to PG - specifically it has zero tolerance for ill-formed JSON.

Regarding the difference between -> and ->> in PG, I see how ->> can be useful, but I'm having a hard time coming up with examples for when -> would be useful. Can you enlighten me?

(10) By AlexJ (CompuRoot) on 2022-01-08 00:49:18 in reply to 9.2 [link] [source]

AFAIK, the most important difference between those two arrows in PG (counting the fact that SQLite don't know about JSONB) is that -> returns double quoted string, while ->> returns plain unquoted string, that might be useful when using returned result of query in some languages or when one generating dynamically some scripts.

The first case that come to mind is Microsoft's cmd that treats double quotes differently, in echo statement it will print double quotes, but on assignment cmd eats it as a full string, for example, if one would need to set up variable that containing some special characters, like

SET "var=Bob & Anna"

then assignment must be double quoted

Another example is unix's shell:

#!/bin/sh

a='"a b
c"'

b='a b
c'

eval echo "$a"
echo -------
eval echo "$b"

exit

where variable a will be displayed and eval echo "$b" will choke and throw an error.

(13) By Harald Hanche-Olsen (hanche) on 2022-01-08 08:34:58 in reply to 9.2 [link] [source]

errors due to malformed json can't happen

Here is a counter example that I just ran on PG 9.6 over at sqlfiddle:
SELECT '{not valid json}'::json -> 'a';

But here it is the typecast to json that throws the error, not the -> operator. I.e.,

SELECT '{not valid json}'::json;

throws the same error (on PG 14 – for some reason, your sqlfiddle link doesn't open for me).

But I'm nitpicking. Personally, I have no problem with the somewhat different semantics anyhow. But I thought the issue worth mentioning, in case someone thinks it's a big deal.

(14.1) By Harald Hanche-Olsen (hanche) on 2022-01-08 16:07:58 edited from 14.0 in reply to 13 [link] [source]

Now that I thought about it some more, what about choosing a slightly different pair of operators? By all means, pick -> for one, but what about something like ->- or => or ~> for the other? That way, PG (and MySQL?) users will suffer less mental friction when moving between databases. (I, for one, use both regularly, for wildly different purposes, but still. Oh, and I think the tilde and minus signs are easily distinguished in any programming font; less so on this forum.)

(15) By Richard Hipp (drh) on 2022-01-08 16:44:15 in reply to 14.1 [link] [source]

Couple of points:

PG and MySQL appear to be fundamentally incompatible in their JSON functions. You cannot (as far as I can tell) write SQL using JSON that works on both PG and MySQL.
My objective is to make it so you can write SQL that works on both SQLite and PG or that works on both SQLite and MySQL. But in as much as PG and MySQL are incompatible, it is not possible to write SQL that will work on all three.
The ->> operator is the one that works the same on SQLite as it does on PG and MySQL. You could perhaps make the argument that I should use something other than -> for compatibility. But that argument breaks down when you try to use it for ->>, because ->> is compatible between PG/MySQL and SQLite. By "compatible" I mean that ->> uses in PG or MySQL should also work in SQLite. It is not true the other way around. Because SQLite's ->> is a union of the capabilities of ->> in PG and MySQL, you can write ->> that works in SQLite but not in PG or not in MySQL.
If you do try to make the argument that I should choose a different operator other than -> in SQLite, I will push back as follows:
1. There are no two RDBMSes for which -> works the same way today, so compatibility is not really a factor here, and
2. The SQLite -> operator, though slightly different from PG and MySQL, is close enough to being the same that many use cases would not notice the difference, and
3. The -> operator is familiar and convenient to programmers with prior exposure to C/C++.

(18) By Harald Hanche-Olsen (hanche) on 2022-01-08 21:10:23 in reply to 15 [link] [source]

Okay, good points there. Thanks for taking the time to explain.

My experience with MySQL is so limited, you might call it non-existent.

(31) By ddevienne on 2022-01-10 09:24:32 in reply to 5 [link] [source]

To me, this seems easier to write and understand

Yes, I do agree, once you know about ->, and know the value is JSON,
then it both makes sense, and it's easy to write. So no, I do not disagree.
I just unsure about discoverability and least surprise, for the reader which didn't write the SQL (i.e. me after a little while...)

OTOH, I think this below, although verbose, is rather readable, no?

select dict.key, entt.key, field.key,
       json_extract(field.value, '$.DESC'),
       json_extract(field.value, '$.DATA_TYPE'),
       json_extract(field.value, '$.DEFAULT_VALUE'),
       json_extract(field.value, '$.DISCRIMINANT'),
       json_extract(field.value, '$.DEFAULT_QUERY'),
       json_extract(field.value, '$.FIXED_VALUE'),
       json_extract(field.value, '$.READ_ONLY'),
       (select case
          when json_extract(field.value, '$.DB_NULLABLE') is not null then null
          else ifnull(
            json_extract(field.value, '$.REQUIRED'),
            json_extract(field.value, '$.DB_NOT_NULL')
          )
          end
       ),
       json_extract(field.value, '$.CHOICE_VALUE'),
       json_extract(field.value, '$.UNIQUE'),
...
  from pg,
       json_each(pg.js, '$') dict,
       json_each(dict.value, '$.entities') entt,
       json_each(entt.value, '$.fields') field

When I see the above, I wonder how many times we are reparsing the JSON.
Those JSON queries are expensive, it's a larguish JSON file (roughly 250 entities, 4'000 fields).
So from my POV, I'd prefer a way to traverse the JSON more directly, and a single time,
to extract the tuples that match a given path. For example, in XPATH, I'd select ./entities/fields
and be able to match on the resulting nodes, to extract their value, if this was XML. Can this be done here?

I also have plenty of queries like this (and above)?

select attr.key "name",
       count(distinct entt.key) as "#entity use"
  from pg,
       json_each(pg.js, '$') dict,
       json_each(dict.value, '$.entities') entt,
       json_each(entt.value, '$') attr
 group by attr.key
 order by attr.key

So why only -> for json_extract and not json_each?
Why can't I even chain the -> to implicitly do the joins?
I think an Apple framework did implicit join traversal via a dot or arrow
notation, my ex-boss showed it to me once in an airport.

So could/should -> also work in table-valued contexts and mean json_each?
And json_each support multi-join traversal implicitly?

I guess I'm saying I'd rather have more speed and ease of use around JSON, not so much syntax sugar.

As usual I probably sound too critical, that's not my intention. I'm more thinking aloud here...

(33) By ddevienne on 2022-01-10 11:13:45 in reply to 31 [link] [source]

I'd rather have more speed

Maybe this wasn't clear. The query I showed has 20 leaf json_extract calls.
It has 3 nested json_each joins, for traversal of the JSON hierarchy,
to reach the each individual fields of all entities:

select ...,
       json_extract(field.value, '$.DESC'),
...
  from pg,
       json_each(pg.js, '$') dict,
       json_each(dict.value, '$.entities') entt,
       json_each(entt.value, '$.fields') field

I didn't exactly follow what's going on in the debugger, but here's my understanding of what's going on in this query:

SCAN the pg table.
for each row (I have 4 of them), parse js column, which is in JSON format.
the first json_each(, '$') fully parses that JSON value,
and produces a text value for the top-level objects
(I just realize this one might not be necessary :)) At this point, we've parsed the JSON 1x I believe,
unless there's an optimization for '$' which might be the identity.
the next json_each(, '$.entities') parses dict.value, and produces a dictinct value for all values of the entities array.
So we've reparsed basically the document almost entirely another time (2x).
then the same happens for the fields array inside each entity.
We're reparsing most of the JSON another time (3x).
and finally we extract of all values of interest from the leaf objects;
using json_extract. I suspect each extract is a separate partial parse of that object,
until it finds the key of interest. Assuming 1st key needs only parsing 1/20th of the object,
last (20th) key needs to parse 100% of the object.
So 1/20 + 2/20 + ... + 20/20 = 20*21/2/20 = 10.5 times the JSON value.

So if my count is correct, that query parses 13.5 time each top-level (4) JSON doc.
That's what I mean when I wish for more speed. That only a single parse was necessary.

So unless SQLite added the possibility to return views into input strings (or blobs)
when it can guarantee the lifetime of the value it returns a view from
(like in the case above I suspect, since SQLite only supports nested loops, and the loop-order is fixed in the query above),
as good/fast the SQLite JSON parser is, doing the parsing an order of magnitude too many times is not ideal.

That's why I'd hope for changes in SQLite that would allow JSON to be faster in SQLite.
I'd prefer JSON support to stay in the JSON1 extension, until the above behavior can be avoided.
That's assuming my analysis is correct. I'd be happy to be told I'm not looking at this correctly.

(32) By anonymous on 2022-01-10 10:01:06 in reply to 5 [link] [source]

Or to make it more explanatory for the non experts or newbies:

SELECT jsonval->'base', jsonval->'hash', jsonval->'wiki' FROM config WHERE name=lower($name)

(11) By anonymous on 2022-01-08 01:29:42 in reply to 1 [link] [source]

Thanks for your great work. How about add a new "json" type in the new strict table mode, the parser translate column definition into json check(json_valid(x)=1) ex: create table t(data json) strict; transform to create table t(data json check(json_valid(data)=1)) strict;

(12) By SeverKetor on 2022-01-08 04:27:42 in reply to 1 [link] [source]

Seems interesting to me. I'd use the arrow operators, though the n-functions wouldn't be useful for my (currently existing) code.

On the topic of JSON enhancements, I still think there's a place for a json_contains() function that checks if a JSON array or object has a specified value. This was useful for me. On that note, I ended up making it myself in my extension, so if someone happens to want it, you can use my amateur-ish code (though it does #include json1.c which bloats the file a bit):

static void json_contains_func(
  sqlite3_context *ctx,
  int argc,
  sqlite3_value **argv
){
  /*
  Check JSON for a specified value

  Checks a JSON string for a specific value. Works on both JSON objects and
  JSON arrays. JSON object keys are assumed to be strings. Limitations in
  SQLite mean true and false values cannot be checked for in arrays.

  Args:
    JSON String: The JSON string
    Any Value: The value to check for
    Text Path: An optional path to use when checking for the value

  Returns:
    Integer: 1 or 0 if the value is found or not
  */
  JsonParse *p;
  int b = 0;
  u32 i;
  JsonNode *pNode;
  char* zPath;

  p = jsonParseCached(ctx, argv, ctx);
  if( p==0 ) return;
  assert( p->nNode );
  if( argc==3 ){
    zPath = (char*)sqlite3_value_text(argv[2]);
    pNode = jsonLookup(p, zPath, 0, ctx);
  }else{
    zPath = "$";
    pNode = p->aNode;
  }

  if( pNode==0 ){
    return;
  }

  if( pNode->eType==JSON_ARRAY ){
    assert( (pNode->jnFlags & JNODE_APPEND)==0 );
    JsonNode *iNode;
    int arg_type;

    arg_type = sqlite3_value_type(argv[1]);

    for(i=1; i<=pNode->n; i++){
      iNode = &pNode[i];
      switch (iNode->eType) {
        case JSON_TRUE:
        case JSON_FALSE:
        case JSON_ARRAY:
        case JSON_OBJECT:
          break;
        case JSON_NULL:
          if ( arg_type==SQLITE_NULL ) {
            b = 1;
            goto value_found;
          }
          break;
        case JSON_STRING:
          if (arg_type != SQLITE_TEXT) break;
          char* value;
          value = sqlite3_malloc((iNode->n)-1);
          if ( value==0 ){
            sqlite3_result_error_nomem(ctx);
            return;
          }
          strncpy(value, (char*)(iNode->u.zJContent+1), (iNode->n)-1);
          value[(iNode->n)-2] = '\0';
          if ( !strcmp((char*)sqlite3_value_text(argv[1]), value) ) {
            b = 1;
            sqlite3_free(value);
            goto value_found;
          }
          sqlite3_free(value);
          break;
        case JSON_INT:
          if (arg_type != SQLITE_INTEGER) break;
          sqlite3_int64 i = 0;
          const char *z = iNode->u.zJContent;
          if( z[0]=='-' ){ z++; }
          while( z[0]>='0' && z[0]<='9' ){
            unsigned v = *(z++) - '0';
            if( i>=LARGEST_INT64/10 ){
              if( i>LARGEST_INT64/10 ) goto end;
              if( z[0]>='0' && z[0]<='9' ) goto end;
              if( v==9 ) goto end;
              if( v==8 ){
                if( pNode->u.zJContent[0]=='-' ){
                  goto int_done;
                }else{
                  goto end;
                }
              }
            }
            i = i*10 + v;
          }
          if( iNode->u.zJContent[0]=='-' ){ i = -i; }
          int_done:
          if ( sqlite3_value_int64(argv[1])==i ) {
            b = 1;
            goto value_found;
          }
          end:
          break;
        case JSON_REAL:
          if (arg_type != SQLITE_FLOAT) break;
          double d = strtod(iNode->u.zJContent, 0);
          if ( (double)sqlite3_value_double(argv[1])==d ) {
            b = 1;
            goto value_found;
          }
      }
    }
  } else if ( pNode->eType==JSON_OBJECT ){
    JsonNode *kp;
    char *kPath;
    char *arg_str;
    char *arg;

    arg = (char*)sqlite3_value_text(argv[1]);

    if ( strlen(arg)>2 && arg[0]=='"' && arg[strlen(arg)-1]=='"' ) {
      arg_str = sqlite3_malloc(strlen(arg)-1);
      if ( arg_str==0 ){
        sqlite3_result_error_nomem(ctx);
        return;
      }
      strncpy(arg_str, arg+1, strlen(arg)-2);
    } else {
      arg_str = sqlite3_malloc(strlen(arg)+1);
      if ( arg_str==0 ){
        sqlite3_result_error_nomem(ctx);
        return;
      }
      strcpy(arg_str, arg);
    }

    kPath = sqlite3_malloc(strlen(zPath)+1+strlen(arg_str)+1);
    if ( kPath==0 ){
      sqlite3_result_error_nomem(ctx);
      return;
    }
    strcpy(kPath, zPath);
    strcat(kPath, ".");
    strcat(kPath, arg_str);

    kp = jsonLookup(p, kPath, NULL, ctx);
    if ( kp != 0 ) {
      b = 1;
    }
    sqlite3_free(kPath);
    sqlite3_free(arg_str);
  }
  value_found:
  sqlite3_result_int(ctx, b);
}

(16) By Marco Bubke (marcob) on 2022-01-08 18:47:26 in reply to 1 [link] [source]

I like the shortcuts. I use JSON a lot for saving some meta data.

So somthing like

SELECT key, value FROM json_type('{'key':'foo','value':1})

would be nice.

Combined with a walk with a fixed depth it could be a very useful pattern for my code. I know I should normalize it to tables but I very often get Json and this values are not performance critical.

(17) By MBL (UserMBL) on 2022-01-08 18:59:19 in reply to 1 [link] [source]

Will there be a virtual table/ table valued function which would allow the input of one big json text block and get a table of repeated rows from it?

I got inspired to ask this when getting in contact with Python using the pandas module and pickles by the Python nodes in KNIME Analytics Platform. A pickle is some kind of blob of a serialized data stream. Isn't a database or just a table or view out of it something similar?

It would be great to become enabled to use json text as a kind of pickle to carry configuration and parameter information and on one path a whole table content. Such a json text could carry more than only a table in a human readable and streamable format. I suggest to enable the CLI to accept JSON as an import format.

I would appreciate a table valued function (similar to json_each/json_tree that is able to create a whole data table output while the json text itself could also carry additional info, even the whole schema. JSON could even become an import type at the CLI. Let me show you my idea by an example:

sqlite> .mode box
sqlite> select * from concerts;
┌─────┬──────────────┬─────────────┐
│ id  │    artist    │ concertDate │
├─────┼──────────────┼─────────────┤
│ 100 │ FOO FIGHTERS │ 2019-03-22  │
│ 101 │ JOE COCKER   │ 2007-03-22  │
│ 102 │ ELO          │ 1998-04-18  │
│ 103 │ YES          │ 2003-11-02  │
└─────┴──────────────┴─────────────┘
sqlite> .schema
CREATE TABLE concerts (
  id integer not null,
  artist text not null,
  concertDate text not null
);
sqlite> .mode json
sqlite> select * from concerts;
[{"id":100,"artist":"FOO FIGHTERS","concertDate":"2019-03-22"},
{"id":101,"artist":"JOE COCKER","concertDate":"2007-03-22"},
{"id":102,"artist":"ELO","concertDate":"1998-04-18"},
{"id":103,"artist":"YES","concertDate":"2003-11-02"}]

The accepted input could then e.g. be from following JSON-Schema definition, especially the JSON path "$.DATAPACKET[1].DATA":

{ "DATAPACKET" : [ SCHEMA-VERSION":"3.38.0"
      , { "TABLENAME":"concerts"
        , "FIELDS":[ { "NAME":"id"          , "DATATYPE:"integer", CONSTRAINT:["not null","primary key"] }
                   , { "NAME":"artist"      , "DATATYPE":text"   , CONSTRAINT:["not null"] } 
                   , { "NAME":"concertDate" , "DATATYPE":text"   , CONSTRAINT:["not null"] }
                   ]
        , "DATA":[ {"id":100, "artist":"FOO FIGHTERS","concertDate":"2019-03-22"}
                 , {"id":101, "artist":"JOE COCKER",  "concertDate":"2007-03-22"}
                 , {"id":102, "artist":"ELO",         "concertDate":"1998-04-18"}
                 , {"id":103, "artist":"YES",         "concertDate":"2003-11-02"}
                 ]
        }
  ]
}

(19) By anonymous on 2022-01-08 21:19:23 in reply to 1 [link] [source]

I think the json_ntype and json_nextract functions are good. The subtype codes used for JSON should be documented, though.

JSON structures can sometimes be helpful inside of another record (especially when you want to interact with other programs that use JSON formats), but there are the problem involving JSON too: JSON does not use 64-bit integers, and it uses Unicode, and cannot store blobs effectively (except by encoding them as hex, base 64, etc). Not all data is text (and even that which is, isn't necessarily Unicode text).

(20) By anonymous on 2022-01-09 03:41:09 in reply to 19 [link] [source]

Additionally, I should think that JSON functions should not be included by default; they should be available as an option in the amalgamation and as a file that can be compiled separately if desired. (However, the -> and ->> syntax should be always included in the core SQLite, so that the extension will work.)

Another thing to note perhaps is that keys in JSON can be arbitrary strings, so mentioning them fully and unambiguously in the path can be helpful maybe (although the json_each virtual table can also be used for this purpose, I suppose).

(21) By Stephan Beal (stephan) on 2022-01-09 08:46:51 in reply to 20 [link] [source]

Additionally, I should think that JSON functions should not be included by default; they should be available as an option in the amalgamation and as a file that can be compiled separately if desired. (However, the -> and ->> syntax should be always included in the core SQLite, so that the extension will work.)

You've just brought up a good argument against the new syntax, perhaps inadvertently: since they new syntax can be overridden with user-defined, functions, users moving from one sqlite3 app to another might have misconceptions about what those operators actually do. Much has been said about compatibility with PG and MySQL, and yet allowing the new syntax to be bound to UDFs means that the new syntax isn't even necessarily compatible within two different sqlite3 connections.

(22) By Richard Hipp (drh) on 2022-01-09 11:19:42 in reply to 20 [link] [source]

I should think that JSON functions should not be included by default

Why not? Why should JSON support be default-off rather than default-on?

(24) By anonymous on 2022-01-09 20:46:51 in reply to 22 [link] [source]

Why not? Why should JSON support be default-off rather than default-on?

Most programs will not need them. (However, they should be enabled by default in the command-shell.)

(Exception: If you think the amount of added stuff and memory/disk-space/compile-time it takes up is small enough (I am not sure that it is), then maybe it is OK to be enabled by default; that would be easier to work, maybe.)

(26) By Richard Hipp (drh) on 2022-01-09 20:54:58 in reply to 24 [link] [source]

The extra code space for the JSON functions is about 20KB (x64, gcc -Os). That amounts about a 3.3% size increase. You get much, much bigger changes in binary size by adjusting compiler optimization settings.

(23) By AlexJ (CompuRoot) on 2022-01-09 18:21:05 in reply to 20 [link] [source]

I should think that JSON functions should not be included by default

Could you please collaborate on this?

IMHO, JSON should be "on" by default as it is in PostgreSQL & MySQL.

(25) By Richard Hipp (drh) on 2022-01-09 20:52:16 in reply to 1 [link] [source]

The description of the proposed JSON enhancements has been updated for clarity, based on comments received so far. Please re-read and offer any critique.

(27) By AlexJ (CompuRoot) on 2022-01-10 01:22:06 in reply to 25 [link] [source]

Typo in the section 3.1:

... that wworks in SQLite will work for MySQL.

should be:

... that works in SQLite will work for MySQL.

(29) By bokwoon on 2022-01-10 09:07:12 in reply to 25 [link] [source]

I think it is safer to mention that the -> operator is incompatible with MySQL and Postgres' version of ->. In MySQL and Postgres, the only reason why you would use -> is to get a JSON quoted string, which no SQLite json function supports. Instead, it is only when using the ->> operator that behaviour between SQLite and MySQL or SQLite and Postgres are semantically identical.

# Postgres and SQLite

-- for valid inputs, '->' is not compatible
postgres> SELECT '{"a":{"b":{"c":"value"}}}'::JSON->'a'->'b'->'c';
"value"
sqlite> SELECT '{"a":{"b":{"c":"value"}}}'->'a'->'b'->'c';
value

-- for valid inputs, '->>' is compatible
postgres> SELECT '{"a":{"b":{"c":"value"}}}'::JSON->'a'->'b'->>'c';
value
sqlite> SELECT '{"a":{"b":{"c":"value"}}}'->'a'->'b'->>'c';
value

-- for invalid inputs, '->' is not compatible
postgres> SELECT '{"a":{"b":{"c":"value'::JSON->'a'->'b'->'c';
ERROR:  invalid input syntax for type json
sqlite> SELECT '{"a":{"b":{"c":"value'->'a'->'b'->'c'; -- json_nextract cascades NULL all the way down the chain
NULL

-- for invalid inputs, '->>' is compatible
postgres> SELECT '{"a":{"b":{"c":"value'::JSON->'a'->'b'->>'c';
ERROR:  invalid input syntax for type json
sqlite> SELECT '{"a":{"b":{"c":"value'->'a'->'b'->>'c'; -- json_nextract cascades NULL, but gets caught by the json_extract at the end of the chain
malformed JSON

# MySQL and SQLite
# I'm using json_extract() and json_unquote(json_extract()) as aliases for -> and ->> because no db fiddle website I found seems to support that syntax for MySQL

-- for valid inputs, '->' is not compatible
mysql> SELECT json_extract('{"a":{"b":{"c":"value"}}}', '$.a.b.c');
"value"
sqlite> SELECT '{"a":{"b":{"c":"value"}}}'->'$.a.b.c';
value

-- for valid inputs, '->>' is compatible
mysql> SELECT json_unquote(json_extract('{"a":{"b":{"c":"value"}}}', '$.a.b.c'));
value
sqlite> SELECT '{"a":{"b":{"c":"value"}}}'->>'$.a.b.c';
value

-- for invalid inputs, '->' is not compatible
mysql> SELECT json_extract('{"a":{"b":{"c":"value', '$.a.b.c');
Error: ER_INVALID_JSON_TEXT_IN_PARAM
sqlite> SELECT '{"a":{"b":{"c":"value'->'$.a.b.c'; -- json_nextract cascades NULL all the way down the chain
NULL

-- for invalid inputs, '->>' is compatible
mysql> SELECT json_unquote(json_extract('{"a":{"b":{"c":"value', '$.a.b.c'));
Error: ER_INVALID_JSON_TEXT_IN_PARAM
sqlite> SELECT '{"a":{"b":{"c":"value'->>'$.a.b.c'; -- json_extract catches the error
malformed JSON

(30.3) By bokwoon on 2022-01-10 10:42:04 edited from 30.2 in reply to 29 [source]

Oh my error example for Postgres/SQLite is wrong, feeding NULL from json_nextract into json_extract should also return NULL. So in order for SQLite to behave similarly to Postgres for invalid inputs, you must use ->> all the way i.e.

->>'a'->>'b'->>'c' (not valid in postgres)

instead of

->'a'->'b'->>'c'

(34) By bokwoon on 2022-01-10 11:33:08 in reply to 30.3 [link] [source]

Personally I think that Postgres' -> and ->> syntax was a mistake, it confuses too many people because ->'a'->'b'->>'c' is unnatural compared to .a.b.c. In that sense SQLite and MySQL's choice of using json_extract was far more pragmatic, because JSON path resembles something that people know. Postgres 12 added proper JSON path support, and Postgres 14 added new JSON subscripting operators to make JSON access even more familiar (you can see people complaining about the old Postgres -> and ->> syntax in these threads https://news.ycombinator.com/item?id=27247621 https://www.reddit.com/r/PostgreSQL/comments/nq0p1k/better_json_in_postgres_with_postgresql_14/).

The biggest benefit to adding the new -> and ->> operators is that existing Postgres or MySQL JSON queries will just work without modification in SQLite, but that is not always the case due to the semantic mismatch between ->/->> in MySQL/Postgres and json_nextract/json_extract in SQLite. What about decoupling -> and ->> from json_nextract and json_extract and directly emulating Postgres and MySQL behaviour? Something like

sqlite> SELECT '{"a":{"b":{"c":"value"}}}'->'a'->'b'->'c';
"value"

sqlite> SELECT '{"a":{"b":{"c":"value"}}}'->'a'->'b'->>'c';
value

sqlite> SELECT '{"a":{"b":{"c":"value"}}}'->'$.a.b.c';
"value"

sqlite> SELECT '{"a":{"b":{"c":"value"}}}'->>'$.a.b.c';
value

sqlite> SELECT json_extract('{"a":{"b":{"c":"value"}}}', '$.a.b.c');
value

sqlite> SELECT json_extract('{"a":{"b":{"c":"value', '$.a.b.c');
malformed JSON

sqlite> SELECT json_nextract('{"a":{"b":{"c":"value', '$.a.b.c');
NULL

(35) By Richard Hipp (drh) on 2022-01-10 12:08:01 in reply to 1 [link] [source]

Having explored this question further, I now think I better understand why MySQL and PG have separate -> and ->> operators. It is because of their rigid type system, that requires each function to always return values of a specific type. Hence you have -> for returning JSON values and ->> for returning TEXT values. SQLite is not constrained in this way, and so, in theory, SQLite can have a single operator -> that returns either JSON or TEXT (or INTEGER or REAL) as appropriate for the JSON and path expression. This is a simpler and more intuitive approach, but is not possible in MySQL and PG due to the limitations of their type system.

Hence, for the sake of compatibility, I'm proposing to completely revamp my SQLite JSON improvement proposal as follows:

Omit the json_nextract() and json_ntype() functions, as the same thing can be accomplished using CASE and the need for these does not seem to come up as often as I anticipated.
Merge the JSON functions into the SQLite core. This is mostly transparent to applications. It just means the JSON functions are included by default, and they use a little less memory.
Add a -> operator that works kind of like json_extract() but that always returns JSON. Thus -> is compatible with MySQL and PG.
Add a ->> operator that always returns a primitive SQL type - TEXT, INTEGER, REAL, or NULL - depending on the input JSON and the path expression. This behavior is slightly different from MySQL and PG in that in those others, ->> always returns either TEXT or a NULL, and never INTEGER or REAL. But I haven't been able to come up with a situation where returning INTEGER or REAL when the underlying JSON value is INTEGER or REAL would cause a problem, so I'm going to call this new ->> operator "compatible" too.

My intent is to write up the new proposal in more detail, with examples, and add a working implementation to a branch (probably the json-in-core branch) at some point. But I have some other unrelated matters that will need to take priority so I'm not sure how soon that will happen.

(36.1) By Richard Hipp (drh) on 2022-01-10 15:44:35 edited from 36.0 in reply to 35 [link] [source]

New proposed -> and ->> operators for SQLite: https://sqlite.org/src/artifact/7f67a2e75de23958. These changes have not been implemented as of this writing.

Update: Code changes are now on the json-in-core branch. Your efforts at trying to poke holes in the new design are appreciated.

(38) By anonymous on 2022-01-10 17:23:03 in reply to 36.1 [link] [source]

There is a typo in the first table:

JSON: '{"a":[6,7,8]}'
json_extract() (SQLite): '[6,7,9]'

(39) By Richard Hipp (drh) on 2022-01-10 17:44:56 in reply to 38 [link] [source]

Fixed now in the latest.

(37) By AlexJ (CompuRoot) on 2022-01-10 16:31:35 in reply to 35 [link] [source]

The only suggestion to the latest proposal is - do not glue -> operator to JSON function only and leave possibility to analyze left hand part of operator, so in case one would like to implement PostgreSQL's hstore functionality in a future, then operator -> might be used with hstore type too.

(42) By anonymous on 2022-01-11 02:04:23 in reply to 35 [link] [source]

RE: use of json_nextract:

I might be missing something, but it's not clear to me that case/iif can effectively take the place of the nextract/ntype or a nullonerror flag argument.

Consider the following where we want to know if any given JSON has some target structure (based on the JSON1 doc example):

select iif (json_extract('{"a":2,"c":[4,5,{"f":7}]}', '$') = '{"a":2,"c":[4,5,{"f":7}]}' , 'contains', 'does not contain') as 'answer';

This renders 'contains' or 'does not contain' so long as the JSON is well formed.

The complications start when the json is not well formed (the first "a" is missing a quote:

select iif (json_extract('{"a:2,"c":[4,5,{"f":7}]}', '$') = '{"a":2,"c":[4,5,{"f":7}]}' , 'contains', 'does not contain') as 'answer';

This renders a statement error. I think there may be a use-case where we want an easy way to fold that error into the 'does not contain' category. Here we are not concerned about why the JSON does not contain the target, simply whether it does or does not.

If we have a flag on extract/type (or a n-prefix function) that would certainly handle it, but I'm not sure it's as easy with CASE unless we wrap a JSON check first, as in:

select iif(json_check(@j), iif (json_extract(@j, '$') = '{"a":2,"c":[4,5,{"f":7}]}' , 'contains', 'does not contain'), 'does not contain') as 'answer');

The above is doable, but having a flag or n-prefix function to nullonerror would simplify that a bit.

RE: nullonerror with -> and -->:

If it turns out we do want to sometimes fold in malformed json into a nullonerror, consider the -> and --> as well. --> returns various types (including null) so must cause an error on malformed json, while -> can only return JSON text, and therefore could also return null (aka nullonerror) on malformed json.

(74) By drjdpowell on 2022-06-20 12:18:35 in reply to 35 [link] [source]

Omit the json_nextract() and json_ntype() functions, as the same thing can be accomplished using CASE and the need for these does not seem to come up as often as I anticipated.

I would have thought suppressing the error would be very common desired behaviour. One can use CASE and json_valid, I suppose, but wouldn't that be lower-performing?

(40) By anonymous on 2022-01-10 20:22:46 in reply to 1 [link] [source]

IMHO the -> and ->> operators should require the left-hand operand to have correct type and subtype

so that

        select json('{"a":123}')->'a';

should work,

but each of

        select '{"a":123}'->'a';
        select cast('{"a":123}' as blob)->'a';
        select 1->'a';

should throw an error

(41) By Richard Hipp (drh) on 2022-01-10 21:25:48 in reply to 40 [link] [source]

Why? What problem are you trying to solve with this new restriction?

(43) By Wout Mertens (wmertens) on 2022-01-11 07:37:40 in reply to 41 [link] [source]

If the operators are restricted to JSON subtypes, they remain available for other subtypes implemented by extensions.

That way, e.g. alternative JSON extensions could be developed that implement some optimizations, like e.g. object shape encode/decode, dictionaries, streaming parsing etc. These would then still be able to use the operators.

Perhaps though, strings could be accepted as well as JSON, since those are never a subtype and JSON is in core.

(44) By anonymous on 2022-01-11 20:47:43 in reply to 41 [link] [source]

SQLite should not assume that the left-hand operand of -> or - >> is a JSON string, instead the operand format should be recognized by type and subtype so that -> and - >> could work on a variety of formats.

Extensions should be able to register extraction functions for different formats identified by type and subtype so that the following examples would work:

        CREATE TABLE t1(
            id INTEGER PRIMARY KEY,
            json_str TEXT,
            bson_data BLOB
        );

        SELECT json(json_str)->'a', bson(bson_data)->'a' FROM t1;


        CREATE TABLE t2(
            id INTEGER PRIMARY KEY,
            type TEXT,
            data
        );

        CREATE VIEW v2 AS SELECT
            id,
            CASE type
                WHEN 'json' THEN json(data)
                WHEN 'bson' THEN bson(data)
                WHEN 'cbor' THEN cbor(data)
                WHEN 'hstore' THEN hstore(data)
            END AS object
        FROM t2;

        SELECT object->'a', object->>'b' FROM v2;

(45) By Keith Medcalf (kmedcalf) on 2022-01-11 22:05:56 in reply to 44 [link] [source]

I believe you labour under the misaprehension that the "subtype" is stored in the database. It is not. SQLite3 only recognizes five storage types: NULL, BLOB, TEXT, REAL, INTEGER and something called NUMERIC which is a made-up combination of REAL and INTEGER that does not actually exist as a type.

The "subtype" is something that gets "tagged onto a retrieved value" by "a function execution against the value".

Subtype is not persistent. It must be recalculated every time. Nor is the subtype associated with anything in particular related to the data value itself.

It is merely a way of tagging a retrieved value. If you were to take a TEXT item containing JSON and pass it to the Jimbo functions, then it would get a subtype of Jimbo. Subtype is not an inherent characteristic of a value but is a convenient way to "remember" something about the value after it has been processed by a function.

(46) By Richard Hipp (drh) on 2022-01-12 00:07:05 in reply to 1 [link] [source]

The changes have landed on trunk. The draft 3.38.0 website contains preliminary documentation:

A prerelease snapshot tarball is available on the download page.

Please report any issues seen to this forum, or directly to me at drh at sqlite dot org.

(47) By Arfrever Frehtes Taifersar Arahesis (Arfrever) on 2022-01-12 00:25:26 in reply to 46 [link] [source]

As reported earlier, ./configure --disable-json adds -DSQLITE_OMIT_JSON to flags, and this SQLITE_OMIT_JSON option is used to omit JSON support, but in only one place (src/ctime.c) the spelling used is SQLITE_DISABLE_JSON instead of SQLITE_OMIT_JSON.

Trivial fix: ``` --- src/ctime.c +++ src/ctime.c @@ -179,9 +179,6 @@ #ifdef SQLITE_DISABLE_INTRINSIC "DISABLE_INTRINSIC", #endif -#ifdef SQLITE_DISABLE_JSON

"DISABLE_JSON", -#endif #ifdef SQLITE_DISABLE_LFS "DISABLE_LFS", #endif @@ -583,6 +580,9 @@ #ifdef SQLITE_OMIT_INTROSPECTION_PRAGMAS "OMIT_INTROSPECTION_PRAGMAS", #endif +#ifdef SQLITE_OMIT_JSON
"OMIT_JSON", +#endif #ifdef SQLITE_OMIT_LIKE_OPTIMIZATION "OMIT_LIKE_OPTIMIZATION", #endif ```

(48.1) By Keith Medcalf (kmedcalf) on 2022-01-12 00:36:46 edited from 48.0 in reply to 46 [link] [source]

Why are you pissing around directly with ctime.c and not updating the mkctimec.tcl script which is supposed to generate ctime.c?

If mkctimec.tcl is no longer used to generate ctime.c and it is now maintained by direct edit, the tcl script should be deleted as it is useless.

(49) By Richard Hipp (drh) on 2022-01-12 00:52:57 in reply to 48.1 [link] [source]

Because I forgot about mkctimec.tcl. Fixed now.

(50) By Larry Brasfield (larrybr) on 2022-01-12 00:57:20 in reply to 48.1 [link] [source]

mkctimec.tcl is still authoritative. The file it mostly generates, ctime.c, is provided as a convenience but should not be edited as a rule. It contains lines such as:** BEGIN CODE GENERATED BY tool/mkctime.tcland** END CODE GENERATED BY tool/mkctime.tclto warn users of the ephemeral nature of the intervening content.

(51) By anonymous on 2022-01-12 02:09:39 in reply to 46 [link] [source]

To avoid having to wrap everything in json_valid calls, it would be nice to have an entry in the examples table of section 4.5 as follows :

'{"malformed:json}' -> '$.x') → NULL (for malformed json on the -> operator, properly formed json required on the ->> operator)

Another argument for it is that it is closer to Postel's law. In the -> case, a NULL return means x does not exist in the json, which is indeed the case for malformed json. It just seems more robust.

In any event, that's all I'll say on this. :)

(53) By Ryan Smith (cuz) on 2022-01-12 10:23:18 in reply to 51 [link] [source]

Not sure this suggestion is exactly in the spirit of Postel's law.

Citing my own law, I will say that I much prefer the return value for "This is invalid/malformed JSON" to be distinct from "The path you specified does not exist in this valid well-formed JSON".

I won't advocate for it in this case because there are other functions that can check the JSON validity, including in CHECK constraints, and in my projects (as I would advise to all) the SCHEMA would be tasked with ensuring JSON validity prior to trying any function upon said JSON values. This means that by the time a value is in the DB, it is already confirmed to be valid, and so any null returns can only mean that either the specified path is wrong, or it doesn't exist in the JSON. Ditto when acting upon the output of a prior JSON function.

I think that is just fine in this case.

(55) By anonymous on 2022-01-12 19:48:15 in reply to 53 [link] [source]

ah so in your case the json is in the db and the value to check for is the parameter.

and if the json is not stored in the db? lets say match targets are stored and the json is the parameter?

while your use case is the most common, i can see it being used similar to storing glob/re patterns to be used on a parameter. in that case the targets are schema checked, but not the json.

in your use case you rely on the schema constraint as your wrapper. in the non stored json case the proposal requires the wrapper in the query. The suggestion for the -> operator (not the -->) gives us one way to handle that. another is argument flag.

(52) By Arfrever Frehtes Taifersar Arahesis (Arfrever) on 2022-01-12 04:16:46 in reply to 46 [link] [source]

SQLITE_ENABLE_JSON1 macro no longer has any effect, but SQLite source tree still contains 20 obsolete references to SQLITE_ENABLE_JSON1 macro:

$ grep -r SQLITE_ENABLE_JSON1 *
Makefile.in:SHELL_OPT = -DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_FTS4
Makefile.in:FUZZERSHELL_OPT = -DSQLITE_ENABLE_JSON1
Makefile.in:FUZZCHECK_OPT = -DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_MEMSYS5 -DSQLITE_OSS_FUZZ
Makefile.msc:OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_JSON1=1
Makefile.msc:MPTESTER_COMPILE_OPTS = -DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_FTS5
Makefile.msc:FUZZERSHELL_COMPILE_OPTS = -DSQLITE_ENABLE_JSON1
Makefile.msc:FUZZCHECK_OPTS = -DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_MEMSYS5 -DSQLITE_OSS_FUZZ -DSQLITE_MAX_MEMORY=50000000 -DSQLITE_PRINTF_PRECISION_LIMIT=1000
Makefile.msc:TESTFIXTURE_FLAGS = $(TESTFIXTURE_FLAGS) -DSQLITE_ENABLE_JSON1=1
autoconf/configure.ac:  BUILD_CFLAGS="$BUILD_CFLAGS -DSQLITE_ENABLE_JSON1"
autoconf/README.txt:  "OPTS=-DSQLITE_ENABLE_STAT4=1 -DSQLITE_ENABLE_JSON1=1"
autoconf/Makefile.msc:OPT_FEATURE_FLAGS = $(OPT_FEATURE_FLAGS) -DSQLITE_ENABLE_JSON1=1
doc/json-enhancements.md:-DSQLITE_ENABLE_JSON1 compile-time option was used.
ext/repair/sqlite3_checker.c.in:#define SQLITE_ENABLE_JSON1 1
main.mk:SHELL_OPT += -DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_FTS5
main.mk:FUZZERSHELL_OPT = -DSQLITE_ENABLE_JSON1
main.mk:FUZZCHECK_OPT = -DSQLITE_ENABLE_JSON1 -DSQLITE_ENABLE_MEMSYS5
src/test_config.c:#ifdef SQLITE_ENABLE_JSON1
test/releasetest_data.tcl:            lappend opts -DSQLITE_ENABLE_JSON1
tool/warnings.sh:      -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1 \
tool/warnings.sh:      -DSQLITE_ENABLE_FTS5 -DSQLITE_ENABLE_JSON1 \

Most of them should be simply deleted.

src/test_config.c is the only place where SQLITE_ENABLE_JSON1 is still checked with preprocessor directives like '#ifdef', '#ifndef', '#if defined'.

Fix for src/test_config.c:

--- src/test_config.c
+++ src/test_config.c
@@ -239,10 +239,10 @@
   Tcl_SetVar2(interp, "sqlite_options", "geopoly", "0", TCL_GLOBAL_ONLY);
 #endif
 
-#ifdef SQLITE_ENABLE_JSON1
+#ifdef SQLITE_OMIT_JSON
-  Tcl_SetVar2(interp, "sqlite_options", "json1", "1", TCL_GLOBAL_ONLY);
+  Tcl_SetVar2(interp, "sqlite_options", "json1", "0", TCL_GLOBAL_ONLY);
 #else
-  Tcl_SetVar2(interp, "sqlite_options", "json1", "0", TCL_GLOBAL_ONLY);
+  Tcl_SetVar2(interp, "sqlite_options", "json1", "1", TCL_GLOBAL_ONLY);
 #endif
 
   Tcl_SetVar2(interp, "sqlite_options", "has_codec", "0", TCL_GLOBAL_ONLY);

(54) By anonymous on 2022-01-12 14:15:55 in reply to 46 [link] [source]

What about a function similar to COALESCE, that returns the first valid json argument or errors out if none is valid.

JSON_FIRSTVALID(p1,p2,p3,p4,...)

With that, one could avoid the somewhat lengthy IF json_valid(p1) THEN ...-constructs and make sure a query with json_extract (or ->, ->>) does not error out but rather returns "something", like

JSON_FIRSTVALID(maybe_not_json_column,'{}')->'$.my_field'   --results in NULL if not valid json

JSON_FIRSTVALID(maybe_not_json_column,'{"my_field":"DEFAULT"}')->'$.my_field'  --results in '"DEFAULT"' if not valid json

Or just enhance the existing JSON()-function to have 1+ parameters and do exactly that? Makes that pretty compact:

JSON(maybe_not_json_column,'{}')->'$.my_field'

(56) By Simon Willison (simonw) on 2022-01-19 05:00:54 in reply to 1 [link] [source]

I have to admit I've never been a big fan of PostgreSQL's -> and ->> JSON operator design - I find them hard to remember, so I usually it to use the function equivalents instead. That's a matter of personal preference though.

PostgreSQL 14 introduced new syntax which, as a Python and JavaScript programmer, I really like. It's described in https://blog.crunchydata.com/blog/better-json-in-postgres-with-postgresql-14 which gives the following example:

SELECT *
FROM shirts 
WHERE details['attributes']['color'] = '"neon yellow"'
  AND details['attributes']['size'] = '"medium"'

I would be thrilled to see that syntax added to SQLite!

(57) By Marco Bubke (marcob) on 2022-01-19 09:45:56 in reply to 56 [link] [source]

Can you attribute the parameters?

SELECT *
FROM shirts 
WHERE details[?1]['color'] = ?2

Or can you use them in the selection:

SELECT details[?1]['color']
FROM shirts

(58) By Richard Hipp (drh) on 2022-01-19 12:33:08 in reply to 56 [link] [source]

The "[]" operator is not available to SQLite. Microsoft SQL Server uses square brackets to quote identifiers. For example, instead of "columnname" they write [columnname]. This is not standard SQL, but it is extremely common among programmers who first learned SQL from SQL Server. SQLite provides square-bracket quoting too, for compatibility with SQL Server, and has for 20 years. Hence, we cannot use [] as an operator.

(59) By Marco Bubke (marcob) on 2022-01-19 15:21:15 in reply to 58 [link] [source]

What about

[[]]

and a compile time option to use

[]

if you don't want to use SQL Server syntax? Is

['foo']

valid Sql Server syntax too? But maybe that makes the parser to complicated. I personally get used to almost every syntax but I have colleagues which very "sensitive". For them "[]" would be easier.

(60) By Larry Brasfield (larrybr) on 2022-01-19 15:31:05 in reply to 59 [link] [source]

The parser hardly sees those square brackets. The tokenizer deals with them, and without much fuss.

(61) By Holger J (holgerj) on 2022-01-20 13:49:56 in reply to 58 [link] [source]

I regard this (old) decision as very unwise. Even if an SQL Server developer tends to use the brackets, there is nothing wrong in raising an error because it's non-standard and SQL Server works fine without these.

PostgreSQL refuses them (syntax error), as does Oracle (invalid table name).

Of course it's not feasible to remove a "feature" (accepting wrong syntax) after it has been supported for decades. I write "feature" because it's actually a non-feature not to report this error.

What about introducing a flag or pragma to turn on better syntax checking refusing brackets around table names?

(62) By Richard Hipp (drh) on 2022-01-20 14:36:35 in reply to 61 [link] [source]

You want a flag or PRAGMA to disable a non-standard syntax feature from SQL Server that has been in wide-spread use for decades, in order to support a non-standard syntax feature of PG that has been added just recently?

(63) By Richard Hipp (drh) on 2022-01-22 12:33:54 in reply to 58 [link] [source]

Just in case anyone doubts that the use of [...] for quoting of identifiers is pervasive, forum post 525873d1796a4a51 from a first-time participant in this forum uses [...] quoting, and only a couple of days after my previous message.

(64) By Marco Bubke (marcob) on 2022-01-22 18:25:55 in reply to 63 [link] [source]

I think it would be advantageous to have a process to deprecate features for new features. For example you can introduce a new feature and disable the old feature. People then can decide. After some years you can see if people adopt the new feature and leave the old one behind. But this process can make the source code more complex, so it has drawbacks too.

I think the standard answer of software development is to build something completely new if changes getting very hard(technical or social). Having a continuous process of change is my perception not so common. And tooling for it is even less common. But in our time we wrote already so much software and it is getting more and more expensive to abandon a big chunk of code completely to rewrite it. Sometimes it is simply ignored and development is stopped but then you get security problems.

(65) By Holger J (holgerj) on 2022-01-23 12:09:32 in reply to 63 [link] [source]

I have no doubts that using [] around identifiers is pervasive, but nontheless non-standard.

Making a mistake by allowing non-standard syntax and later complaining that still so many people use this, is not the finest thing to do.

It's like allowing your children to misbehave and later complain that they do - even if it harms others.

(66) By Harald Hanche-Olsen (hanche) on 2022-01-23 17:56:15 in reply to 65 [link] [source]

Making a mistake by allowing non-standard syntax and later complaining that still so many people use this, is not the finest thing to do.

I think that is unnecessarily harsh. First, the syntax in question was introduced for a reason that may well have seemed valid at the time (MS SQL server did it). Second, the SQLite project is extremely careful to maintain backward compatibility, more so than many other projects. You may disagree with this stance, but ”not the finest thing” is an unfair characterization.

Of course, there are ways to get rid of such warts. For example, one could introduce a pragma to turn off this syntactical misfeature. Using that pragma might simultaneously enable other uses of the square brackets, thus encouraging people to use it. And after a year or two, the default could switch, so that you now have to use the pragma to enable the misfeature instead. (It would have to take a boolean argument to allow its usage to remain the same.) Should it be done? That is not for me to decide, but I certainly wouldn't mind.

(67) By Richard Hipp (drh) on 2022-01-23 20:27:30 in reply to 65 [link] [source]

I have no doubts that using [] around identifiers is pervasive, but nontheless non-standard.

The uses of [] for accessing JSON elements in PG is non-standard too. My point is: why should we favor the new and mostly unknown non-standard-ism of PG over the ancient and widely used non-standard-ism of SQL Server?

(69) By Marco Bubke (marcob) on 2022-01-24 11:27:50 in reply to 67 [link] [source]

I think there is an demand for both. So maybe we can find a way around it?

Is actually the

['foo']

syntax valid for Sql Server too? Could the tokenizer distinguish between

[foo]

and

['foo']

(70) By Larry Brasfield (larrybr) on 2022-01-24 12:46:37 in reply to 69 [link] [source]

Those last two tokens specify different identifiers. The first is 3 characters long and the last is 5 characters long. The tokenizer does not "distinguish" in any way except by faithfully collecting what lies between the functional delimiters.

(71) By Marco Bubke (marcob) on 2022-01-24 17:03:38 in reply to 70 [link] [source]

Sorry, I don't know how the Sqlite lexer/parser works. I know only a little but about the Clang lexer/parser where it is possible to distinguish between '[' and '[['. I expected what it would get 3 or 5 tokens like:

"[" "foo" "]"

"[" "'" "foo" "'" "]"

So my idea was that you get:

"['" "foo" "']"

But it was only an idea and if that is not possible you can forget it. ;-)

(72) By Richard Hipp (drh) on 2022-01-24 18:29:26 in reply to 69 [link] [source]

I don't know, but presumably in PG, the array operator accepts any expression that evaluates to a string inside the [...] operation, not just a string literal. Perhaps a string literal is the most commonly used case for JSON, but we shouldn't limit it to that.

(73) By anonymous on 2022-01-27 14:42:23 in reply to 72 [link] [source]

If somebody want to use an expressions there is extract. It would anyway already a complex expression. What I would expect is somthing like:

[?1]

(68) By anonymous on 2022-01-24 06:59:49 in reply to 63 [link] [source]

I use the nonstandard `...` syntax, and some others might use the nonstandard [...] syntax. Often the SQL codes are embedded in C codes, and for this purpose, it is helpful, instead of using the quotation marks (avoiding needing the additional escaping) (but when the SQL codes are in external files, the quotations marks can be in use).

I think the JSON specific syntax is not needed for core SQL. In the case of -> it is not a conflict and can be overridden anyways, it is not a problem; however, using the square brackets for JSON can be problem due to already being in use for something else.