SQLite User Forum

JSONB has landed
Login

JSONB has landed

(1) By Richard Hipp (drh) on 2023-12-05 20:43:27 [source]

JSONB is a rewrite of the SQLite JSON functions that, depending on usage patterns, could be several times faster than the original JSON functions. This enhancement has now landed on trunk.

Developers who use JSON heavily in their applications are encouraged to download a pre-release snapshot and give the new code a try.

How Is This Different?

Functions that deal with text JSON use a three-step process:

  1. Parse the text JSON into an internal binary format that is more accessible to C-code.

  2. Carry out the requested operation. Maybe this is looking up a field in an object, or perhaps it is modifying the JSON in some way.

  3. If the processing involved changing the JSON, convert the internal binary format back into RFC-8279 JSON text for output and/or storage.

Step 2 is the essential operation that you want to accomplish. Steps 1 and 3 are just overhead.

Historically, SQLite used an internal binary representation of JSON that involved lots of pointers. This fits will into C programs, but it is difficult to serialize. The JSONB rewrite changes the internal-use binary representation of JSON into a contiguous byte array that can read or written as an SQL BLOB. This allows the internal-use representation of JSON to potentially be saved to the database, in place of JSON text, eliminating the overhead of steps 1 and 3.

What has changed?

All legacy functionality is preserved. The only change has been to add new capabilities.

Any JSON function that accepts JSON text as an input will now also accept JSONB binary content for that same parameter. You do not have to tell the function if it getting text or binary data. It figures that out for itself.

JSON functions that output JSON now come in two versions. The historical "json_" functions works as before. But there is now a corresponding "jsonb_" function that returns JSONB rather than text JSON, thus omitting step 3 in the normal processing.

If you don't make any changes to your application, everything should continue to work as it always has, though perhaps slightly (1%) faster.

But if you modify your application to start storing JSONB instead of text JSON, you might see a 3-times performance improvement, at least for the JSON-intensive operations. JSONB is also slightly smaller than text JSON in most cases (about 5% or 10% smaller) so you might also see a modest reduction in your database size if you use a lot of JSON.

Migrating

Note that all functions accept both text JSON and JSONB. So to start using JSONB, you do not have to modify your database files to convert legacy text JSON into JSONB. Just start writing out JSONB for new entries. The old entries will continue to work. The new entries will just work faster.

Or, if you do want to convert all your legacy data to JSONB, you can just run an update operation like:

UPDATE bigtable SET jsonColumn = jsonb(jsonColumn);

Please provide comments

If you find this enhancement useful, or if you try it out and see performance regressions or bugs, please let us know. Leave a follow-up post here, or contact me directly at drh at sqlite dot org.

The current plan is to release the JSONB enhancement in the next major release of SQLite - version 3.45.0. That will probably occur in a month or two.

(2.1) By Alex Garcia (alexgarciaxyz) on 2023-12-05 21:03:09 edited from 2.0 in reply to 1 [link] [source]

Thank you for all your hard work! I'm very excited for the new JSONB features.

I'm curious, would it be possible (or a good idea) to add JSON and JSONB datatypes to STRICT tables? Only int/real/text/blob/any column types are currently supported, but I could see a big benefit to supporting JSON/JSONB column types as well.

CREATE TABLE t1(
  id TEXT PRIMARY KEY,
  name TEXT,
  settings JSONB
) STRICT;

Would essentially be the same as:

CREATE TABLE t1(
  id TEXT PRIMARY KEY,
  name TEXT,
  settings BLOB CHECK (jsonb_valid(settings))
) STRICT;

JSON columns could check if values are json_valid(), and JSONB could use jsonb_valid(). It would be cool if this could be something extensions could use (like an XML extension adding a xml_valid() constraint to strict "XML" column types).

(3) By Richard Hipp (drh) on 2023-12-05 21:04:58 in reply to 2.1 [link] [source]

Something like that might happen someday. But let's get the basic JSONB functionality out the door first.

(4) By Deon Brewis (deonb) on 2023-12-05 21:15:49 in reply to 1 [link] [source]

Is JSONB a directly serializable data format?

i.e. Can we read it as a blob, send it over the wire, and write back into another database as a blob?

(5) By Richard Hipp (drh) on 2023-12-05 21:31:49 in reply to 4 [link] [source]

Yes. The byte-format is defined in the (draft) documentation. See https://sqlite.org/draft/jsonb.html.

(9) By anonymous on 2023-12-06 12:46:25 in reply to 5 [link] [source]

One question about "lazy" convert. What is the design point of not converting the data to full binary format? If the user has some data, that does not worth the convert, they can keep the source JSON, and when they want speed, they can use JSONB. JSONB point seems to be speed (as the reported values), so keeping the numbers as text (and I assume converting text to number is the slowest part of JSON parsing as reported by many active in the field including SIMDJSON), seems a little puzzling.

I am sure, you had something interesting in mind when design, can you elaborate more on the reasoning?

(23) By Richard Hipp (drh) on 2023-12-11 19:15:47 in reply to 9 [link] [source]

The conversion from text-to-binary happens as needed, rather than as a big up-front cost. So if you have an object or array with a million integer values embedded, you don't have to do a million conversions as you change it into JSONB. You instead do each conversion as you extract the value.

And the common case is that the value is never extracted. Remember, that JSONB is the internal parse format for the JSON library in SQLite. When you say "x ->> 'fieldName'" in an SQL expression where x is text JSON, it converts all of x into JSONB, then extracts fieldName. Why would you want to convert all the other integers contained within x into binary when they are never used?

(25) By anonymous on 2023-12-12 18:01:24 in reply to 23 [link] [source]

I would agree with you if SQLite main use was analytics, and many of them store information that they do not use. But for a database like SQLite, I assumed the data will be queried multiple times, and the cost of text to integer is paid as many times.

I asked, as I guessed you have a much better view on the common case, and you are saying that it is the opposite.

One other note is that, on write, it is slow, because of disk and writing SQLite, so I assume again that cost of JSONB can be ignored, and maybe the size save help a bit too. But on read, it is much faster, so slowing down a fast query on a binary tree, to parse the data was a little confusing.

All these talks are not very effective unless we do many benchmarks, and I guess you did already, hence me asking to see your view better.

Thank you.

(10) By Roger Binns (rogerbinns) on 2023-12-06 12:56:19 in reply to 5 [link] [source]

I was delighted to see you allowed for future reserved types. When dealing with lots of JSON data it saves a lot of memory by ensuring there are unique copies of strings - in practical terms instead of storing a string directly, an integer entry into a string table is used. This format allows for that when the inevitable feature request comes in.

(16) By Deon Brewis (deonb) on 2023-12-06 19:07:55 in reply to 10 [link] [source]

Hah! The only reason I'm on the fence about switching from our own binary json format to jsonb is because our own format supports string interning of the keys.

Inevitable is putting it mildly :)

(6) By nalgeon on 2023-12-06 04:37:23 in reply to 1 [link] [source]

For anyone who wants to try it out, I've set up a live playground with the pre-release snapshot.

(7) By anonymous on 2023-12-06 06:48:01 in reply to 6 [link] [source]

Awesome, thanks!

(8) By anonymous on 2023-12-06 12:10:24 in reply to 1 [link] [source]

Are there any plans to conpress this format to save space, like a JSONBZ type or similar

(11) By Bo Lindbergh (_blgl_) on 2023-12-06 14:29:06 in reply to 1 [link] [source]

sqlite> select json(x'E70000000568656C6C6F') as correct;
+---------+
| correct |
+---------+
| "hello" |
+---------+
sqlite> select json(x'F70000000568656C6C6F') as incorrect;
+-----------+
| incorrect |
+-----------+
| "hello"   |
+-----------+

The bug is in jsonbPayloadSize, which treats size field values 14 and 15 as equivalent.

(13) By ddevienne on 2023-12-06 14:58:35 in reply to 11 [link] [source]

Took me a while to see the few missing pixels between E and F...
Maybe next time point them out explicitly :)

(20) By Bo Lindbergh (_blgl_) on 2023-12-06 19:57:35 in reply to 11 [link] [source]

Fixed here. Thank you!

(12) By Marco Bubke (marcob) on 2023-12-06 14:50:46 in reply to 1 [link] [source]

Are --> and -> supporting jsonb as input?

(14) By ingo on 2023-12-06 18:37:00 in reply to 1 [link] [source]

Just out of interest (and lack of knowledge on my side), was there a specific reason to create JSONB instead of using an existing format like CBOR, BJSON, BSON, UBSON? Formats that have existing libraries in several languages.

(15) By Richard Hipp (drh) on 2023-12-06 18:47:41 in reply to 14 [link] [source]

The purpose of SQLite-JSONB is to expose the underlying parse tree for JSON strings, so that the parse can be saved in the database and not need reparsing every time the JSON is used. This makes some kinds of operations on large JSON strings about 3x faster, according to measurement scripts in the test/json subdirectory of the source tree. In addition to running about 3x faster, SQLite-JSONB also has the interesting property of being about 5 to 10% smaller than text JSON, thus taking up less disk space.

SQLite-JSONB is not intended to be an interchange format. It is not intended to be used by any other programming languages. It is not intended to be used by application developers. Application developers should consider JSONB to be an opaque blob. JSONB is an internal format for use by the SQLite implementation only.

I tried to make this point clear in the documentation, but apparently my efforts have been unsuccessful. I'll work on the documentation some more.

(17) By Nuno Cruces (ncruces) on 2023-12-06 19:09:53 in reply to 15 [link] [source]

I think one thing that might be helpful is to explain that the expected way to insert JSONB into an SQL table is something to the effect of:

CREATE TABLE log (tstamp, json);
INSERT INTO log VALUES (datetime(), jsonb(?));

Where you bind RFC 8259 JSON text to the statement, and SQLite does the JSON to JSONB conversion.

Then to pull data from the database, you again leverage SQLite to get the output as RFC 8259 JSON text:

SELECT tstamp, json(json) FROM log;

(18) By jose isaias cabrera (jicman) on 2023-12-06 19:12:33 in reply to 15 [link] [source]

I tried to make this point clear in the documentation, but...

I thought your point was clear and precise. I believe folks just see things, questions come up, and questions are asked without the proper research. I know I do it sometimes.

(19) By Bo Lindbergh (_blgl_) on 2023-12-06 19:51:23 in reply to 14 [link] [source]

CBOR and UBJSON are even worse than JSON for random access. Containers have optional member counts (much less useful than actual byte sizes), which complicates skipping over an item: you need a stack of incomplete object member counts. To skip over a JSON item, a single integer tracking the container nesting level suffices.

CBOR, BJSON, and BSON add extra data types that don't exist in JSON. If you choose an externally specified format, people will expect your code to support the full specification and complain when it doesn't.

(21) By anonymous on 2023-12-06 20:50:35 in reply to 19 [link] [source]

Jens Alfke's Fleece seems to have similar properties to JSONB: very fast to read: No parsing is needed, and the data can be navigated and read without any heap allocation

https://github.com/couchbase/fleece

(22) By ddevienne on 2023-12-07 08:35:56 in reply to 21 [link] [source]

The Delta appending feature is interesting.

There's are tons of C/C++ JSON libraries, and binary variants.

The latest making the buzz in the C++ world that came on my radar is Glaze,
which claims GB/s parse time, similar to simdjson, w/o the disadvantages.

And he has his own binary variant of course, Beve.

Would be interesting to add SQLite's parser to the Glaze benchmarks.
Except it's not available as a standalone JSON parser.
Maybe the reverse should be done, given Richard's mention of an set of benchmark for JSON,
i.e. write JSON1 alternatives using Glaze and others.

In any case, none of these other alternatives are likely to sway Richard on his own format.
And the only thing that matters is that we're gaining faster JSON processing in SQLite.

(32) By Vadim Goncharov (nuclight) on 2024-02-16 13:29:00 in reply to 19 [link] [source]

CBOR and UBJSON are even worse than JSON for random access. Containers have optional member counts (much less useful than actual byte sizes)

CBOR is designed as stable interchange format, suitable for streaming implementations - byte counts would prohibit such usage.

To skip over a JSON item, a single integer tracking the container nesting level suffices.

Nope. Consider {"key}":"value}"} for example. You still need to parse/scan every byte.

CBOR, BJSON, and BSON add extra data types that don't exist in JSON

And this ia A Good Thing.

(24) By anonymous on 2023-12-11 22:23:43 in reply to 1 [link] [source]

That is good, but what I would like is the ability to parse and make blobs in the row values storage format of SQLite, because there are some things I dislike about JSON. (As far as I can tell, the SQLite row value format is handled in the VDBE execution code, so it might be possible to do by adding a special SQL function and/or C function and/or special built-in virtual table and/or pragma; I may be wrong about any part of this though.)

(33) By Vadim Goncharov (nuclight) on 2024-02-16 13:30:34 in reply to 24 [link] [source]

Sounds interesting, but what be the usage? Row format does not nest, and have no key names (is equivalent to simple array in JSON).

(26) By rseymour on 2023-12-22 18:36:52 in reply to 1 [link] [source]

From having a lot of ups and downs with JSON/JSONB in postgres, I have to ask is there a problem with Unicode 1.1 valid character u0000 in sqlite JSONB? I've found it to be one of the premiere unfixable bug sources with postgres' jsonb.

(27) By Bo Lindbergh (_blgl_) on 2023-12-22 23:50:42 in reply to 26 [link] [source]

It seems to work fine.

sqlite> select json(x'7A666F6F00626172') as json;
+----------------+
|      json      |
+----------------+
| "foo\u0000bar" |
+----------------+

That is, the raw 00 byte in the middle gets correctly escaped when a text rendering is requested.

(28) By rseymour on 2023-12-23 00:08:48 in reply to 27 [link] [source]

This is legitimately a big deal to me. Thanks for checking! Looks like SQLite did encoding right!

(29) By Jack (JackLite) on 2023-12-28 06:53:19 in reply to 1 [link] [source]

Just an interesting question.

If JSONB is designed for internal use by SQLite only, then if a malformed JSONB appears in database by some reason, will it or should it pass the PRAGMA integrity_check?

I aware not passing the PRAGMA integrity_check is currently not doable because the format of a malformed JSONB can actually be a well formed BLOB, but it is some kind of a malformed database file, right?

(30) By EricG (EriccG) on 2023-12-28 10:21:23 in reply to 1 [link] [source]

When storing JSON (text) it is possible to use FTS to provide an indexing capability (pre-filtering) for content in the JSON. This is not officially supported, and relies on somewhat ghetto tokenization, but is very effective IME.

For instance {"name1":"value1","name2":value2} can be tokenized as 'name1 value1 name2 value2', which is good enough for FTS pre-filtering:

  • finding records containing specific GUIDs/tokens
  • finding records where a field has a specific value (search for "name value")
  • fuzzy search

The advantage over indexes on expressions is that an FTS index can be created "a priori" to index "everything", with no re-parsing of all JSON documents (which can be prohibitive), and is more resilient to versioned JSON (when structure varies over time).

With JSONB, would it be worth it to provide an "official" FTS tokenizer? Or am I the only one (ab)using FTS for JSON? :) Some of the limitations could be addressed while doing it, like having standardized markers for end of object, end of array, tweaked distances etc.

(34) By Vadim Goncharov (nuclight) on 2024-02-16 13:35:25 in reply to 30 [link] [source]

Interesting trick. How do you deal with multiword values or even keys?

(31.1) By Davide Malpassini (davide.malpassini) on 2024-01-12 09:42:20 edited from 31.0 in reply to 1 [link] [source]

Is in plan to support querying of nested array with wildcard like is already done by other database like postgress, mysql , etc ?

An example could be
json_extract('{"a":2,"c":[{"f:2"},{"f:7"},{"f":7}]}',$.c[*].f) and returning a tuple so we can filter in the where clause for the object that contains in the array some value ?

Thank you Davide Malpassini