SQLite Forum

Timeline
Login

50 most recent forum posts by user ddevienne

2022-01-26
08:54 Reply: Finding the SQLite version for a feature (artifact: 9d118eecc3 user: ddevienne)

Nice! Thanks for sharing.

2022-01-25
08:33 Reply: Finding the SQLite version for a feature (artifact: a79db6b806 user: ddevienne)

how can I find the SQLite version number where that feature was introduced?

It's unfortunately not something that is tracked.
It's been requested before, but that did not happen.

Searching the release notes is indeed the next best thing.
And there's always finding the code for that feature, and go back to Fossil...

08:26 Reply: Subtle SQLite compliment (artifact: aa5434b363 user: ddevienne)

In the same vein as Simon's post: From https://apenwarr.ca/log/20211229:

The best part of free software is it sometimes produces stuff you never
would have been willing to pay to develop (Linux), and sometimes at
quality levels too high to be rational for the market to provide (sqlite).

2022-01-24
10:39 Reply: binding by reference instead of value (artifact: bfcdcfae20 user: ddevienne)

That's already how it works. If you don't rebind something else, the binds persist.

2022-01-21
16:22 Reply: Is it possible to fix a more than 1G corrupted SQLite file with only the first 16K content corrupted? (artifact: 0b105bb898 user: ddevienne)

the next 3 pages could in theory be anything

Note quite, excluding a vacuum or autovacuum.

(and even then, the vacuum is likely to follow a deterministic pattern for low page IDs I suspect).

Because each CREATE DLL will allocate at least one page per table and index(es),
so page#2 is likely to be the 1st table created, page#3 the 2nd table, unless the 1st table had an index.
At least that's my experience.

2022-01-20
17:46 Reply: Is it possible to fix a more than 1G corrupted SQLite file with only the first 16K content corrupted? (artifact: 970ab35981 user: ddevienne)

Bummer.

How can you be sure only the first 16K are corrupted?

Can you share the schema of the DB?
Along with a hex-dump (e.g. od or xxd) of those first 16K?
(possibly with a way to fetch that 16KB file from some URL).

I'm guessing you can't, but in case you do, might be an interesting exercise on this list.
You never know what could happen, lots of technical folks here who like challenges :)

07:48 Edit reply: Is it possible to fix a more than 1G corrupted SQLite file with only the first 16K content corrupted? (artifact: 7dec1d2387 user: ddevienne)

In addition to what Larry wrote, what was the page_size of that DB?

If it was 16K, and you are sure only the first 16K were clobbered,
then you might have a better chance of recovery, if you know what
the schema of that DB is supposed to be.

The 1st page of an SQLite DB contains both the 100 bytes header,
and the content of the all-important sqlite_main (aka sqlite_master)
table. If large enough, sqlite_main will spill to more pages,
but most smallish schemas fit in a single page.

So recreate an empty DB with the same schema, with the same 16K page_size,
and overlay that 1st 16K page of the empty DB on top of the corrupted DB,
(on a copy of course, as Larry mentioned), and see what happens.

If your DB uses sequences, you might have to fix those, but I think they
are in a separate table, thus separate page.

If OTOH your DB uses a 4K page_size, then potentially more tables than just
sqlite_main are corrupted, which might contain data and/or a non-leaf page
of the B-Tree, and thus things might be more difficult to recover your DB...

I suggest you carefully read the file-format doc, in any case.

There's also at least one person of this list that does commercial DB
recovery using specialized tools, which might be able to help, if that
DB really has a lot of value.

I hope that's helpful. Good luck, --DD

07:46 Reply: Is it possible to fix a more than 1G corrupted SQLite file with only the first 16K content corrupted? (artifact: b0d27f8c5e user: ddevienne)

In addition to what Larry wrote, what was the page_size of that DB?

If it was 16K, and you are sure only the first 16K were clobbered,
then you might have a better chance of recovery, if you know what
the schema of that DB is supposed to be.

The 1st page of an SQLite DB contains both the 100 bytes header,
and the content of the all-important sqlite_main (aka sqlite_master)
table. If large enough, sqlite_main will spill to more pages,
but most smallish schemas fit in a single page.

So recreate an empty DB with the same schema, with the same 16K page_size,
and overlay that 1st 16K page of the empty DB on top of the corrupted DB,
(on a copy of course, as Larry mentioned), and see what happens.

If you DB uses sequences, you might have to fix those, but I think they
are in a separate table, thus separate page.

If OTOH you DB uses a 4K page_size, then potentially more tables than just
sqlite_main are corrupted, which might contain data and/or a non-leaf page
of the B-Tree, and thus things might be more difficult to recover your DB...

I suggest you carefully read the file-format doc, in any case.

There's also at least one person of this list that does commercial DB
recovery using specialized tools, which might be able to help, if that
DB really has a lot of value.

I hope that's helpful. Good luck, --DD

2022-01-14
14:15 Reply: Any use case for the Bloom filter for "large analytic queries" (artifact: 1905df603c user: ddevienne)

Bloom filters are also deployed whenever an automatic index is created

Does that mean they could also be useful on regular indexes? In an opt-in manner?

13:55 Reply: Any use case for the Bloom filter for "large analytic queries" (artifact: e25e577f02 user: ddevienne)

Thanks! Very interesting Richard.

08:20 Reply: Any use case for the Bloom filter for "large analytic queries" (artifact: 6f67551c0d user: ddevienne)

Hi. Just curious, what's the ballpark improvement of much faster? 50%? 2x? 10x? More?

I'd also be interested in more information on how / when Bloom filtering kicks in in SQLite.
I have basic notions of Bloom filters, it's the SQLite specific uses I wonder on.

2022-01-10
11:13 Reply: Proposed JSON enhancements. (artifact: 9c05baa3e4 user: ddevienne)

I'd rather have more speed

Maybe this wasn't clear. The query I showed has 20 leaf json_extract calls.
It has 3 nested json_each joins, for traversal of the JSON hierarchy,
to reach the each individual fields of all entities:

select ...,
       json_extract(field.value, '$.DESC'),
...
  from pg,
       json_each(pg.js, '$') dict,
       json_each(dict.value, '$.entities') entt,
       json_each(entt.value, '$.fields') field

I didn't exactly follow what's going on in the debugger, but here's my understanding of what's going on in this query:

  • SCAN the pg table.
  • for each row (I have 4 of them), parse js column, which is in JSON format.
  • the first json_each(, '$') fully parses that JSON value,
    and produces a text value for the top-level objects
    (I just realize this one might not be necessary :)) At this point, we've parsed the JSON 1x I believe,
    unless there's an optimization for '$' which might be the identity.
  • the next json_each(, '$.entities') parses dict.value, and produces a dictinct value for all values of the entities array.
    So we've reparsed basically the document almost entirely another time (2x).
  • then the same happens for the fields array inside each entity.
    We're reparsing most of the JSON another time (3x).
  • and finally we extract of all values of interest from the leaf objects;
    using json_extract. I suspect each extract is a separate partial parse of that object,
    until it finds the key of interest. Assuming 1st key needs only parsing 1/20th of the object,
    last (20th) key needs to parse 100% of the object.
    So 1/20 + 2/20 + ... + 20/20 = 20*21/2/20 = 10.5 times the JSON value.

So if my count is correct, that query parses 13.5 time each top-level (4) JSON doc.
That's what I mean when I wish for more speed. That only a single parse was necessary.

So unless SQLite added the possibility to return views into input strings (or blobs)
when it can guarantee the lifetime of the value it returns a view from
(like in the case above I suspect, since SQLite only supports nested loops, and the loop-order is fixed in the query above),
as good/fast the SQLite JSON parser is, doing the parsing an order of magnitude too many times is not ideal.

That's why I'd hope for changes in SQLite that would allow JSON to be faster in SQLite.
I'd prefer JSON support to stay in the JSON1 extension, until the above behavior can be avoided.
That's assuming my analysis is correct. I'd be happy to be told I'm not looking at this correctly.

09:24 Reply: Proposed JSON enhancements. (artifact: a3522caf04 user: ddevienne)

To me, this seems easier to write and understand

Yes, I do agree, once you know about ->, and know the value is JSON,
then it both makes sense, and it's easy to write. So no, I do not disagree.
I just unsure about discoverability and least surprise, for the reader which didn't write the SQL (i.e. me after a little while...)

OTOH, I think this below, although verbose, is rather readable, no?

select dict.key, entt.key, field.key,
       json_extract(field.value, '$.DESC'),
       json_extract(field.value, '$.DATA_TYPE'),
       json_extract(field.value, '$.DEFAULT_VALUE'),
       json_extract(field.value, '$.DISCRIMINANT'),
       json_extract(field.value, '$.DEFAULT_QUERY'),
       json_extract(field.value, '$.FIXED_VALUE'),
       json_extract(field.value, '$.READ_ONLY'),
       (select case
          when json_extract(field.value, '$.DB_NULLABLE') is not null then null
          else ifnull(
            json_extract(field.value, '$.REQUIRED'),
            json_extract(field.value, '$.DB_NOT_NULL')
          )
          end
       ),
       json_extract(field.value, '$.CHOICE_VALUE'),
       json_extract(field.value, '$.UNIQUE'),
...
  from pg,
       json_each(pg.js, '$') dict,
       json_each(dict.value, '$.entities') entt,
       json_each(entt.value, '$.fields') field

When I see the above, I wonder how many times we are reparsing the JSON.
Those JSON queries are expensive, it's a larguish JSON file (roughly 250 entities, 4'000 fields).
So from my POV, I'd prefer a way to traverse the JSON more directly, and a single time,
to extract the tuples that match a given path. For example, in XPATH, I'd select ./entities/fields
and be able to match on the resulting nodes, to extract their value, if this was XML. Can this be done here?

I also have plenty of queries like this (and above)?

select attr.key "name",
       count(distinct entt.key) as "#entity use"
  from pg,
       json_each(pg.js, '$') dict,
       json_each(dict.value, '$.entities') entt,
       json_each(entt.value, '$') attr
 group by attr.key
 order by attr.key

So why only -> for json_extract and not json_each?
Why can't I even chain the -> to implicitly do the joins?
I think an Apple framework did implicit join traversal via a dot or arrow
notation, my ex-boss showed it to me once in an airport.

So could/should -> also work in table-valued contexts and mean json_each?
And json_each support multi-join traversal implicitly?

I guess I'm saying I'd rather have more speed and ease of use around JSON, not so much syntax sugar.

As usual I probably sound too critical, that's not my intention. I'm more thinking aloud here...

2022-01-07
19:25 Reply: Proposed JSON enhancements. (artifact: 3d8f6b98ba user: ddevienne)

Hi Richard,

I had few questions pop in my mind reading the doc:

  • What does the n prefix mean? What's the mnemonic?
  • These new -> and ->> operators are JSON only? That's why you moved JSON into the Core?
  • Do the JSON functions use a subtype? Such that nested JSON functions can know whether the argument is already validated well-formed JSON?

About the new N functions, I don't think they are for me, 'cause I tend to be on the strict side of schemas,
and would have a CHECK constraint with json_valid(), if I expect a column to be JSON. But in the long
tradition of SQLite being flexibly typed, I can see where they'd simplify the SQL sometimes.

Regarding the syntax sugar operators -> and ->>, why not. I'm surprised, they do not seem to fit
the SQLite moto of lite/light, and but if JSON becomes Core, why not again. I've seen these operators in the PG doc,
and they do not aid readability of the SQL IMHO, especially since there are tons such operators in PG.

12:25 Reply: Windows nanosecond file times to accurate presentable format and back (artifact: 3d7d577faf user: ddevienne)

forumpost/4a658e3adca5442b might be of interest

2021-12-15
16:24 Reply: persistent in-memory database (artifact: 21bdb930ed user: ddevienne)

There's no such thing as persistent memory in standard C/C++.

So either that memory is exposed as pseudo-files, and thus acts as very fast files.
Or it requires use of non-standard APIs, and you must use a custom VFS to use it.

SQLite memory databases are dynamically allocated, in transient/volatile memory.

14:37 Reply: BLOB Incremental I/O vs direct file system I/O (artifact: 10e60d2ac6 user: ddevienne)

Nobody mentioned it yet, so look at this article.

See also the SO post.

You could also use an append-only file, and start a new file past a given size threshold.
Keep the last two files around, and you have bounded the disk-space usage.
Just remove older files, or compress / archive them instead too.

With some fsync calls, you could be durable too.

2021-12-14
11:38 Reply: FYI: binary dump (artifact: 3a690c3d9c user: ddevienne)

Sorry, does not compute... What do you mean?

If by that software you mean anon's binary dump, it uses the SQLite API,
so gets back native endianness (i.e. LE...) fixed-sized numbers, so that's
neither the on-disk data, nor do I request any translation.

In fact, I suggest NOT translating the SQLite BE format and varints,
to LE in memory, and to then putting it back in BE, in a different varint encoding.
But only leave SQLite's own unavoidable translation, and not add any.

10:01 Reply: FYI: binary dump (artifact: 07f198a09a user: ddevienne)

I know that, of course. I've read the file-format doc many times over :).

But we are talking about a new format here. Thus BE does not make sense to me.
SQLite's own format predates it by a few decades OTOH, and is rooted in *NIX largely BE philosophy.

09:57 Reply: FYI: binary dump (artifact: acadb2b70f user: ddevienne)

First, user+system time is often misleading, especially with parallelism.
I know you do not have any, but still, real or wall-clock rules IMHO.

Second, your test databases are too small, the timing are not reliable enough.
MB/sec bandwidth numbers would be helpful, as would be an x-factor or % to a baseline you'd choose.

Third, as I suspected, speed-wise, SQLite's own binary dumps seem to win.

You are perfectly correct that they do not save space.

That they are not streamable, is more debatable.
You'd have to be more specific here, at least from my point of view.

I do not want to detract from what you did.
It shows good thinking, and good execution, from what I saw.
But I think it is helpful to put that in context.

Given this thread, perhaps Richard might consider changes to allow
VACUUM INTO or the Backup-API to drop redundant index information,
leaving their definitions, so they could be recreated later. That would
address your space concerns. I'm not sure Richard will consider it though.

And again, thanks for sharing your work publicly. Very interesting.

2021-12-13
17:21 Reply: How to merge two or more tables that have the same structure? (artifact: 6543e3a4b8 user: ddevienne)

Very nice David!

So much so that my horrible Window+JSON1 (incomplete) soup!!!

sqlite> select * from t;
┌───────┬────────┬────────┬────────┐
│ time  │ value1 │ value2 │ value3 │
├───────┼────────┼────────┼────────┤
│ 12345 │ 50     │        │        │
│ 12346 │        │ 49     │ 99     │
│ 12347 │        │ 77     │        │
│ 12348 │        │ 88     │        │
│ 12349 │        │        │ 23     │
│ 12350 │ 70     │ 22     │        │
│ 12351 │        │        │ 45     │
│ 12352 │ 80     │        │        │
└───────┴────────┴────────┴────────┘
sqlite> update t set value1 = (select value1 from t as t2 where t2.time < t.time and t2.value1 is not null order by t2.time desc limit 1) where value1 is null;
sqlite> update t set value2 = (select value2 from t as t2 where t2.time < t.time and t2.value2 is not null order by t2.time desc limit 1) where value2 is null;
sqlite> update t set value3 = (select value3 from t as t2 where t2.time < t.time and t2.value3 is not null order by t2.time desc limit 1) where value3 is null;
sqlite> select * from t;
┌───────┬────────┬────────┬────────┐
│ time  │ value1 │ value2 │ value3 │
├───────┼────────┼────────┼────────┤
│ 12345 │ 50     │        │        │
│ 12346 │ 50     │ 49     │ 99     │
│ 12347 │ 50     │ 77     │ 99     │
│ 12348 │ 50     │ 88     │ 99     │
│ 12349 │ 50     │ 88     │ 23     │
│ 12350 │ 70     │ 22     │ 23     │
│ 12351 │ 70     │ 22     │ 45     │
│ 12352 │ 80     │ 22     │ 45     │
└───────┴────────┴────────┴────────┘
sqlite>
17:16 Reply: How to merge two or more tables that have the same structure? (artifact: 4b028b3a7c user: ddevienne)

Well, I have found a horrible way to select the proper value of a single column :)

sqlite> with filler(time, prec_values_as_json) as (
   ...>   select time, json_group_array(value1) FILTER (WHERE value1 is not null) over win
   ...>   from t WINDOW win as (
   ...>     order by time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
   ...>   )
   ...> )
   ...> select time, json_extract(prec_values_as_json, '$[#-1]') "value1 no holes" from filler;
┌───────┬─────────────────┐
│ time  │ value1 no holes │
├───────┼─────────────────┤
│ 12345 │ 50              │
│ 12346 │ 50              │
│ 12347 │ 50              │
│ 12348 │ 50              │
│ 12349 │ 50              │
│ 12350 │ 70              │
│ 12351 │ 70              │
│ 12352 │ 80              │
└───────┴─────────────────┘

I used JSON1 to accumulate the preceding non-null values into an array,
so I could use the tail of that array later (thanks for # Richard!).

Updating t for all 3 value columns left as an exercise! :) --DD

PS: Dan, why can't we use FILTER for non-aggregate Window functions?

Error: FILTER clause may only be used with aggregate window functions

I was trying to use lag(), using FILTER to ignore the NULLs, to reach the first preceding value that is NOT NULL.
It seemed logical to me to use filtering, to make rows I didn't want disappear from the window.
What is wrong with that thinking?

16:18 Edit reply: How to merge two or more tables that have the same structure? (artifact: b416fc5767 user: ddevienne)

This is wrong:

sqlite> update t set value1 = (select value1 from t t2 where t2.time < t.time and t2.value1 is not null) where value1 is null;
sqlite> update t set value2 = (select value2 from t t2 where t2.time < t.time and t2.value2 is not null) where value2 is null;
sqlite> update t set value3 = (select value3 from t t2 where t2.time < t.time and t2.value3 is not null) where value3 is null;
sqlite> select * from t;
┌───────┬────────┬────────┬────────┐
│ time  │ value1 │ value2 │ value3 │
├───────┼────────┼────────┼────────┤
│ 12345 │ 50     │        │        │
│ 12346 │ 50     │ 49     │ 99     │
│ 12347 │ 50     │ 77     │ 99     │
│ 12348 │ 50     │ 88     │ 99     │
│ 12349 │ 50     │ 49     │ 23     │
│ 12350 │ 70     │ 22     │ 99     │
│ 12351 │ 50     │ 49     │ 45     │
│ 12352 │ 80     │ 49     │ 99     │
└───────┴────────┴────────┴────────┘

Because the query picks any preceding non-null value, but not the closest one as you want.

This probably needs Window Functions, but I'm not familiar enough with them,
and don't have enough time to play with this more.

16:17 Reply: How to merge two or more tables that have the same structure? (artifact: 7cbdb0380b user: ddevienne)

This is wrong:

sqlite> update t set value1 = (select value1 from t t2 where t2.time < t.time and t2.value1 is not null) where value1 is null;
sqlite> update t set value2 = (select value2 from t t2 where t2.time < t.time and t2.value2 is not null) where value2 is null;
sqlite> update t set value3 = (select value3 from t t2 where t2.time < t.time and t2.value3 is not null) where value3 is null;
sqlite> select * from t;
┌───────┬────────┬────────┬────────┐
│ time  │ value1 │ value2 │ value3 │
├───────┼────────┼────────┼────────┤
│ 12345 │ 50     │        │        │
│ 12346 │ 50     │ 49     │ 99     │
│ 12347 │ 50     │ 77     │ 99     │
│ 12348 │ 50     │ 88     │ 99     │
│ 12349 │ 50     │ 49     │ 23     │
│ 12350 │ 70     │ 22     │ 99     │
│ 12351 │ 50     │ 49     │ 45     │
│ 12352 │ 80     │ 49     │ 99     │
└───────┴────────┴────────┴────────┘

Because the query picks any preceding non-null value, but the closest one as you want.

This probably needs Window Functions, but I'm not familiar enough with them,
and don't have enough time to play with this more.

15:56 Reply: FYI: binary dump (artifact: cb0cd2dab1 user: ddevienne)

The code still loops through every byte, so there's no speed advantage to be had.

If you don't varint, and are already in the target encoding (which is very often if choosing LE),
then you don't, that's the point. In your code, you'd simply wd(..., &t, sizeof(t)) basically.

Streamability was one of the design criteria.

One does not prevent the other IMHO.

You can interleave blocks from different tables basically,
as if you are multiplexing the different table-streams.

15:16 Reply: How to merge two or more tables that have the same structure? (artifact: 72aff275d3 user: ddevienne)

Scatter:

C:\Users\ddevienne>sqlite3 db1
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
sqlite> create table t (time integer primary key, value);
sqlite> insert into t values (12345,50), (12350, 70), (12352, 80);
sqlite> .exit

C:\Users\ddevienne>sqlite3 db2
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
sqlite> create table t (time integer primary key, value);
sqlite> insert into t values (12346, 49), (12347, 77), (12348, 88), (12350, 22);
sqlite> .exit

C:\Users\ddevienne>sqlite3 db3
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
sqlite> create table t (time integer primary key, value);
sqlite> insert into t values (12346, 99), (12349, 23), (12351, 45);
sqlite> .exit

Gather:

C:\Users\ddevienne>sqlite3 db-All
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
sqlite> create table t (time integer primary key, value1, value2, value3);
sqlite> attach db1 as db1;
sqlite> attach db2 as db2;
sqlite> attach db3 as db3;
sqlite> with scale(time) as (
   ...>   select time from db1.t
   ...>   union
   ...>   select time from db2.t
   ...>   union
   ...>   select time from db3.t
   ...> )
   ...> insert into t
   ...> select time, t1.value, t2.value, t3.value
   ...>   from scale
   ...>   left join db1.t as t1 using (time)
   ...>   left join db2.t as t2 using (time)
   ...>   left join db3.t as t3 using (time);
sqlite> .mode box
sqlite> select * from t;
┌───────┬────────┬────────┬────────┐
│ time  │ value1 │ value2 │ value3 │
├───────┼────────┼────────┼────────┤
│ 12345 │ 50     │        │        │
│ 12346 │        │ 49     │ 99     │
│ 12347 │        │ 77     │        │
│ 12348 │        │ 88     │        │
│ 12349 │        │        │ 23     │
│ 12350 │ 70     │ 22     │        │
│ 12351 │        │        │ 45     │
│ 12352 │ 80     │        │        │
└───────┴────────┴────────┴────────┘
sqlite>

If time is not your primary key, the multi-way join risks being slow.

14:40 Edit reply: FYI: binary dump (artifact: cec44c6a0a user: ddevienne)

Can you please add the times of VACUUM INTO and .backup in the shell?
These are binary dumps as well, so might not suffer as much with blobs,
and would provide a fairer comparison with your new dump.

Also, to have a baseline for your file-system, the size+time of a pure file-copy of the DB?

14:37 Reply: FYI: binary dump (artifact: 8ae95f4a0b user: ddevienne)

Can you please add the times of VACUUM INTO and .backup in the shell?
These are binary dumps as well, so might not suffer as much with blobs,
and would provide a fairer comparison with your new dump.

11:59 Reply: FYI: binary dump (artifact: 1abccc03c6 user: ddevienne)

Interesting, thanks for sharing.

After looking at the code a little bit, here are various thoughts that crossed my mind:

  • Big Endian. Given that most of our CPUs are LE (ARM can do both, but is LE by default), why BE?
  • Varint encoding. I just wonder if it's worth it. One day I need to compare the size vs speed tradeof.
  • No Pages/Blocks. This basically precludes any kind of parallelism processing tables for example.
  • Custom record-format. Why not reuse the SQLite record-format as-is?
  • Not using blob IO. This increases memory-use for large blob columns.

Thanks, --DD

PS: Although regarding my last point, I don't see how you can deal with that generically...
I think once you've step()'ed, the whole record is loaded in memory,
instead of sqlite3_column_blob() lazily fetching the blob on demand.
So it's basically too late to use incremental blob IO... There would
need to be native support in SQLite to ask for the blob handle instead
of the blob value itself at the SQL level (perhaps with a size threshold),
and a new column API to get that handle.

2021-12-10
15:18 Reply: Wishing CLI could be usefully embedded (artifact: a125daa1a8 user: ddevienne)

(3) The primary DB connection (5) Coordination between DB closing

In embedded mode, the pimary DB connection can be coming from the host.
I.e. one should neither open nor close it.

In my case, it is the DB with all the vtables representing the state
of the C++ host application, which only the host app can configure, and
which the host app will continue using once the embedded shell exists.

The embedded shell can open other connections (recent addition), or attach other DBs.

If attaching other DBs to the primary host-managed connection, once might
wonder whether the shell should detach those, or let them attached, to be
available again later if/when another embedded shell is re-started.

I think leaving them as-is makes more sense. The host can setup an authorizer
if it wants to, to prevent or know about those, and detach them post embedded-shell exit.

Regarding your thinking of a struct and a main()-like entry point.
Why not replicate the design of SQLite itself, with the shell taking
an optional VFS, defaulted otherwise, and a new VIO (Virtual IO) interface
specific to the shell? I.e. make the shell a stateful object to new/config/close?

2021-12-07
09:06 Reply: Wishing CLI could be usefully embedded (artifact: 1bd9faec0b user: ddevienne)

Good list. Thanks for sharing.

goal [...] gain access to custom virtual tables

That's one of the main reason of my original post too.

Those are part of a large application that can't be made into a loadable
extension of the regular (official) SQLite shell, so the logical solution
is to embed the shell into that large application.

Others in the company have created a GUI dialog that allows running queries,
including against the vtables, displaying the results in a GUI table control/widget.
But of course one loses all the .commands of the shell, and doing a one-time embedding
is IMHO a losing proposition, given the fast pace of the SQLite sources (including its shell.c).

As I wrote originally, just allowing IO to be fully redirected to the hosting
app is required, while the ability to side-step the text-based .mode processing
to have the hosting app using a table GUI control/widget is more nice-to-have.
(Optional side-stepping, since .mode is also used for output to CSV, HTLM, etc...
Which should be retained and is part of the appeal to embed the shell)

2021-12-03
21:02 Reply: Benchmarking SQLite on Apple's M1 ? (artifact: 981bacd4a9 user: ddevienne)

costs about as much

641 Euros for the CPU (on Amazon, on sale, normally 721), versus
799 Euros for the Mini M1 (bare config). Amazingly close indeed!

08:57 Reply: SQLite WAL Pages Checkpointed (artifact: 690243fc45 user: ddevienne)

Thanks for sharing.

About the hook, well, maybe Richard will answer.
I have not had much luck when asking questions these past few months.

2021-12-02
13:25 Reply: SQLite WAL Pages Checkpointed (artifact: af00ca20a2 user: ddevienne)

The Session Extension is more Logical replication,
while WAL-replication is more Physical replication.

There are PROs and CONs with both approaches.
The latter basically has no limitations, unlike the former.

13:21 Reply: SQLite WAL Pages Checkpointed (artifact: 20dbc24c0d user: ddevienne)

live replication through using the WAL

So your system is similar to Litestream?
Is yours open-source too? If not, will it be eventually?

Richard, you've mentioned in a radio interview not wanting to get
into that space, and knowing of other initiatives similar to Litestream,
if I recall correctly. There's obviously lots of interest in this area
so is there hope to one day seeing primitives (like a hook?) in SQLite Core
to better support these WAL-replication use cases in a more official way?

Thanks, --DD

07:44 Reply: 64-bit precompiled Linux binaries (artifact: 5d7db84b5c user: ddevienne)

[...] there will never be an official x64 sqlite shell release [...]

Really? Did I miss a post? Absence of response from Richard does not mean never.
For the life of me, I don't see why a 64-bit shell would not be provided, eventually.

Just yesterday I was surprised to read a 2019 embedded software survey
pegging 64-bit CPUs at 15% in that market, and most desktop CPUs have
been 64-bit for years now.

2021-11-30
11:15 Reply: In a trigger insert all columns of NEW into another table (artifact: d4932738c4 user: ddevienne)

Not at all. But many people frown on SELECT *, and prefer to list columns explicitly.

SQLite has no concept of row-values, like PostgreSQL does.
So NEW and OLD in triggers are not row-values like they are in PostgreSQL.

Even a table alias is basically a row-value in PostgreSQL, and defining a table
implicitly defines an equivalent composite type, albeit w/o the constraints.
(which you can manually re-add via a domain type over the composite type.)

2021-11-26
15:00 Reply: Feature Request split string table-valued function (artifact: 456f7239bd user: ddevienne)

If a "split()" function did exist

To wrap-up this thread, and hopefully convince Richard and team to add a split
table-valued function to SQLite Core, I'd like the illustrate the stark difference
in SQL necessary w/ and w/o such a function.

So I created two concrete tables, with the same data, one with the original
space-separated values, and the other using JSON. And to build the JSON-formatted
one, I actually used the grid view using a CTE that Ryan helped me define:

create table grids_ssv as
select parent, files from grids

create table grids_json as
select parent, '["'||group_concat(file, '","')||'"]' as files
  from (select parent, file, idx from grid order by parent, idx)
 group by parent

Here the simple SQL using JSON1:

select g.parent, j.id, j.value
  from grids_json g, json_each(files) j
 order by g.parent, j.id

And here's the complex CTE from Ryan and myself:

with
grid (parent, head, tail, idx) as
(
    select parent, '', files||' ', 0
      from grids_ssv
     UNION ALL
    select parent,
           trim(substr(tail, 1, instr(tail, ' ')-1)) as head,
           substr( tail, instr( tail, ' ' ) + 1) as tail,
           idx + 1 as idx
      from grid
     where instr(tail, ' ') > 0
)
select distinct parent, head as file, idx
  from grid
 where length(trim(head)) > 0
 order by parent, idx

There's no contest IMHO. If this doesn't convince anything this belongs in SQLite's built-in functions,
I don't know what will basically.

In my case, I don't have enough data to contrast performance. 21 rows expand into 82 rows.
Both queries range from 1ms to 4ms in SQLiteSpy, depending on the run. But we mostly all
agree the table-valued one will fair better for larger data, especially larger strings to split.

On the join side, that's a keyword, and group_concat already does that basically,
so I don't know what I was thinking when I wrote my original feature request ;)

So Richard, could we please have something like split? Xmas is not far :) --DD

2021-11-25
16:03 Reply: Feature Request split string table-valued function (artifact: 525f19ee26 user: ddevienne)

Will need to be Unicode-aware I suppose

Not really. Both sides (string and separator) are strings, so unicode already.
So already encoded to some bytes. Just match the bytes.

The real design decision is whether the separator is a set of chars,
or a text-fragment; and whether to compress / ignore repeated separator.
Ideally those are options of split().

16:00 Reply: Feature Request split string table-valued function (artifact: 2f3f09d936 user: ddevienne)

That will be quite slow for larger datasets

It takes milliseconds in my case, no biggy. Not a lot of data.

If a "split()" ... did exist .. could .. speed it up significantly

Indeed. And that's precisely why I requested it!!!

Using the CTE is jumping to significant hoops to split a string into rows IMHO.
Yes, such data is denormalized, but unfortunately that's very common...

if there is a precedent in Postgres

There is, via unnest(string_to_array(col, sep))
I.e. first to an array (I wish SQLite had native arrays like PostgreSQL :)),
then one uses the unnest table-valued function (RETURNS SETOF).

13:55 Reply: Feature Request split string table-valued function (artifact: 0abd35c09e user: ddevienne)

Thank you Ryan!

Apart from some renaming, for clarity;
adding the index in the space-separated-list;
and not having your vLength column, this is basically your code.

create view grids(parent, files) as ...

with
grid (parent, head, tail, idx) as
(
    select parent, '', files||' ', 0
      from grids
     UNION ALL
    select parent,
           trim(substr(tail, 1, instr(tail, ' ')-1)) as head,
           substr( tail, instr( tail, ' ' ) + 1) as tail,
           idx + 1 as idx
      from grid
     where instr(tail, ' ') > 0
)
select distinct parent, head as file, idx
  from grid
 where length(trim(head)) > 0
 order by parent, idx, file
12:04 Reply: Faster way to insert into WITHOUT ROWID table? (artifact: 5354951e21 user: ddevienne)

You haven't said how that DB / table is used later. But If your use case is really as described,
i.e. only fixed-size columns, then you can have random access into your data, by direct indexing,
whether it on-disk or in-memory, if you implement your own virtual table (and indexing) on top of
that regular / tabular data.

Might end up bigger that the SQLite DB, which uses varints, if you intergers are small.
But the extra space is worth it, for direct random-access addressing, performance-wise.

A long time ago, I measured vtables over C++ std::unordered_map-like containers of structs,
being 5x faster than equivalent regular in-memory SQLite tables. And that was even w/o direct
addressing like in your use-case (node-based hash map, so each row is a separate alloc / address).

Short of the above, you've covered all the bases, from what you wrote. FWIW. --DD

10:09 Reply: How to rerank one row in a big set of rows? (artifact: a26b9bcd34 user: ddevienne)

I don't have a solution to what you describe, but here are some thoughts.

If you make your rank the id PRIMARY KEY column, as an alias to the ROWID,
then updating the ranks means physically moving the rows around, rebalancing the BTREE,
which means a lot of IO. Very expensive.

While if the rank is a separate and explicit column of its own,
then updating it means updating the pages with actually modified rows only.

If you make queries by that rank column, then you need an index on it though.
So the index will need updating too, but then you don't need to move the item columns,
as when id and rank are one-and-the-same and the PK.

If you ranks change often, that's a problem indeed. You could switch to a floating point rank,
to insert in-the-middle, w/o disturbing surrounding rows, but that only works so many times,
you eventually run out of precision. But if you can detect those conflicts when they arise,
you've at least reduced the frequency of re-ranking the tail/rest of a rank-update.

A similar strategy is to split your rank in two, the primary and secondary.
The latter starts out at zero, and when re-ranking, you just update that one instead of the primary,
again limiting the number of rows that need updating on re-ranks. Now some rows share a primary rank
(after 1 or more reranking), but not a secondary one, and you need to index/order-by both ranks of course,
but again, it's a strategy to limit to number of rows affected by a rerank.

In both cases (REAL ranks or SPLIT ranks),
once in a while you probably want to pay for a full rerank to go back to INT-reals or ALL-0-sub-ranks.

I'm just thinking aloud here, I haven't tried any of the above. --DD

09:49 Post: Feature Request split string table-valued function (artifact: 5b67d64df1 user: ddevienne)

Now that SQLite support table-valued functions, could we please have split?
And conversely, a join aggregate / window function would also be great, no?
These belong in the SQLite Core IMHO.

And in the mean time, can someone please share how to emulate split with a CTE?

I have an existing DB with a column containing space separate values,
which I need to split / transpose into rows for further processing (joins, etc...)

2021-11-24
17:44 Reply: Feature Request: ATTACH DB readonly? (artifact: 54c097c13b user: ddevienne)

Are you aware of URIs in SQLite?
I think this allows to do what you want.

17:29 Reply: CBOR Support & challenging DB design problem for keeping history with big schema changes (artifact: cb08840377 user: ddevienne)

OK, I'll bite another time. No one said I was a reasonable man :)

You're going a bit all over the place Vadim, and most of what you write
is not specific to SQLite, so a bit Off Topic here IMHO. But in any case,
here is maybe a bit more information.

Regarding trees in SQLite, you need to look into CTEs, which allow to write recursive queries.
That's how one deals with filesystem-like hierarchies.

Regarding indexing CBOR, well, extensions can create shaddow tables,
which store custom preprocessed data used by custom MATCH operators.
That's how the various FTS implementations do it I believe, I'm not
very versed in those, I never delved into their code. But this is definitely
an advanced subject in SQLite, which very few people have experience with I believe.

I doubt this will help you much, sorry. Good luck. Over and out. --DD

16:02 Reply: CBOR Support & challenging DB design problem for keeping history with big schema changes (artifact: 4052572ca3 user: ddevienne)

FWIW, regarding CBOR, this was brought up a long time ago when JSON1 was introduced,
and it's unlikely to come from the SQLite team itself, given the discussion at the time.
But nothing prevents you from doing it yourself, JSON1 is an extension,
and a CBOR extension can be similarly written, using the same techniques
(a bunch of set-returning functions, i.e. eponymous virtual tables, in SQLite speak :))

Sure, JSON1 comes from Richard, and is part of the amalgamation,
so it has 1st-party status. But since CBOR1 is unlikely to come
as 1st-party, then do it as 3rd-party in OSS, and if it's good,
and unlike the discussion at the time bring some speed-advantage
over JSON1, maybe it will pick-up a following. FWIW.

12:29 Reply: sqlite3_bind_parameter_count question (artifact: a492bfaf9b user: ddevienne)

More precisely, look at the last out param of sqlite3_prepare_v2(..., pzTail),
which tells you where SQLite stopped when preparing the 1st statement of your SQL script.
Which allows you to prepare the next statement in your script, and so on...

2021-11-18
08:08 Reply: sqlite3.dll is not returning control to my program (artifact: e5f50a6946 user: ddevienne)

Hi Larry. [OT] Just curious, why use <code> instead of triple-backtick?

And you certainly didn't manually do the escaping by hand, too many of them.
So how did you do that escaping? I don't see a way to do that in the Forum UI.

2021-11-17
16:53 Reply: SQLite + TCL on-line conference starts in less than 2 days (artifact: 5b981a7b13 user: ddevienne)

I managed to join & saw Richard H's presentation.

I didn't. Hopefully a link to a video will be posted eventually.

More ↓