SQLite Forum

Timeline
Login

50 most recent forum posts by user ddevienne

2021-10-14
14:49 Edit reply: Pragma_Function_List() (artifact: 65f256e964 user: ddevienne)

scalar, aggregate, and window functions (I think).

Not sure about the flags. (See Richard's answer below)

13:43 Reply: Pragma_Function_List() (artifact: 7872a4aa18 user: ddevienne)

scalar, aggregate, and window functions (I think).

Not sure about the flags.

11:47 Reply: binary .dump format (artifact: 0a013984da user: ddevienne)

With ZLib compressing at around 60MB/s in my tests, for a 60GB DB file,
dumped to 120GB of SQL text, that's around 2,000s of compression right there...
On top of the actual .dump itself, of course (unless concurrent if piped maybe).

The advantage of a custom binary dump, of table-pages only (see David's),
or table-cells only (see hitchmanr's), is that you limit the IO, and don't
need to decode the record cells (in the first case). But then you are on your
own the re-assemble the DB though!

You'd need to investigate with DB Stats to find out how much you'd save precisely on that DB,
to see if it's worth it time-wise, against the heavy dev-investment needed.

For a completely different take, one could also use WAL mode and Litestream,
backup'ing the WAL pieces Litestream generates. A very different approach...

PS: Richard mentioned they had activity in that area too, but unfortunately we haven't heard anything since.

06:54 Reply: inconsistent output: insert on conflict with returning (artifact: 9cc8c73efc user: ddevienne)

Well, PostgreSQL does behave differently here (see last insert below), again. So I'd be leaning more toward the bug side:

D:\oss>psql -h pq
Password for user ddevienne:
psql (12.1, server 12.5)
WARNING: Console code page (437) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
Type "help" for help.

ddevienne=> create temp table foo (name text unique);
CREATE TABLE
ddevienne=> insert into foo values ('name1') on conflict(name) do nothing returning *;
 name
-------
 name1
(1 row)


INSERT 0 1
ddevienne=> insert into foo values ('name1') on conflict(name) do nothing returning *;
 name
------
(0 rows)


INSERT 0 0
ddevienne=> drop table foo;
DROP TABLE
ddevienne=> create temp table foo (name text unique, amount int);
CREATE TABLE
ddevienne=> insert into foo values ('name1', 0) on conflict(name) do update set amount=excluded.amount returning *;
 name  | amount
-------+--------
 name1 |      0
(1 row)


INSERT 0 1
ddevienne=> insert into foo values ('name1', 1) on conflict(name) do update set amount=excluded.amount returning *;
 name  | amount
-------+--------
 name1 |      1
(1 row)


INSERT 0 1
ddevienne=>
2021-10-13
15:17 Reply: binary .dump format (artifact: 8b5549ca46 user: ddevienne)

I'm just as unknowledged as you, but I like your plan, David :)

You're still wasting the free space in the middle of pages,
but keeping everything page-aligned has benefits too.

10:18 Reply: binary .dump format (artifact: ae16f90b20 user: ddevienne)

it's not even theoretically possible

Not really. Binary dumps typically just prefix arbitrary byte sequences with a length.
Yes, that makes the dump context-dependent, and any missing or corrupt byte can make
the whole dump corrupt, but they are ways to detect such cases too.

You could also encode (base16, base64, ascci85) small blobs in the main dump file,
and offload larger blobs into a separate paged companion file. One could even use
content addressable schemes (e.g SHA-3 like Fossil) for blobs, gaining blob dedup too.

For such a large DB, using a snapshot'ing file-system, like ZFS, would allow instant
shallow copies, which would allow to compute a diff (via SQLite's own DB diff tool)
where time is not as critical, and keeping only the diff for backups perhaps.

2021-10-12
07:08 Reply: x86_64/amd64 sqlite3 binaries (artifact: f4123b6fb3 user: ddevienne)

the product is quite useless without user modification (ie, application program).

Lots of people beg to differ with this, and consider the CLI as much SQLite as the core library itself, and quite useful in and of itself.

Why provide any binaries at all, given that kind of thinking? It's not as if 64-bit Linux binaries are esoteric, like your imagined 128-bit machine.

2021-10-11
06:49 Reply: Update with table alias (artifact: f0bd746c4e user: ddevienne)

Given that this works on PostgreSQL (see below),
and SQLite strives for compatibility with it,
I suspect this could be considered an actual issue.
Lets see what Richard thinks of this.

D:\trunk> psql -h pq
Password for user ddevienne:
psql (12.1, server 12.5)
WARNING: Console code page (437) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
Type "help" for help.

ddevienne=> create table demo (id serial, name text);
CREATE TABLE
ddevienne=> insert into demo values (1, 'one'), (2, 'two'), (3, 'three');
INSERT 0 3
ddevienne=> update demo as _new_ set name = 'Name 2'
ddevienne-> from (select * from demo as _old_ where _old_.id = 2) as _old_
ddevienne-> where _new_.id = _old_.id
ddevienne-> returning _old_.name as oldName, _new_.name as newName;
 oldname | newname
---------+---------
 two     | Name 2
(1 row)


UPDATE 1
ddevienne=> select * from demo;
 id |  name
----+--------
  1 | one
  3 | three
  2 | Name 2
(3 rows)
2021-09-29
17:35 Reply: Multithread processing, storing in memory and writing to database (artifact: ec4108f4a0 user: ddevienne)

Indeed. Appending to a table is not as cheap as appending to a file,
because of B-tree relancing. But if your end result must be an SQLite
DB, then unless you create it as a post-processing from some other
append-only file(s), in which case those rebalances matter little,
you have to accept the cost of appending to tables.

Time-series databases certainly do exists, but are there any as small and
beautiful as SQLite, with the same world-class quality as SQLite, and free too?

17:26 Reply: Multithread processing, storing in memory and writing to database (artifact: b164d4fa43 user: ddevienne)

Advantages of Approach#1 is that writes are independent and concurrent.
But you'll block the threads, to copy the data over eventually.

Approach#2 is bad, since the one DB must serialize everything,
which makes all threads wait on a global mutex basically, when they need to write anything.

A third approach is to use a concurrent queue, enqueue from the threads,
dequeue from the main thread to write into the DB. If you are in C++,
I'd recommend https://github.com/cameron314/concurrentqueue which is
lock-free and performant. There are others from Boost, Facebook Folly, etc...

The approach#3 is similar to #2, but with the crucial difference that you are
trading more memory use (to store in the queue) for better concurrency.
And since you guarantee serialized use of SQLite, you can build it w/o mutex
support too, gaining a little more speed.

My $0.02. --DD

2021-09-21
09:45 Reply: How to find out whether a table is STRICT? (artifact: 3111642db4 user: ddevienne)

That's a shame, I like the look of your table_list pragma Keith!

So here's to wishing SQLite grows something like that in the future.

PS: Can table_list be added w/o modifying the SQLite source code, as a pure addition?

2021-09-20
09:13 Reply: adding record if it doesn't already exist (artifact: 73aa2a7748 user: ddevienne)

Not sure if you are asking this tongue-in-cheek or not.

But the answer to this is very simple, from my experience: Drum roll...

To insert child rows, using the RETURNING's surrogate key of the parent row, which may pre-exist, as a foreign key.

PS: Didn't read the whole thread, so maybe I'm off-topic :)

2021-09-15
07:44 Post: INSERT INTO ... RETURNING (artifact: 058ac49cc3 user: ddevienne)

Hi. Does 3.4 from RETURNING apply even to multi-row inserts?

I mean, surely the literal row values will be processed in order, no?
I can understand 3.4 for other DMLs, but in that particular case, isn't it special?
The non-scientific example below supports that, but that's not saying much :)

Otherwise I'll have to also return something identifying the row from one of the inputs,
to know which row the ID was assigned to, and maintain a mapping in the calling code.

Just want to be 100% clear on this. Thanks, --DD

C:\Users\ddevienne>sqlite3
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table t(id integer primary key, name text UNIQUE);
sqlite> insert into t (name) values ('one') returning id;
1
sqlite> insert into t (name) values ('two'), ('three') returning id;
2
3
sqlite> insert into t (name) values ('four'), ('five') returning id, name;
4|four
5|five
sqlite>
2021-09-13
14:44 Reply: integer overflow in pager.c (artifact: cb2e9040d9 user: ddevienne)

So the anon-OP is right, and a fix is necessary? Not quite clear from your post.

2021-09-10
17:16 Reply: ZIPVFS questions (artifact: 0476c3ba9f user: ddevienne)

Right. That's 3 orders of magnitude larger than my case :)

I packed 4 DBs totalling 5.4MB into a <900KB zip file.

In your case, it depends how big each DB inside the ZIP would be.
And how many of those DBs you need to keep in memory at any one time.

If OTOH what is big in inside single-cell-values, you could use https://www.sqlite.org/src/file/ext/misc/compress.c instead, similar
to how https://sqlite.org/sqlar/doc/trunk/README.md emulates / replicates a ZIP.

But it could well be that zipvfs is better for you indeed.
And going commercial, you then have the full support of DRH and Dan! :)

16:39 Reply: ZIPVFS questions (artifact: 680904c104 user: ddevienne)

Do you really need zipvfs? You talk about your storage budget, but what's you RAM budget? Compared to your storage budget?

I've just in the past few days ZIP'd up several SQLite DBs, to make them as small as possible
(also VACUUM'd and optimized the page_size for minimum size) so I can check them in to source control,
for testing, and dynamically decompress them using https://www.sqlite.org/zipfile.html at runtime,
and load the blobs as in-memory DBs using https://www.sqlite.org/c3ref/deserialize.html,
which sounds a little bit like your use case.

Worked like a charm for me, but I'm in C++, not Java, and it's all linked statically.

Only issue I can foresee, if that this might waste more RAM than zipvfs, but not having used the latter, I can't say for sure.

14:46 Reply: reading databases using a customized virtual file-system (artifact: 001324b7b5 user: ddevienne)

Thanks. Good to know. Takes care of registering stuff then.
But falls short on the other objectives though.

14:45 Edit reply: reading databases using a customized virtual file-system (artifact: fdc585c712 user: ddevienne)

Sure. Two main use-cases to start with.

First, embed the SQLite CLI in an existing CLI app, w/o needed source modification.
The host CLI app is multi-command (like SVN, GIT, etc...), and uses SQLite databases
already. I'd like to add a sql or sqlite command, which embeds the full CLI, so I need
a non-main() entry-point (e.g. sqlite3_cli_main(argc, argv)), and not have that CLI
command ever exit(), but simply return from the entry point.

Note that the CLI has other non-SQLite interactive modes, which can stack, I could
enter the SQLite CLI from another interactive-mode, and .exit in the SQLite CLI should
just go back to the previous mode. The prompt changes back to that previous mode's one.

Also, the host CLI already uses linenoise for CLI-formatting/coloring, so integration would be nice-to-have.

And the CLI has custom SQLite functions and vtable modules it might want to pre-register with the SQLite CLI.
So a pure main()-like entry-point is not ideal, in terms of flexibility.

That host CLI is a single-exe, kitchen-sink style, no need for an installer. SQLite or Fossile style :)
Update: And of course, SQLite (the library) is already linked into that CLI app.

Second, embed the SQLite CLI in a GUI app, opening in a custom dialog and not a console,
thus not using stdin/stdout/stderr, but having the possibility to have those streams provided by the host application.

Here again, calling exit() is a no-no, and also needs a non-main() entry-point, like the CLI use-case above.
In this mode, the CLI is meant to use an existing in-memory connection, with lots of virtual tables in it.
SQLite is at the heart of the app, exposing a lot of the state of this large scientific (proprietary) application,
and having the power of the full SQLite CLI app would be great, to troubleshoot/debug that state.

I think the CLI use-case is easily achievable with a few tweaks.

The second is more difficult, and might need the CLI using a VFS to abstract away the IO streams,
or something of that nature. In that GUI-app use-case, being able to add a .mode to present the
table-data in a native GUI-table-widget would be even better, but that's a different matter and a bit of a stretch goal.

So here you go, this is what I had in mind. FWIW. --DD

13:26 Reply: reading databases using a customized virtual file-system (artifact: 17ece0d15d user: ddevienne)

Sure. Two main use-cases to start with.

First, embed the SQLite CLI in an existing CLI app, w/o needed source modification.
The host CLI app is multi-command (like SVN, GIT, etc...), and uses SQLite databases
already. I'd like to add a sql or sqlite command, which embeds the full CLI, so I need
a non-main() entry-point (e.g. sqlite3_cli_main(argc, argv)), and not have that CLI
command ever exit(), but simply return from the entry point.

Note that the CLI has other non-SQLite interactive modes, which can stack, I could
enter the SQLite CLI from another interactive-mode, and .exit in the SQLite CLI should
just go back to the previous mode. The prompt changes back to that previous mode's one.

Also, the host CLI already uses linenoise for CLI-formatting/coloring, so integration would be nice-to-have.

And the CLI has custom SQLite functions and vtable modules it might want to pre-register with the SQLite CLI.
So a pure main()-like entry-point is not ideal, in terms of flexibility.

That host CLI is a single-exe, kitchen-sink style, no need for an installer. SQLite or Fossile style :)

Second, embed the SQLite CLI in a GUI app, opening in a custom dialog and not a console,
thus not using stdin/stdout/stderr, but having the possibility to have those streams provided by the host application.

Here again, calling exit() is a no-no, and also needs a non-main() entry-point, like the CLI use-case above.
In this mode, the CLI is meant to use an existing in-memory connection, with lots of virtual tables in it.
SQLite is at the heart of the app, exposing a lot of the state of this large scientific (proprietary) application,
and having the power of the full SQLite CLI app would be great, to troubleshoot/debug that state.

I think the CLI use-case is easily achievable with a few tweaks.

The second is more difficult, and might need the CLI using a VFS to abstract away the IO streams,
or something of that nature. In that GUI-app use-case, being able to add a .mode to present the
table-data in a native GUI-table-widget would be even better, but that's a different matter and a bit of a stretch goal.

So here you go, this is what I had in mind. FWIW. --DD

12:01 Reply: reading databases using a customized virtual file-system (artifact: 7e68358523 user: ddevienne)

Accessing DBs using a custom VFS is one thing, but the CLI itself is not
using any VFS, but direct C/Posix APIs to access files. I've long wished
for the CLI to also use a few abstractions to be usable embedded in a larger
application, that would provide IO, main-entry-point, etc... Oh well :)

2021-09-08
14:28 Post: Have sqlite3_deserialize() create new attached DBs (artifact: 04c9da4820 user: ddevienne)

Hi. I'm using the zipfile extension and sqlite3_deserialize() together (both for the first time), and they are great. Thanks for them!

But one thing that surprised me, was that the 2nd arg to sqlite3_deserialize() cannot be a new name.
I've worked around that, but using a dummy "attach 'file:foo?mode=memory' as foo, so I can use "foo" as that 2nd arg.
Repeating with bar, baz, etc... for the other DBs I attach to the main memory DB.

But my question is why not simply do the attach directly in the API?
That way, no need to create an empty throw-away memory DB for nothing, no?

2021-09-06
15:46 Reply: The characters to be escaped in Sqlite3 to prevent Sql Injection (artifact: 6a9000e140 user: ddevienne)

I didn't realize it was a live turing test :)

13:58 Reply: The characters to be escaped in Sqlite3 to prevent Sql Injection (artifact: 758729ddd7 user: ddevienne)

And how is that relevant?

No offense, but many of your posts are off-topic and cryptic at best.
I'm even surprised they pass moderation in fact. Could you please post
only about technical content relevant to SQLite? And the OP's question?

07:02 Reply: Issues with sqlite3IsNaN() and HAVE_ISNAN (artifact: 7a22ad674a user: ddevienne)

Not sure, but it's possible it is one of those compile-time options that works only on the canonical sources, not the amalgamation.

There are a few that way, that need to be defined at configure time, not compile time.
Although that's a total guess on my part, from vague recollection of past messages on this ML.

See also how to compile which has details, and lists pre-amalgamation options,
but that list being 13 years old; that's perhaps the reason HAS_INAN is not in there?

Could also be a small bug of course. We'll know soon for sure I suspect.

06:48 Reply: The characters to be escaped in Sqlite3 to prevent Sql Injection (artifact: be6a6fd88b user: ddevienne)

Well, yes. Except... If that was 100% true, then SQLite's printf wouldn't
have %q, %Q, and %w as extensions, and list them as advantages in #3.

So of course prepared statements and binding is better, but in a pinch,
proper use of sqlite3_mprintf() will do the job.

It's not by chance they are in SQLite after all.

2021-09-03
16:20 Reply: Database disk image is malformed (artifact: 0598ffa771 user: ddevienne)

Thanks for sharing. Very interesting.

2021-08-30
07:38 Reply: Can an application-defined function be used to convert one row of data to multiple rows and insert them into another table? (artifact: a4c47dbab3 user: ddevienne)

And here's the doc you need to get started.

2021-08-24
11:53 Post: Get SQLite via `winget` on Windows (artifact: 5c4c2a1d16 user: ddevienne)

Microsoft is finally releasing an official CLI package manager, at last.

Would be nice to easily get the SQLite CLI tools using it.

From the little I read, the integration looks doable. FWIW.

There are other ecosystems too, of course, like Chocolatey and others,
but with winget being bundled by default with Windows soon, and pushed by
Microsoft itself, would seem natural to support it. My $0.02.

There does not appear to exist a package for SQLite yet:

C:\Users\ddevienne> winget search sqlite
Name                        Id                                    Version       Match
-------------------------------------------------------------------------------------------
Kangaroo                    Taozuhong.Kangaroo                    1.10.1.210731 Tag: sqlite
TablePlus                   TablePlus.TablePlus                   4.2.8         Tag: SQLite
DataGrip (EAP)              JetBrains.DataGrip.EarlyAccess        212.4746.58   Tag: sqlite
DataGrip                    JetBrains.DataGrip                    212.4746.86   Tag: sqlite
DBeaver                     dbeaver.dbeaver                       21.1.5        Tag: SQLite
DB Browser for SQLite       DBBrowserForSQLite.DBBrowserForSQLite 3.12.2        Tag: sqlite
Beekeeper Studio            beekeeper-studio.beekeeper-studio     1.13.2        Tag: sqlite
PremiumSoft Navicat Premium PremiumSoft.NavicatPremium            15.0.26       Tag: sqlite

While one can get PostgreSQL on Windows that way it seems:

C:\Users\ddevienne> winget search postgresql
Name                        Id                         Version       Match
------------------------------------------------------------------------------------
PostgreSQL                  PostgreSQL.PostgreSQL      13
DBeaver                     dbeaver.dbeaver            21.1.5        Tag: PostgreSQL
pgAdmin                     PostgreSQL.pgAdmin         5.6
...
09:53 Edit: Most performant way to test for having 0, 1 or more rows in a table (artifact: 9611bdc934 user: ddevienne)

Hi. We have two mutually-exclusive ON INSERT triggers on a table,
and a 3rd ON DELETE trigger.

The 1st has a WHEN (select count(*) from t) = 1,
while the 2nd has a WHEN (select count(*) from t) > 1,
and the 3rd a WHEN (select count(*) from t) = 0.

It looks simple and innocuous, but I'm worried about the performance.
Because it seems to me, this introduces quadratic behavior on inserts
and delete, from the count(*) FULL-SCANs. Is that correct?

I propose to use (select count(*) from (select 1 from t limit 2))
instead of (select count(*) from t), which I think will avoid the FULL-SCANs,
but comparing the query-plans, I don't understand the 2nd plan, so I'm not 100%
sure this is the better more performance approach, albeit more verbose.

sqlite> create table t (k integer primary key not null, data blob);
sqlite> select (select count(*) from (select 1 from t limit 2)) = 1;
QUERY PLAN
|--SCAN CONSTANT ROW
`--SCALAR SUBQUERY 2
   |--CO-ROUTINE 1
   |  `--SCAN TABLE t
   `--SCAN SUBQUERY 1
0
sqlite> select count(*) = 1 from t;
QUERY PLAN
`--SCAN TABLE t
0

In both plans, there's a SCAN TABLE t. Doesn't the LIMIT 2 avoid the full-scan, as I expect?
It's not obvious to me given the query plan output. I'm not versed in reading those I'm afraid...

If someone more knowledgeable could confirm my suspicious of the performance issue,
and confirm my proposed changes fixes it, I'd appreciate. Thanks, --DD

PS: The ON DELETE trigger can/should also use EXISTS, more idiomatic, no?

sqlite> select not exists (select 1 from t);
QUERY PLAN
|--SCAN CONSTANT ROW
`--SCALAR SUBQUERY 1
   `--SCAN TABLE t
1
09:52 Edit: Most performant way to test for having 0, 1 or more rows in a table (artifact: e1c0ad6785 user: ddevienne)

Hi. We have two mutually-exclusive ON INSERT triggers on a table,
and a 3rd ON DELETE trigger.

The 1st has a WHEN (select count(*) from t) = 1,
while the 2nd has a WHEN (select count(*) from t) > 1,
and the 3rd a WHEN (select count(*) from t) = 0.

It looks simple and innocuous, but I'm worried about the performance.
Because it seems to me, this introduces quadratic behavior on inserts
and delete, from the count(*) FULL-SCANs. Is that correct?

I propose to use (select count(*) from (select 1 from t limit 2))
instead of (select count(*) from t), which I think will avoid the FULL-SCANs,
but comparing the query-plans, I don't understand the 2nd plan, so I'm not 100%
sure this is the better more performance approach, albeit more verbose.

sqlite> create table t (k integer primary key not null, data blob);
sqlite> select (select count(*) from (select 1 from t limit 2)) = 1;
QUERY PLAN
|--SCAN CONSTANT ROW
`--SCALAR SUBQUERY 2
   |--CO-ROUTINE 1
   |  `--SCAN TABLE t
   `--SCAN SUBQUERY 1
0
sqlite> select count(*) = 1 from t;
QUERY PLAN
`--SCAN TABLE t
0

In both plans, there's a SCAN TABLE t. Doesn't the LIMIT 2 avoid the full-scan, as I expect?
It's not obvious to me given the query plan output. I'm not versed in reading those I'm afraid...

If someone more knowledgeable could confirm my suspicious of the performance issue,
and confirm my proposed changes fixes it, I'd appreciate. Thanks, --DD

PS: The ON DELETE trigger can/should also use EXISTS, more idiomatic, no?

sqlite> select not exists (select 1 from t);
QUERY PLAN
|--SCAN CONSTANT ROW
`--SCALAR SUBQUERY 1
   `--SCAN TABLE t
0
09:46 Edit: Most performant way to test for having 0, 1 or more rows in a table (artifact: b54e02fce9 user: ddevienne)

Hi. We have two mutually-exclusive ON INSERT triggers on a table,
and a 3rd ON DELETE trigger.

The 1st has a WHEN (select count(*) from t) = 1,
while the 2nd has a WHEN (select count(*) from t) > 1,
and the 3rd a WHEN (select count(*) from t) = 0.

It looks simple and innocuous, but I'm worried about the performance.
Because it seems to me, this introduces quadratic behavior on inserts
and delete, from the count(*) FULL-SCANs. Is that correct?

I propose to use (select count(*) from (select 1 from t limit 2))
instead of (select count(*) from t), which I think will avoid the FULL-SCANs,
but comparing the query-plans, I don't understand the 2nd plan, so I'm not 100%
sure this is the better more performance approach, albeit more verbose.

sqlite> create table t (k integer primary key not null, data blob);
sqlite> select (select count(*) from (select 1 from t limit 2)) = 1;
QUERY PLAN
|--SCAN CONSTANT ROW
`--SCALAR SUBQUERY 2
   |--CO-ROUTINE 1
   |  `--SCAN TABLE t
   `--SCAN SUBQUERY 1
0
sqlite> select count(*) = 1 from t;
QUERY PLAN
`--SCAN TABLE t
0

In both plans, there's a SCAN TABLE t. Doesn't the LIMIT 2 avoid the full-scan, as I expect?
It's not obvious to me given the query plan output. I'm not versed in reading those I'm afraid...

If someone more knowledgeable could confirm my suspicious of the performance issue,
and confirm my proposed changes fixes it, I'd appreciate. Thanks, --DD

09:30 Post: Most performant way to test for having 0, 1 or more rows in a table (artifact: 15c664f67e user: ddevienne)

Hi. We have two mutually-exclusive ON INSERT triggers on a table,
and a 3rd ON DELETE trigger.

The 1st has a WHEN (select count(*) from t) = 1,
while the 2nd has a WHEN (select count(*) from t) > 1,
and the 3rd a WHEN (select count(*) from t) = 0.

It looks simple and innocuous, but I'm worried about the performance.
Because it seems to me, this introduces quadratic behavior on inserts
and delete, from the count(*) FULL-SCANs. Is that correct?

I propose to use (select count(*) from (select 1 from t limit 2))
instead of (select count(*) from t), which I think will avoid the FULL-SCANs,
but comparing the query-plans, I don't understand the 2nd plan, so I'm not 100%
sure this is the better more performance approach, albeit more verbose.

sqlite> create table t (k integer primary key not null, data blob);
sqlite> select (select count(*) from (select 1 from t limit 2)) = 1;
QUERY PLAN
|--SCAN CONSTANT ROW
`--SCALAR SUBQUERY 2
   |--CO-ROUTINE 1
   |  `--SCAN TABLE t
   `--SCAN SUBQUERY 1
0
sqlite> select count(*) = 2 from t;
QUERY PLAN
`--SCAN TABLE t
0

In both plans, there's a SCAN TABLE t. Doesn't the LIMIT 2 avoid the full-scan, as I expect?
It's not obvious to me given the query plan output. I'm not versed in reading those I'm afraid...

If someone more knowledgeable could confirm my suspicious of the performance issue,
and confirm my proposed changes fixes it, I'd appreciate. Thanks, --DD

2021-08-03
06:59 Reply: null character sorts greater than 0x7f and less than 0x80 (artifact: d96aed8aff user: ddevienne)

I don't disagree, but at the same time, since SQLite already has code for UTF-8,
it would be useful to have a is_utf8() built-in function for those that want to opt-in
to strict UTF-8 enforcement using a CHECK constraint.

2021-07-30
13:27 Reply: LEFT JOIN Misunderstanding (artifact: 134b052945 user: ddevienne)

FWIW, I much prefer https://sqlite.org/forum/forumpost/bd05c5ea6ee5fb91?t=h,
i.e. ON should only be with columns from the 2 joined tables on both sides,
and WHERE with only a single column from any table and a literal (typically).

The way you mix ON-predicates and WHERE-predicates in ON now,
or everything in WHERE before, confuses things IMHO (and was wrong for the latter).

It's partly a matter of style.
I like to think of it as ON being about stitching tables together, and WHERE about filtering.
The former is also a filtering of sort, yet different in my mind. YMMV :)

2021-07-29
08:17 Reply: Virtual RDBMS over SQLite (artifact: e2fd4f50bd user: ddevienne)

The only advantage of the design you describe is that the schema is fixed,
you never have to do any DDLs, only DMLs. But that seems like a pretty terrible schema IMHO.
Is DDL not transactional in SQL-Server like in Oracle, but unlike SQLite and PostgreSQL?
That could be a reason for such a design maybe.

SQLite is great at storing structured data, or even semi-structured data, e.g. SQLAR, and you also have JSON1 too.
So sure, I'd go with SQLite myself, assuming I understand where you are going.

It's up to you whether to use a single table, many tables, or even several different DB files too.
Unless you have thousands of tables, SQLite will be fine. SQLite is optimized for many-rows, not many-tables,
still it will do great with a few hundreds to thousands of tables, just don't go into the dozens of thousand tables (or shard them on several DBs).

Have fun! --DD

2021-07-20
16:02 Reply: cycle detection in CHECK constraint with recursive CTE (artifact: cb6417fdf5 user: ddevienne)

OK, sure. What I propose is a pure TREE, a simpler form of DAG indeed.
But still, what you have will not scale well. At least index the from/to tag,
and probably also have a UNIQUE(from, to) contraints too.

That way the WHERE from = ... queries will range-scan that auto-index,
and either the WHERE to = ... ones will skip-scan it using that same index,
or create an explicit (non-UNIQUE) index on it. Look at the plans.

You probably also want to ON DELETE CASCADE in your FKs, to auto-delete edges,
when tags are removed. Unless your ORM does that in code instead?

And you also don't need an id column for edges, but I suspect it's your ORM adding it, right?
I'd make the edges table a WITHOUT ROWID table, and have the PK be (from, to),
effectively replacing the UNIQUE-index I proposed above.

Still, I somehow feel there's a better way to do what you want, I just don't see it now. Maybe I'm just imagining things.

PS: Also, don't forget to enable pragma foreign_keys = 1, unlike me!

14:35 Reply: ROWID in compound SELECT (artifact: 79fa2cae2e user: ddevienne)

See point #3 of https://www.sqlite.org/changes.html for 3.36

14:32 Edit reply: cycle detection in CHECK constraint with recursive CTE (artifact: 04a82fb13f user: ddevienne)

I think your schema is to blame. If you want a DAG, then use a single table, with a self-referential parent.

Most DBMS's implement immediate Foreign-Keys by default, unlike SQLite. (see correction in follow-up posts)
So with a table such as:

CREATE TABLE IF NOT EXISTS "tag" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, 
  "name" varchar(40) NOT NULL UNIQUE,
  "parent" integer REFERENCES "tag" ("id"),
  UNIQUE(parent, name) -- or UNIQUE(name), depends what you want
);
You cannot event insert a cycle with that schema, ensuring a DAG.

With deferred FKs, the only mode in SQLite (see later posts), you then need to resort
to a TRIGGER to detect cycles, but that trigger can be much faster,
since starting from the leaves, instead of the roots like yours above.
It only has to detect a duplicate ID when traversing from the one leaf
(that fired the trigger) to its root, which is log(N).

Maybe I missed something, and the above is wrong. I'll know soon I guess :)

PS: AFAIK, a CHECK constraint can only access the local row, not do an arbitrary query,
so that approach was flawed from the get go I believe. Happy to be corrected on that.

14:31 Edit reply: cycle detection in CHECK constraint with recursive CTE (artifact: 2a91f2f203 user: ddevienne)

Well, this looks like deferred to me, no?

C:\Users\ddevienne>sqlite3
SQLite version 3.33.0 2020-08-14 13:23:32
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table tag (id integer primary key, name text, parent integer references tag (id), UNIQUE(parent, name));
sqlite> begin;
sqlite> insert into tag values (1, "leaf", 0);
sqlite> insert into tag values (0, "root", null);
sqlite> commit;
sqlite> select * from tag order by id;
0|root|
1|leaf|0
sqlite>

Update: Rah, I always forget about pragma foreign_keys = 1; !!! So you are right indeed. My bad!

C:\Users\ddevienne>sqlite3
SQLite version 3.33.0 2020-08-14 13:23:32
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> pragma foreign_keys = 1;
sqlite> create table tag (id integer primary key, name text, parent integer references tag (id), UNIQUE(parent, name));
sqlite> begin;
sqlite> insert into tag values (1, "leaf", 0);
Error: FOREIGN KEY constraint failed
14:30 Edit reply: cycle detection in CHECK constraint with recursive CTE (artifact: 3d34635057 user: ddevienne)

Well, this looks like deferred to me, no?

C:\Users\ddevienne>sqlite3
SQLite version 3.33.0 2020-08-14 13:23:32
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table tag (id integer primary key, name text, parent integer references tag (id), UNIQUE(parent, name));
sqlite> begin;
sqlite> insert into tag values (1, "leaf", 0);
sqlite> insert into tag values (0, "root", null);
sqlite> commit;
sqlite> select * from tag order by id;
0|root|
1|leaf|0
sqlite>

Update: Rah, I always forget about pragma foreign_keys = 1; !!! So you are right indeed. My bad!

14:28 Reply: cycle detection in CHECK constraint with recursive CTE (artifact: 4c3adefb82 user: ddevienne)

Well, this looks like deferred to me, no?

C:\Users\ddevienne>sqlite3
SQLite version 3.33.0 2020-08-14 13:23:32
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table tag (id integer primary key, name text, parent integer references tag (id), UNIQUE(parent, name));
sqlite> begin;
sqlite> insert into tag values (1, "leaf", 0);
sqlite> insert into tag values (0, "root", null);
sqlite> commit;
sqlite> select * from tag order by id;
0|root|
1|leaf|0
sqlite>
12:23 Reply: cycle detection in CHECK constraint with recursive CTE (artifact: d13515cf30 user: ddevienne)

I think your schema is to blame. If you want a DAG, then use a single table, with a self-referential parent.

Most DBMS's implement immediate Foreign-Keys by default, unlike SQLite.
So with a table such as:

CREATE TABLE IF NOT EXISTS "tag" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, 
  "name" varchar(40) NOT NULL UNIQUE,
  "parent" integer REFERENCES "tag" ("id"),
  UNIQUE(parent, name) -- or UNIQUE(name), depends what you want
);
You cannot event insert a cycle with that schema, ensuring a DAG.

With deferred FKs, the only mode in SQLite, you then need to resort
to a TRIGGER to detect cycles, but that trigger can be much faster,
since starting from the leaves, instead of the roots like yours above.
It only has to detect a duplicate ID when traversing from the one leaf
(that fired the trigger) to its root, which is log(N).

Maybe I missed something, and the above is wrong. I'll know soon I guess :)

PS: AFAIK, a CHECK constraint can only access the local row, not do an arbitrary query,
so that approach was flawed from the get go I believe. Happy to be corrected on that.

12:02 Reply: cycle detection in CHECK constraint with recursive CTE (artifact: ad291bd3ec user: ddevienne)

FWIW, WHERE EXISTS often uses a SELECT 1 FROM ... WHERE ... query, because it does not matter what you select,
it's the mere fact a single row exists that matters. So selecting anything else is just wasting cycles :).
Not that it matters performance-wise, most likely, so just an FYI.

2021-07-19
12:11 Reply: Populating a tree from a flat table of subcategories (artifact: c9993bacc6 user: ddevienne)

Start with https://www.sqlite.org/lang_with.html

But could be done with a 3-way join too, since you explicitly want to do 3 levels only.

2021-07-08
17:24 Reply: How to detect invalid trigger w/o actually running it? (artifact: eea3380888 user: ddevienne)

Thanks for the suggestion. Although with triggers, it can't be just SELECTs,
and coming up with all kinds of dummy statements to explain, so one can be
sure there's no mistake hiding in the schema seems less than ideal.

Surely there has got to be a better way?

And if not at the moment, shouldn't there be a pragma check_schema in the future,
that checks everything declared in sqlite_schema is OK, referencing existing tables/views/columns, that is opt-in.

08:40 Post: How to detect invalid trigger w/o actually running it? (artifact: 9264574ffd user: ddevienne)

Hi. We've had a mishap with a trigger referencing an unknown table,
which was detected too late, by internal users. The SQL script doing
the upgrade (that added the faulty trigger) was run during tests,
and installed successfully w/o errors, but of course actually using it
yielded errors.

Validating all triggers at runtime is currently not something we can do...

So is there a way to quickly detect any invalid objects, like triggers,
views, etc... which does not implies running them (via a DML statement)?

Something we could run post-schema-upgrade or post-schema-creation, to
avoid a repeat of the above mishap?

Thanks, --DD

PS: Note that we may have pragma_legacy_alter_table, I'm not 100% sure,
on the connection doing the upgrade, if that makes a difference.

2021-07-07
09:29 Edit reply: How ddo I specify options with MSVC (artifact: f31f52a8fe user: ddevienne)

You just add /D "SQLITE_MAX_COLUMN=16384" on your CL cli.

You may also want to add /O2 to make it fast too.

But of course, having 16K columns is suspicious from a relational design POV :)

09:24 Reply: How ddo I specify options with MSVC (artifact: 0ae9c9a83e user: ddevienne)

You just add /D "SQLITE_MAX_COLUMN=16384" on your CL cli.

But of course, having 16K columns is suspicious from a relational design POV :)

2021-07-06
19:02 Reply: shared in memory db: :memory:+shared cache OR memdb VFS? (artifact: 6700ab1f9f user: ddevienne)

Well, the point is that shared-cache is frowned upon, and kinda deprecated,
except on very low resources devices, which is precisely not the case of wanting
to use it on large multi-core desktop machines. Except shared-cache was the only
known way to have separate SQLite connections sharing the same in-memory DB within
the bounds of a single process, allowing concurrent access to the same DB from several
threads. Until MemDB VFS. But then you say they end up being the same code... that's confusing.

Shared-cache implies normal Journal mode, with readers blocking writers, and the writer
blocking readers. WAL not supported. While nothing prevents in theory a VFS from supporting
WAL-mode, as long as that VFS supports SHM, which it can if that VFS is in-memory.
Thus my question whether MemDB VFS supports WAL-mode for true MVCC.

Hopefully everything will be clearer soon. --DD

14:48 Reply: shared in memory db: :memory:+shared cache OR memdb VFS? (artifact: d33f5d5dd1 user: ddevienne)

Then no WAL-mode and no MVCC then, I guess :(

Too bad. I thought going for the VFS approach was precisely to gain WAL-mode
for shared in-memory DBs, thus I don't quite understand why a separate approach
was added, if it ends up going through the same code in the end.

There must be something I'm missing. Looking forward to the new doc.

More ↓