SQLite Forum

Timeline
Login

50 most recent events by user kmedcalf occurring on or after 2021-11-17 18:12:14.

More ↑
2021-12-19
21:50 Reply: HEX - What is the reverse function? artifact: 9520fa78ea user: kmedcalf

Yes, that is fine. I would consider it Public Domain. There is no warranty, neither express nor implied. If you break it, you own both halves.

2021-12-12
03:32 Reply: SQLite3 select (with table scan) not saturating NVMe throughput artifact: 73125e3e63 user: kmedcalf

I did some testing on Windows 10 Pro for Workstations.

I/O write (continuous append) to a Samsung SSD (NVMe) was about 280 MB/s and read (table scan) was 380 MB/s. This used one entire "CPU thread" worth of CPU at 3.9 Ghz.

I/O write (continuous append) to a Samsung SATA III SSD (with RAPID installed so that it can do proper queueing and scatter/gather) was slightly better at 340 MB/s write and read at about 512 MB/s. This consumed the same CPU (one execution thread 100% at 3.9 Ghz).

Doing the same on a PCIe x4 (Thunderbolt) attached RAID array with "standard spinning rust" performed the write at ~240 MB/s and read at ~340 MB/s.

So your result seems in line with expected Windows CPU and I/O performance on Windows.

00:36 Reply: SQLite3 select (with table scan) not saturating NVMe throughput artifact: 9ed7985630 user: kmedcalf

This limit is the same whether running SQLIte3 from the command line, or via the Python 3 API.

This is expected since both are single-thread and thus cannot use more than one "thread of execution" on the CPU. Also, without special drivers Windows does not handle scatter-gather, multiple command presentement (queueing), nor block devolution/aggregation from a single thread.

The benchmark testing does direct I/O and bypasses most of the Windows I/O subsystems so that it can perform queueing and multithread presentation (which I/O originating from the Windows Filesystems will not do).

00:10 Reply: SQLite3 select (with table scan) not saturating NVMe throughput artifact: ca57132e2f user: kmedcalf

An AMD Ryzen 7 CPU and X570 chipset will have directly CPU attached PCIe lanes eliminating the muliplexed PCIe bottlenecks commonly present in Intel PCIe architecture.

This leaves the Operating System, Microsoft Windows, or the NVMe device or drivers themselves. Which NVMe driver are you using, what storage device, and what are the Windows Device and IO settings?

2021-12-11
22:39 Reply: SQLite3 select (with table scan) not saturating NVMe throughput artifact: 43e016e488 user: kmedcalf

This sounds like a hardware issue (you were sold a "bill of goods" because you did not ask the right questions).

Nor did you mention the Operating System, OS version, nor any particulars about the computer system itself (and the particular CPU and chipset), all of which are relevant to the question asked.

That said, unless you payed a lot of money and purchased your hardware from a very reliable source, it is quite likely that your NVME device is connected onto the low-bandwidth shared PCIe bus expansion and not CPU connected PCIe lanes (this is typical with all Intel CPUs until you get to the very tippy toppy end -- they simply do not have the PCIe lanes available to be able to obtain any meaningful performance whatsoever).

This is like in the "old days" when many manufacturers built "gigabit" ethernet adapters that used the USB 1.0 bus interface (which cannot even sustain the bandwith of StarLAN, let alone 10 megabit ethernet), and sold them as "10/100/1000" network adapters. Yes, theey did comply with the gigabit ethernet specification, but could only move a couple of mbps.

Similarly hard drives which were attached to a "super fast ultra whupputy diggity dog" SATA interface and were sold as being ultra fast (because of the raw spec of an unencumbered SATA controller) but had an internal transfer rate that was very slow.

Benchmark utilities (and the people who rely on them) usually do not measure real-world performance. They make a very doctored (as in arranged) benchmark in order to part the foolish with their money.

The same game of "fool the foolish" is played by manufacturers in the RAM market in order to sell common very low speed RAM with crappy access characteristics as if it were high performance RAM.

Similarly, utilizing multiple CPU cores requires (1) a CPU that can run multiple cores at once (this is often a super-secret limitation of Intel CPUs that they do not want you to know about) (b) software which can execute on multiple cores/threads simultaneously and (c) decent I/O handling.

So no, there is no inherent limitation in SQLite3 (nor could there be) which would limit its throughput (there is no "sleep(10000 - versionnumber)" calls scattered about to make each version appear faster than the previous such as some software companies do (I will not mention Microsoft by name).

When dinosaurs ruled the earth the usual cause was also defective hardware implementation, often caused by nutbars putting high thoughput devices onto low capability channels, or mixing channel usage inappropriately.

14:33 Reply: MAX performance artifact: 6adf81aeae user: kmedcalf

Example (note you extend the list differently but bind the same):

select column1 as stream,
       (
        select max(id)
          from events
         where stream == column1
       ) as id
  from (
        values (?)
       )
;

You make the list of values longer by appending ,(?) to the values line.

14:24 Reply: MAX performance artifact: 3f730f3feb user: kmedcalf

Yes, it would work better running the simpler select for each stream one at a time -- you only need to prepare the statement once, just bind one stream at a time to get the result. Alternatively you can send the parameters in a carray or in a temp table or something so that the (sub)query finding the max is executed by itself to guarantee that the minmax optimization is applied.

13:57 Reply: Close/Dispose does not release lock artifact: 502f172369 user: kmedcalf

What happens if you use the first script and change .output to .once ?

13:50 Reply: MAX performance artifact: 2a8fb12cd0 user: kmedcalf

Using select stream, max(id) as id from events where stream == ?' will accept exactly the same bindings and return exactly the same result (except if you send a NULL parameter binding).

To answer your original question: You asked the Query Planner to solve a query that has a GROUP BY and an IN operator (even though the list has only one element, strangely enough). The query optimizer does not know that you asked a bad question thus gave you the answer to the question that you asked, even though it took much longer to answer because the question was bad.

12:39 Reply: MAX performance artifact: fa6e4e3ce5 user: kmedcalf

Since you already know the stream, then why do you not simply do the following:

select max(id)
  from events
 where stream == ?
;

which will accept the same binding and return the only item of information that you do not already have.

2021-12-08
05:56 Reply: how to do "Explain Query Plan" using code (c++ or rust)? artifact: e01882ab44 user: kmedcalf

Slight correction.

if SQLITE_ENABLE_UNKNOWN_SQL_FUNCTION is defined when you build the CLI, then you can open a database that has the unknown function in the schema (ie, in an index).

You can then use EXPLAIN and EXPLAIN QUERY PLAN in the CLI. The unknown function will only cause an error to be thrown if you attempt to execute a statement that needs it (such as updating the table (though you can do whatever the heck you want so long as the function that is missing is not required)).

05:30 Reply: how to do "Explain Query Plan" using code (c++ or rust)? artifact: 59f56e8e4c user: kmedcalf

You can add the UDF to SQL by taking the code and compiling it as a load library (called a DLL on Windows, shared object (so) on *nix, other names depending on the OS -- you did not specify the OS), then using the .load CLI command to load the load library into the CLI.

As long as you have created the index, ala:

create index idx on thetable (json_field('name', raw_blob_field));

Once you have created the index, you can use it once built. The function json_field is only required if (a) you call it explicitly or (b) you insert or update the table on which it is defined. (You may need to compile the CLI with SQLITE_ENABLE_UNKNOWN_SQL_FUNCTION defined if the CLI you have does not have that defined -- you can find out by opening the database in the CLI. Without SQLITE_ENABLE_UNKNOWN_SQL_FUNCTION you will get an error accessing the database).

If you issue SQL of the following form:

SELECT * FROM thetable WHERE json_field('name', raw_blob_field) == 'something';

then the index will be used. the function json_field does not need to be present unless you attempt to update the table (or otherwise use the function).

01:31 Reply: How to skip intermediate data of DB in file? artifact: c775de7bbf user: kmedcalf

Note to the OP:

You need to use the facilities of the Operating System to manage the process heap. For example, on Windows you can define SQLITE_WIN32_MALLOC and SQLITE_WIN32_HEAP_CREATE so that a separate arena is allocated to SQLite3.

You can then call sqlite3_win32_compact_heap or sqlite3_win32_reset_heap to either comact or reset the heap.

You can probably do the same thing on other Operating Systems.

2021-12-07
21:24 Reply: How to skip intermediate data of DB in file? artifact: 5fafa6fd62 user: kmedcalf

To clear memory of accumulated data of SQLite, close all SQLite connections.

This is untrue. This will "free" the memory used by SQLite3 (or whatever else you close and free). It will not, however, reduce the size of the virtual arena which has been allocated by the Operating System to satisfy memory allocation requests from the executing process.

That is, if you do something like this p = malloc(1000000) will allocate 1 million bytes (pointed to by p) from the virtual arena to the process. If the arena is too small, then the address space of the arena will be increased (alternatively another V:V block may be allocated to the process by the Operating System from which the allocation request can be satisfied).

Subsequently executing free(p) will return the block of 1 million bytes back into the memory allocation arena provided to the process by the Operating System.

How the Operating System manages V:V mapped memory blocks allocated to the memory arena belonging to the process does not necessarily mean that the size of the arena will decrease.

Let's take a specific example on Windows x64.

The process makes memory allocation (malloc, calloc, etc) requests which are satisfied from a "pool" (V:V arena) allocated by the Operating System to the process. When this memory is released by the process (free) nothing at all happens except that the allocation is moved to the process runtime free list.

Only when the process terminates is the V:V memory arena released back to the Operating System free pool.

Some runtime's and some Operating Systems may "coalesce" each of their respective free lists so that only when the entire "arena" block allocated to the process (or one of the arena blocks) is entirely free, then that arena block can be deallocated from the process by the Operating System.

This means that, as has been the case since forever, processes which do much malloc/free will expand so that the total size of all arena blocks is sufficient that the use of malloc/free does not require the Operating System to allocate more V:V address blocks to the process.

Badly written software (including ill-conceived malloc/free) will result in what is called "memory fragmentation".

You may wish to refer to https://sqlite.org/malloc.html

There are many scholarly articles available which discuss memory allocation behaviour, though a proper analysis requires that you have a background in mathematics (not arithmetic).

04:46 Reply: Windows Build default compile options? artifact: 7236426d5a user: kmedcalf

I believe that the default download is cross-compiled but the symbol table has not been stripped.

Again, inclusion of symbols in the symbol table is dependent on the platform. Some platform export the entire (global and private) symbol tables by default (and you use the strip command to remove them, or tell the compiler on the command line to strip the symbols). Other platforms (such as Microsoft) do not export a symbol table at all unless you specifically direct the compiler/linker to do that.

04:07 Reply: Windows Build default compile options? artifact: ba57ce897a user: kmedcalf

Add the option (define) -DSQLITE_API=__desclspec(dllexport) to the command line.

cl -O2 -DSQLITE_API=__declspec(dllexport) -DSQLITE_ENABLE_SESSION -DSQLITE_ENABLE_FTS4 -DSQLITE_ENABLE_RTREE -DSQLITE_DQS=0 -DSQLITE_ENABLE_COLUMN_METADATA sqlite3.c -link -dll -out:sqlite3.dll

Some platforms export all global symbols by default, and some do not. Windows happens to be one of those platforms that does not export globals by default.

You must label them to be exported; or, you may use a method like the default Makefile.msc uses and generate a .def file and pass that in to the linker so that it knows what to do (differently than the default).

It is easier, SQLite3 is written such, that you can specify modifiers to prefix the various types of symbols (API, PRIVATE, etc). You define the API modifier, SQLITE_API to the value __declspec(dllexport) which will cause the compiler to add the dllexport to those declarations. The linker will then utilize that tag to export the named symbols from the DLL.

2021-12-06
02:04 Reply: SQLite Testing artifact: 70026e7a73 user: kmedcalf

Hehe.

On bitty-boxen, MDBS and dbVista both implement(ed) CODASYL Network Extended databases. EDMS (Xerox program product running under CP-V -- in this case on a Sigma 7) also provides CODASYL Network Extended database organization. There are others.

Relational or Pure Hierarchical (ie, IMS) need to use a "linkage table" to model N:M relationships.

2021-12-05
02:17 Edit reply: Error: stepping, unable to open database file (14) artifact: 4238f24b08 user: kmedcalf

Have you checked permissions? That is file permissions as controlled and implemented by the Operating System. Unfortunately the CLI does not tell you the extended error code, which might be informative in this situation.

02:15 Reply: Error: stepping, unable to open database file (14) artifact: f8ef1c9342 user: kmedcalf

Have you checked permissions? That is file permissions as controlled and implemented by the Operating System.

02:10 Reply: Proposed new date/time function to return a unix timestamp artifact: 741263fb7b user: kmedcalf

As long as you account "estimates being nearly equal" there will be no problems.

Storing the timestamp as a scaled integer does not really help as you still have to solve the same problem -- that is, two values which appear identical may originate from a source that is incorrect, and two values which appear to differ be a millisecond may in fact only differ by a picosecond.

2021-12-03
20:43 Reply: SQLite Testing artifact: f70416f7f3 user: kmedcalf

You could just execute it. However, it will not work (neither as intended nor as designed nor as built).

FIrstly, the parent key of a foreign-key relationship must be unique.
Secondly, the child record cannot reference multiple parents.
The parent:child relationship must be 1:N and cannot be N:1.

While a CODASYL database (in particular, a Network Extended CODASYL database) permits a "child" to have mnore than one "parent" (ie, the parent instance can be a set), such extensions do not apply to either hierarchical databases nor to relational databases.

This has been the state of affairs since about 1956 ...

2021-12-02
18:27 Reply: Will the database file be corrupted if using MEMORY journal mode ? artifact: 7e763a5bd4 user: kmedcalf

This is incorrect.

The journal mode is not persistently saved with the database. The database saves whether the journal_mode is or is not WAL. If the journal_mode is not WAL, then the journal_mode is reset to the default (the default default is DELETE) when the database is opened.

That is, journal_mode=WAL is persistent.
None of the other journal_mode settings are persisted.

2021-11-30
21:55 Reply: Proposed new date/time function to return a unix timestamp artifact: 8e66227e40 user: kmedcalf

I suggest to add functions along the line of unix_epoch(), unix_epoch_ms(), and maybe even unix_epoch_us(), each converting a datetime value into a sufficiently large integer number with second / millisecond / microsecond precision, respectively.

The internal datetime structure maintains the datetime as a 64-bit integral count of milliseconds since the julian epoch. This necesarily entails that the maximum precision is 1 millisecond. Notwithstanding this limitation on precision, accuracy is also limited by the Operating System. Although the datetime returned on unix is precise to the nanosecond (and on Windows, to the hun -- hundreds of nanoseconds) the granularity of the sampling is controlled by the OS -- Windows only updates the system clock on a "tick", which is usually about 16 ms, although this can be changed.

Using a stamp value which is in the unit days since the julian epoch date, or in seconds since the unix epoch date, expressed as a double precision floating point number, is sufficiently accurate to express the timestamp as a number with millisecond precision with at least one guard digits over the entire range of ISO-8601 datetime text strings (0000-01-01T00:00:00.000Z to 9999-12-31T23:59:59.999Z) recognized by SQLite3.

sqlite> create table x(x);
sqlite> insert into x values ('0000-01-01 00:00:00.000');
sqlite> insert into x values ('1970-01-01 00:00:00.000');
sqlite> insert into x values ('2021-11-30 21:28:14.234');
sqlite> insert into x values ('9999-12-31 23:59:59.999');
sqlite> select x, julianday(x), epsilon(julianday(x))*86400, unixtime(x), epsilon(unixtime(x)) from x;
┌─────────────────────────┬──────────────────┬─────────────────────────────┬──────────────────┬──────────────────────┐
│            x            │   julianday(x)   │ epsilon(julianday(x))*86400 │   unixtime(x)    │ epsilon(unixtime(x)) │
├─────────────────────────┼──────────────────┼─────────────────────────────┼──────────────────┼──────────────────────┤
│ 0000-01-01 00:00:00.000 │ 1721059.5        │ 2.01165676116943e-05        │ -62167219200.0   │ 7.62939453125e-06    │
│ 1970-01-01 00:00:00.000 │ 2440587.5        │ 4.02331352233887e-05        │ 0.0              │ 1.11022302462516e-16 │
│ 2021-11-30 21:28:14.234 │ 2459549.39460919 │ 4.02331352233887e-05        │ 1638307694.234   │ 2.38418579101563e-07 │
│ 9999-12-31 23:59:59.999 │ 5373484.49999999 │ 8.04662704467773e-05        │ 253402300799.999 │ 3.0517578125e-05     │
└─────────────────────────┴──────────────────┴─────────────────────────────┴──────────────────┴──────────────────────┘

NB: UnixTime code previously posted and differs from the new unixepoch function in that it returns a double precision floating point value which includes milliseconds.

which is prone to floating-point precision limits resulting in flipping sub-second digits.

This is what is referred to as an old wives tale. It is promulgated by the same persons who barf up the same demonstration of ignorance when dealing with storing money in floating point. As you can see from the above table, a double does not contain enough precision to express the value reliably to more than millisecond precision.

2021-11-29
17:20 Edit reply: Proposed new date/time function to return a unix timestamp artifact: 73fc7a8fa7 user: kmedcalf

I simply use this function in the core (cannot be a UDF since it uses internal data structures):

/*
** UnixTime function returns the floating point unix epoch time.
** It uses the iJD maintained by the VDBE and is therefore only
** precise to a millisecond.
*/

static void _UnixTime(sqlite3_context *context, int argc, sqlite3_value **argv) {
    DateTime x;
    if (isDate(context, argc, argv, &x) == 0) {
        sqlite3_result_double(context, ((double)(x.iJD - 210866760000000ll)) / 1000.0);
    }
}
17:19 Reply: Proposed new date/time function to return a unix timestamp artifact: bbdefae9c7 user: kmedcalf

I simply use this function in the core (cannot be a UDF since it uses internal data structures):

** UnixTime function returns the floating point unix epoch time.
** It uses the iJD maintained by the VDBE and is therefore only
** precise to a millisecond.
*/

static void _UnixTime(sqlite3_context *context, int argc, sqlite3_value **argv) {
    DateTime x;
    if (isDate(context, argc, argv, &x) == 0) {
        sqlite3_result_double(context, ((double)(x.iJD - 210866760000000ll)) / 1000.0);
    }
}
2021-11-26
21:35 Reply: sqlite CLI: .timer - should be whole transaction? artifact: 626174198b user: kmedcalf

Excellent. If you need millisecond precision then use the julianday function, and multiply the difference by 86400 to get the difference in seconds.milliseconds.

20:37 Reply: sqlite CLI: .timer - should be whole transaction? artifact: c40c38258d user: kmedcalf

Is this the correct behaviour/useful?

Yes, this is the correct behaviour. Usefulness is in the eye of the beholder.

I happen to more interested in the total time/summary for the whole batch, rather than a list of individual timings.

As you have discovered, .timer does not do this. If you want to do this, then I would suggest that you do what you require. One such possibility is:

timethis sqlite3 dumdum.db < load.sql

I also assume that this output in fact slows down the overall import?

Of course it does. Significantly. I/O is slow and the fastest way to do I/O is to not do it.

Wouldn't it be more appropriate to just treat the whole block as a batch and show an average timing here?

You are entirely free to do that.

20:32 Reply: group_concat - ORDER BY parameter please. artifact: e26bfd68e4 user: kmedcalf

You will also note that ORDER BY applies sortation for OUTPUT. This is because, although data processing is based on relational set algebra, current technology is restricted to serial (one after each) processing and presentement.

Current technology cannot compute the entire set operation at once, but only piecewise (one bit more after each previously). This is the ONLY reason there is any "ordering" whatsoever even though none exists.

20:25 Reply: group_concat - ORDER BY parameter please. artifact: 228bb96e12 user: kmedcalf

group_concat concatenates the arguments in the order of presentation. The description of this order as "arbitrary" is so because, to an uneducated observer, it appears to operate by "magic". (Remeber that any sufficiently advanced technology appears to the addle as "magic").

However, this is not actually true. The "order" is entirely deterministic and is not "arbitrary" or "magical".

Because group_concat (and every other aggregate/window function operates on the data passed to it "in order of presentement", then if you wish to have the aggregate calculated in a specific order, you have to control the order of presentement.

with t(x) as (values (3), (1), (2))
select group_concat(x, ', ')
  from (
           select x
             from t
         order by x
        )
;
2021-11-25
17:58 Reply: Faster way to insert into WITHOUT ROWID table? artifact: c0af76e5f5 user: kmedcalf

Havew you increased the cache size?

17:50 Reply: How to rerank one row in a big set of rows? artifact: c4943eab84 user: kmedcalf

Opening a rowid and renumbering the table is performed by changing the traversal to minimize row movement.

17:48 Reply: How to rerank one row in a big set of rows? artifact: 98ddcbd5a8 user: kmedcalf

Python code to maintain an Ordered Table:


def OpenRowid(db, table, rowid, count=1):
    if count < 1:
        raise ValueError('Count must be >= 1')
    if db.cursor().execute(OpenRowid.test.format(table=table), (rowid, count)).fetchone()[0] != 0:
        db.cursor().execute(OpenRowid.sql.format(table=table), (rowid, count))

def RenumberTable(db, table):
    db.cursor().execute(RenumberTable.sql.format(table=table))


OpenRowid.test = '''
select count(*)
  from {table}
 where _rowid_ >= ?1
   and _rowid_ < ?1 + ?2
;
'''

OpenRowid.sql = '''
insert into {table}
     select *
       from {table}
      where _rowid_ >= ?
   order by _rowid_ desc
on conflict (_rowid_) do update
        set _rowid_ = excluded._rowid_ + ?;
'''

RenumberTable.sql = '''
begin immediate;

-- make sure we have the correct temporary table r

drop table if exists temp.r
;
create temporary table if not exists r
(
   old primary key,
   new
)
without rowid
;

-- for all _rowid_ > 0 compute new _rowid_ to close gaps

insert into temp.r
     select _rowid_ as old,
            row_number() over (order by _rowid_) as new
       from {table}
      where _rowid_ > 0
   order by _rowid_
;

-- apply _rowid_ updates in ascending order of _rowid_ (move _rowid_ down)

insert into {table}
     select *
       from {table}
      where _rowid_ in (
                        select old
                          from temp.r
                         where old != new
                       )
   order by _rowid_
on conflict (_rowid_) do update
        set _rowid_ = (
                       select new
                         from temp.r
                        where old == excluded._rowid_
                      )
;

-- clear temporary table

delete from temp.r;

-- for _rowid_ > 0 compute offset _rowid_ so rows with _rowid_ < 1 can be moved into place

insert into r
     select _rowid_ as old,
            row_number() over (order by _rowid_) + (select count(*) from {table} where _rowid_ < 1) as new
       from {table}
      where _rowid_ > 0
   order by _rowid_
;

-- apply _rowid_ updates in descending order of _rowid_ (move _rowid_ up)

insert into {table}
     select *
      from {table}
     where _rowid_ in (
                       select old
                         from temp.r
                        where old != new
                      )
   order by _rowid_ desc
on conflict (_rowid_) do update
        set _rowid_ = (
                       select new
                         from temp.r
                        where old == excluded._rowid_
                      )
;

-- clear temporary table

delete from temp.r;

-- compute final _rowid_ for rows with _rowid_ < 1

insert into temp.r
     select _rowid_ as old,
            row_number() over (order by _rowid_) as new
       from {table}
      where _rowid_ < 1
   order by _rowid_
;

-- apply _rowid_ updates in descending order of _rowid_ (move _rowid_ up)

insert into {table}
     select *
       from {table}
      where _rowid_ in (
                        select old
                          from temp.r
                         where old != new
                       )
   order by _rowid_ desc
on conflict (_rowid_) do update
        set _rowid_ = (
                       select new
                         from temp.r
                        where old == excluded._rowid_
                      )
;

-- clean up our temporary table

drop table if exists temp.r;

-- commit our transaction

commit;
'''

if __name__ == '__main__':
    import apsw
    db = apsw.Connection()
    db.cursor().execute('create table x(id integer primary key, v integer);')
    db.cursor().execute('insert into x select value*10, value from generate_series where start=1 and stop=10;')
    db.cursor().execute('insert into x select -(value-1)*10, -value from generate_series where start=1 and stop=10;')
    db.cursor().execute('delete from x where id=50;')
    db.cursor().execute('delete from x where id=-30;')
    for row in db.cursor().execute('select * from x;'):
        print(row)
    print()
    RenumberTable(db, 'x')
    for row in db.cursor().execute('select * from x;'):
        print(row)
    print()
    OpenRowid(db, 'x', 3)
    OpenRowid(db, 'x', -4)
    for row in db.cursor().execute('select * from x;'):
        print(row)
    print()
    db.cursor().execute('insert into x values (3, 103);')
    db.cursor().execute('insert into x values (-4, -104);')
    for row in db.cursor().execute('select * from x;'):
        print(row)
    print()
    RenumberTable(db, 'x')
    for row in db.cursor().execute('select * from x;'):
        print(row)
    print()
2021-11-24
22:37 Reply: Feature Request: ATTACH DB readonly? artifact: c35e9d87d5 user: kmedcalf

Ooops. You are correct. It applies to the entire connection (it does not recognize the <schema> prefix.)

21:54 Reply: Feature Request: ATTACH DB readonly? artifact: 5208775b10 user: kmedcalf
2021-11-22
20:43 Reply: Proposed slightly-incompatible change to date-time functions artifact: 9a491235f9 user: kmedcalf

I would presume that a BLOB is "interpreted" as a text string and that if the text string is invalid, then the result is NULL. Similarly, the propagation of NULL requires that a NULL in means NULL out. Any value that is not valid would cause a NULL return. This is current/present behaviour.

20:30 Edit reply: Proposed slightly-incompatible change to date-time functions artifact: 9183c02f7f user: kmedcalf

No. REAL values are interpreted as Julian Dates and INT values are interpreted as unix timestamps. Do you know of an application that stores an unix timestamp as a REAL or a Julian date as an INT? I do not. So, I'm wondering if the change will have any impact at all.

I do. Why do you assume that unix timestamps are integers? I also sometimes store (when used for other purposes) Julian Day (integer) counts.

I have also used Rata Die daystamps (aka the unit is day but the epoch is 0001-01-01 0:00:00.000). Once, oddly enough, I had to use ANSI month stamps (that is, the epoch was the ANSI epoch, 1601, and the unit was month and the fractional part was the fraction of the month elapsed -- that makes for an interesting conversion algorithm...)

CREATE TABLE x(x);
sqlite> INSERT INTO x VALUES('-4713-11-24 12:00:00.000');
sqlite> INSERT INTO x VALUES('1970-01-01 00:00:00.000');
sqlite> INSERT INTO x VALUES('2021-11-21 17:37:38.000');
sqlite> INSERT INTO x VALUES('9999-12-31 23:59:59.000');
sqlite> select x as timestring, unixtime(x) as UnixEpochSeconds, ulp(unixtime(x)) as ResolutionSecU, julianday(x) as JulianEpochDays, ulp(julianday(x))*86400 ResolutionSecJ from x;
┌──────────────────────────┬──────────────────┬──────────────────────┬──────────────────┬──────────────────────┐
│        timestring        │ UnixEpochSeconds │    ResolutionSecU    │ JulianEpochDays  │    ResolutionSecJ    │
├──────────────────────────┼──────────────────┼──────────────────────┼──────────────────┼──────────────────────┤
│ -4713-11-24 12:00:00.000 │ -210866760000.0  │ 3.0517578125e-05     │ 0.0              │ 9.59232693276135e-12 │
│ 1970-01-01 00:00:00.000  │ 0.0              │ 1.11022302462516e-16 │ 2440587.5        │ 4.02331352233887e-05 │
│ 2021-11-21 17:37:38.000  │ 1637516258.0     │ 2.38418579101563e-07 │ 2459540.23446759 │ 4.02331352233887e-05 │
│ 9999-12-31 23:59:59.000  │ 253402300799.0   │ 3.0517578125e-05     │ 5373484.49998843 │ 8.04662704467773e-05 │
└──────────────────────────┴──────────────────┴──────────────────────┴──────────────────┴──────────────────────┘
sqlite>

The resolution of each format (as a double precision floating point number) is at least capable of containing the entire iJD (milliseconds since the julian epoch) to the internal resolution of a millisecond.

Assuming that an INTEGER meaning "seconds since the Unix epoch" and REAL meaning "days since the Julian epoch" is a breaking change in that anything that required "seconds since the Unix epoch" already has the appropriate unit (modifier) specified, whereas, there is presently no unit (modifier) for "days since the Julian epoch" as that is the default.

This means that all code that requires "days since the julian epoch" will now either (a) require a change to add the appropriate modifier or (b) risk misinterpretation of the value.

** NB: unixtime is a UDF that takes the iJD value from a datetime struct and converts it from days (milliseconds, actually) since the julian epoch to seconds since the unix epoch and returns the full value (a real).

2021-11-21
21:50 Reply: big performance hit when querying 2 values in a single query instead of two artifact: 55ff70e928 user: kmedcalf

Interesting.

For some reason the optimizer chooses to use a separate b-tree for distinct even though there is already an index. It does this even for the base query select count(distinct id) from pro_comment sometimes, even though scanning the existing index would be more efficient (lower cost).

Generating the separate b-tree requires an additional scan for no benefit.

I presume that this is because the optimizer is not considering the more direct solution of merely doing a single scan of the index to generate the result.

20:54 Reply: big performance hit when querying 2 values in a single query instead of two artifact: cb5d1e75cf user: kmedcalf

Use EXPLAIN or EXPLAIN QUERY PLAN preface to the queries to see what is happening. In the CLI you can also use .eqp on which automagically does explain query plan or .eqp full to automagic explain.

Try using this form:

select (select max(id) from pro_comment),
       (select count(distinct id) from pro_comment)
;

This allows each scalar subquery to be optimized (executed) independently. Using both max(id) and count(distinct id) in the same single query prevents the optimizer from applying some optimizations (particulary the minmax optimization).

However, even as expressed, the total time taken for the combined query will be the same. Your last select statement select 'max id & count distinct:', max(id), count(distinct id) from pro_comment; should not take more time than a mere select count(distinct id) from pro_comment; since that requires a full scan of the table -- I cannot explain what is happening on the computer that is using the additional 47 seconds of CPU or 2 seconds of system time other than to observe that some OTHER process is going full-ninja-kaboom using your CPU and I/O.

19:17 Reply: Proposed slightly-incompatible change to date-time functions artifact: 05068ae52c user: kmedcalf

That should work without causing any breakage.

Note that rule 3 could specify that if the timevalue is numeric and "less than 0" or "greater than 5373484.5" then it is seconds since 1970-01-01. I have personally never used (or seen) a JD value less than 0 (though unixepoch values before 1970 (as in less than 0) I have seen and used).

18:25 Reply: Proposed slightly-incompatible change to date-time functions artifact: 382aadc198 user: kmedcalf

There are a number of issues with this.

The first is that internally the datetime value is maintained in "milliseconds since the julian epoch" as a 64-bit integer. This means that there is no way to generate output with more resolution than a millisecond.

Secondly, even though Windows internally maintains time to the hun (hundreds of nanoseconds) the call used to access that time has an update frequency of a tick (about 16 milliseconds). Although you can change the system call used to get the time so that it obtains the "precise" time rather than the tick time, it will still not be used internally by SQLite to any more than millisecond resolution.

Although Unix systems will return the tod with a nanosecond resolution, the granularity is up to the OS, and in any case, only millisecond resolution is used internally.

18:10 Reply: Proposed slightly-incompatible change to date-time functions artifact: bf24576a95 user: kmedcalf

No. REAL values are interpreted as Julian Dates and INT values are interpreted as unix timestamps. Do you know of an application that stores an unix timestamp as a REAL or a Julian date as an INT? I do not. So, I'm wondering if the change will have any impact at all.

I do. Why do you assume that unix timestamps are integers? I also sometimes store (when used for other purposes) Julian Day (integer) counts.

I have also used Rata Die daystamps (aka the unit is day but the epoch is 0001-01-01 0:00:00.000). Once, oddly enough, I had to use ANSI month stamps (that is, the epoch was the ANSI epoch, 1601, and the unit was month and the fractional part was the fraction of the month elapsed -- that makes for an interesting conversion algorithm...)

CREATE TABLE x(x);
sqlite> INSERT INTO x VALUES('-4713-11-24 12:00:00.000');
sqlite> INSERT INTO x VALUES('1970-01-01 00:00:00.000');
sqlite> INSERT INTO x VALUES('2021-11-21 17:37:38.000');
sqlite> INSERT INTO x VALUES('9999-12-31 23:59:59.000');
sqlite> select x as timestring, unixtime(x) as UnixEpochSeconds, ulp(unixtime(x)) as ResolutionSecU, julianday(x) as JulianEpochDays, ulp(julianday(x))*86400 ResolutionSecJ from x;
┌──────────────────────────┬──────────────────┬──────────────────────┬──────────────────┬──────────────────────┐
│        timestring        │ UnixEpochSeconds │    ResolutionSecU    │ JulianEpochDays  │    ResolutionSecJ    │
├──────────────────────────┼──────────────────┼──────────────────────┼──────────────────┼──────────────────────┤
│ -4713-11-24 12:00:00.000 │ -210866760000.0  │ 3.0517578125e-05     │ 0.0              │ 9.59232693276135e-12 │
│ 1970-01-01 00:00:00.000  │ 0.0              │ 1.11022302462516e-16 │ 2440587.5        │ 4.02331352233887e-05 │
│ 2021-11-21 17:37:38.000  │ 1637516258.0     │ 2.38418579101563e-07 │ 2459540.23446759 │ 4.02331352233887e-05 │
│ 9999-12-31 23:59:59.000  │ 253402300799.0   │ 3.0517578125e-05     │ 5373484.49998843 │ 8.04662704467773e-05 │
└──────────────────────────┴──────────────────┴──────────────────────┴──────────────────┴──────────────────────┘
sqlite>

The resolution of each format (as a double precision floating point number) is at least capable of containing the entire iJD (milliseconds since the julian epoch) to the internal resolution of a millisecond.

Assuming that an INTEGER meaning "seconds since the Unix epoch" and REAL meaning "days since the Julian epoch" is a breaking change in that anything that required "seconds since the Unix epoch" already has the appropriate unit (modifier) specified, whereas, there is presently no unit (modifier) for "days since the Julian epoch" as that is the default.

This means that all code that requires "days since the julian epoch" will now either (a) require a change to add the appropriate modifier or (b) risk misinterpretation of the value.

** NB: unixtime is a UDF that takes the iJD value from a datetime struct and converts it from days since the julian epoch to seconds since the unix epoch and returns the full value (a real).

03:17 Reply: IS NULL optimization on NOT NULL constraints breaks on corrupted databases artifact: 2194c356e2 user: kmedcalf

During my perplexity on the point, I've been tempted to parody it, along lines of" "I crossed out 'Apples' on this bag of fruit and wrote 'Oranges', then pulled an apple out." But it was too crude and not quite on point. And likely to seem unkind.

Actually, that is a very good analogue, in more ways than one!

The 'bag of fruit' represents a whole database.
The 'label' represents the database schema.
The 'apple' represents the database contents.

So if you receive a 'bag of fruit' that is labelled 'oranges', and pull out an 'apple' you can conclude that the 'bag of fruit' is corrupt. Clearly when receiving a 'bag of fruit' from a 'source of ill-repute' if might be advantageous to check that the contents of the bag match the label.

Luckily in SQLite3 there is command(s) to automate this inspection to ensure that the contents of the bag match the label -- pragma integrity_check and pragma foreign_key_check -- if both of these pass then the label (schema) and contents (data) are consistent and not corrupt.

02:42 Reply: Proposed slightly-incompatible change to date-time functions artifact: 62db41fcfc user: kmedcalf

So as I see it CURRENT SITUATION:

A 'text' argument is interpreted as an ISO-8601 subset datetime string.
A 'numeric' argument is interpreted as a Julian Date number.
The 'unixepoch' modifier forces the 'numeric' to be interpreted as a unixepoch timestamp rather than a Julian Date number.

** numeric means a numeric type, either integer or real

Interpretation: Numeric values are always interpreted as Julian Date numbers UNLESS the 'unixepoch' modifier is specified.

PROPOSAL

Introduce "magical behaviour" which alters how a 'numeric' argument is interpreted depeding on its presentement type and its value range.

RAMIFICATIONS

Numeric arguments are no longer interpreted as Julian Date numbers.
A new modifier must be introduced to restore the previous behaviour.
All existing code that relies on the old behaviour must be modified so as to pass the new modifier string that the argument is to be interpreted as a Julian Date number.
All existing code that relies on the numeric argument representing a unixepoch value rather than a julian date value already have the 'unixepoch' modifier.

ANALYSIS AND RECOMMENDATION

The addition of the automagical interpretation adds no benefit and will require all pre-existing code that depends on the current behaviour to be modified in order to continue to work as it does currently.

If one wishes to break compatibility in this manner then I would recommend that a new modifier be added to indicate that the numeric argument is to be interpreted as 'julianepoch' or some such, and that the function throws an error if a numeric is presented and no modifier is specified.

This will ensure that the change is crumbling and detected and analysed for its effect where it occurs.

A BETTER CHANGE

A pragma setting that has three values for how to interpret numeric arguments with no modifier specified:

pragma epoch = JULIAN;     -- current state of affairs, default
pragma epoch = UNIX;       -- force 'unixepoch' unless julian is specified
pragma epoch = MAGICAL;    -- use "magical interpretation" unless a specific modifier is specified

This could, conceivably be extended to allow other epoch boundaries. For example:

pragma epoch = JULIAN;  -- -4714-11-25 12:00:00
pragma epoch = DOTNET;  --  0001-01-01 00:00:00
pragma epoch = ANSI;    --  1601-01-01 00:00:00
pragma epoch = MJD;     --  1858-11-17 00:00:00
pragma epoch = EXCEL;   --  1900-01-00 00:00:00
pragma epoch = NTP;     --  1900-01-01 00:00:00
pragma epoch = UNIX;    --  1970-01-01 00:00:00
pragma epoch = IBMPC;   --  1980-01-01 00:00:00
pragma epoch = COCOA;   --  2001-01-01 00:00:00

You could even specify the granularity:

pragma epochunit = SECOND;
pragma epochunit = DAY;

The default would be, of course, epoch = JULIAN and epochunit = DAY, to match the presently existant code.

2021-11-19
21:56 Reply: IS NULL optimization on NOT NULL constraints breaks on corrupted databases artifact: 677e52a522 user: kmedcalf

What do you expect to happen when you corrupt the database?

sqlite> pragma integrity_check;
┌───────────────────┐
│  integrity_check  │
├───────────────────┤
│ NULL value in t.v │
└───────────────────┘

The closest analogy is that you put an advertizement in the newspaper to sell your green car. However, before the advertizement was printed, you painted the car blue. However, the advertisement, when it appeared, was still tying to sell a green car.

You are not entitled to claim that the advertizement is now incorrect.

If you deliberately and with aforethought decide to make the database inconsistent and incoherent, and cannot be bothered to ensure the integrity of the database following you coniptions, they why would you place the blame for your failure on anyone else?

This reminds me of the old "Doctor, Doctor, in hurts when I do this" the patient complains as he pokes himself in the eye. "Well do not do that then" replies the Doctor.

I believe the appropriate observation is that you been "hoisted by your own petard".

2021-11-17
23:25 Reply: sqlite3.dll is not returning control to my program artifact: 0be17c459c user: kmedcalf

The problem is most likely located in what you have changed lately. If nothing is changed, then nothing will change. As it worked yesterday, so to will it work tomorrow, for all values of today.

So, what did you change between when "it worked as expected" and when "it did not work as expected"?

(Hint: The most likely culprit is Microsoft Visual Studio ...)

19:41 Reply: SQLite + TCL on-line conference starts in less than 2 days artifact: b2ababc565 user: kmedcalf

especially with REGEXP being part of the standard build

REGEXP is not part of the standard build although it is included in the CLI.

It is one of the many things that have been added in the wrong place (IMHO).

19:32 Reply: SQLite + TCL on-line conference starts in less than 2 days artifact: 5e670438a9 user: kmedcalf

VARCHAR and chums are a holdover from the primitive era of punched cards and fixed record lengths.

Sort of. In the olden days record sizes were fixed. They needed to be because there is no way to access record number 9384378 directly unless the records are fixed length (if they are variable length, you either need a fixed record size index of the starting locations or you need to read all 9384377 preceding records to find the one you want -- if the record length is fixed, it is merely a simple offset calculation).

For this reason "CHAR" and "VARCHAR" required the specification of the maximum length of the field so that the maximum record length (a constant) could be computed in order to permit direct access. "VARCHAR" simply had an "extra couple of bytes" added to the equivalent non-VAR form in order to store the length of the field that was used, the rest was GIGO (random padding provided by happenstance). Every field and record was always a fixed size.

Then along came "INDEXED DIRECT" storage (in the 1960's) which allowed actual variable length records and a "side index" into which the starting offset of each record was stored.

Eventually a number of other file formats replaced BDAM and IDAM organization as the computers and storage media increased speed -- once the CPU and STORAGE were twice as efficient, then twice the amount of time could be "spent" in "fart about locating the data" without affecting percieved performance.

Computing has now progressed to the point where it is thousands or millions of times quicker than it was, so now a huge crapload of time and effort can be consumed "farting around to locate the data" without affecting the perceived performance.

While the prior art will still speed things up from a few millisecond response time to a few nanoseconds of response time, except in very rare circumstance, such efficiency is undetectable by the human bags of mostly water operating the systems -- this is why a program which displays "Hello World" has grown from about 20 bytes to now average just under a gigabyte to achieve the same result in the same amount of time.

19:03 Reply: LockingMode Exclusive for a shared readonly database artifact: feeaf539c8 user: kmedcalf

You would be better served by opening the file as "immutable", assuming that it is, in fact, immutable (this effectively means readonly and nolock) during the time it is open (that is, nothing will change the file, as if it were on WORM media that has been finalized (ie, a closed CD, for example)).

This is a good thing since Docker does not implement the locking semantics necessary for reliable concurrency -- and an "immutable" file does not require any locking.

18:44 Reply: User Define Aggregate Functions with multiple columns and scaler value artifact: f46ded7529 user: kmedcalf

Here is an example UDF that uses 2 arguments. It is a scalar function that takes two arguments -- the first being the number to operate on, and the second optional argument being the number of significant digits to maintain, with the default being 14. The provided number (arg 1) is then scaled, properly rounded, and scaled back so that it contains the number of significant digits specified by the second argument.

SQLITE_PRIVATE void _rounddigitsFunc(sqlite3_context *context, int argc, sqlite3_value **argv)
{
    double x, scale;
    int digits = 14;

    if ((argc==0) || (argc>2))
    {
        return;
    }
    if (sqlite3_value_type(argv[0]) == SQLITE_NULL)
    {
        return;
    }

    x = sqlite3_value_double(argv[0]);
    if (argc == 2)
    {
        digits = sqlite3_value_int(argv[1]);
    }

    digits = max(min(digits, 14), 1) - ceil(log10(fabs(x)));
    scale = pow(10.0, digits);
    x *= scale;
    x -= remainder(x, 1.0);
    x /= scale;

    sqlite3_result_double(context, x);
    return;
}

The following (windowed) aggregate calculates the weighted mean (and a bunch of other statistics) by successive approximation. Each call to the aggregate has two arguments, the value and the weight. No default is provided and two arguments are required to the window aggregate function.

typedef struct wmeanCtx wmeanCtx;
struct wmeanCtx
{
    sqlite_int64 n;
    LONGDOUBLE_TYPE m;
    LONGDOUBLE_TYPE w;
    LONGDOUBLE_TYPE s;
};

SQLITE_PRIVATE void wmeanStep(sqlite3_context *context, int argc, sqlite3_value **argv)
{
    wmeanCtx* p = sqlite3_aggregate_context(context, sizeof(wmeanCtx));
    if (p && sqlite3_value_numeric_type(argv[0]) != SQLITE_NULL && sqlite3_value_numeric_type(argv[1]) != SQLITE_NULL)
    {
        register LONGDOUBLE_TYPE x = sqlite3_value_double(argv[0]);
        register LONGDOUBLE_TYPE w = sqlite3_value_double(argv[1]);
        if (p->n)
        {
            register LONGDOUBLE_TYPE newW, delta, R;
            newW = w + p->w;
            delta = x - p->m;
            R = delta * w / newW;
            p->m += R;
            p->s += p->w * delta * R;
            p->w = newW;
        }
        else
        {
            p->m = x;
            p->w = w;
            p->s = 0.0L;
        }
        p->n++;
    }
}

SQLITE_PRIVATE void wmeanUnStep(sqlite3_context *context, int argc, sqlite3_value **argv)
{
    wmeanCtx* p = sqlite3_aggregate_context(context, sizeof(wmeanCtx));
    if (p && sqlite3_value_numeric_type(argv[0]) != SQLITE_NULL && sqlite3_value_numeric_type(argv[1]) != SQLITE_NULL)
    {
        register LONGDOUBLE_TYPE x = sqlite3_value_double(argv[0]);
        register LONGDOUBLE_TYPE w = sqlite3_value_double(argv[1]);
        register LONGDOUBLE_TYPE newW, delta, R;
        newW = p->w - w;
        delta = x - p->m;
        R = delta * w / newW;
        p->m -= R;
        p->s -= p->w * delta * R;
        p->w = newW;
        p->n--;
    }
}

SQLITE_PRIVATE void wmeanFinal(sqlite3_context *context)
{
    wmeanCtx* p = sqlite3_aggregate_context(context, 0);
    if (p && p->n > 0)
        sqlite3_result_double(context, p->m);
}

SQLITE_PRIVATE void wvarPFinal(sqlite3_context *context)
{
    wmeanCtx* p = sqlite3_aggregate_context(context, 0);
    if (p && p->n > 0)
        sqlite3_result_double(context, p->s/p->w);
}

SQLITE_PRIVATE void wvarFinal(sqlite3_context *context)
{
    wmeanCtx* p = sqlite3_aggregate_context(context, 0);
    if (p && p->n > 0)
        sqlite3_result_double(context, p->s/p->w*(LONGDOUBLE_TYPE)p->n/(LONGDOUBLE_TYPE)(p->n - 1));
}

SQLITE_PRIVATE void wstdevPFinal(sqlite3_context *context)
{
    wmeanCtx* p = sqlite3_aggregate_context(context, 0);
    if (p && p->n > 0)
        sqlite3_result_double(context, sqrtl(p->s/p->w));
}

SQLITE_PRIVATE void wstdevFinal(sqlite3_context *context)
{
    wmeanCtx* p = sqlite3_aggregate_context(context, 0);
    if (p && p->n > 0)
        sqlite3_result_double(context, sqrtl(p->s/p->w*(LONGDOUBLE_TYPE)p->n/(LONGDOUBLE_TYPE)(p->n - 1)));
}

SQLITE_PRIVATE void wsemFinal(sqlite3_context *context)
{
    wmeanCtx* p = sqlite3_aggregate_context(context, 0);
    if (p && p->n > 0)
        sqlite3_result_double(context, sqrtl(p->s/p->w*(LONGDOUBLE_TYPE)p->n/(LONGDOUBLE_TYPE)(p->n - 1))/sqrtl(p->n));
}

SQLITE_PRIVATE void wciFinal(sqlite3_context *context)
{
    wmeanCtx* p = sqlite3_aggregate_context(context, 0);
    if (p && p->n > 0)
        sqlite3_result_double(context, sqrtl(p->s/p->w*(LONGDOUBLE_TYPE)p->n/(LONGDOUBLE_TYPE)(p->n - 1))/sqrtl(p->n)*1.96L);
}
18:12 Reply: Hot backup database in WAL mode by coping artifact: 1307e36986 user: kmedcalf

That is why I consider just backing up -wal as well (and restoring). It should still be consistent, correct?

Yes. You will have to take and hold an "intent to write" (BEGIN IMMEDIATE) to prohibit any other connection from writing to the wal file.

Out of curiosity, what is the problem there? Can opening another file descriptor for the same file disturb SQLite in some ways?

This is an Operating System issue. The Operating System tracks file locks by process (POSIX/Linux) rather than by handle (Windows). This means that on an Operating System which claims it is POSIX compliant, opening and then closing a file using the platform calls will cause all locks for that file held by the current process to be released without notification of this state of affairs.

On Windows the "files" may remain "locked" even though the process is terminated or the file is closed, and the only way to fix that is to cold-start the Operating System (a reboot). This state of affairs comes to pass because of the way that locks are tracked on Windows.

Some other Operating Systems do things differently by other methods and may have other varying behaviours and design features (and design flaws).

This is documented (at least for the case of POSIX locking) and the SQLite3 library contains code that attempts to work-around the design flaw.

It would be perscpicacious, knowing about the frailty and bad design decisions, to avoid the circumstance in which the issue might arise -- especially when it is so simple to do so.

To ensure that the backup method will work properly and without causing the database to become corrupted, a process that is using the SQLite3 database exclusively via the SQLite3 API must be used to lock out other sqlite3 instances from the database while a separate process must be used to copy the files so as to not render the database locking fubar.

More ↓