SQLite Forum

Timeline
Login

20 most recent forum posts

2021-03-02
09:15 Reply: Index organized tables (artifact: cb5ba34e72 user: cuz)

That (IOT) is the default mode for SQLite - in fact SQLite doesn't have any alternative way to organize a table.

Technically, in SQLite every table is itself a BTREE Index with data, much like a "Covering Index".
The Index key along which the typical table is structured, is called the RowID, and a good way to refer to such an SQLite table is: "rowid-table".

All access to a table is via lookups of this RowID Key - even if the table has no declared key of any sort in its CREATE TABLE schema. This allows some funky things in SQLite that you do not see in other DBs, for instance:

  • a declared primary Key in SQLite CAN contain NULLs (because it is not typically used as the real Table lookup Key - that is still done via the rowid),
  • a Table without any defined Keys/Indexes can still have very fast lookups via the rowid.

There is an optimization, as Larry mentioned, whereby you can use another primary Key as this table lookup key, by specifying it as the explicit PRIMARY KEY and then declaring the table as "WITHOUT ROWID". (We can refer to that as a non-rowid-table). The advantages here are:

  • The table BTREE Key is organized along the given primary key order,
  • The space that the rowid portion of the table would have taken up is freed, and
  • Lookups via the Primary Key locates the row directly.
    Disadvantage: A straight integer lookup in a Rowid BTREE is marginally faster than a text key. Oh, and it's a right pain for DB management applications which were used to manage everything via the row-id! :)

A normal rowid-table most closely resembles an Oracle IOT based on an integer primary key, and the non-rowid-table optimization above most closely resembles an Oracle IOT table based on any other type of Key.

As an aside: The table lookup BTREE key in SQLite (the key used for table lookups - i.e. the rowid in rowid-tables, or the PK in non-rowid-tables) has some interesting special rules:

  • It cannot contain NULLs - unlike other "normal" PRIMARY KEYs in SQLite.
  • You can refer to the RowID of any rowid-table by the named field "oid", "rowid" or "rowid", without it being declared in the schema.
  • You can explicitly declare a field that is a reference/alias to the rowid of a rowid-table by declaring the column using this exact phrase: "{column} INTEGER PRIMARY KEY, "
  • A rowid table without any explicit column that references/aliases its rowid, can actually rearrange the row-ids if need be, during a vacuum for instance - so if you intend to depend on the rowid for future lookups using your own stored values, best to properly reference/alias it as shown above.

So then, Oracle probably got the IOT idea from SQLite. :)
Actually the IOT optimization is commonly understood by DB designers since long ago, SQLite probably chose it at the start for that reason (or perhaps it was just easier to do - I'm just guessing now).

07:28 Reply: Feature request: Import csv into a table with generated columns (artifact: 3dfc90e341 user: anonymous)

Another alternative, which I have used, is to create a view and a trigger attached, and then to import into the view. (I did this because I needed special parsing and conversion, although it would probably work for this purpose too.)

03:09 Post: Misoptimization in ieee754func (artifact: d5c9502458 user: yongheng)
In case you miss it, I opened a ticket at https://www.sqlite.org/src/tktview/22dea1cfdb9151e48fe99cebbd3c92519d334de6.

When using optimization level >= 2, many compilers misoptimize ieee754 in SQLite, including the latest release version of GCC and clang.

The misoptimized code is in https://github.com/sqlite/sqlite/blob/4ad1a1c74caeee76124091e6e1771fae4afa1a33/ext/misc/ieee754.c#L165

I think the integer overflow at https://github.com/sqlite/sqlite/blob/4ad1a1c74caeee76124091e6e1771fae4afa1a33/ext/misc/ieee754.c#L186 caused the problem.
02:59 Reply: Feature request: Import csv into a table with generated columns (artifact: 104fe6e571 user: larrybr)

The doc on this shell feature shows what is happening. I agree that it would be more convenient if the .import could figure out which columns in the receiving table are susceptible to update, then try the insert if the .csv has the right column count. It is unlikely to do something that sophisticated, which would require parsing the schema.

If your .csv had a column-name header, it would .import very easily into a to-be-created temp table, from which an INSERT statement with a select can take values from that temp table. This can also be easily written. This ease and modularity takes some of the steam out of creating .import enhancements.

In my enhanced shell, when '.header always' is in effect, the .import creates INSERT statement(s) using the .csv (or .tsv) header to name incoming columns, without being picky about mapping them one-to-one with receiving table columns. This is convenient when columns have default or generated values and the file data is short some column(s) as in your case. Unfortunately, I have fallen behind on keeping that shell updated with the last year of enhancements made to shell.c published by the SQLite project.

2021-03-01
22:25 Reply: Index organized tables (artifact: f07fe50186 user: dnamlin)

Any roadmap for index dedup? (Another nice space-saver for certain common cases!)

21:32 Post: SQLite using ADO.Net with Types Other Than Long, Double, and String (artifact: bdd957e82e user: jblackjim)

I have an application that uses ADO.NET with SQL Server, and I now want to have it support SQLite without causing a complete rewrite. The more rigid .NET data types (e.g. DateTime, Single, Int32) are everywhere in in-memory DataTables in the code, so I need to continue using those types in memory. If I define and fill these tables via DataAdapter.FillSchema and DataAdapter.Fill with a SQLite connection, it makes all columns String, Long, or Double. If I convert the columns in these in-memory DataTables to (DateTime, Single, and Int32) and then try to call DataAdapter.Update back to SQLite with in-memory data types of DateTime, Single, or Int32, I get concurrency exceptions. So I convert the types back to (String, Long, and Double) right before the call to DataAdapter.Update. But this seems to mess up the underlying SQL and parameters in the Update Command. Is there a practical ADO.NET solution for in-memory data types differing from Long, String, and Double types?

17:01 Edit reply: Tracing why SQLITE is making an index decision (artifact: d586737c58 user: deonb)

In this case, all the DateRangeStart and DateRangeEnd values are identical so there wouldn't be any actual difference in chunking.

The more interesting question is why would 2 database files that have identical schema, similar table sizes, and identical sqlite_stat1 entries for the two indexes behave differently?

What other information (apart from schema and sqlite_stat1) is SQLite looking at when it makes an index decision based on ANALYZE? I don't mind the behavior - I mind not being able to predict the behavior.

PS: <= DateRangeStart and >= DateRangeEnd are both asking to search an index to a specific point and counting everything after that until it hits the other value. It should behave similarly if DateRangeStart is DESC, but SQLITE clearly has a preference for the >= and ASC over <= and DESC.

16:59 Reply: Tracing why SQLITE is making an index decision (artifact: a803755950 user: deonb)

In this case, all the DateRangeStart and DateRangeEnd values are identical so there wouldn't be any actual difference in chunking.

The more interesting question is why would 2 database files that have identical schema, similar table sizes, and identical sqlite_stat1 entries for the two indexes behave differently?

What other information (apart from schema and sqlite_stat1) is SQLite looking at when it makes an index decision?

PS: <= DateRangeStart and >= DateRangeEnd are both asking to search an index to a specific point and counting everything after that until it hits the other value. It should behave similarly if DateRangeStart is DESC, but SQLITE clearly has a preference for the >= and ASC over <= and DESC.

16:55 Post: Feature request: Import csv into a table with generated columns (artifact: dac8559275 user: hanche)

Consider this very simple example (if you'll excuse my unusual prompt):

▶▶▶ create table p(a integer, b integer, c integer as (a + b) stored);
▶▶▶ insert into p values(40,2);
▶▶▶ select * from p;
a   b  c
--  -  --
40  2  42
▶▶▶ .shell cat p.csv
1,3
2,4
▶▶▶ .mode csv
▶▶▶ .import p.csv p
Error: table p has 2 columns but 3 values were supplied

The insert statement works fine, evidently ignoring the generated column. But the .import fails with a rather puzzling error message. (It doesn't matter whether the column is stored or not.)

I think it would be useful to be able to import csv files directly into a table with generated columns. As it is, the easiest workaround is to import the csv into a temporary table, then insert from there.

(I labeled this a feature request, but it seems to me to exist in the grey zone between a feature request and a bug report. Make of it what you will. But the error message is at least misleading.)

– Harald

16:54 Post: SQLite Version 3.35.0 - Beta 1 (artifact: bd047d7107 user: drh)

We have reached "pencils down" for the SQLite 3.35 development cycle. Our goal is no further changes other than minor code formatting, or documentation, or bug fixes.

A new prerelease snapshot is available. Call this snapshot 3.35.0-beta-1. Please download and test. Report any problems or concerns to this forum, or via private email to drh at sqlite dot org.

See the change log for a summary of the many enhancements in the 3.35 release.

The target release date is 2021-03-30, though that date might be moved sooner or later depending on how things go over the next week. A release checklist for version 3.35.0 is on-line. The release will happen when that checklist goes all green. As I type this, the checklist is all white - indicating that nothing has been checked off yet. Green probably won't start appearing until one week prior to the anticipated release.

13:34 Reply: Database UTF-16 support (artifact: 817cda25ee user: kmedcalf)

sqlite3_exec does not do UTF-16, either for the SQL (command) or for the results.

As a primitive interface it only does UTF-8.

If you wish to use UTF-16 then you need to use the full prepare/bind/exec/column interfaces that have UTF-16 versions.

13:29 Reply: Database UTF-16 support (artifact: 5b4de9df31 user: kmedcalf)

You can let SQLite choose a format suited to your setup

Actually no. Left to it's own devices SQLite3 will, unless you set the default differently at compile time, default to UTF-8 encoding for the database encoding.

You may specify the encoding to be used for text data stored in the database using pragma encoding as documented.

The encoding of the "internal data" stored by the database (the database encoding) is quite independent of the format used to send or retrieve data between SQLite3 and your application, and format conversions will be carried out between encodings as needed.

12:54 Reply: Database UTF-16 support (artifact: bdb855fb77 user: anonymous)

That's nice. So I have read that one can do this. The big question is how to accomplish this.

Let's begin with a Heads off for writing the Unicode functions in C.

So I want to query something and use this API call: rc = sqlite3_exec(DB, command, exec_result, 0, &ErrMessage);

DB is of course a pointer. assume the command is a valid SQL-statement using UTF-16. Will this work? The callback is execute_result.

The UTF-8 callback looks like this: exec_result(void *unused, const int columns, char** data, char** column_name)

So how do I define a UTF-16 callback? char16** data e.g.?

By the way is the int 32 bit or 64 bit?

12:37 Reply: Database UTF-16 support (artifact: fee5433f21 user: slavin)

The API for SQLite3 contains routines that accept text as both UTF-8 and UTF-16 in the native byte order of the host machine. Each database file manages text as either UTF-8, UTF-16BE (big-endian), or UTF-16LE (little-endian). You can let SQLite choose a format suited to your setup, or you can specify one yourself:

https://sqlite.org/pragma.html#pragma_encoding

SQLite API calls will accept whatever form you specify and do any needed conversion internally. You can, of course, speed this up by being consistent in the form you use, which reduces the conversions needed.

12:23 Edit reply: Tracing why SQLITE is making an index decision (artifact: 95e0034353 user: slavin)

I've fixed this specific issue by changing the index from

CREATE INDEX dynidx_2f546fcb2a782272b0363f4596c89c7dc0674084 on Media(DateRangeStart,DateRangeEnd) where SearchPartitions=1

to

CREATE INDEX dynidx_2f546fcb2a782272b0363f4596c89c7dc0674084 on Media(DateRangeEnd,DateRangeStart) where SearchPartitions=1

Bingo. Here's what's happening. The ANALYZE command lets SQLite see how 'chunky' each column is. Suppose you are trying to search a 10000 row table on both DateRangeStart and DateRangeEnd. And there are 4 different values for DateRangeStart but 100 different values for DateRangeEnd. The chunks in DateRangeStart are bigger, and there are fewer of them.

If you search using an index on DateRangeStart it's going to narrow down your search to 2500 rows. But if you search using an index on DateRangeEnd you'll get just 100 rows, which will get you closer to the rows you want, and do it faster.

So SQLite is weighing up the advantages of searching an existing index which doesn't help much, against just iterating through the whole table, which gives it all the values it needs (which in this case you don't care about because you're just counting them, but SQLite doesn't know that).

So what you asked for … a general understanding of the problem rather than a solution to just this one example … is that the order of terms in an index matters.

12:21 Reply: Tracing why SQLITE is making an index decision (artifact: 8319882076 user: slavin)
I've fixed this specific issue by changing the index from
CREATE INDEX dynidx_2f546fcb2a782272b0363f4596c89c7dc0674084 on Media(DateRangeStart,DateRangeEnd) where SearchPartitions=1
to
CREATE INDEX dynidx_2f546fcb2a782272b0363f4596c89c7dc0674084 on Media(DateRangeEnd,DateRangeStart) where SearchPartitions=1

Bingo. Here's what's happening. The ANALYZE command lets SQLite see how 'chunky' each column is. Suppose you are trying to search a 10000 row table on both DateRangeStart and DateRangeEnd. And there are 4 different values for DateRangeStart but 100 different values for DateRangeEnd. The chunks in DateRangeStart are bigger, and there are fewer of them.

If you search using an index on DateRangeStart it's going to narrow down your search to 2500 rows. But if you search using an index on DateRangeEnd you'll get just 100 rows, which will get you closer to the rows you want, and do it faster.

So SQLite is weighing up the advantages of searching an existing index which doesn't help much, against just iterating through the whole table, which gives it all the values it needs (which in this case you don't care about because you're just counting them, but SQLite doesn't know that).

So what you asked for … a general understanding of the problem rather than a solution to just this one example … is that the order of terms in an index matters.

11:08 Reply: Index organized tables (artifact: 15c1f932e9 user: larrybr)

It's called WITHOUT ROWID.

10:26 Post: Index organized tables (artifact: d0481d8820 user: walrus)

Oracle database supports so called index organized tables (IOT) where all data is stored in index B*-tree effectively eliminating duplication of data. It also improves performance of queries which use primary key.

I wonder whether SQLite supports something similar?

09:36 Reply: Are the changes done after a transaction is performed always available to all connections to a database? (artifact: 2150324614 user: leandro)

Thank you (and all the other folks) for the answers.

From my application's perspective, that's the case, as a read transaction starts (that's guaranteed by the go channel semantics) only after a commit has been performed, but I confess I'll have to debug it to figure out whether that's done correctly.

I noticed that the Go database wrappers are quite "thick", with many layers from the users till the calls to the database itself, and I'll try to understand better how it decides to start and commit a transaction (apart from the explicit "BEGIN" and "COMMIT).

09:35 Reply: Are the changes done after a transaction is performed always available to all connections to a database? (artifact: 57c9b1003f user: leandro)

I am not sure about either, as I suspect the issue is caused by bugs in my code, but now I know where to start investigating it :-)

The go sql layer is indeed quite "thick" and there might be the case, but I believe it's way more likely to be caused by my application itself.

More ↓