SQLite Forum

Timeline
Login

20 forum posts by user w.oertl

2021-07-28
20:59 Reply: create sqlite3_analyzer from trunk (artifact: b682b9296b user: w.oertl)

Old thread, but I have an observation to add: when configure is being run with the --disable-tcl argument, the Makefile variables needed to build anything tcl-ish (including sqlite3_analyzer) are not set properly (TCC, LIBTCL). "make sqlite3_analyzer" will therefore fail.

So I think tcl headers and libraries should be looked for even in the presence of --disable-tcl. See the attached patch for how it might work. I hereby declare this patch to be public domain, no strings attached, which I'd gladly repeat in form of an affidavit if you tell me how to do this.

BTW, after running automake, I get errors about the recently added AMALGAMATION_LINE_MACROS.

Index: configure.ac
==================================================================
--- configure.ac
+++ configure.ac
@@ -287,13 +287,11 @@
 # This code is derived from the SC_PATH_TCLCONFIG and SC_LOAD_TCLCONFIG
 # macros in the in the tcl.m4 file of the standard TCL distribution.
 # Those macros could not be used directly since we have to make some
 # minor changes to accomodate systems that do not have TCL installed.
 #
-AC_ARG_ENABLE(tcl, AC_HELP_STRING([--disable-tcl],[do not build TCL extension]),
-      [use_tcl=$enableval],[use_tcl=yes])
-if test "${use_tcl}" = "yes" ; then
+use_tcl=yes
   AC_ARG_WITH(tcl, AC_HELP_STRING([--with-tcl=DIR],[directory containing tcl configuration (tclConfig.sh)]), with_tclconfig=${withval})
   AC_MSG_CHECKING([for Tcl configuration])
   AC_CACHE_VAL(ac_cv_c_tclconfig,[
     # First check to see if --with-tcl was specified.
     if test x"${with_tclconfig}" != x ; then
@@ -455,15 +453,17 @@
     AC_SUBST(TCL_STUB_LIB_FILE)
     AC_SUBST(TCL_STUB_LIB_FLAG)
     AC_SUBST(TCL_STUB_LIB_SPEC)
     AC_SUBST(TCL_SHLIB_SUFFIX)
   fi
-fi
-if test "${use_tcl}" = "no" ; then
+
+AC_ARG_ENABLE(tcl, AC_HELP_STRING([--disable-tcl],[do not build TCL extension]),
+      [tcl_ext=$enableval],[tcl_ext=yes])
+if test "${use_tcl}" = "yes" -a "${tcl_ext}" = "yes"; then
+  HAVE_TCL=1
+else
   HAVE_TCL=""
-else
-  HAVE_TCL=1
 fi
 AC_SUBST(HAVE_TCL)
 
 ##########
 # Figure out what C libraries are required to compile programs
16:15 Reply: Searching in text columns with Unicode folding (artifact: f6ef51f436 user: w.oertl)

Thanks for sharing. Unfortunately this query won't use an index, but perform a full table scan.

2021-07-27
22:57 Post: FAQ for ALTER TABLE is outdated (artifact: 284232ba8b user: w.oertl)

The FAQ entry doesn't mention the ability to do RENAME COLUMN and DROP COLUMN.

22:44 Post: Searching in text columns with Unicode folding (artifact: 0c8af2da92 user: w.oertl)

I'm still struggling to achieve Unicode aware, case insensitive LIKE searching using an index. This isn't exotic and should be easy! Think about names containing letters with diaeresis in languages like Spanish, German, French, Turkish etc. The FAQ entry isn't quite satisfying.

A COLLATE NOCASE index will only handle ASCII. What I really would like to see is a COLLATE UNICODE (or something similar) that will simply do the right thing. Is there any chance of this happening (in the foreseeable future)? The necessary machinery seems to be already available in the FTS5 extension in the unicode61 tokenizer, see below.

The icu extension doesn't really help, because it replaces the SQL like() function which automatically disables the use of any index for the LIKE operator, regardless of the collation, so this will slow down with large data sets. As it affects every LIKE operator, not just specific ones, it might have unwanted side effects.

The FTS5 extension does Unicode folding. So I can create an FTS5 virtual table and use that for searching. I'm not sure about the overhead (less with the options detail=column and content/content_rowid), but at least it is usable right now. It makes the queries a little more complicated, though, as both the original table and the virtual table are involved.

Another option I see is to create "shadow" columns for the text fields in the same table and automatically fill them by generating and indexing it like this:

CREATE TABLE t1 (
  name TEXT NOT NULL,
  name2 TEXT NOT NULL AS (fold_unicode(name)) STORED
);
CREATE INDEX t1_name2 ON t1(name2 COLLATE NOCASE);
SELECT rowid, name FROM t1 WHERE name2 LIKE ?;

The missing piece is the folding function, which exists buried somewhere in the FTS5 extension. Unfortunately, it is not exposed as an SQL function. How about that? I could implement it myself, but it won't be as complete and the database won't be usable without loading the custom extension.

Are there any (simpler) other options I have overlooked?

2021-07-21
20:43 Reply: SELECT optimization for constant expression (artifact: 70fa1b3ee4 user: w.oertl)

This other thread about LIKE was also started by me and mostly concerned the ICU extension (for Unicode) in conjunction with LIKE. Anyway, the splitting above is useless. SQLite will make a new query plan after binding of parameters and therefore can detect LIKE patterns that do not start with a wildcard character, and it will automatically use an index; a few more conditions have to be met, however - read the fine material (esp. the last paragraph of section 5).

15:26 Reply: SELECT optimization for constant expression (artifact: 6f6d15de09 user: w.oertl)

I've just tried this in PostgreSQL for comparison on a table with 1000 rows and three columns (id, a TEXT, b TEXT) and indices on a and b.

wolfgang=> explain select * from t1 where (a = 'a500' or 'a500' is null);
                           QUERY PLAN                           
----------------------------------------------------------------
 Index Scan using t1_a on t1  (cost=0.28..8.29 rows=1 width=12)
   Index Cond: (a = 'a500'::text)
(2 Zeilen)

wolfgang=> explain select * from t1 where (a = 'a500' or 'a500' is null) and (b = null or null is null);
                           QUERY PLAN                           
----------------------------------------------------------------
 Index Scan using t1_a on t1  (cost=0.28..8.29 rows=1 width=12)
   Index Cond: (a = 'a500'::text)
(2 Zeilen)

wolfgang=> explain select * from t1 where (a = 'a500' or 'a500' is null) and (b = 'b500' or 'b500' is null);
                           QUERY PLAN                           
----------------------------------------------------------------
 Index Scan using t1_b on t1  (cost=0.28..8.29 rows=1 width=12)
   Index Cond: (b = 'b500'::text)
   Filter: (a = 'a500'::text)
(3 Zeilen)

So here the behavior is as I expect, assuming that "Index Scan" means an index lookup - one index is used, and the other condition is either ignored (if the parameter is NULL) or applied as a filter. It seems to work exactly the same with parameters, like this:

PREPARE foo1 (text) AS SELECT * FROM t1 WHERE ($1 IS NULL OR a=$1);
EXPLAIN EXECUTE foo1('a500');
08:50 Reply: SELECT optimization for constant expression (artifact: cadcb4d983 user: w.oertl)

Create the index with COLLATE NOCASE. Only then will LIKE use the index, because LIKE is by default case insensitive. The CREATE INDEX I would have used for my example (not shown) would be like this.

CREATE INDEX df_players_surname ON df_players(surname COLLATE NOCASE);
EXPLAIN QUERY PLAN SELECT * FROM df_players WHERE surname LIKE 'Dennis';
08:26 Reply: SELECT optimization for constant expression (artifact: 212e0589b5 user: w.oertl)

I haven't thought of this - but this makes the query plan more complicated, and does not use the indices. So, this is not an improvement.

08:20 Post: shell: .version missing from .help (artifact: bbe01a358f user: w.oertl)

When typing .help into the SQLite shell, the .version dot command isn't mentioned (tested in the current trunk of the fossil repository).

2021-07-20
23:00 Post: SELECT optimization for constant expression (artifact: 4e6ad52a67 user: w.oertl)

I'm trying to construct an SQL SELECT statement that takes parameters which, if NULL, do not influence the result. Of course performance should be good as well, i. e. indices should be used. The reason for this is that I want to use stored procedures (CG-SQL), so the SQL statements are not dynamically constructed to only contain expressions for non-NULL parameters.

In this simple example with just one column in the table (apart from the rowid), the first select of course yields no rows if the parameter ?1 is NULL, but uses the index. The second works as expected (i. e. would yield all rows if ?1 IS NULL), but performs a full table scan. The third is the same as the second but with literals instead of parameters; even then a table scan is performed.

CREATE TABLE t1 (a TEXT NOT NULL);
CREATE INDEX t1_a ON t1(a);
INSERT INTO t1(a) VALUES ('a1'), ('a2'), ('a3');

.parameter init
.parameter set ?1 'a1'

SELECT * FROM t1 WHERE a=?1;

SELECT * FROM t1 WHERE (?1 IS NULL OR a=?1);

SELECT * FROM t1 WHERE ('a2' IS NULL OR a='a2');

SELECT * FROM t1 WHERE a='a2' OR FALSE;

SELECT * FROM t1 WHERE a='a2' OR TRUE;

The second SELECT could be optimized to an index SEARCH after binding, the third right away and the fourth as well. The last one is being optimized and doesn't check a to be 'a2', in fact it has no condition at all (according to EXPLAIN).

Is there any way to arrive at an index-using query plan for SELECT Nr. 2 to 4 (I guess not), or would this require an improvement to the query planner?

This is an example of an actually useful query to illustrate what I'm trying to arrive at:

SELECT * FROM person
WHERE (?1 IS NULL OR firstname LIKE ?1)
AND (?2 IS NULL OR lastname LIKE ?2)
AND (?3 IS NULL OR birthdate == ?3)
AND (?4 IS NULL OR maidenname LIKE ?4)
ORDER BY lastname, firstname;

Thanks.

2021-07-19
20:29 Reply: LIKE optimization - limitation to string literals (artifact: 5ac7184892 user: w.oertl)

Thanks for the clear and insightful explanation, as always!

06:43 Reply: LIKE optimization - limitation to string literals (artifact: 19b51a9035 user: w.oertl)

Thank you for your suggestions, which I have tried before and found that of course this doesn't work outside of ASCII, like German umlauts. After loading the ICU extension, like() is overridden and indices aren't used - regardless of any fiddling with COLLATE - for all LIKE operators anywhere, a possibly hard to detect side effect even for COLLATE NOCASE indices or fields. I'll have to do without this extension, I think, but then I'm back to an artificial ASCII only name column.

Note that the ICU extension's LIKE stops working after PRAGMA case_sensitive_like is set. This is an interesting side effect that is documented in ICU's README referenced above. There seems no way to go back to the ICU like() implementation after setting the PRAGMA.

.load libIcu.so
CREATE INDEX t1_name ON t1(name);
CREATE INDEX t1_name_nc ON t1(name COLLATE NOCASE);
INSERT INTO t1(name) VALUES ('Ägidius');

EXPLAIN QUERY PLAN SELECT * FROM t1 WHERE name LIKE 'ä%';
PRAGMA case_sensitive_like=on;
EXPLAIN QUERY PLAN SELECT * FROM t1 WHERE name LIKE 'ä%';
PRAGMA case_sensitive_like=off;
EXPLAIN QUERY PLAN SELECT * FROM t1 WHERE name LIKE 'ä%';

Output:

QUERY PLAN
`--SCAN t1
QUERY PLAN
`--SEARCH t1 USING COVERING INDEX t1_name (name>? AND name<?)
QUERY PLAN
`--SEARCH t1 USING COVERING INDEX t1_name_nc (name>? AND name<?)

Only the first variant, before any PRAGMA case_sensitive_like, uses the ICU like() function and yields the desired row, but performs a full table scan.

Disregarding all of the above, I think a query like this should use an index (with COLLATE NOCASE):

SELECT * FROM t1 WHERE name LIKE RTRIM(?) || '%';

As I wrote in my first post, the byte-code for this statement contains a "Once" opcode that skips the computation of the right hand side in the second and later iterations, so the constant nature of the RHS is already known to the query planner.

2021-07-18
22:08 Post: LIKE optimization - limitation to string literals (artifact: 16b3e8bbe5 user: w.oertl)

As stated in the documentation, the LIKE operator will only use an index if the right hand side is a string literal or a parameter bound to a string literal (among a few other conditions). Why is an expression yielding a constant string not acceptable? Specifically, I have tried this:

CREATE TABLE t1 (
  id INTEGER NOT NULL PRIMARY KEY,
  name TEXT NOT NULL,
  name_simple TEXT NOT NULL AS (UPPER(name)) STORED
);
CREATE INDEX t1_name_simple ON t1(name_simple);
PRAGMA case_sensitive_like = on;

EXPLAIN QUERY PLAN SELECT *
FROM t1
WHERE name_simple LIKE 'A%';

EXPLAIN QUERY PLAN SELECT *
FROM t1
WHERE name_simple LIKE UPPER('a%');

Output:

QUERY PLAN
`--SEARCH t1 USING INDEX t1_b (name_simple>? AND name_simple<?)
QUERY PLAN
`--SCAN t1

The first SELECT uses the index, while the second doesn't. How about allowing this kind of RHS? When using EXPLAIN, I see that a "Once" opcode skips over the UPPER() invocation on all but the first iteration, so it seems to be known to the query planner that this is a constant.

When ICU is used, the like() operator is overloaded and then the LIKE optimization is also disabled. I plan to implement a function in place of UPPER that converts strings (persons' names) into plain uppercase ASCII, thereby allowing efficient searching. Of course I could first apply that function from C before binding the parameter, but I'd rather keep the logic in the SQL statement. Thanks for your opinions.

2021-06-25
11:01 Post: .lint fkey-indexes output (artifact: 49407cfb74 user: w.oertl)

Hi all,

when .lint fkey-indexes reports missing indices, single quotes are used in place of double quotes, I think. It works, but is not as recommended. Example output:

CREATE INDEX 'val_parameter_id' ON 'val'('parameter_id'); --> parameter(id)

When looking into shell.c.in at the function lintFkeyIndexes, the SQL statement uses the QUOTE function inappropriately, if I'm not mistaken.

2021-04-08
21:36 Post: VACUUM resets journal_mode wal2 to delete (artifact: e3acdc2a75 user: w.oertl)

I noticed an unexpected behaviour when vacuum'ing a db in wal2 mode. Assume that the file cmds.sql contains:

PRAGMA journal_mode;
PRAGMA journal_mode=wal2;
VACUUM;
PRAGMA journal_mode;

When feeding the following statements to the sqlite3 shell for an on-disk database (can be empty), then exiting the shell and repeating, this is the output:

$ sqlite3 test.db < cmds.sql
delete
wal2
wal2
$ sqlite3 test.db < cmds.sql
delete
wal2
wal2

So after VACUUM the journal_mode remains wal2 at first, but after exiting the shell it is in delete mode. This doesn't happen with wal mode.

If two SQLite shell processes access the same database file simultaneously, the reset to delete seems only to happen after both (all) processes have exited and the journal files have been committed and removed.

Shouldn't the journal_mode be unchanged? Using SQLite 3.35.3, begin-concurrent-pnu-wal2 branch.

2021-04-06
20:41 Reply: books / papers on "one sqlite per user" design ? (artifact: c89ce9fae8 user: w.oertl)

You might want to take a look at ActorDB for a working example. It builds on SQLite and splits data to many small database files called actors.

Well, actually it is much more complicated than that, see how it works. You can have one database (actor) per user, for example. It is mostly implemented in Erlang and hasn't seen much activity lately, and the latest binary release doesn't run on my system. YMMV.

2021-04-05
20:21 Reply: .import csv and NULL values (artifact: cf99368fe4 user: w.oertl)

It is an excellent answer to treat empty values as NULL, and "" as empty strings, as no explicit NULL string is required. Of course this behaviour needs to be off by default, only to be enabled with an option. As Simon says, this isn't in the standard, but OTOH it doesn't say there can't be missing values. I'll give it a try.

2021-04-03
20:18 Post: .import csv and NULL values (artifact: 9da85fe4fc user: w.oertl)

There doesn't seem a way to import NULL values in the shell using the .import command. Missing values are treated as empty strings.

When I import into a table with constraints this might fail if either NULL or a valid value is enforced, with an empty string not being valid. Furthermore, I have to replace empty strings in nullable fields with NULL using UPDATE commands. I think that .import is much faster than .read with lots of INSERT statements.

So, I suggest that .import should provide a new option --nullvalue or so, just like the .nullvalue command is used for exporting. Another idea is that a missing value is NULL, and '' is an empty string. I could probably provide a patch unless there is some objection or a better idea.

2020-09-06
19:39 Post: Writer Concurrency - server-process-edition (artifact: 6534ede66e user: w.oertl)

Hi,

if I'm not mistaken, SQLite allows only one writer at a time, regardless of the journal mode. I'd like to use SQLite for an application server, so concurrent writing would be a significant performance factor. At least I think so, and this might be my fallacy.

There is/was a branch called server-process-edition, which tackled this issue by doing page-level locking (instead of file-level) and using multiple journal files, one per concurrent write. While this looks promising, not much has happened since then.

I've tried to compile this server-process-edition branch, but it doesn't out-of-the-box. After some tinkering it did compile, but I failed to run the benchmark so far.

Is there any chance for this to be revived? Has development taken a different route, so that concurrency would better be increased by different methods?

Thanks, W. Oertl

2020-07-03
19:26 Post: Bug in .import --skip (v3.33.0, trunk) (artifact: f9bbcd435d user: w.oertl)

I noted that when using the new --skip option for the shell's .import command, the line numbers are incorrect. The reason is that while skipping, xRead (either ascii_read_one_field or csv_read_one_field) increments ImportCtx.nLine, and the loop to skip lines does it too:

shell.c:8137

while( (nSkip--)>0 ){
  while( xRead(&sCtx) && sCtx.cTerm==sCtx.cColSep ){}
  sCtx.nLine++;
}

The line numbers reported in error messages and (when using -v) the number of used lines are too high by the number of skipped lines.

The fix is, of course, to remove the sCtx.nLine++ statement.

The --skip option does not complain about negative values or otherwise unparseable values.