SQLite Forum

Timeline
Login

34 forum posts by user andreas-kupries

2021-10-13
10:15 Reply: binary .dump format (artifact: a67aa42b9a user: andreas-kupries)

A binary dump format is not only not available, it's not even theoretically possible since any byte sequence can be present in any blob which makes format bytes impossible, or at least very difficult - something like MIME boundaries works but is heavy on checking for boundary matches in content, which doesn't matter much for typical e-mail, but will for DBs. It also means even small blobs will add large boundary identifiers, which will take away from the saving.

I disagree here. A format using a length/value format would need no encoding of the blobs, nor boundary markers. I.e. each value, binary or not, would be preceded by its length in bytes. This could a be simple fixed 64bit unsigned int, or some kind of varint coding to keeping coding of small numbers short(er). Lots of varints codings to choose from.

A natural extension of LV is TLV, i.e. adding a tag before the LV sequence. The tag describes what the value is for. Lots of binary formats using this kind of structure (TIFF, JPEG, EXIF, PNG, etc. ...)

Note that I read the original request as a binary format for the entire dump, and not just the binary blobs he has.

07:22 Reply: binary .dump format (artifact: 84c3501432 user: andreas-kupries)

Have you tried using a textual .dump followed by a compression stage, be it gzip, bzip2, xz, or zstd ?

Would that get you back into a size range you would consider acceptable ?

Note that AFAIK no binary .dump format has been defined for sqlite itself.

2021-09-11
17:03 Reply: Terminal Interface to Sqlite (artifact: 40976abe0c user: andreas-kupries)
2021-09-04
09:38 Post: Terminal Interface to Sqlite (artifact: 5cf05a5322 user: andreas-kupries)

HN: SQLite-TUI: A TUI for viewing SQLite databases

Sources: Github

Go-based. MIT licensed.

2021-07-31
07:07 Post: HN article: Static.wiki – read-only Wikipedia using a 43GB SQLite file (artifact: 8537d5c3c1 user: andreas-kupries)

Some discussion of limitations in sqlite's FTS extension.

2021-05-17
11:13 Post: FYI HN: Geocode-sqlite: Geocode rows in an SQLite database table (artifact: 3b78c1f78f user: andreas-kupries)
2021-05-04
11:53 Reply: "Office Space"... I'm losing all those pennies (artifact: bc6fc7b6ea user: andreas-kupries)

Is it possible to store the values as pennies ? That would be integer numbers. The math in your system would need to be adapted, i.e fixed point math, instead of floating point.

Maybe store even integers representing 1/1000 of a penny or some such. Still fixed point math, with some guard digits before you have to round.

I.e. 1.23 [USD] would be 123 [Penny]. Or 123000 [1/1000th penny].

11:49 Post: Sqlite User - Who's On First - Place Gazetteer - Geo Data (artifact: abfc9abfec user: andreas-kupries)
2021-04-07
08:48 Post: Saving Wikipedia to SQLite (artifact: 198e1a104a user: andreas-kupries)

See this reddit post by David Shapiro. Just seen on HN. No discussion yet.

2021-01-19
13:59 Edit reply: Need If column exist option in SQLite (artifact: e9480087e9 user: andreas-kupries)

Right. The idea I was talking about I had seen in code which allowed different engines as backend. It uses Mysql in the default config, and could be switched to use a Postgres backend. So that idea is generally applicable. Good if you don't know what database will be used, or know that things have to be portable across several engines.

deep schema introspection Yes, sqlite has a few pragmas which help there to figure out tables, columns, indices, foreign keys, default values, etc. IIRC I have seen usage of these in the Datasette (python) code.

13:57 Reply: Need If column exist option in SQLite (artifact: d1094e190e user: andreas-kupries)

Right. The ideas I was talking about I had seen in code which allowed different engines as backend. It uses mysql in the default config, and could be switched to use a postgres backend. So that idea is generally applicable. Good if you don't know what database will be used, or know that things have to be portable across several engines.

deep schema introspection Yes, sqlite has a few pragmas which help there to figure out tables, columns, indices, foreign keys, default values, etc. IIRC I have seen usage of these in the Datasette (python) code.

13:01 Reply: Need If column exist option in SQLite (artifact: db5207f2cc user: andreas-kupries)

This kind of thing I have usually seen handled in the application (logic) instead of via SQL statements.

The basic idea was to have a fixed table appschema or the like which has a single row listing the version of the application's schema found in the database.

When the application starts and accesses the database it compares the version from that table with the version it itself supports. Then

  • If the database is ahead of the application you can only abort.

  • If, on the other hand, the application is ahead of the database then it is expected to have code to update the old database schema to the new schema. This kind of code is usually written as a series of functions, each of which will incrementally update the schema from one version to the next. The application then simply invokes the functions for the "(database schema).next" up to "latest", in order, stepping the schema forward in easy increments. After that completes successfully it can then use the database as normal.

2020-11-17
18:28 Reply: FYI: Personal Data Warehouses: Reclaiming Your Data (artifact: c2d057ab7f user: andreas-kupries)

Heh. While I was thinking them complementary too, it was more wondering about how feasible it would be to replace perkeep's indexing with something based on dogsheep. It would at least need some importer for the perkeep schema documents. Plus ways to import information about non-schema documents (like for the meta data supported by various image formats, etc.).

With the reference to Dropbox I now also wonder if perkeep has importers for data in Dropbox, or Google drive, or ... Found an issue for writing an importer from gdrive.

2020-11-14
22:54 Reply: FYI: Personal Data Warehouses: Reclaiming Your Data (artifact: 0a49d28908 user: andreas-kupries)

Replying to myself, now that I have read the article, the first thing coming to mind is perkeep.

DS and PK seem to both have a go at the same kind of thing, although from different angles. PK is more looking at the storage and having that under control. It has indexing and searching, however that is a custom language and limited to what the PK engine is indexing.

DS OTOH looks to me to be all about the indexing, and the searching/analysis. Where the data sources live is not as important, only that they can be indexed (converted to sqlite) in some way, and then datasette can have a go at it.

As another connection, the article mentions Wireguard/Tailscale for personal mesh networking. The main PK developer, Brad Fitzpatrick works at Tailscale.

19:03 Post: FYI: Personal Data Warehouses: Reclaiming Your Data (artifact: 1508e49fe9 user: andreas-kupries)

Article found at HN by our simonw/datasette: Personal Data Warehouses: Reclaiming Your Data

2020-09-20
04:34 Reply: 64-bit Win compile of Tcl interface, revisited (artifact: e6a63820a3 user: andreas-kupries)

Hi Gerry.

The missing symbols look to be something which should be provided by Tcl's stubs library, i.e. libtclstub.a or whatever the equivalent name under Windows(tm).

I suspect that whatever Makefile you use is not supplying this library to the link step, or supplying it incorrectly.

As I am not firm (anymore) with regard building on Windows, I would recommend/propose to ask the question on the tcl-core mailing list, and then feed any corrections you get back to Richard so that he can update the Makefile in question.

IIRC Jan Nijtmans is doing Windows-builds, and listening on Tcl-core. Other names I remember as associated with Tcl on Windows are Harald Oehlmann and Francois Vogel, also on that list.

Hopefully this helps.

2020-09-17
17:46 Edit reply: Trigram indexes for SQLite (artifact: bc02b03610 user: andreas-kupries)

You are welcome.

I do not believe that the idea is original to me, although I do not remember having it seen anywhere else.

I should note that I expect the suffix table to be quite a bit larger than the word table, and the link table to be much larger. Plus the index on link.

So this looks to be very much trading memory/space for speed.

16:26 Reply: Trigram indexes for SQLite (artifact: 369988dac1 user: andreas-kupries)

You are welcome.

I do not believe that the idea is original to me, although I do not remember having it seen anywhere else.

I should note that that I expect the suffix table to be quite a bit larger than the word table, and the link table to be much larger. Plus the index on link.

So this looks to be very much trading memory/space for speed.

03:12 Reply: Trigram indexes for SQLite (artifact: ef05429536 user: andreas-kupries)

An alternate to these trigram indices is a schema of the form

table word   (id, string);
table suffix (id, string);
table link   (suffix, word);
index link_s on link (suffix);

The basic idea behind this is that any substring S of a word W is the prefix of a suffix of W.

Whenever a word W is added to word, all the suffices of W are added to suffix, and link is extended also, to map from the suffices to W.

Now a substring search for the words matching %foo% becomes a prefix search foo% in suffix, and we get the relevant word id by joining link.

Roughly:

SELECT DISTINCT word
FROM   link
WHERE  suffix IN (SELECT id
	          FROM   suffix
	          WHERE  suffix LIKE 'foo%')

Note, this gives us only the word ids. To get the actual words we have to join word as well. That exercise is left for the reader.

2020-08-27
20:15 Reply: SQLITE_ENABLE_STAT4 triggers failures in analyze3.test (artifact: 2d4b0dada7 user: andreas-kupries)

I do not believe so. n! is when order matters, i.e. you have n! permutations of n elements.

However for the sqlite config flags I am pretty sure that order does not matter, thus we have combinations, not permutations, and only 2^87 = 154,742,504,910,672,534,362,390,528 of them (assuming the flags are simple on/off, or other binary).

2020-07-29
20:28 Reply: Typos in SQLite source code (artifact: 334f9cdf94 user: andreas-kupries)

Looks to be. Thank you.

16:59 Reply: Typos in SQLite source code (artifact: 480ff57197 user: andreas-kupries)

Is this tool open somewhere ? If yes, what is its license ?

16:49 Reply: Typos in SQLite source code (artifact: b9cac775b2 user: andreas-kupries)

A compromise and intermediate way of fixing things might be to

  • apply the tool to all newly added/changed files as sqlite development goes forward and
  • fix all the typos it found in these files.

That way we should have less new typos going forward, and old typos are incrementally expunged as the files they are in get modified.

That should leave only the typos in files truly not changed for many years.

2020-06-11
21:03 Edit reply: DROP VIEW IF EXISTS failure (artifact: f7b3ac9fe1 user: andreas-kupries)

I suspect that your basic misconception is the belief that tables and views occupy separate namespaces. IOW that you can have a view "test", and a table "test", both, at the same time.

This is not the case. Views, as a kind of virtual table are in the same namespace as tables itself.

In your example you create an object named "test", which is of type "table". The IF EXISTS of the DROP VIEW finds that object, and then sqlite notes the type mismatch, view vs. table, and complains.

I hope that this explanation helps.

21:03 Reply: DROP VIEW IF EXISTS failure (artifact: b33eceef2d user: andreas-kupries)

I suspect that your basic misconception is the belief that tables and views occupy separate namespaces. IOW that you can have a view "test", and a table "test", both, at the same time.

This is not the case. Wiews, as a kind of virtual table are in the same namespace as tables itself.

In your example you create an object named "test", which is of type "table". The IF EXISTS of the DROP VIEW finds that object, and then sqlite notes the type mismatch, view vs. table, and complains.

I hope that this explanation helps.

2020-05-29
20:56 Reply: shell nit, goofy import leaks memory (artifact: d4ce62d53c user: andreas-kupries)

Why does the code have a triple & ? (&&&) I suspect that the compiler interprets that as && (logical and) followed by an & (address operator), applied to p->xCloser.

2020-05-26
23:05 Reply: Best way to observe database operations (artifact: 82b7a5abb3 user: andreas-kupries)

Welford's incremental variance

A bit later on the same page also a few higher order statistics.

Also John D Cook

2020-04-16
21:59 Reply: Tcl coroutine, db eval ... script, yield, segmentation fault (artifact: c4f6935667 user: andreas-kupries)

I suspect that my non-working 3.25.3 was compiled against Tcl 8.5, which does not have NRE. The 3.32.0 I compiled recently against 8.6 definitely works. And an older 3.20.1 I found installed also works.

Bisecting for when the NRE support was added tells me 2009-10-06 14:59, via commit e9f72f1de4, which went into release 3.6.19.

2020-04-09
22:08 Edit reply: Tcl coroutine, db eval ... script, yield, segmentation fault (artifact: 31768b9886 user: andreas-kupries)

Thanks. Time to update the local sqlite installation (I am using a chunked SELECT with LIMIT in a personal project because yield did not work. Should have checked versions first).

22:03 Reply: Tcl coroutine, db eval ... script, yield, segmentation fault (artifact: 8ea5bd3498 user: andreas-kupries)

Thank. Time to update the local sqlite installation (I am using a chunked SELECT with LIMIT in a personal project because yield did not work. Should have checked versions first).

21:37 Reply: Tcl coroutine, db eval ... script, yield, segmentation fault (artifact: ce14f3ab65 user: andreas-kupries)

Which version of sqlite3 (and associated tcl binding) is that ?

I know that the 3.25.3 I currently use would error on the yield with C stack busy. I.e. that Tcl binding does not properly use the Tcl8.6+ non-recursive engine APIs.

2020-04-03
19:12 Post: PSA: All books in the ACM digital library are free to download for the next 3 months (artifact: 6453c37af7 user: andreas-kupries)
2020-03-27
18:30 Reply: Daily activity alert grouped by thread (artifact: f7d00546cd user: andreas-kupries)

I suspect that this is a request better suited to the fossil forum where the forum software is maintained.

18:25 Reply: Natural sort order (artifact: 4cf020bfca user: andreas-kupries)

The Tcl implementation of dictionary sorting does not convert to integers. The link references the DictionaryCompare function called by the higher framework to compare two strings.