SQLite Forum

Timeline
Login

9 forum posts by user baraherself

2020-09-20
22:19 Reply: lsm1 compression (artifact: 963b4f90b7 user: baraherself)

Thanks, Dan!

Using a Makefile like yours, and your edits, I've been able to build sqlite3 with built-in lsm and lz4. I've inserted several blocks of lines into my 8-million line .lsm file, with no errors.

Now on to sort out how to load such a build in our app...

2020-09-18
06:18 Reply: lsm1 compression (artifact: 9a2d61dae3 user: baraherself)

Thank you, Dan!

I've applied your patch and attempted to build the shell tool with lsm and lz4 support built-in. I've tried a couple of variations on build commands. (I've consulted https://www.sqlite.org/howtocompile.html and the READMEs in the autoconf dir.) In each case, however, I've gotten this error:

Undefined symbols for architecture x86_64:
  "_sqlite3_lsm_init", referenced from:
      _openDatabase in sqlite3.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [libsqlite3.la] Error 1

Do you have any more tips for replicating your build?

I'm developing on MacOS 10.15.6. I could easily try a new clone of sqlite3, or a build on ubuntu, where the app we're prototyping is most likely to run.

Thank you again!

2020-09-13
20:15 Reply: lsm1 compression (artifact: a509240a88 user: baraherself)

Here's a multi-line insert, for what it's worth.

insert into cdx (key, cdxline) values ('ai,ak)/20200621000641','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz'), 
('ai,ak)/20200621000642','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz'), 
('ai,ak)/20200621000643','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz'), 
('ai,ak)/20200621000644','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz'), 
('ai,ak)/20200621000645','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz'), 
('ai,ak)/20200621000646','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz'), 
('ai,ak)/20200621000647','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz'),
('ai,ak)/20200621000648','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz'),
('ai,ak)/20200621000649','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz'),
('ai,ak)/20200621000650','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz');

19:44 Delete reply: lsm1 compression (artifact: edcede11bc user: baraherself)
Deleted
19:43 Reply: lsm1 compression (artifact: c12d86b6a0 user: baraherself)

Thanks, Dan!

With the correct path name for the lsm datastore, I can open it using the sqlite3 cli, and eventually hit an error inserting an additional line. So far, this has been a different error than my webapp has hit.

You may be able to reproduce...

by building lsm.so from a clone of https://github.com/galgeek/sqlite-lsm1-lz4 (my build commands are in the README)

and

loading lsm.so, and the datastore, and inserting one or more lines, as below.
(lsm datastore: https://drive.google.com/file/d/1fJFz9vdXaE8ehBqX5ARUN5toBhKBFaZc/view?usp=sharing)

baraherself@b-ma ~/D/sqlite3 > ./sqlite3
SQLite version 3.34.0 2020-09-01 19:02:52
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .load ext/lsm1/lsm.so
sqlite> CREATE VIRTUAL TABLE cdx USING lsm1 ('/Users/baraherself/Dev/cdxlite/collections/lz4_weetest/lz4_weetest.lsm', key, TEXT, cdxline);
sqlite> select count(*) from cdx;
8741534

# separately, I ran the python/sanic webapp, posting insertions, and again it failed with segmentation fault.
# here, the lsm store remains open, and inserts of several lines succeed...

sqlite> insert into cdx (key, cdxline) values ('ai,ak)/20200621000621','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz');
sqlite> insert into cdx (key, cdxline) values ('ai,ak)/20200621000622','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz');
sqlite> insert into cdx (key, cdxline) values ('ai,ak)/20200621000623','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz');
sqlite> select count(*) from cdx;
8741537

# let's try another... 

sqlite> insert into cdx (key, cdxline) values ('ai,ak)/20200621000625','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz');

# nope!  note: this is a new error, at a different location in the lsm code

Assertion failed: (aSpace[aSort[i].pShm->iShmid - iPrevShmid].pShm==0), function treeRepairList, file lsm_tree.c, line 1312.
fish: './sqlite3' terminated by signal SIGABRT (Abort)

I'm hoping to find some debugging time myself!

And checking my ability to reproduce, I eventually hit another error:

sqlite> insert into cdx (key, cdxline) values ('ai,ak)/20200621000639','ai,ak)/ 20200621000621 http://ak.ai/ text/html 200 7OJCBOXLSJRUTLFALBSMSD5XHAWSMGO3 - - 5907 12823972 ARCHIVEIT-9810-DAILY-JOB1205229-SEED2290404-20200620230127543-00000-h3.warc.gz');
Error: SQL logic error
sqlite3(12092,0x10d671dc0) malloc: Incorrect checksum for freed object 0x7f8936c06c60: probably modified after being freed.
Corrupt value: 0x1
sqlite3(12092,0x10d671dc0) malloc: *** set a breakpoint in malloc_error_break to debug
fish: './sqlite3' terminated by signal SIGABRT (Abort)

2020-09-05
20:18 Reply: lsm1 compression (artifact: 8e3ac028c1 user: baraherself)

The lsm datastore was created using a Python webapp (built using sanic) running under Python 3.8. The webapp logs these version numbers:

[2020-09-05 12:24:57 -0700] [95053] [INFO] sqlite3.version is 2.6.0
[2020-09-05 12:24:57 -0700] [95053] [INFO] sqlite3.sqlite_version is 3.33.0

lsm.so was built from SQLite version 3.34.0 2020-09-01 19:02:52.

Maybe the version mismatch is the underlying issue?

Here's the webapp datastore code (we also use a standard sqlite table):

        self.conn = sqlite3.connect(":memory:")
        self.conn.create_function("REGEXP", 2, regex_match)
        self.conn.enable_load_extension(True)
        self.conn.load_extension("lsm.so")
        self.conn.execute('''CREATE VIRTUAL TABLE cdx USING lsm1 ('collections/%s/%s.lsm', key, TEXT, cdxline);''' % (name, name,) )
        self.replicator = sqlite3.connect("collections/%s/%s.replication.sqlite" % (name, name,))

I've tried to open the .lsm within the sqlite3 cli, but I'm missing something (Error: lsm_open failed with 266):

% ./sqlite3
SQLite version 3.34.0 2020-09-01 19:02:52
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .load ext/lsm1/lsm.so
sqlite> CREATE VIRTUAL TABLE cdx USING lsm1 ('/Users/baraherself/collections/lz4_weetest/lz4_weetest.lsm', key, TEXT, cdxline);
Error: lsm_open failed with 266
sqlite> 

I can provide webapp code, the ingested data, or the troubled .lsm, if helpful.

2020-09-04
22:57 Reply: lsm1 compression (artifact: bbee5ad742 user: baraherself)

The .lsm datastore created by the ingest ending in a seg fault is still usable.

It's 1.9G on disk, while a sqlite db storing the same data uncompressed is 5.4G.

18:23 Reply: lsm1 compression (artifact: 7f773c4e85 user: baraherself)

I'm pretty sure that's the code that I'm using.

I created the build dir Tuesday, September 1:

fossil clone https://www.sqlite.org/src sqlite.fossil
mkdir sqlite3 ; cd sqlite3
fossil open ../sqlite.fossil

I copied code from a clone of https://github.com/thoughtpolice/sqlite4_lsm_lz4:

sqlite4_lsm_lz4 % cp -p *.c *.h lz4/*.c lz4/*.h ../sqlite3/ext/lsm1
sqlite4_lsm_lz4 % ls *.c *.h lz4/*.c lz4/*.h
lsm-lz4.c		lsm-test.c		lz4/lz4.h		lz4/lz4_encoder.h
lsm-lz4.h		lz4/lz4.c		lz4/lz4_decoder.h

I edited the Makefile: https://github.com/galgeek/sqlite-lsm1-lz4/commit/3aae5c28eff92f7a7ff85b75b91b105239ed0de9

I edited vtab.c: https://github.com/galgeek/sqlite-lsm1-lz4/commit/4ed20a55afcd70f3595438e2d3f7d8b47f89cb59

2020-09-03
05:48 Post: lsm1 compression (artifact: a38903fd31 user: baraherself)

Hello!

Many thanks for the lsm1 extension! I'm prototyping an application using the lsm1 extension that will benefit from compression. I have code working for small test ingests, but not for a larger test ingest. Help?

Details...

I found this project, for the sqlite4 lsm:

https://github.com/thoughtpolice/sqlite4_lsm_lz4

Based on that, I've been working with this build directory, which works fine for small test ingests:

https://github.com/galgeek/sqlite-lsm1-lz4

A larger ingest without compression creates an 11 GB .lsm file. Attempts to do the same ingest using compression fail after ~5-10 minutes, with this error:

Assertion failed: (iPg>=fsFirstPageOnBlock(pFS, 1)), function fsPageGet, file lsm_file.c, line 1555.

https://www.sqlite.org/cgi/src/file?name=ext%2Flsm1%2Flsm_file.c&ln=1555

I've tried commenting out the two lines of assertions there.

Then, retrying the larger ingest, again after ~5-10 minutes, the app fails with this error:

Assertion failed: ((rc==LSM_OK && (p || (pnSpace && *pnSpace))) || (rc!=LSM_OK && p==0)), function fsPageGet, file lsm_file.c, line 1632.

https://www.sqlite.org/cgi/src/file?name=ext%2Flsm1%2Flsm_file.c&ln=1630-1632

I've commented out this assertion, too, and tried again.

The latest error: segmentation fault

some detail from the MacOs Problem Report:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   lsm.so                        	0x000000010d83ce41 pageGetKey + 17
1   lsm.so                        	0x000000010d83e41c pageGetBtreeKey + 332
2   lsm.so                        	0x000000010d84033a seekInBtree + 378
3   lsm.so                        	0x000000010d83b92f sortedWork + 5599
4   lsm.so                        	0x000000010d839c18 doLsmWork + 392
5   lsm.so                        	0x000000010d83a24e lsmSortedAutoWork + 302
6   lsm.so                        	0x000000010d832583 doWriteOp + 323
7   lsm.so                        	0x000000010d8488e6 lsm1Update + 2934
8   libsqlite3.0.dylib            	0x000000010d653ab0 sqlite3VdbeExec + 8425
9   libsqlite3.0.dylib            	0x000000010d62e256 sqlite3_step + 391
10  _sqlite3.cpython-38-darwin.so 	0x000000010d61974f pysqlite_step + 31
11  _sqlite3.cpython-38-darwin.so 	0x000000010d616dfe _pysqlite_query_execute + 952
12  org.python.python             	0x000000010be3718a cfunction_call_varargs + 319
13  _sqlite3.cpython-38-darwin.so 	0x000000010d615b3d pysqlite_connection_executemany + 74
14  org.python.python             	0x000000010be3e012 method_vectorcall_VARARGS + 270