SQLite Forum

Linear algebra extention to core SQLite
Login

Linear algebra extention to core SQLite

(1) By Raoul (raoul314) on 2022-01-07 12:08:20 [link] [source]

Hi,

I'd be very interested to see this linear algebra extension or similar implemented.

It defines new linear algebra types that extend SQL, but it is my understanding that this would require forking core SQLite (i.e. cannot be implemented as an extension such as e.g. R-trees).

Now, I would gladly do the legwork myself but being no C or database expert (am epidemiologist with very little experience in low-level programming), I'd welcome input from experienced people. Could this be tackled by a motivated noob loner? Or is this a multiple person-year amount of work, even for experts?

Best, Raoul

(2) By Simon Slavin (slavin) on 2022-01-07 13:49:10 in reply to 1 [link] [source]

Represent your matrices in JSON format, and write a bunch of functions to manipulate them. It won't be fast but it'll work and it doesn't require any changes to SQLite.

But yes, to do it properly you should introduce a MATRIX type or something like that. Nice project for a maths postgraduate.

(3) By MBL (UserMBL) on 2022-01-07 16:02:34 in reply to 1 [link] [source]

Wouldn't it be more of interest for you to use something simpler and already existing - at least for the majority of work?

You could for example use Euler Math Toolbox (EMT) or MathLab or similar. EMT would give you the possibility to link your own C program functions into it using Tiny C (tcc). The only missing is the wrapper to use SQLite3 directly from inside a mathematical notebook. EMT is capable e.g. to work with complex matrices or to do symbolic computer algebra and has nice graphics capabilities too. Have a look to the examples.

Or you can use Python with many modules supporting math and visualizations and many more - also with SQLite3 connectivity.

If you prefer mouse operations then I can recommend as framework the KNIME Analytics Platform into which you can integrate also nodes like JPython, Java or link to Python or R-Studio for lots of statistics - but it would give you definitely a nice tooling around the math core, which you seem to look for, for capturing, formatting and blending data and many more. Of cause you will get access from and to many data sources.

KNIME Analytics Platform is my favorite tool together with SQLite3 and H2 for local storage - unfortunately the latest JDBC driver for SQLite3 is not yet 3.37.2 as of 2022-01-07; but at least at version 3.36.0. The SQLite3 version built into the distribution package 4.5.0 is further behind, somewhat 3.23.1 if I am not wrong.

Regards MBL

(4) By Raoul (raoul314) on 2022-01-07 16:28:32 in reply to 3 [link] [source]

Thanks for your input. Actually, R provides everything I need and more. But all existing tools including the one you cite suffer from the common problem of "extract-transform-reload" because the storage and analysis systems are completely separate. This has two major consequences:

  1. if you want to implement a custom analysis, you have to meddle with the low-level backend (usually C) otherwise it's going to be unbearably slow

  2. handling even moderately-sized data requires special tools, again because the cost of extraction and reloading data is so high

So, I'm actually looking to build a new thing that would alleviate those problems. Those newly published SQL extensions are very enticing for my workflow.

(5) By MBL (UserMBL) on 2022-01-07 18:43:26 in reply to 4 [link] [source]

I developed several extension functions myself and integrated Lua, a small but fast script engine, easy to integrate in C. By using the user defined functions (UDF) I am able to feed data or use data into and out of one Lua state. Because Lua is really fast I can do what I think is in your mind as long as I find an SQLite3 host application which allows me to load extensions as either a .dll (on Windows) or as an .so (on Linux). I would be able to even speed it up much more when changing from the standard Lua to LuaJIT; but until now I prefer the original. For me as single user with single thread usage it is perfect.

With an UDF match() in Lua I do pattern matching and value extraction out of text lines (which I take out line by line out of flat text file(s) using virtual tables); my match() works similar to instr() and substr().

Another use case for this is an xml configured extraction of values out of a binary blob image with conversion into its correctly typed value (string.pack and string.unpack are doing this job for me).

I am planning now to create a table valued function to use string.gmatch as an iterator through large text blobs to transform the matches into SQLite3 table lines and fields; straight from text to table with a maximum of flexibility of what is filtered out.

For your purpose of speed and "data inside" I would recommend to try out LuaJIT integration. With module "matrix" and "complex" you may find already what you need and with "lpeg" all you need to parse text as input and extract and transform what you need out of it.

TCL integration of scripted logic into SQLite3 for speed and flexibility might be possible also - but I have no experience with this nor do I expect your work to become easier if it could be done. But TCL was the initial environment in which SQLite3 was born.

(6) By Raoul (raoul314) on 2022-01-07 19:48:56 in reply to 5 [link] [source]

I've indeed heard good things about Lua integration into C codebases. I'm going to have a look at that. Thanks!

(9) By Gisbert (gisbert) on 2022-01-10 09:18:49 in reply to 4 [link] [source]

If you're happy working in R, would the RSQLite driver (with DBI as the user-looking front-end) cut the cookie for you? (Possibly in conjunction with dbplyr and/or dplyr.) Of course, the round-trip time is still there, it's not as fast as doing the calculations within the database code proper. And of course, you may run into memory limitations when loading large datasets into R. These may be knock-out criteria for you. Personally, I've used this (with either SQLite or MariaDB) for great good in pharmacoepi and drug utilisation research.

(7) By mlin (dnamlin) on 2022-01-07 22:06:49 in reply to 1 [source]

Raoul,

SciDB and TileDB are existing major projects I know of exploring how to build a DBMS with advanced maths capabilities. I'm quite sure it would be a huge effort to build such capabilities into SQLite in a comparably satisfying/performant way (that is with a clean query syntax and performance comparable to dedicated maths packages). But that's not to say that a smaller side project couldn't get something to work with SQLite as others in the thread have suggested.

-Mike Lin

(8) By Raoul (raoul314) on 2022-01-07 22:37:12 in reply to 7 [link] [source]

Hi Mike,

Thanks for the input. Yes, I already suspected this work to be way beyond my abilities. I know of SciDB and TileDB, but those are not what I have in mind.

As an analogy to "SQLite competes with fopen, not PostgreSQL", what I'd really like (and I'm not alone) is not "Postgres for data analysis" but "fopen for data analysis", i.e. a portable no-setup solution integrating storage and analysis and eliminating the "extract-transform-reload" steps for the lone analyst on a single machine.

The point would be: no more messing about with memory-unsafe stuff for custom analysis, because you would do everything in (extended) SQL.

To illustrate: there was recently a huge scandal regarding the NHS analysis of COVID data, which was done in C++ with no tests at all and memory leaks and undefined behaviour everywhere! COVID highlighted this particular analysis, but you'll find similar programs all across academia, hospitals, space agencies, science labs ... so I strongly suspect such a system to be in high demand. Just not at huge "data science" companies.

It's a real shame that with all the "big data" hype, nobody thinks of "small data" tools anymore! :-(