SQLite User Forum

Dynamic columns from table-valued functions
Login

Dynamic columns from table-valued functions

(1) By Phil Eaton (eatonphil) on 2022-05-22 14:12:27 [source]

Hey! I build some tools on top of SQLite (github.com/multiprocessio/datastation, github.com/multiprocessio/dsq) that let users query data files with SQLite.

Right now it works by transforming the query and INSERTing all of the data file into SQLite. This can take pretty long for large data files. So I was curious about the performance of building a virtual table instead of INSERTing the data.

But when I tried implementing a generic CSV reader in a virtual table, for example, I realized that SQLite is only ok with you having dynamic columns per table, not per query.

As the sample CSV virtual table example shows, you can do:

CREATE VIRTUAL TABLE temp.t1 USING csv(filename='thefile.csv');
SELECT * FROM temp.t1

But it's impossible to build a virtual table/table-valued function like:

SELECT t1.name, t2.organization from csv('users.csv') t1 left join csv('organization.csv') t2 on t1.organization_id = t2.id

Unless they have the same columns.

I don't want to require users to run this CREATE VIRTUAL TABLE statement themselves for every file they want to query.

I'm not the first person to wonder about this possibility, see https://github.com/coleifer/sqlite-vtfunc/issues/8 for example.

Is this something SQLite devs would be interested in supporting?

(2) By Simon Willison (simonw) on 2022-05-22 15:38:20 in reply to 1 [link] [source]

I'd really appreciate this feature - the ability for a table-valued custom SQL query to return an arbitrary set of columns rather than having them predefined.

I explained one of my use-cases in https://github.com/coleifer/sqlite-vtfunc/issues/8 - I'd like to be able to write a function that does this:

select id, title from fetch_json_array(
  "https://api.github.com/repos/simonw/datasette/issues"
) order by comments desc

(3) By Simon Willison (simonw) on 2022-05-22 15:40:19 in reply to 2 [link] [source]

I guess that example right there demonstrates why this is so hard (maybe impossible?) - how would the query planner know that "order by comments" is allowed if it doesn't know the columns in advance?