Feature request: Import csv into a table with generated columns

(1) By Harald Hanche-Olsen (hanche) on 2021-03-01 16:55:48 [link] [source]

Consider this very simple example (if you'll excuse my unusual prompt):

▶▶▶ create table p(a integer, b integer, c integer as (a + b) stored);
▶▶▶ insert into p values(40,2);
▶▶▶ select * from p;
a   b  c
--  -  --
40  2  42
▶▶▶ .shell cat p.csv
1,3
2,4
▶▶▶ .mode csv
▶▶▶ .import p.csv p
Error: table p has 2 columns but 3 values were supplied

The insert statement works fine, evidently ignoring the generated column. But the .import fails with a rather puzzling error message. (It doesn't matter whether the column is stored or not.)

I think it would be useful to be able to import csv files directly into a table with generated columns. As it is, the easiest workaround is to import the csv into a temporary table, then insert from there.

(I labeled this a feature request, but it seems to me to exist in the grey zone between a feature request and a bug report. Make of it what you will. But the error message is at least misleading.)

– Harald

(2) By Larry Brasfield (larrybr) on 2021-03-02 02:59:19 in reply to 1 [source]

The doc on this shell feature shows what is happening. I agree that it would be more convenient if the .import could figure out which columns in the receiving table are susceptible to update, then try the insert if the .csv has the right column count. It is unlikely to do something that sophisticated, which would require parsing the schema.

If your .csv had a column-name header, it would .import very easily into a to-be-created temp table, from which an INSERT statement with a select can take values from that temp table. This can also be easily written. This ease and modularity takes some of the steam out of creating .import enhancements.

In my enhanced shell, when '.header always' is in effect, the .import creates INSERT statement(s) using the .csv (or .tsv) header to name incoming columns, without being picky about mapping them one-to-one with receiving table columns. This is convenient when columns have default or generated values and the file data is short some column(s) as in your case. Unfortunately, I have fallen behind on keeping that shell updated with the last year of enhancements made to shell.c published by the SQLite project.

(3) By anonymous on 2021-03-02 07:28:31 in reply to 2 [link] [source]

Another alternative, which I have used, is to create a view and a trigger attached, and then to import into the view. (I did this because I needed special parsing and conversion, although it would probably work for this purpose too.)

(5.1) By Harald Hanche-Olsen (hanche) on 2021-03-31 09:40:35 edited from 5.0 in reply to 3 [link] [source]

That is a pretty clever idea! I'll file it away for future use. In this particular case, it seems to me that the approach I outlined in a sibling post to yours is easier. But yours seems superior if one has to do many of these imports into the same table, as the creation of the view and trigger only has to be done once.

Edit, much later:

I just had an opportunity to try out this idea, and it worked great! Also, after reading up on triggers, it turned out to be almost ridiculously easy to implement. (And if the format of the csv file ever changes, I can just recreate the view to fit the new format.) Thank you!

(4) By Harald Hanche-Olsen (hanche) on 2021-03-02 14:11:05 in reply to 2 [link] [source]

Ah, thanks; I hadn't quite stopped to think about the difference between information available to the shell and what the core library can discover so easily. Still, however, I don't think the docs explains (to my level of understanding) what happened. I think it all depends what mechanism the shell is using to communicate the tuples to the database layer and telling it what to do with it. As I haven't studied the source, and am not at all familiar with the C API, I really can't know.

Unfortunately, my csv files come from spreadsheets not of my making, so the column headers won't match. But your response (and the other followup) got a handful of my brain cells working again, so I found a better way, demonstrated here with my toy example.

I start sqlite3 with no argument, thus working at first with the in-memory database, then:

▶▶▶ .shell cat n.csv
"fee fie foe foo",bar
1,2
3,4
▶▶▶ .import -csv n.csv tmpn
▶▶▶ attach 'n.db' as n;
▶▶▶ .schema n
CREATE TABLE n.n(a,b,c as (a+b) stored);
▶▶▶ insert into n.n select * from tmpn;
▶▶▶ select * from n;
a  b  c
-  -  -
1  2  3
3  4  7

This seems to me the easiest approach, as the critical parts (.import, attach, insert) are the same no matter the particulars of the database schema and the details of the csv file, so long as the number of columns matches.

(6) By Simon Slavin (slavin) on 2021-03-03 16:09:32 in reply to 4 [link] [source]

In cases where I didn't control the columns of the original database, I often give up trying to import the CSV file directly. Instead import into a table with columns which match the CSV file, then use

INSERT INTO table SELECT ...

to get the data where I want it, then

DELETE FROM …

to remove all rows from the temporary table.

This gives you extreme simplicity of code, making it easy for other people to figure out how things work, but also lets you do almost everything inside SQLite, exploiting how good it is at manipulating data.