.import csv and NULL values

(1) By Wolfgang Oertl (w.oertl) on 2021-04-03 20:18:32 [link] [source]

There doesn't seem a way to import NULL values in the shell using the .import command. Missing values are treated as empty strings.

When I import into a table with constraints this might fail if either NULL or a valid value is enforced, with an empty string not being valid. Furthermore, I have to replace empty strings in nullable fields with NULL using UPDATE commands. I think that .import is much faster than .read with lots of INSERT statements.

So, I suggest that .import should provide a new option --nullvalue or so, just like the .nullvalue command is used for exporting. Another idea is that a missing value is NULL, and '' is an empty string. I could probably provide a patch unless there is some objection or a better idea.

(2) By Simon Slavin (slavin) on 2021-04-05 01:27:21 in reply to 1 [source]

I assume you're talking about CSV file format, since your title refers to it.

CSV files can't contain NULL values. The standard describes no way to express NULL in that format. The standard is described here:

https://tools.ietf.org/html/rfc4180

(3) By Larry Brasfield (larrybr) on 2021-04-05 02:10:59 in reply to 1 [link] [source]

One of the extensions published here, "sqlite3vsv.c" written by Keith Medcalf, creates a virtual table reflecting the content of a delimited text file. Among its useful options is one causing empty fields to be exposed as NULLs.

(4) By Wolfgang Oertl (w.oertl) on 2021-04-05 20:21:10 in reply to 3 [link] [source]

It is an excellent answer to treat empty values as NULL, and "" as empty strings, as no explicit NULL string is required. Of course this behaviour needs to be off by default, only to be enabled with an option. As Simon says, this isn't in the standard, but OTOH it doesn't say there can't be missing values. I'll give it a try.

(5) By anonymous on 2023-01-31 20:43:04 in reply to 2 [link] [source]

It is entirely customary to represent missing data in CSV files as empty cells, and it is entirely customary to represent missing data in relational databases with NULLs. Too bad SQLIte is so rigid it can't consider bridging this gap, even on an optional basis.

(6) By Ryan Smith (cuz) on 2023-02-01 07:27:21 in reply to 5 [link] [source]

"Customary" is not a technical term, nor a technical consideration in any system's veracity or robustness specification.

But, let's imagine for a moment a CSV reader was considering understanding the customary - how exactly do you propose the CSV reader to define the difference between an empty string and a NULL value?

If you say that an empty field means NULL, how then would you indicate an empty string? On a table with a NOT NULL constraint on a TEXT column, that would break the import. So as soon as that change is made, people will be in here shouting about "How do I make SQLite understand my empty strings???". CSV imports may contain both.

You can argue for "Customary" only when you can propose a clear rule that is not contradicted by the standard and does not remove/break any current functionality. That won't guarantee your addition be implemented, only that it would be listened to with real consideration. Right now your "Too bad SQLite can't consider bridging this gap" is misrepresenting what is going on and amounts to nothing more than a whine.

Bring a real rule, demonstrate it's working, perhaps by showing a patch that makes it work correctly for both empty strings and NULLs, without breaking any current functionality, and then maybe SQLite will add it, or if SQLite still refuses to consider it, then you may go right ahead and whine justifiably.

(7.1) By Spindrift (spindrift) on 2023-02-01 10:29:47 edited from 7.0 in reply to 6 [link] [source]

The suggestion (not that I agree, just that it was indeed mentioned) was to differentiate an empty field as NULL, versus doubled empty double quotes ("") as the empty string.

(8) By Robert Hairgrove (bobhairgrove) on 2023-02-01 11:19:50 in reply to 6 [link] [source]

In general, since the CSV data is most likely coming from an external source and can contain good and bad data, I always use a staging table as the target for imports, then massage the data as necessary before inserting it into the working tables. Such things as distinguishing empty strings from NULL values can then be done at a higher level. Otherwise, you might end up importing text fields containing the quoted string "NULL"... :)

And if the same developer or admin has control over both the import and export formats, then if I were that developer, I would definitely choose some other interchange format than CSV; for example, XML or JSON.

(9) By Ryan Smith (cuz) on 2023-02-01 11:38:55 in reply to 7.1 [link] [source]

Quite right, it was suggested, but it doesn't work as a one-size-fits-all change. (Which is what I tried to tease out by asking for an example patch or rule-set).

I believe the extension from Keith already has such options, but it has to be optioned-in (rightly so). It's quite different to saying the current core implementation should from now on work in a new way that may break existing imports.

And, on the off-chance that my statements are construed as kicking against the idea itself - not at all. I'm quite happy for a CSV extension to offer such options, even to extend it to also have quoted-empty-cells be regarded as NULLs (as an option), because lots of CSV exports would leave you with quoted empty fields which you may also like to interpret as NULLs. (Even if exporting CSV from Excel, you can say "Quote all fields").

I hope this makes clear that my gripe with the previous message is not with this suggestion (I'd welcome it in the core in fact, if not adding too much bloat), but only to point out that it isn't required, one specific way of doing isn't vastly-customary or ubiquitous throughout computing, and that suggesting that SQLite, by not changing, is somehow obtusely/stubbornly/rigidly kicking against mainstream opinion, is merely an unsubstantiated whine and not a coherent argument for the change.

(10) By anonymous on 2023-09-07 23:44:44 in reply to 9 [link] [source]

It is completely wrong to import '' to a NUMERIC column. and treat it as 0 during computing. rigid because this bad behaviour is wired ignoring how many people asking for an option at least.

(11) By Aask (AAsk1902) on 2023-09-08 13:13:23 in reply to 8 [link] [source]

since the CSV data is most likely coming from an external source and can contain good and bad data, I always use a staging table as the target for imports, then massage the data as necessary before inserting it into the working tables. Such things as distinguishing empty strings from NULL values can then be done at a higher level.

Sound advice: you are describing ExtractTransformLoad. A quick guide is found here..

Extract - acquire the data in temporary tables

Transform - validate, de-duplicate, apply other business logic to the data, and ensure adherence to regulatory requirements ... mark data that does not meet requirements.

Load - archive the transformed data into working tables (with a marker column that tracks the chronology or history of acquisition).

(12) By anonymous on 2024-08-04 08:14:05 in reply to 2 [link] [source]

The standard for CSVs is not really important so much so as SQLite's ability to understand a null value. Personally, I believe the .nullvalue command should control what string to interpret as a null, and for the null keyword to be the default and have no affect on the current approach. Other database systems support null interpretation so it just seems like a bell that should be present... Though SQLite has it's own approach vs traditional RDBMSes in many ways, so it doesn't surprise me that this is not supported.