SQLite User Forum

[PATCH] Support .import --nulls in CLI
Login

[PATCH] Support .import --nulls in CLI

(1) By Thomas Sibley (tsibley) on 2024-07-10 17:58:29 [source]

I found the following patch minimal yet useful. If there's interest in applying it, I could add some tests. Thanks for such great software!

From c8ef8b3e728dd8902c89b96f74b2aaac96a08d2b Mon Sep 17 00:00:00 2001
From: Thomas Sibley <tom@zulutango.org>
Date: Wed, 10 Jul 2024 10:56:53 -0700
Subject: [PATCH] Support .import --nulls in CLI

Optional flag to respect the .nullvalue string during import and convert
them into NULLs.
---
 src/shell.c.in | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/shell.c.in b/src/shell.c.in
index 58475dd6d..117e38661 100644
--- a/src/shell.c.in
+++ b/src/shell.c.in
@@ -4729,6 +4729,7 @@ static const char *(azHelp[]) = {
   "   Options:",
   "     --ascii               Use \\037 and \\036 as column and row separators",
   "     --csv                 Use , and \\n as column and row separators",
+  "     --nulls               Replace \".nullvalue\" string in input with nulls",
   "     --skip N              Skip the first N rows of input",
   "     --schema S            Target table to be S.TABLE",
   "     -v                    \"Verbose\" - increase auxiliary output",
@@ -5779,6 +5780,7 @@ struct ImportCtx {
   int cTerm;          /* Character that terminated the most recent field */
   int cColSep;        /* The column separator character.  (Usually ",") */
   int cRowSep;        /* The row separator character.  (Usually "\n") */
+  const char *zNullValue;     /* The null value string, if --nulls is given.  (Usually "") */
 };
 
 /* Clean up resourced used by an ImportCtx */
@@ -8844,6 +8846,8 @@ static int do_meta_command(char *zLine, ShellState *p){
         sCtx.cRowSep = '\n';
         xRead = csv_read_one_field;
         useOutputMode = 0;
+      }else if( cli_strcmp(z,"-nulls")==0 ){
+        sCtx.zNullValue = p->nullValue;
       }else{
         oputf("ERROR: unknown option: \"%s\".  Usage:\n", z);
         showHelp(p->out, "import");
@@ -9064,7 +9068,11 @@ static int do_meta_command(char *zLine, ShellState *p){
         if( z==0 && (xRead==csv_read_one_field) && i==nCol-1 && i>0 ){
           z = "";
         }
-        sqlite3_bind_text(pStmt, i+1, z, -1, SQLITE_TRANSIENT);
+        if( sCtx.zNullValue!=0 && cli_strcmp(z,sCtx.zNullValue)==0 ){
+            sqlite3_bind_null(pStmt, i+1);
+        }else{
+            sqlite3_bind_text(pStmt, i+1, z, -1, SQLITE_TRANSIENT);
+        }
         if( i<nCol-1 && sCtx.cTerm!=sCtx.cColSep ){
           eputf("%s:%d: expected %d columns but found %d"
                 " - filling the rest with NULL\n",
-- 
2.45.1

(2) By Larry Brasfield (larrybr) on 2024-07-11 13:46:37 in reply to 1 [link] [source]

I speak as someone who has taken much interest in the CLI and its improvement, and also with some charter to do so. Please do not take this response to indicate lack of appreciation regarding improvement suggestions.

Your suggestion, (irrespective of its simple implementation), is attractive. I am tempted to advocate it to the project team. But I have a serious reservation that you or others might be willing to address.1 I will frame it as a reason to not add the feature, even though that decision has not yet been made.

Currently, the .import command has a narrowly defined function -- to accept well-formed CSV, or equivalents with different field and record separators, and insert them as values into a specified table. It is that simple, without handy features that could be considered an unbounded set when the multitude of semi-structured text input formats are taken into account.

The present (and only) way to deal with adaptation of .import simplicity/limitation to the morass of table-like text formats is to import into a temporary table, then formulate a query to populate a less transient table using that temporary table as a source in the query. Of course, this is some extra "work", but it has a virtue that is often under-appreciated: The technique is simple composition from simpler and logically independent operations.2 This simplicity of the individual operations reduces the mental work needed to get the present task done, and certainly reduces the documentation and .help text for the .import command.

If I was assigned the role of arguing against such a change, I would cite that simplicity and assert that a subsequent "transform/insert query" is going to often be necessary anyway. And I would add that transforming fields which equal the present null output representation can be accomplished trivially as part of such a query.

If one accepts the fact that .import is not going to grow ever-more complex to deal with the vast variety of table-like text formats, what argument supports the proposition that this NULL transformation deserves to become part of the now-empty handy feature set?


  1. ^ I suspect this reservation has already occurred with the other project team members.
  2. ^ This building block simplicity became popularized with the introduction of Unix to the computing world, and is part of what is known as "the Unix way", or "the Unix philosophy.

(3) By Thomas Sibley (tsibley) on 2024-07-11 21:32:02 in reply to 2 [link] [source]

Thanks for the careful and thoughtful comments. I see this handling of .nullvalue during .import to be the missing flip side of its handling during printing ("exporting") of query results. It enables round tripping, for example:

$ sqlite3 "" \
    ".mode csv" \
    ".headers on" \
    "create table a (x text, y text, z text);" \
    "insert into a values ('foo', null, 'bar'), ('baz', 'bat', null);" \
    "select * from a;" \
    >/tmp/a

$ cat /tmp/a
x,y,z
foo,,bar
baz,bat,

$ sqlite3 "" \
    ".mode csv" \
    ".headers on" \
    ".import /tmp/a a" \
    "select * from a;" \
x,y,z
foo,"",bar
baz,bat,""

$ sqlite3 "" \
    ".mode csv" \
    ".headers on" \
    ".import --nulls /tmp/a a" \
    "select * from a;" \
x,y,z
foo,,bar
baz,bat,

(4) By anonymous on 2024-07-12 00:43:41 in reply to 3 [link] [source]

The enabling of 'round trip' (avoiding information loss) is a strong argument in favor of handling the SQLite text, real, int (numeric?) and NULL.

How would your 'round trip' argument stand up if BLOB is considered (this certainly would fall into Larry's point about the slippery slope) ?
Does this mean there is some sort of -blobs as well?  or would some general '-SQLite DB primatives' (obviously a better name must exist) end up enabling both?

(5) By Thomas Sibley (tsibley) on 2024-07-12 04:03:43 in reply to 4 [link] [source]

I see your point, anonymous, but I suppose I see nulls as a special case here: integers and reals (and numeric) benefit from column affinity when importing text values into an existing table. Nulls do not.

Blobs do not either, but their handling is less straightforward than nulls as it's not simple string equality against a known, user-controlled value; it instead requires making a heuristic decision about which values are blobs (all of them? only ones that can't be represented as text?) and which aren't (if any).

Allowing nulls to be simply handled at import time seems like an unqualified win (if a small one) to me. Also, I simply found it strange that .import respects .mode and .separator but not .nullvalue. It seemed like an omission. (There are no equivalents for blob representations.)

I don't really care to argue this much further. I offered a gentle suggestion in the form of a working patch, and it's for the maintainers (or community members) to take or leave as they please.

(6) By Dave Mausner (dmausner) on 2024-07-12 14:30:43 in reply to 2 [link] [source]

May I add the observation that MySQL uses "LOAD DATA INFILE" as an analogy to ".import", and it permits a field value == backslash+"N" as a representation of the null value to be stored. So there is a long-standing precedent for the suggestion made by larrybr.

(7) By Gonzalo Diethelm (gonzus) on 2025-01-07 17:29:55 in reply to 6 [link] [source]

I would like to point out that this is also supported in Postgres, where the COPY command supports a NULL option with a default of \N, used for dumping / loading data to / from a file.

I would personally love to see this patch added to SQLite -- it would simplify a corner case when dealing with NULL values in data dumps.