SQLite Forum

Efficiently checking if a TEXT column exclusively contains integers or floating point numbers (as strings)
SQLite3 does not have "column types" -- it has column affinities.  Each "value" has a storage type and that storage type "prefers" to be the type of that columns "affinity", so a column affinity of "text" will always result in everything being converted to "text" storage for that column (on storage/insert) unless it is a binding that cannot be converted to text (such as a blob).  The csv importer (either the one in the CLI or the CSV extension) does not perform any conversions and the result is always returned as "text" even if it is not.


The "correct" way to force the conversion is to create another table with the same number of columns where the "affinity" of the columns that you think should be numeric or real or integer are declared that way, and then execute an insert into that table (INSERT INTO t2 SELECT * FROM t1).  SQLite3 will "convert" the data into the specified column affinity if it can losslessly do so.

The other alternative (assuming you are using the CLI to perform the import) is to create the table you are importing into BEFORE importing the data with the correct column affinity definitions when you create the table.  This will result in the data being converted from text to the declared affinity on insert if this can be done losslessly.

>type t.csv
1,"", ," "
2,3 or 4,4.5,"5"
8,a,b,c,excess column

SQLite version 3.34.0 2020-09-22 02:06:09
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table x(a numeric, b numeric, c numeric, d numeric);
sqlite> .mode csv
sqlite> .headers on
sqlite> .import t.csv x
t.csv:6: expected 4 columns but found 1 - filling the rest with NULL
t.csv:9: expected 4 columns but found 5 - extras ignored
t.csv:10: expected 4 columns but found 1 - filling the rest with NULL
sqlite> select typeof(a),a,typeof(b),b,typeof(c),c,typeof(d),d from x;
integer|1|text||text| |text|
integer|2|text|3 or 4|real|4.5|integer|5

Another possible method is to use my VSV extension which is an enhanced CSV extension.  It does not (anymore) set a column affinity for the data though you may do so with the schema= and columns= parameters (though this does nothing other than affect the virtual table definition if, for example, you do a "create table t2 as select * from t1" those column affinities will be carried forth into the new table).  This will not have any effect on the actual returned data values, however, unless you also specify that you want "affinity conversions" applied to the source data by specifying an affinity= parameter.  Specification of the full set of conversions (validatetext,affinity=numeric,nulls) will result in each "value" being returned in its bestest representation (text/blob/integer/real/null) assuming that you are using a compiler that supports LONG DOUBLE (if your compiler does not support LONG DOUBLE as an 80-bit IEEE extended double then the conversion is limited somewhat -- examples of broken compilers are all Microsoft x64 compilers).

sqlite> create virtual table x using vsv(filename=t.csv,header,nulls,validatetext,affinity=numeric);
sqlite> .schema x
CREATE VIRTUAL TABLE x using vsv(filename=t.csv,header,nulls,validatetext,affinity=numeric)
/* x(a,b,c,d) */;
sqlite> select typeof(a),a,typeof(b),b,typeof(c),c,typeof(d),d from x;
│ typeof(a) │ a │ typeof(b) │        b         │ typeof(c) │        c         │ typeof(d) │  d   │
│ integer   │ 1 │ text      │                  │ text      │                  │ text      │      │
│ integer   │ 2 │ text      │ 3 or 4           │ real      │ 4.5              │ integer   │ 5    │
│ integer   │ 3 │ blob      │ test?test        │ integer   │ 5                │ null      │      │
│ integer   │ 4 │ null      │                  │ real      │ 6.1              │ integer   │ 7    │
│ integer   │ 5 │ null      │                  │ null      │                  │ null      │      │
│ integer   │ 6 │ integer   │ 5503599627370495 │ integer   │ 5503599627370495 │ null      │      │
│ integer   │ 7 │ integer   │ 1000             │ integer   │ 1000             │ integer   │ 1000 │
│ integer   │ 8 │ text      │ a                │ text      │ b                │ text      │ c    │
│ integer   │ 9 │ null      │                  │ null      │                  │ null      │      │

You can find VSV.C here:  

and the whole collection of extensions here: