Feature request: .import based on column

(1) By anonymous on 2021-05-21 16:50:53 [source]

I'd like to propose .import based on column number.

--- Example file ---
Name          Address             Phone
John Doe      My street           555-1212
--------------------

For this example, Name is in column 1, Address in 15, and Phone in 35.

A command like

.import file table 1,15,35

would read each column starting at the corresponding column until next (minus one), or end of line (for the last one).

Leading and trailing spaces from each column should be trimmed in case of centered or right aligned fields.

When dealing with such data files (very common) I end up manually adding | (or other) column delimiter chars.

--- Modified file ---
Name          |Address             |Phone
John Doe      |My street           |555-1212
---------------------

This runs the risk of using a character already part of the data which will misread those rows.

One has to first check the file does not contain the delimiter character. And, in rare cases, all printable characters are used.

Thank you for reading.

agp

(2) By Richard Hipp (drh) on 2021-05-21 17:44:08 in reply to 1 [link] [source]

I suggest you write a utility program that converts your data into RFC-4180 CSV. You could probably do this as a script in Python or TCL. Suppose the program is called "column-to-csv" and works like this:

column-to-csv 1,15,35 file >file.csv

Once you have such a converter program, then you can import into SQLite using a command-line like this:

.import -csv '|column-to-csv 1,15,35 file' table

(3) By Kees Nuyt (knu) on 2021-05-21 18:02:43 in reply to 1 [link] [source]

To get you started:

# fixed width import
cat >in.tmp <<EOF
id Name          Address             Phone
1  John Doe      My street           555-1212
2  Marc O'Donnel His street          555-1313
EOF

gawk -v pos="1,4,18,38" -v table=T1 \
'BEGIN{
	if (table) TABLE    = table
	else TABLE = "mytable"
	NROFCOL=split(pos,tmp,",")
	q = "\x27"
	wd=""
	for (i=1;i<NROFCOL;i++) wd = wd sprintf("%d ",tmp[i+1]-tmp[i])
	wd = wd "999"
	FIELDWIDTHS = wd
	fmtins = "INSERT INTO %s (%s) VALUES (%s);\n"
}
function trim(s){
	sub(/^[[:blank:]]+/,"",s)
	sub(/[[:blank:]]+$/,"",s)
	return s
}
function quote(s){
	sub(q,q q,s)
	return s
}
function isnum(s){
	gsub(/[[:digit:]]+/,"",s)
	return (s=="")
}
(FNR==1){
	COLLIST = ""
	for (i=1;i<=NROFCOL;i++){
		COLLIST = COLLIST ((COLLIST)?",":"") trim($i)
	}
	printf "-- fieldwidths %s\n",FIELDWIDTHS
	printf "-- columnlist %s\n",COLLIST
	printf "CREATE TABLE %s (%s);\n",TABLE,COLLIST
	next
}
{
	VALLIST = ""
	for (i=1;i<=NROFCOL;i++){
		val = trim($i)
		if (isnum(val)){
			VALLIST = VALLIST ((VALLIST)?",":"") val
		} else {
			VALLIST = VALLIST ((VALLIST)?",":"") q quote(val) q
		}
	}
	printf fmtins,TABLE,COLLIST,VALLIST
}' in.tmp     # | sqlite3 import.sqlite

Sorry, I'm more proficient in awk than TCL, Python or Perl

(5) By anonymous on 2021-05-22 11:59:10 in reply to 3 [link] [source]

I took the liberty to improve a bit (in my eyes, at least) your code.

Disassociated from Linux only (can now be used on Windows, too).
Allowed column names to possibly contain spaces.
Wrapped INSERT statements into a BEGIN and END transaction.
Added usage message when no arguments present.

Still need to do:

De-tab the file based on another option (-tabs=N) so that column numbers will work correctly even when the file uses tabs instead of spaces.

Thank you.

###############################################################################
# [gawk] col2sql.awk -v p[os]=1,15,35 -v t[able]=T1 [<] filename
###############################################################################

BEGIN{
    if ((t) && (!table)) table = t
    if ((p) && (!pos)) pos = p
    if ((!table) && (!pos)) {
      printf "Usage: col2sql.awk -v p[os]=n1,n2,... -v t[able]=table_name\n"
      exit
    }
    if (table) TABLE = table
    else TABLE = "mytable"
    NROFCOL=split(pos,tmp,",")
    q = "\x27"
    wd=""
    for (i=1; i<NROFCOL; i++)
      wd = wd sprintf("%d ",tmp[i+1]-tmp[i])
    wd = wd "9999"
    FIELDWIDTHS = wd
    fmtins = "INSERT INTO %s (%s) VALUES (%s);\n"
}

###############################################################################

END{
    printf "COMMIT;\n"
}

###############################################################################

function trim(s){
    sub(/^[[:blank:]]+/,"",s)
    sub(/[[:blank:]]+$/,"",s)
    return s
}

################################################################################

function quote(s){
    sub(q,q q,s)
    return s
}

################################################################################

function isnum(s){
    gsub(/[[:digit:]]+/,"",s)
    return (s=="")
}

################################################################################

(FNR==1){
    COLLIST = ""
    for (i=1; i<=NROFCOL; i++){
        COLLIST = COLLIST ((COLLIST)?",":"") "\"" trim($i) "\""
    }
    printf "------------------------------------------\n"
    printf "-- Field widths %s\n",FIELDWIDTHS
    printf "-- Column list %s\n",COLLIST
    printf "------------------------------------------\n"
    printf "CREATE TABLE %s (%s);\n",TABLE,COLLIST
    printf "BEGIN;\n"
    next
}

################################################################################

{
    VALLIST = ""
    for (i=1; i<=NROFCOL; i++){
        val = trim($i)
        if (isnum(val)){
            VALLIST = VALLIST ((VALLIST)?",":"") val
        } else {
            VALLIST = VALLIST ((VALLIST)?",":"") q quote(val) q
        }
    }
    printf fmtins,TABLE,COLLIST,VALLIST
}

################################################################################

(6) By Kees Nuyt (knu) on 2021-05-22 14:54:21 in reply to 5 [link] [source]

I took the liberty to improve a bit (in my eyes, at least) your code.

That was the idea,"to get you started".

(4) By anonymous on 2021-05-21 18:20:25 in reply to 1 [link] [source]

My own trick for dealing with stuff like this is to import into a view. Attach a trigger to the view to reformat the data and insert into the actual table. In this specific case, you can use the substr SQL function to retrieve the substring of the data at the proper offsets. For this simple case, importing into a temporary table and then using INSERT to copy the data into the real table is another way, that does not involve using views or triggers.