Feature request: .import based on column
(1) By anonymous on 2021-05-21 16:50:53 [source]
I'd like to propose .import
based on column number.
--- Example file ---
Name Address Phone
John Doe My street 555-1212
--------------------
For this example, Name
is in column 1, Address
in 15, and Phone
in 35.
A command like
.import file table 1,15,35
would read each column starting at the corresponding column until next (minus one), or end of line (for the last one).
Leading and trailing spaces from each column should be trimmed in case of centered or right aligned fields.
When dealing with such data files (very common) I end up manually adding |
(or other) column delimiter chars.
--- Modified file ---
Name |Address |Phone
John Doe |My street |555-1212
---------------------
This runs the risk of using a character already part of the data which will misread those rows.
One has to first check the file does not contain the delimiter character. And, in rare cases, all printable characters are used.
Thank you for reading.
agp
(2) By Richard Hipp (drh) on 2021-05-21 17:44:08 in reply to 1 [link] [source]
I suggest you write a utility program that converts your data into RFC-4180 CSV. You could probably do this as a script in Python or TCL. Suppose the program is called "column-to-csv" and works like this:
column-to-csv 1,15,35 file >file.csv
Once you have such a converter program, then you can import into SQLite using a command-line like this:
.import -csv '|column-to-csv 1,15,35 file' table
(3) By Kees Nuyt (knu) on 2021-05-21 18:02:43 in reply to 1 [link] [source]
To get you started:
# fixed width import
cat >in.tmp <<EOF
id Name Address Phone
1 John Doe My street 555-1212
2 Marc O'Donnel His street 555-1313
EOF
gawk -v pos="1,4,18,38" -v table=T1 \
'BEGIN{
if (table) TABLE = table
else TABLE = "mytable"
NROFCOL=split(pos,tmp,",")
q = "\x27"
wd=""
for (i=1;i<NROFCOL;i++) wd = wd sprintf("%d ",tmp[i+1]-tmp[i])
wd = wd "999"
FIELDWIDTHS = wd
fmtins = "INSERT INTO %s (%s) VALUES (%s);\n"
}
function trim(s){
sub(/^[[:blank:]]+/,"",s)
sub(/[[:blank:]]+$/,"",s)
return s
}
function quote(s){
sub(q,q q,s)
return s
}
function isnum(s){
gsub(/[[:digit:]]+/,"",s)
return (s=="")
}
(FNR==1){
COLLIST = ""
for (i=1;i<=NROFCOL;i++){
COLLIST = COLLIST ((COLLIST)?",":"") trim($i)
}
printf "-- fieldwidths %s\n",FIELDWIDTHS
printf "-- columnlist %s\n",COLLIST
printf "CREATE TABLE %s (%s);\n",TABLE,COLLIST
next
}
{
VALLIST = ""
for (i=1;i<=NROFCOL;i++){
val = trim($i)
if (isnum(val)){
VALLIST = VALLIST ((VALLIST)?",":"") val
} else {
VALLIST = VALLIST ((VALLIST)?",":"") q quote(val) q
}
}
printf fmtins,TABLE,COLLIST,VALLIST
}' in.tmp # | sqlite3 import.sqlite
Sorry, I'm more proficient in awk than TCL, Python or Perl
(5) By anonymous on 2021-05-22 11:59:10 in reply to 3 [link] [source]
I took the liberty to improve a bit (in my eyes, at least) your code.
- Disassociated from Linux only (can now be used on Windows, too).
- Allowed column names to possibly contain spaces.
- Wrapped
INSERT
statements into aBEGIN
andEND
transaction. - Added usage message when no arguments present.
Still need to do:
De-tab the file based on another option (-tabs=N
) so that column numbers will work correctly even when the file uses tabs instead of spaces.
Thank you.
###############################################################################
# [gawk] col2sql.awk -v p[os]=1,15,35 -v t[able]=T1 [<] filename
###############################################################################
BEGIN{
if ((t) && (!table)) table = t
if ((p) && (!pos)) pos = p
if ((!table) && (!pos)) {
printf "Usage: col2sql.awk -v p[os]=n1,n2,... -v t[able]=table_name\n"
exit
}
if (table) TABLE = table
else TABLE = "mytable"
NROFCOL=split(pos,tmp,",")
q = "\x27"
wd=""
for (i=1; i<NROFCOL; i++)
wd = wd sprintf("%d ",tmp[i+1]-tmp[i])
wd = wd "9999"
FIELDWIDTHS = wd
fmtins = "INSERT INTO %s (%s) VALUES (%s);\n"
}
###############################################################################
END{
printf "COMMIT;\n"
}
###############################################################################
function trim(s){
sub(/^[[:blank:]]+/,"",s)
sub(/[[:blank:]]+$/,"",s)
return s
}
################################################################################
function quote(s){
sub(q,q q,s)
return s
}
################################################################################
function isnum(s){
gsub(/[[:digit:]]+/,"",s)
return (s=="")
}
################################################################################
(FNR==1){
COLLIST = ""
for (i=1; i<=NROFCOL; i++){
COLLIST = COLLIST ((COLLIST)?",":"") "\"" trim($i) "\""
}
printf "------------------------------------------\n"
printf "-- Field widths %s\n",FIELDWIDTHS
printf "-- Column list %s\n",COLLIST
printf "------------------------------------------\n"
printf "CREATE TABLE %s (%s);\n",TABLE,COLLIST
printf "BEGIN;\n"
next
}
################################################################################
{
VALLIST = ""
for (i=1; i<=NROFCOL; i++){
val = trim($i)
if (isnum(val)){
VALLIST = VALLIST ((VALLIST)?",":"") val
} else {
VALLIST = VALLIST ((VALLIST)?",":"") q quote(val) q
}
}
printf fmtins,TABLE,COLLIST,VALLIST
}
################################################################################
(6) By Kees Nuyt (knu) on 2021-05-22 14:54:21 in reply to 5 [link] [source]
I took the liberty to improve a bit (in my eyes, at least) your code.
That was the idea,"to get you started".
(4) By anonymous on 2021-05-21 18:20:25 in reply to 1 [link] [source]
My own trick for dealing with stuff like this is to import into a view. Attach a trigger to the view to reformat the data and insert into the actual table. In this specific case, you can use the substr
SQL function to retrieve the substring of the data at the proper offsets. For this simple case, importing into a temporary table and then using INSERT to copy the data into the real table is another way, that does not involve using views or triggers.