SQLite User Forum

Disabling Quoting in sqlite3 CLI’s .import
Login

Disabling Quoting in sqlite3 CLI's .import

(1) By hruodr on 2022-11-29 11:05:47 [link] [source]

Dear Sirs (including Madams)!

I have in a file a table with fields separated by \t and records separated 
by \n. The fields do not contain \t and hence it is not necessary to quote 
them. The fields may contain in between the double quote char: "

It is a very simple table, but .import may make problems when there are " in 
the fields. I would need to put around any field double quotes and escape 
double quotes with another double quote symbol.

My question: Is there a way to disabling quoting when reading with .import?

Thanks
Rod.

(2.1) By Keith Medcalf (kmedcalf) on 2022-11-29 12:07:21 edited from 2.0 in reply to 1 [link] [source]

This is known as the GIGO principle (Garbage In -> Garbage Out).

Either your file must conform to the standard, or you must write your own importer using whatever language you wish that can deal with your peculiar file format in the manner in which you intend it be dealt with.

(3.1) By Keith Medcalf (kmedcalf) on 2022-11-29 12:11:02 edited from 3.0 in reply to 1 [link] [source]

may make problems

Does it or does it not? You seem to be ambiguous here. Have you bothered to even try and see what happens? It is unlikely that a "standard format" will be changed to make you happy, especially since you are only guessing that Garbage In will result in Garbage Out.

(4) By hruodr on 2022-11-29 12:23:47 in reply to 3.0 [link] [source]

"Have you bothered to even try and see what happens?" Of course I did it!
All is described in my first posting. Do you want examples?

Following field does not make problems: "a abc"
Following field does: "a" b
But in the first case I want the quotes, hence I must quote them.

This quoting in tab separated files is, as far as I see, not documented.
I just asked if out of the box I can disable the treatment of quoting.
That is all. Thanks for saying that it is not possible!

Rod.

(5.1) By Keith Medcalf (kmedcalf) on 2022-11-29 14:01:04 edited from 5.0 in reply to 4 [link] [source]

You said "may make problems" not "does cause an error in the following circumstance". Finally you got around to documenting your issue.

The standard requires that if the first character of a field is a double-quote, that any embeded double-quote must be escaped with a double-quote and that the field ends with a double-quote (right before the field separator or record separator (ie "this""stuff"), and that everything between the two double-quotes is part of the field contents. Embeded double-quotes that are not the first character of the field are not a problem.

For example, this csv file:

a,b,c
test"test,"test""test",doorknob
works exactly according to the standard:
SQLite version 3.41.0 2022-11-28 14:59:27
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .mode csv
sqlite> .import crap.txt t0
sqlite> .mode qbox
sqlite> select * from t0;
┌─────────────┬─────────────┬────────────┐
│      a      │      b      │     c      │
├─────────────┼─────────────┼────────────┤
│ 'test"test' │ 'test"test' │ 'doorknob' │
└─────────────┴─────────────┴────────────┘

The developers may take pity and decide to add a --fubarquoting option (or similar) to the .import command, however, that will have to be supported long-term. It would be easier (and better) if you simply complied with the standard or wrote your own custom importer.

(6) By ddevienne on 2022-11-29 14:10:47 in reply to 5.1 [link] [source]

The standard requires [...]

Which standard? The CSV one?

The OP mentioned a TAB-separated file.

From that page:

Another common convention is to use the CSV convention from RFC 4180 and enclose these special characters in double quotes. This can lead to ambiguities.

So what you state as Garbage-In may be valid instead, for a TSV file, no? Since that's only a convention?

What about Postel's law we've been reading around SQLite recently?
Seems fairly reasonable to be able to load such unquoted TSV files in the CLI IMHO.

(7) By Keith Medcalf (kmedcalf) on 2022-11-29 14:32:34 in reply to 6 [link] [source]

But you can load files containing embedded double-quotes. You just cannot have the first character of a field be a double-quote unless you are quoting the entire field.

It makes no difference what characters are chosen as the FS and RS (Field Separator and Record Separator). The only difference between FS = <tab> and FS = , or FS = FS (the field separator character) is the character separating the fields (and perhaps the RS (record separator).

Of course, everyone is free to define their own format and use it if they so wish.

(8) By Richard Damon (RichardDamon) on 2022-11-29 14:55:15 in reply to 6 [link] [source]

I think the issue is that a field with quotes like “a b” can be validity seen as a EITHER a properly (but unnecessarily) quoted field or an “unquoted’ field that happens to have quotes.

The de facto standard is that such fields are treated as a quoted field as that allows for the simplest version of the generator that just quotes ALL fields. It also allows a smarter program to omit the quotes if not needed because there is no quote (or field separator) in the field.

Yes, it says that a program can’t take the simple solution of never quoting, but that tends to have the danger of the field having a character (like the separator) that breaks the file, and would require readers to try and “guess” if the quotes were because the writer realized that it needed to quote the field for a special character, likely by trying to read the field both ways and seeing which one makes more sense. This option gets VERY hard to define all the corner cases and how they should be handled.

Yes, if you are going to define that you are using a restricted set for you data that never needs quoting, but the “standard” doesn’t want to limit files to just that set, which is why quoting is defined at all.

You are basically saying you have a NEW data format, that isn’t a “TSV” file, but is a “UNQUOTED TSV” file, so you need to show that enough people would like this ADDITIONAL format to be worth adding and supporting for the life of the product.

(9.2) By Keith Medcalf (kmedcalf) on 2022-11-29 15:05:07 edited from 9.1 in reply to 8 [link] [source]

You are permitted to have embedded quotes without quoting the field. You only need to quote the field in the following circumstances: the first character is a quote; or, the field data contains a FS or RS character (whatever they happen to be).

This has been the case for as long as I can remember (about 50 years).

NOTE For the greater certainty an "embedded quote" means a quote that occurs at a location other than the first character of a field.

(10.2) By Larry Brasfield (larrybr) on 2022-11-29 17:56:33 edited from 10.1 in reply to 1 [source]

I have edited this post, now replying to the question with a diametrically opposite answer to my previous one.

Is there a way to disabling quoting when reading with .import?

I disagree with the technical content of the previous replies including my post 10.0 answer.

Yes, there is a way. The following CLI command sequence will do it. Note that the command order is critical.

   .mode ascii
   .separator "\t" "\n"
   .import my_pure_tsv_file SomeTable

No, there is not a way to get .import to suspend its quoted-field-decoding (unless somebody modifies the CLI code to add such an option.)

It would be almost child's play to create a Perl script, acting as a filter1, to preprocess your TSV for use with .import, and that filter could be applied as the .import command is invoked.2 The script would be a one-liner on *Nix platforms and a quoting nightmare on Windows. All it has to do, (assuming that \n also does not appear in your fields), is split each line on tab, replace all the double-quote characters with doubled double-quote characters in each field, wrap each field with double-quote characters, then output the fields joined with tab or comma terminated by \n.


  1. ^ "Filter" meaning a program that transforms stdin content to stdout content, useful with inter-process pipes.
  2. ^ This means the .import command tail, with suitable quoting, could contain that filter on a system with Perl available, as many are.

(11) By hruodr on 2022-11-29 16:46:06 in reply to 10.0 [link] [source]

Of course I had that alternative in mind, with a very simple filter written 
in C, no scripting language for such a simple task is necessary, but I wanted 
just an answer to the above question. As said, it is not documented how 
.import deal with quoting, one must guess it, and I was not able to guess if
there is a solution out of the box.

(12.1) By hruodr on 2022-11-29 16:58:53 edited from 12.0 in reply to 8 [link] [source]

I think the technical issue is clear, also what is the problem (except for
Keith Medcalf at the beginning). It is a question about standards
and documentation. You use the word "de facto standard", also TSV, then
I googled and found:

"The IANA standard for TSV[2] achieves simplicity by simply disallowing tabs within fields"

In https://en.wikipedia.org/wiki/Tab-separated_values

There was no way other to just ask what I asked.

(13.1) By Larry Brasfield (larrybr) on 2022-11-29 18:56:08 edited from 13.0 in reply to 11 [link] [source]

(Edited to reflect doc page revision.)

Please see my revised answer.

it is not documented how .import deal with quoting, one must guess it

I agree with this critique. Soon, the CLI doc page will I have modified the CLI doc page source to address these details, to be published at the next website docs refresh.

Ironically, it was while reading code to ensure that said doc revisions are accurate that I saw that there is indeed an existing way to do what you want with the CLI. This is also mentioned in the revised doc page.

(14) By Keith Medcalf (kmedcalf) on 2022-11-29 18:02:09 in reply to 12.1 [link] [source]

.import imports csv files.

It merely allows you to "change" the separator characters.

The documentation is pretty straightforward and included in the debugging shell.

.import FILE TABLE       Import data from FILE into TABLE
   Options:
     --ascii               Use \037 and \036 as column and row separators
     --csv                 Use , and \n as column and row separators
     --skip N              Skip the first N rows of input
     --schema S            Target table to be S.TABLE
     -v                    "Verbose" - increase auxiliary output
   Notes:
     *  If TABLE does not exist, it is created.  The first row of input
        determines the column names.
     *  If neither --csv or --ascii are used, the input mode is derived
        from the ".mode" output mode
     *  If FILE begins with "|" then it is a command that generates the
        input text.

There is no support for any format other than CSV, although you may change the Field and Row separator characters as your heart desires.

(15) By hruodr on 2022-11-29 18:21:19 in reply to 13.0 [link] [source]

Thanks a lot! 

Another command is necessary:

.headers on
.mode ascii
.separator "\t" "\n"
.import my_pure_tsv_file SomeTable

Without ".headers on", the first record as name of the fields are
ignored, then I do not know what field names are taken.

After reading "man sqlite3" and ".help", I thought that 
.headers and .mode concerned only the output, not the input with .import.

(16) By Larry Brasfield (larrybr) on 2022-11-29 18:23:44 in reply to 14 [link] [source]

There is no support for any format other than CSV, although you may change the Field and Row separator characters as your heart desires.

Earlier today, I would (and did) agree with that assertion. However, upon closer study of the CLI's import implementation, I saw that the above is incorrect. See details here.

Soon, the CLI doc page will allow these subtleties to be predicted without reading the shell's code.

(17) By Larry Brasfield (larrybr) on 2022-11-29 18:27:26 in reply to 15 [link] [source]

Another command is necessary: ...

The interaction of {an initial field names line in imported data} with {pre-existence of the table into which import is done} is already well documented. Nothing I have said goes to gainsay that interaction or its description in the CLI doc page.