SQLite3 Auto Named Columns

(1) By anonymous on 2022-04-10 09:01:48 [link] [source]

I haven't been able to find documentation on SQLite3 auto-naming columns; anyone know where this is?

Thanks.

(2) By Larry Brasfield (larrybr) on 2022-04-10 09:32:57 in reply to 1 [link] [source]

... documentation on SQLite3 auto-naming columns; anyone know where this is?

There is none. It is a hidden feature, activated only when an illegitimate set of column names is seen during import. Its exact behavior is subject to change. The most that would be said, if it were documented, is: The columns will be named per the incoming header line unless it has duplicates, in which case some of the columns will have names differing from the incoming header.

(3) By anonymous on 2022-04-10 09:54:50 in reply to 2 [link] [source]

illegitimate set of column names is seen during import

Not sure I understand illegitimate: is a column name duplicated illegitimate? I am under the impression that any column name is valid albeit some names are required to be within square brackets or double quotes.

Its exact behavior is subject to change.

Just to clarify, are you implying that it is volatile i.e. the names might be different given the same input on different occasions OR that SQLite3 might change the convention, without warning, in a future release?

(4) By Simon Slavin (slavin) on 2022-04-10 11:32:42 in reply to 3 [source]

As the behaviour is undocumented, consider the behaviour to be as inconvenient for your software as possible. In other words, all the worst cases you described might apply.

I think the only thing you can rely on is that you should get the same number of columns every time.

(5) By Larry Brasfield (larrybr) on 2022-04-10 12:09:18 in reply to 3 [link] [source]

On "illegitimate" column names: CSV with an initial line of column names is a common data format, corresponding loosely to a database table. Such a table, in SQLite and other DBMS's, will have named columns^a, so that the names can be used to reference fields in a record (or tuple elements, etc.) Those names must be unique for a given table. If the CSV initial line has duplicated names, it cannot represent a set of properly named columns, hence it is an "illegitimate" representative of the CSV-as-table representation. Such can be made into a table only by some intervention involving invented column names.

As a convenience for users dealing with such mongrel inputs, we have added the auto-column-rename-upon-need feature.

Its exact behavior is subject to change.

Just to clarify, are you implying that it is volatile i.e. the names might be different given the same input on different occasions OR that SQLite3 might change the convention, without warning, in a future release?

I am stating that future releases might invent different names when the incoming names are duplicated. That is unlikely, as the feature serves its purpose and deserves little more attention and development effort. Such a change would not happen gratuitously, and if it did happen, the revised behavior would probably be backward-compatible in some sense (that may or may not please everybody with mongrel CSV inputs.)

As for warning: There is no warning system in place for SQLite CLI or library changes. Every effort is made to avoid creating problems for people who rely on established behavior of the released software, particularly when it is documented but even when it is not.

a. Not discussed here: the empty column set or column sets whose counts are negative, non-integer or imaginary; or how they would be named.

(6) By anonymous on 2022-04-10 12:55:21 in reply to 5 [link] [source]

the feature serves its purpose and deserves little more attention and development effort. Such a change would not happen gratuitously, and if it did happen, the revised behavior would probably be backward-compatible in some sense

My reading of this is that the auto-naming feature is ok to use ... but user beware!

My intention is to get the data into SQLite, dump the script for the initial table and with that as a guide, design another more persistent STRICT table, with checks, triggers, and indices as necessary.

Should be OK - yes?

(7) By Larry Brasfield (larrybr) on 2022-04-10 14:59:13 in reply to 6 [link] [source]

That is well within the set of use cases for which the .import dot command is intended. I would also say it is among the set of good practices.

The auto-rename feature is likely to be stable, and will never touch a set of column names without duplicates. It would be a non-robust solution, for dealing with poorly headed CSV on a regular basis, to rely on the exact format of renamed columns.

The SQL which does the rename need detection and name transformation is in the shell source under conditional compilation such that it can be replaced. Somebody who was extremely attached to a particular algorithm could readily substitute one of their choosing when building the shell, even if that was taken from today's version.

(8) By anonymous on 2022-04-10 17:34:21 in reply to 7 [link] [source]

Somebody who was extremely attached to a particular algorithm could readily substitute one of their choosing when building the shell, even if that was taken from today's version.

Thanks Larry. Where do I start looking in the source code for 3.38.2 to get a better insight into this?

I have encountered auto-naming before and in another context but within a similar scenario.

Using the Provider=Microsoft.ACE.OLEDB.12.0 provider to read CSV (or XLSX) files, when the SQL in question finds missing/invalid column names, it substitutes another names silently. The substitute names are 'f' followed by the column number.

When the first column name cannot be resoled, it is settled as f1.

When, say, the third column name cannot be resolved and the first two and fourth are (say, Name, Sex, Salary) valid, the column names in the result is settled as Name, Sex, f3, Salary; that is, f followed by the column number.

SQLite uses repeating names sometimes, for example:

column1  column2 column2
-------  -------  -------

with SELECT statements (column2 is used twice). However, if the same SELECT statement is preceded by CREATE myTable AS, the column names in the table created are:

 column1  column2  column2:1
-------  -------  ---------

column2 is renamed column2:1

In other words, SQLite tries hard and successfully to eliminate an error condition when it might be perfectly valid to do so.

As a consequence it is very handy to use:

(the usually frowned upon) SELECT * (rather than the recommended SELECT comma separated enumeration of column names)
(the usually frowned upon) square brackets to enclose the column names (I know that double quotes are preferred to square brackets in this forum but I find square brackets enhances readability.)

(9) By Larry Brasfield (larrybr) on 2022-04-10 17:51:04 in reply to 8 [link] [source]

Where do I start looking in the source code for 3.38.2 to get a better insight into [altering the shell's column renaming]?

If you look at the (generated) shell.c source, or its progenitor at src/shell.c.in in the SQLite source tree, you can easily find a function, zAutoColumn(...), (and you can stop looking at that point. ;-)

Re "column1 column2 ...":
One objective of the renamer design was that it be easy to tell where, within the original (but illegitimate) column name set, an invented name came from. This is not so valuable with sensible schema, but people drag in data with absurd column counts sometimes.

Re square brackets:
There is nothing wrong with [some odd column name], but many other DBMS's will choke on it. I much prefer short, lowercase, all-dark names that need no extra delimiters and are more quickly typed.

(10) By anonymous on 2022-04-10 18:11:13 in reply to 9 [link] [source]

I much prefer short, lowercase, all-dark names that need no extra delimiters and are more quickly typed.

Sound & simple.

However, prior experience (with MS SQL SERVER, which also repeats column names in SELECT statements but unlike SQLite's CREATE TABLE myTable AS ..., the equivalent SELECT * INTO myTable) manifests itself intuitively (without thinking) within context (as with the use of square brackets).

Until recommended conventions become self-enforcing, not trying to fix something that is not broken is, I believe, acceptable.

Thanks for

you can easily find a function, zAutoColumn(...)