Ability to: SELECT * EXCEPT(some,columns)

(1) By anonymous on 2021-02-25 22:05:39 [link] [source]

Is there any plan to add the feature of selecting all columns except certain ones?

Google's BigQuery has this feature and it's supremely handy at times when you want to grab dozens or hundreds of columns from a large table while excluding only a handful. Otherwise you'd have to explicitly write out all the columns.

Example:

-- Select all columns except "COLUMN_ONE"
SELECT * EXCEPT(COLUMN_ONE)
FROM SOME_TABLE

(2) By anonymous on 2021-02-26 13:38:04 in reply to 1 [link] [source]

+1 on the idea.

(This is also useful when trying to get all but the generated (virtual) columns.)

But, even this method also has problems if you want both to include many columns while excluding many others. You still have to type a lot explicitly (either the ones to include or the ones to exclude).

I would propose a much simpler and more versatile syntax that would use ranges, something like:

SELECT COLA TO COLJ, COLP, COLS TO COLZ

where all columns between the two specified are included in the order they appear in the table definition.

agp

(3) By JayKreibich (jkreibich) on 2021-02-26 17:44:17 in reply to 2 [link] [source]

Such a syntax would be very difficult to make stable. In some cases the column order is not well defined, especially with complex JOINs using the "USING" syntax, or in self-JOINs, where the same column name exists in multiple places within the query.

In true Relational Theory form, columns are unordered, just as rows are. While the first clause of a SELECT statement defines the column ordering for the output, there shouldn't be strong assumptions about the column ordering at the end of the FROM processing.

(4) By anonymous on 2021-02-26 18:07:48 in reply to 3 [link] [source]

I disagree.

Any query regardless of how it is formed (sub-queries, joins, whatever) has a stable column output when doing SELECT *.

Do you get the columns in different order each time the query is run (without changes)? I don't think so.

Once the query is finalized you know the order of the output and the respective names (even generated ones, e.g., expressions that you can always alias to simpler ones).

agp

(5) By Larry Brasfield (larrybr) on 2021-02-26 18:48:12 in reply to 4 [link] [source]

Your kind of "stable" is difference from Jay's. You mistake repeated results from identical circumstances for Jay's stability, which would apply across query optimizer revisions and DBMS choices as to naming of columns lacking an AS qualifier.

(6) By anonymous on 2021-02-26 19:48:07 in reply to 5 [link] [source]

I see what you're saying. To me, this seems to be a non-issue.

Because it's just as consequential as using *. If one is allowed, why not the other?

Internal changes in the DBMS may also return a different order with *, right?

SQL inside programs should be using explicit column names, not *. And, if they use * they will also risk breaking with a newer DBMS version.

These shortcuts are mostly meant for manual use. Regardless, if one (*) is affected and it's acceptable, why would it be a problem with the other?

agp

(7) By Larry Brasfield (larrybr) on 2021-02-26 20:31:22 in reply to 6 [link] [source]

I see what you're saying. To me, this seems to be a non-issue.

I was disagreeing with your contradiction of Jay. I think lack of assured predictability (or possible "instability") is part of why '*' as a columns specifier is disfavored.

Because it's just as consequential as using *. If one is allowed, why not the other?

I sort of buy into that argument and would not be aghast if somebody were to talk the SQL evolution committee into adopting it.

Internal changes in the DBMS may also return a different order with *, right?

Yes, in general. Not likely, though. It is the chosen names that may change. For example, consider a UNION query. It is allowed to have differing column names for its component selects. Which component select's names should win?

... if one (*) is affected and it's acceptable, why would it be a problem with the other?

(Taking "the other" to be * with some named columns omitted:) It's not SQL, and it's yet another feature tending to take the 'Lite' out of SQLite. The SQLite library is intended for use in applications, including quite resource limited one. The convenience of those who type at the CLI shell is low on the priority list for library features.

(9) By Scott Robison (casaderobison) on 2021-02-26 21:08:51 in reply to 6 [link] [source]

I think the problem is that the column list as it exists is simply a concatenation of columns that are desired. A possibly qualified asterisk says "add all the columns". The proposal says "now go back and remove some of the columns I told you to add previously". While SQL engines tend to do this in the order the columns are defined in the DDL, I don't know that this is a hard requirement.

(Perhaps there should be a reverse_wildcard_columns pragma like there is a reverse_unordered_selects pragma for those who want to ensure they aren't depending on an implementation detail. I'm not recommending it, just a thought that occurred to me.)

There certainly are many things a SQL engine could do to make things easier for the interactive user, but SQLite is not an inherently interactive user tool. That it can be used that way is incidental to its embedded nature.

Anyway, I think enhancing the select list to allow column removal is not a great idea. Not that my opinion matters in the end.

If anything, maybe what would have value would be a table-valued function that takes a query followed by a list of columns to exclude from the final list. That could be done with SQL as it exists today (in as much as SQLite supports table-valued functions). The implementation of such is left as an exercise for the reader.

(10) By anonymous on 2021-02-27 19:37:21 in reply to 9 [link] [source]

SQLite does not currently allow a virtual table to have a variable number of columns; the number of columns must be fixed at the time it is created. The names of the columns are also fixed at the time it is created.

(12) By Scott Robison (casaderobison) on 2021-02-28 20:29:09 in reply to 10 [source]

Then a single string parameter with a quoted identifier list could be used. I think it is doable in some way within the context of the system without adding it to core SQLite.

(8) By anonymous on 2021-02-26 20:48:45 in reply to 1 [link] [source]

I think that if you want a subset of columns, a view can be used to provide a name for that subset of columns (possibly also in a different order, with renamed columns, additional filters, triggers, etc, if wanted).

(11) By anonymous on 2021-02-28 14:59:14 in reply to 1 [link] [source]

Original poster here. I agree the "range of columns" is probably a nightmare to implement. Also schemas change all the time, even in production-level systems that use SQLite as a database, so a range might be a dangerous way to go.

The except(columns,listed,here) method does already exist in some systems (well, BigQuery that I've used) and it's been very handy when needing to exclude certain things like columns with personal info or row_number() values for deduplication.

I see in the commit timeline that the alter table drop column functionality is being worked on - so this could be a proxy for the exclude() feature by creating the table, then dropping those few columns you want to exclude.