Improve sqlite3_column_name() return value documentation

(1) By Erlend E. Aasland (erlendaasland) on 2021-02-22 09:07:19 [link] [source]

The documentation for sqlite3_column_name(), sqlite.org/c3ref/intro.html, implies that NULL is only returned in case of memory error, however NULL may also be returned if the statement pointer is NULL, or the column index is out of bounds. The documentation should reflect this.

I guess NULL may also be returned if the column name is not set, but that seems very unlikely.

(2) By Ryan Smith (cuz) on 2021-02-22 10:24:19 in reply to 1 [link] [source]

implies that NULL is only returned in case of memory error, however NULL may also be returned if the statement pointer is NULL, or the column index is out of bounds.

What do you imagine the term "Memory Error" to mean other than those types of things?

Both the statement being NULL, and the column index being out-of-bounds, are very much basic cases of "Memory error". The term "Memory Error" means that something that is requested from (or written to) memory isn't in the memory or cannot be accessed in the memory where you are pointing to.

I guess NULL may also be returned if the column name is not set, but that seems very unlikely.

I think if the column name is not explicitly set, SQLite always makes up one. I don't think there is a case in which NULL can be returned for a column name of a valid column in a valid statement - but I am not 100% sure about that.

(4) By Simon Slavin (slavin) on 2021-02-22 10:38:13 in reply to 2 [link] [source]

To be honest, I would like the documentation to go in the other direction, and give less information. There are two cases where I need column names to be predictable:

I used AS in my SQL code.
I used SELECT *, possibly with a JOIN<Join> such as

SELECT albums.*,artists.*
    FROM albums INNER JOIN artists ON album.artist = artist.id

I don't want programs depending on column names for anything else. Not only because the values returned differ in different SQL implementations, but because it shows shoddy programming. Yes, it's a little more work having to scatter AS around your code, but at least you know what's going to happen.

(6) By Erlend E. Aasland (erlendaasland) on 2021-02-22 10:47:42 in reply to 4 [link] [source]

To be honest, I would like the documentation to go in the other direction, and give less information.

I would like to produce presise error messages. If NULL implies three or four different error scenarios, I would like to be able to differentiate them.

(20) By Simon Slavin (slavin) on 2021-02-23 12:29:23 in reply to 6 [link] [source]

You can do a lot towards that using sqlite3_column_count(). Using that before iterating columns gets rid of a bunch of situations where you would otherwise get NULLs. Anything left after that may well be unpredictable anyway.

(21.1) By Erlend E. Aasland (erlendaasland) on 2021-02-23 12:39:50 edited from 21.0 in reply to 20 [link] [source]

You can do a lot towards that using sqlite3_column_count().

As mentioned in other replies, both the column index and the statement pointer is valid. Let me rephrase:

Given valid arguments (valid statement pointer, valid column index) and a NULL response, can I be certain that the error was sqlite3_malloc failure? Based on the docs, I would conclude with "yes, given valid arguments and a NULL response, I've now got a memory error, so let's print 'memory error' to the user, and handle the error as an out-of-memory error". Based on the replies in this thread, I'm no longer certain about that.

(28) By Clemens Ladisch (cladisch) on 2021-02-24 10:46:23 in reply to 4 [link] [source]

I used AS in my SQL code.

I used SELECT *

I don't want programs depending on column names for anything else.

Then the SQL standard is not your friend. SQL-92 says in 7.9 9:

a) If the i-th <derived column> in the <select list> specifies an <as clause> that contains a <column name> C, then the <column name> of the i-th column of the result is C.

b) If the i-th <derived column> in the <select list> does not specify an <as clause> and the <value expression> of that <derived column> is a single <column reference>, then the <column name> of the i-th column of the result is C.

c) Otherwise, the <column name> of the i-th column of the <query specification> is implementation-dependent and different from the <column name> of any column, other than itself, of a table referenced by any <table reference> contained in the SQL-statement.

(SELECT * generates <column reference>s as in case b.)

Every SQL database must give predictable column names for direct column references.

And if it were necessary to write SELECT x AS x, y AS y FROM ..., such a database would not find many users.

(5) By Erlend E. Aasland (erlendaasland) on 2021-02-22 10:45:36 in reply to 2 [link] [source]

Both the statement being NULL, and the column index being out-of-bounds, are very much basic cases of "Memory error".

I read "memory error" as "out of memory", and my post was based on that assumption. I'd say that out-of-bounds is a "value error" (programming error), not a "memory error" (read: "out of memory"). Ditto for a NULL statement pointer. Others may disagree.

(11) By Gunter Hick (gunter_hick) on 2021-02-22 16:29:12 in reply to 5 [link] [source]

Software should not be calling sqlite3_column_name() with a NULL statement pointer. This would mean that the return code of the sqlite3_prepare() family of functions was ignored.

Neither knowing nor bothering to check the number of returned columns also strikes me as bad programming practice. If you are executing user composed SQL and returning the results, you must check the number of columns returned (or quit asking on NULL). If you are executing developer-written SQL, then it should be written with an explicit field list (and not SELECT *) for resilience against schema changes. Either way, the number of columns should be known or queried.

Note that the sqlite2_column_<type>() family of functions is explicitly documented to return undefined values if the column index is out of range.

(12) By Erlend E. Aasland (erlendaasland) on 2021-02-22 20:28:11 in reply to 11 [link] [source]

Of course, I agree with all you say here. In my case, both the statement pointer and the index is under control; that is not the problem. The reason I started investigating this was because the docs only say the following:

"If sqlite3_malloc() fails during the processing of either routine (for example during a conversion from UTF-8 to UTF-16) then a NULL pointer is returned."

But I know from experience that argument value errors also leads to sqlite3_column_name() returning NULL. This (obvious) fact is not mentioned in the docs, something I read as "I cannot trust the documentation". Are there other cases not mentioned?

(13) By Larry Brasfield (larrybr) on 2021-02-22 23:25:44 in reply to 12 [link] [source]

I would agree that the treatment of invalid arguments could be better documented. By reading the code, (less than 40 lines), it becomes apparent that:

Passing a NULL sqlite3_stmt pointer is an error, leading to undefined behavior unless the library is built with SQLITE_ENABLE_API_ARMOR #define'd. Then the behavior is to return 0, but I don't think that should be documented. The doc need only say that the pointer must be a valid ..prepare.. return, not yet finalized.^a
Passing an invalid column index into the API (as coded today) causes 0 to be returned instead of producing a crash or returning something else that cannot be dereferenced, regardless of whether SQLITE_ENABLE_API_ARMOR is #define'd. This should be documented, IMO, because of the confusion you evince. This slight deficiency in the documentation has probably survived for so many years because most API users understand that random indexes are not going to lead to anything good and use sqlite3_column_count() to limit index values.

... argument value errors also leads to sqlite3_column_name() returning NULL. This (obvious) fact is not mentioned in the docs, something I read as "I cannot trust the documentation".

That fact is not obvious without reading the code or experimenting. Anybody who programs much in C quickly learns that argument errors often lead to undefined behavior. I think your mistrust is not well founded on the (present) fact that the contours of the "correct argument" space are not precisely or completely spelled out in the place(s) you expect.^b Such facts are normally left unstated in most contexts where they might be relevant to the more adventurous programmer, to the benefit of those who want/need to understand the API's useful functionality rather than the limits and effects of useless calls into it.

I would also make this point of English semantics: To say "if X then Y happens" is not a statement about any result for conditions other than X. You should not read "If <X> fails ... then a NULL pointer is returned" to restrict the return of NULL pointers to other, non-<X> conditions. To convey such restriction, we say "If and only if <X> then ...".^c So your citing this as a misleading doc assertion, leading to mistrust, is erroneous IMHO. At worst, the doc is silent on something, but it is obviously so.

Are there other cases not mentioned?

Probably. Can you point to a few important omissions? The SQLite dev team is pretty good about improving doc defects that make it harder for SQLite library users to use it.

a. It would dilute the docs horribly to mention, at every location where a pointer is passed into or out of an API function, that either it must be a valid pointer or undefined behavior may result.

b. At the sqlite3_prepare() doc, one learns that the ubiquitously used sqlite3_stmt object should ultimately be deleted.

c. This construct is shortened in mathematics-speak to: Iff <X> then ...". This should be in the common lexicon.

(14.1) By Erlend E. Aasland (erlendaasland) on 2021-02-23 13:00:08 edited from 14.0 in reply to 13 [link] [source]

Thank you for your comprehensive reply, Larry.

I would agree that the treatment of invalid arguments could be better documented.

On this, we agree, however I do understand that this is a subjective matter; either you like explicit documentation, or you don't like it (keep it simple, exclude obvious stuff). I obviously (no pun intended) prefer the former.

The doc need only say that the pointer must be a valid [...]

I agree.

Passing an invalid column index into the API (as coded today) causes 0 to be returned [...] This should be documented [..]

I agree.

This slight deficiency in the documentation has probably survived for so many years because most API users understand that random indexes are not going to lead to anything good and use sqlite3_column_count() to limit index values.

Yes, it is obvious. I am of the opinion that all behaviour should be documented, even if it is obvious.

I think your mistrust is not well founded on the (present) fact that the contours of the "correct argument" space are not precisely or completely spelled out in the place(s) you expect.

I should also have chosen a less harsh wording. "There may be other discrepancies in the documentation" is a better description of how I feel about the matter.

Probably. Can you point to a few important omissions?

I'm sorry, I cannot; I'm not familiar with the internals of the SQLite code base.

So, given valid a valid statement pointer and a valid column index, a NULL pointer is returned in case of memory allocation failure, and probably some other cases. IMHO, mentioning that would improve the documentation. Again, it's just a matter of preference. I prefer explicit docs, but that's just my opinion.

Since it is (unless I'm mistaken) not possible to differentiate between memory error and other errors, maybe the best improvement would be to remove the following sentence: "If sqlite3_malloc() fails during the processing of either routine (for example during a conversion from UTF-8 to UTF-16) then a NULL pointer is returned."

One could also consider altering the start of the first sentence from "These routines return [...]" to "If successful, these routines return", but based on the responses in this thread, I assume the current wording is preferred.

The SQLite dev team is pretty good about improving doc defects that make it harder for SQLite library users to use it.

I'm glad to hear.

(15) By Keith Medcalf (kmedcalf) on 2021-02-23 10:19:26 in reply to 14.0 [link] [source]

All that needs to change is the addition of the following statement:

If either the statement pointer or the column index are invalid then the result is undefined.

(16) By Erlend E. Aasland (erlendaasland) on 2021-02-23 10:48:42 in reply to 15 [link] [source]

If either the statement pointer or the column index are invalid then the result is undefined.

The result is not undefined if the column index is invalid; it is NULL. If the statement pointer is NULL, the result is also NULL. If the statement pointer points to a random location, the result is undefined.

(18) By Gunter Hick (gunter_hick) on 2021-02-23 11:28:30 in reply to 16 [link] [source]

That would constitute a (new) guarantee about undocumented behavior, based on the behaviour of the current implementation. Using the code word "undefined" implies "subject to change without notice".

(19.1) By Erlend E. Aasland (erlendaasland) on 2021-02-23 11:45:01 edited from 19.0 in reply to 18 [link] [source]

Right. I was just pointing out the current behaviour, not suggesting to change the docs as such.

This aligns pretty good with the proposals I mentioned in my first answer to Larry. IMHO.

(23) By Larry Brasfield (larrybr) on 2021-02-23 15:00:33 in reply to 18 [link] [source]

That's a good point, one which needs to be understood by API users. The project's founder and chief is very careful about expanding the set of defined behaviors promised by the API. There is a long-term (decades) commitment to backward compatibility that is made harder to meet as the API and its actual promises is grown.

Also, as I mentioned earlier, a NULL statement pointer only yields a certain NULL return when SQLITE_API_ARMOUR has been #define'd for the library build. Otherwise, the code goes on the use the statement pointer and the possible result set expands enormously (and undefinedly.)

(24) By Larry Brasfield (larrybr) on 2021-02-23 15:42:05 in reply to 14.0 [link] [source]

So, given valid a valid statement pointer and a valid column index, a NULL pointer is returned in case of memory allocation failure, and probably some other cases. IMHO, mentioning that would improve the documentation. Again, it's just a matter of preference. I prefer explicit docs, but that's just my opinion.

I see that last sentence as reaching an important consideration. How completely the range of API inputs and outputs is to be described is not a black-and-white issue. It is a trade-off between competing virtues. I agree that clear description of the inputs producing defined results and what those results should be ^a is a virtue. Another virtue is brevity, one valued by busy and productive programmers. A less obvious virtue is time not taken by the dev team for documentation, leaving them more time to improve the library. Personally, I am not so sure of the optimal trade that I would proclaim that it is made wrongly.

... maybe the best improvement would be to remove the following sentence: "If sqlite3_malloc() fails during the processing of either routine (for example during a conversion from UTF-8 to UTF-16) then a NULL pointer is returned."

I heartily disagree. That sentence informs API users who provide sensible inputs that they may get a NULL return, and when. This is a core concern, to be contrasted with what may be expected from nonsensical inputs.

One could also consider altering the start of the first sentence from "These routines return [...]" to "If successful, these routines return", but based on the responses in this thread, I assume the current wording is preferred.

Your suggested wording is implicit (obviously IMO) in what is said now. In fact, your qualifier could be attached accurately to virtually every API in the library. If I were forced (under dire threat) to put that in the docs, it would go in a general "Caveats" section rather than cluttering the rest.

On what is preferred: That is up for discussion. It is practically tautological that the present wording was preferred by its author when written. Those preferences change, with time, authors and feedback. I have seen many small doc improvements result from forum questions and comments.

a. "Should be" absent an implementation flaw.

(17) By Erlend E. Aasland (erlendaasland) on 2021-02-23 11:26:45 in reply to 13 [link] [source]

Based on your reply, Larry, I conclude that given valid inputs and a NULL result, I cannot be certain that the cause of the error was sqlite3_malloc failure.

(22) By Larry Brasfield (larrybr) on 2021-02-23 14:53:12 in reply to 17 [source]

My reading of the sqlite3_column_name() doc is that it will return the column name when given valid inputs, with one exception: out of memory. This interpretation is also supported by the code.

The doc focus is on "the API contract" -- what you can count upon the library to do once you have provided inputs for which the API can serve its purpose.

Many of the APIs return integer result status, and for those an SQLITE_NOMEM return is used to indicate an out-of-memory error. But the simpler APIs which just return directly useful information have no way to indicate that error. Hence such returns as NULL for OOM conditions.

(25) By Erlend E. Aasland (erlendaasland) on 2021-02-23 17:02:33 in reply to 22 [link] [source]

My reading of the sqlite3_column_name() doc is that it will return the column name when given valid inputs, with one exception: out of memory.

This is also my reading of the docs.

Quoting your first reply to one of my questions:

Are there other cases not mentioned? Probably. Can you point to a few important omissions?

I guess I've misinterpreted this reply; what I though you meant was: Given valid inputs sqlite3_column_name() may return NULL if sqlite3_malloc() fails, and possibly other cases.

(26) By Larry Brasfield (larrybr) on 2021-02-23 17:23:32 in reply to 25 [link] [source]

My question, "Can you point to a few important omissions?", was posed in the context of doubt expressed about the docs generally. (The one on sqlite3_column_name() having been heavily trodden upon already.)

I would encourage you or others to mention where the docs fail to mention something important to API users who: seek to provide valid inputs but cannot see or infer what they are; or cannot see or readily and accurately infer from the doc what the expectable and/or assured results will be. From my observation of the project over several years, I believe such deficiencies are likely to be addressed with doc improvements and I believe tips as to where they exist are welcome.

(27) By Erlend E. Aasland (erlendaasland) on 2021-02-23 19:19:48 in reply to 26 [link] [source]

Thank you for the encouragement. I do not find the docs hard to read or hard to interpret. In fact, I find the SQLite C API easy to use and easy to learn. It was just my opinion that the documentation of sqlite3_column_name() was a little bit vague in my quest to provide a precise error message. Thank you for providing me with an answer; I do appreciate that.

If I post here again, I will try to formulate my questions unambiguous and with a clear context to avoid the confusion.

[..] I believe such deficiencies are likely to be addressed with doc improvements and I believe tips as to where they exist are welcome.

I'm glad to hear.

(3) By Gunter Hick (gunter_hick) on 2021-02-22 10:37:57 in reply to 1 [link] [source]

SQLite will invent a name if none is provided in the AS clause. One should not rely on names invented by SQLite, they may change between releases.

What return value did you expect for a NULL statement pointer or an out of range column index, given that the function returns const char*?

(7) By Keith Medcalf (kmedcalf) on 2021-02-22 10:55:22 in reply to 3 [link] [source]

I would have resort to the HaltEP instruction (Halt and Electrocute Programmer).

(8) By Erlend E. Aasland (erlendaasland) on 2021-02-22 10:56:17 in reply to 3 [link] [source]

What return value did you expect for a NULL statement pointer or an out of range column index, given that the function returns const char*?

Assumption is the mother of a lot of programming errors. What if the programmer assumed an empty string was returned in those cases? I prefer clarity. Check the manual page for isdigit(), for example. The return value of isdigit is pretty obvious, but it's still documented.

(9) By Gunter Hick (gunter_hick) on 2021-02-22 12:30:17 in reply to 8 [link] [source]

Consider

select 1 as "";

----------
1

which names the column with an empty string, generating a column name of NULL is not so easy.

select 1 as NULL;
Error: near "NULL": syntax error

explain select 1 as ?;
Error: near "?": syntax error

(10) By Erlend E. Aasland (erlendaasland) on 2021-02-22 13:01:37 in reply to 9 [link] [source]

select 1 as ""

sqlite3_column_name() returns an empty string, as expected.

select 1 as NULL;

This will fail at sqlite3_prepare_v2(), as expected. There is no statement object, so calling sqlite3_step() and sqlite3_column_name() makes no sense.

explain select 1 as ?

This will also fail at sqlite3_prepare_v2(), as expected. There is no statement object, so calling sqlite3_step() and sqlite3_column_name() makes no sense.