Are there any plans / chances of supporting a RETURNING clause?
I've recently been re-using sqlite for personal software, and got annoyed several times at the necessity to work around the lack of
RETURNING clauses when inserting rows, especially bulk-insert (even more so with upserts), as well as deleting rows.
I understand that since sqlite remains a single-writer system the concurrency aspects of
RETURNING are not really a concern regardless of the operation, but
RETURNING is oft extremely convenient, and quite simplifies the application code. I think it would be a boon.
tl;dr; would love to see this feature, it would make sqlite, personally, go from 'meh, easier to just use json/xml files' to a critical tool in the toolbox.
So much this, every few years I come and check on SQLite to see if it has the returning clause yet, it never does =(.
I'd even be willing to toss money into the pot to support its development, but currently only see 1 way to fund development and that is the 85k/year business subscription....
I'd even be willing to toss money into the pot to support its development, but currently only see 1 way to fund development and that is the 85k/year business subscription....
Me too. I would pay for a more complete ALTER TABLE support, but cannot afford the business subscription. Imho, there should be a possibility for private customers to donate, too!
take a look at this repo that I developed to workaround the missing full ALTER TABLE support: https://github.com/marcobambini/sqlite-createtable-parser
Two points of feedback:
Can y'all provide concrete examples of how you would find RETURNING useful. I can imagine cases where it would be useful, but in my own database programming (which this days is mostly confined to working on the Fossil source code) I don't recall ever coming across any such cases.
Anyone who is seriously interested in encouraging the implementation of new features through funding is welcomed to contact me directly at "drh at sqlite dot org".
I don't recall ever coming across any such cases.
Scanning the Fossil source code, I found two potential uses for RETURNING:
But neither of these are a big gain, either in performance or simplicity of code. Perhaps my coding style is atypical? Or perhaps there are other use cases that I'm just not seeing?
I suppose the value of
RETURNING depends on a good answer to this question: Why would the calling code not already have the desired return values? Didn't it just send them to SQLite to update the DB?
To take the Fossil example, the pattern may be "create new artifact; return artifact ID". Didn't the C code creating that artifact create the new artifact's hash, thus doesn't need SQLite to give it back? It already has the data.
Most of my uses of DB update immediately followed by
SELECT over the changed data are sufficiently well addressed by
sqlite3_last_insert_rowid(), since it requires a subsequent trip up through the UI and back down into the app core before I'll know what that subsequent
SELECT actually is. If I knew what data the user was going to request next, I'd just cache it as part of building the DB update query and return that, not ask SQLite to give me back something I just gave it.
That leads me to wonder if
RETURNING is largely useful with complicated SQL queries that do calculation on supplied values, so the updated data aren't obvious from the executed query. This requires either use of custom functions or complicated expressions using standard SQLite functions.
Also, doesn't the SQLite caching layer remove a large part of the value of
RETURNING? As far as I can tell, the main value of this SQL feature is predicated on the idea that a DBMS round-trip is expensive, but that's only true for client/server DBs. If you select from just-modified records, isn't SQLite -- being in-process -- going to return a result from RAM except in the rare instance that another concurrent query modified the same rows in between?
I repeat drh's challenge: what's the use case?
some reasons the calling code may not have access to the values:
- database defaults.
- non-integer primary key.
- computed values.
- batch inserts, where its easier to just insert then scan out the fields especially when combined with the 3 other issues above.
returning is really just a generalized form of last row id. I find it very useful from expressing the intent. yes I could write two queries but I also could just write one that does what I want, and generally the tooling already supports scanning results back out of the query.
I see it as a QOL improvements while developing beyond performance the performance benefits it provides in typical client/server environments. Once you have it, you start reaching for it in a bunch of contexts.
Completely agree. I've had these same thoughts for a while now,
but didn't bother writing them down. I'd also add to your list
compatibility with other DBs, including of course SQLite's usual
go-to for compatibility, PostgreSQL. This would make porting to
SQLite easier. I know of at least one large commercial product
swapping a client-server RDBMS backend in the enterprisey solution
with SQLite for the personal edition of that same product, and
it's also something I had in mind. Not having to work-around these
types of differences would be a plus for SQLite.
I'm not sure there is much utility in the low-level code, using prepare-step-reset-etc.
The real utility for this comes in the high-level application code where you have abstracted calls to handle the above for you, much in the way that "sqlite3_execute()" does, or when working through a wrapper.
Here are a couple of typical examples with explicit utility:
exec("CREATE TABLE t(a INTEGER PRIMARY KEY, b TEXT);"); exec("INSERT INTO t(b) VALUES ('Johnny') ,('Joan') ,('Jane') ,('Jimmy') ;");
Now there is no way for me this high-level to know what the PK is for those inserted values unless I do those all one-by-one and get the LAST_INSERT_ROWID every time. A simple change like this:
query("INSERT INTO t(b) VALUES ('Johnny') ,('Joan') ,('Jane') ,('Jimmy') RETURNING a,b ;")
would do the inserts and instantly give me a result like this:
1,Johnny 2,Joan 3,Jane 4,Jimmy
which can be parsed in my high-level side to know exactly what was inserted with which ID.
In that example the "inserted" seems obvious, those are the ones I asked for, but in the next example it is less obvious:
query("UPDATE t SET b = 'Mr. ' || b WHERE b LIKE '%y' RETURNING a,b will return: a, b 1, Mr. Johnny 4, Mr. Jimmy
which again in the high-level parser will be extremely helpful in gaining easy access to feedback to a user (or programmer) what was updated - maybe even to construct a next query based on what was changed, if any.
I propose that the reason you do not find lots of application in current works for this RETURNING model, is merely because it did not exist yet and any possible use of it was handled another way - and perhaps you mainly use the low-level API and never have to work through a wrapper. Add it and application will arise (especially in higher-level code).
query("DELETE from t WHERE a > 3 RETURNING b"); will return b Jane Jimmy
This obviates the need for a pre-query to see what will be affected, then a delete query to affect it, then a post-select query to see if all was indeed deleted as expected, and obviates the need for doing a transaction for all of this. One statement (a transaction unto itself) that executes the idea, and returns the result.
On the high-level side of the programming spectrum, perhaps working through a wrapper (which I know is not something you folks ever do), this kind of construct is an invaluable time-saver and code-quantity-saver. And less code means less opportunity for error and less debugging (as Richard always points out when advocating doing stuff in SQL rather than in code).
Thank you for listening!
My usecase for DELETE RETURNING would be to have a list of all deleted id's so that I can easiliy remove them from a virtual treeview component. Currently, I have to do a SELECT first to get the id's, then delete them afterwards. So it's 2 steps instead of just 1.
Yeah, the RETURNING clause seems to be primarily associated with those who want to intermix "update" of the data with "display" of the data rather than maintain independence between the "update process" and the "display process".
It has been my experience that down this path is peril and monsters and much failed projects.
Storing and Updating the datastore is a quite independent process from displaying the contents of the datastore. Intermingling them for no good reason is unwise (and being "easier" is no good reason).
It has little to do with displaying the data Keith, and is all about
keeping the in-memory representation of the data in-sync with the
persistent store (SQLite) of that same data, when part of the data
is generated by the datastore (SQLite). That includes the good old
rowid, but now more content can be SQLite-side generated, given
UPSERT, Generated columns, triggers, etc...
last_insert_rowid() is basically global state, which is generally
a bad thing, and makes much more sense as a
RETURNING clause. In fact
clients not interested in that rowid still have to pay for it being
stored, while with
RETURNING only those explicitly asking for it would.
Of course BC mandates everyone will have to pay for it forever, still,
I don't see why anyone would agree this to be a good thing. Had
been available then, we'd never have had
Which now that there are
WITHOUT ROWID tables, feel all a misfeature.
Adding a RETURNING clause does nothing with respect to maintaining consistency between the persistent store and the in-memory view of the database. It is entirely possible and likely that changes to the persistent store will occur independently of the one bitty program that wants to use the RETURNING clause.
Reliance upon RETURNING for this purpose pre-supposes that the database is SINGLE-USER. Unless one is taking precautions to ensure that the datastore is in fact SINGLE-USER and cannot be modified by anyone else, then using RETURNING to maintain consistency is ill-conceived.
Rather than wasting time on ill-conceived endeavours, I would suggest that it would be far more productive to take a more holistic and realistic view which is not dependent on a state of fact that does not exist.
Now you're reaching into unrelated areas...
The DB does not need to be SINGLE-USER at all, the DB is locked by the
statement itself, or by a transaction. If changes can happen out of band,
RETURNING or separate statements to fetch that same data after the fact
are no better or worse. Completely orthogonal.
You fail to understand. Just because you deleted something does not mean that it is the only change to the "tree view" control. Nor does it mean that the nanosecond after you deleted something, it was not added again by someone else.
The only way to obtain accurate results unless the database is single-user by the application and all other applications are precluded from making changes to the database is to re-read the database and make the "tree view" control consistent with the "authoritative source".
Otherwise, the "tree view" is merely a whimsy and not reflective of any actual state of affairs that actually existed at any point it time.
I'd say the same about you Keith :)
An in-memory view of the data has nothing to do with a UI or a tree view...
I'm talking about a C/C++ data objects here.
And single vs multi-user access to the DB (multi-connection in general)
is again completely unrelated and orthogonal to the benefits of a
You're a bit too quick to think everyone else but you understand issues IMHO, FWIW.
Sorry you feel that way, but the fact is that if you are assuming that nothing else is making concomitant changes to the datastore and design your code to rely on that fact, then you should be ensuring that this invariant condition is in fact true.
That is to say in the previous post that it would be perspicacious to address the manner of making the "in memory tree view" of the database consistent with the extant "persistent datastore" in an efficient manner rather than to make assumptions that there are no other influences on the "persistent datastore" (unless, of course, one has taken steps to ensure that there can be no other other mutators of that persistent datastore).
In other words, reliance on assumptions of fact being true, when they are actually false, will cause erroneous results.
I would point out that your argument goes directly against the "authoritative source" principle. The "authoritative source" of truth is the persistent datastore, not the copy of it that you happen to have in memory.
You must be prepared to deal with the fact that your copy may not be consistent with the "authoritative source" of truth and most efficient way to do that is to not have a copy which you have to continually ensure is consistent with the "authoritative source".
If you do choose to maintain a copy of the "authoritative source" for your use, you should be prepared to keep it consistent with the actual "authoritative source" and not with some proxy therefore.
(26) By doug (doug9forester) on 2021-01-30 05:27:30 in reply to 19 [link] [source]
Good point. If you want to always use the "auth source", how can SQLite help with that? Suppose you are displaying a list of xyz in your application, and someone adds (or deletes) members of xyz. What is the application supposed to do? It can't continually query the database in hopes of finding a change. Nor can it "listen" for a change and continually update the display of members, which would drive a user crazy.
So the single user design point works always when there actually is a single user. How can you actually design a user interface that faces continual (possible) updates in a multi-user environment? And can SQLite help with that?
(27) By Robert Ransom (rransom8774) on 2021-01-30 05:42:34 in reply to 26 [link] [source]
Nor can it "listen" for a change and continually update the display of members, which would drive a user crazy.
File management programs do this nowadays (when they can), and it doesn't drive their users crazy; people expect this capability now. In other applications, it may be better to gray out deleted items rather than removing them from the list.
In an SQLite-based system, this feature could be retrofitted in by using a trigger to call a user-defined ‘doorbell’ function, and having all instances of the table-viewer application listen for this event. Ideally this function would also be able to communicate the rowids of changed records to listeners.
(29) By Richard Damon (RichardDamon) on 2021-01-30 20:28:56 in reply to 27 [link] [source]
I believe that SQLite triggers only fire for the connection the executed the statement that did the deed, not other connections, which would require SQLite to create threads to make it happen.
For Windows, if I remember right, the file system can be asked to send a GUI event on a file change, so the GUI knows to update for the change (if needed). This would be beyond what would be expected of SQLite, but the program could set up a file system notification to the GUI on the database change.
No. The filesystem posts an event to an event semaphore.
This has nothing to do with the gooey application running on top of the Operating System.
The "Windows Gooey" is merely a WIMP interface running on top of the OS/2 New Technology (aka NT) Operating System (in previous incarnations it was a WIMP application running on top of DOS) much like X is a WIMP application that runs on top of Linux.
The operation of such events does not require a WIMP gooey and there is nothing which prevents the WIMP application (nor any application) from utilizing system services.
- Technically, X is not a GUI. X is a presentation system on which various other stuff runs that provides the "window manager" and "shell services"; much like the display layer in "Windows" is called the GDI layer, and on top of that runs a bunch of other stuff that provides the "window manager" and the "shell services".
Microsoft has gone even further and wrapped the GDI layer and "window manager" and "shell services" with yet more gigabytes of layers of cruft which go by a variety of names which change on a daily basis though the underlying Operating System and GDI have remained relatively unchanged for several decades.
(32) By Richard Damon (RichardDamon) on 2021-01-31 02:46:22 in reply to 31 [link] [source]
I appear to have misremembered, or that was the effect of some wrapper that was being used, as typically you are using the notification to cause the GUI update to reflect the file system change, so making it a message can be a common usage (and removes the multitude of issue that can pop up if you try to alter stuff in multiple threads).
That still makes it not really suitable to embed into SQLite itself as a way to notify an application of a data change.
You must not have ever worked with a non-bitty multiuser database application. The "view" presented to the user on the Cathode Ray Tube (or the Teletype machine) is not the "ongoing" state of affairs, it is "the truth" as it existed at the exact instant that it was claimed to have existed.
The fact that the bag-of-water reading the Teletype printout takes twenty minutes to read the output has zero relevance on the fact that, at the instance the teletype printed the data, in represented the exact truth at that instant. Not some programs "impression" or "wishful thinking" of what the state of truth was at that instance -- but the actual hard-boiled truth.
If the bag-of-water wants a subsequent printout of the truth as it exists at some later time, then the bag-of-water can request that another printout of the truth be obtained which reflects the truth at the time of the request.
If the bag-of-water wishes to change the "state of truth" from state A to state B by modifying C, then the very first step is to ensure that state A is still the truth before applying C to arrive at state B.
This is concurrency management 101. Actually, I think it is concurrency management pre-school (that is, it is a foundational concept) and as such no education is possible without a grounding in the foundational concepts.
This is the reason so many people find concurrent programming hard and why many attempts at concurrent programming fail (and this includes things like multithreading). Great effort is spent addressing symptoms rather than the root cause.
(23.1) Originally by Holger J (holgerj) with edits by Richard Hipp (drh) on 2021-01-28 12:38:22 from 23.0 in reply to 1 [source]
If there is anybody who doubts that this is a useful feature, these are good readings:
It saves a lot of round trips to the database after INSERTs and UPDATEs and allows deleting rows from one table while immediately inserting them into another. Advantages:
- Data are not transferred between client and server (in case of SQLite between the lib and the application).
- No explicit transaction is necessary as the whole statement is an autocommit transacation.
(24) By Richard Damon (RichardDamon) on 2021-01-28 13:00:50 in reply to 23.1 [link] [source]
Note that there is a BIG difference in the cost of a 'Round Trip' to the database in something like a classical server-based database and SQLite, which is a critical difference. With a server-based database, there are significant communication delays in the 'round trip', reaching out to it and getting a response. With SQLite, there is basically no trip to speak of, as SQLite sits in your process space.
The whole concept of having lots of code to keep a cached version of the database can become a pessimism when you take into account that it will eat up process memory that could otherwise have been used to up the memory cache for SQLite and the OS, rather than building a duplicate cache for the data that is likely sitting in those already.
(25) By doug (doug9forester) on 2021-01-29 18:49:35 in reply to 24 [link] [source]
In the real world, applications which use use pulldown lists (sometimes nested), querying a database to gather the list (and marshal the values into a form required by the menu interface) is just too costly to performance. As soon as SQLite provides an interface that matches a SELECT query to an arbitrary in-memory structure, I'll be glad to give it a whirl. However, I think you'd agree that it's the application's responsibility to provide the data in a form that matches the interface, not the database's responsibility.
(30) By Richard Damon (RichardDamon) on 2021-01-30 20:45:01 in reply to 25 [link] [source]
Have you actually checked the timing of this sort of thing with SQLite (as opposed to a full-service Client-Server Database). Remember that SQLite is running in-process accessing the same sort of memory as your cache system that you are writing to try to handle this. Yes, SQLite is likely a bit slow to get the data than from cache store optimized for a particular data model, but you also charge against the cache the operations it is doing to keep it up to date and for every data 'miss' to that cache. Also, it is using up memory that could have been added to the SQLite or OS caches to help avoid needing to go to the actual disk for data (since it is likely caching the recently accessed data that those other caches are holding).
IF you are submitting a change that you can't just know what the results are going to be (so you need the RETURNING clause), it seems that actually doing the SELECT to get fresh data isn't going to be that costly.