[feature req] VACUUM INTO trigger

(1) By ingo on 2022-10-25 17:32:28 [link] [source]

The other thread on "vaccum into" https://sqlite.org/forum/forumpost/a00368874b made me aware of its existence and the first thing I thought of, that would be nice if it could be used in a trigger.

(2) By Simon Slavin (slavin) on 2022-10-25 22:06:46 in reply to 1 [link] [source]

What's your use case ? How would it be useful to perform a VACUUM INTO inside a trigger ?

(3) By ingo on 2022-10-26 06:24:34 in reply to 2 [link] [source]

When doing "trigger based" database maintenance, I can see me using 'vacuum into' in a before delete trigger (and 'vacuum' in an after delete trigger). Time(stamp) based backups is an other one. Back up attached db's? I can see it being used in the raise function for mission critical applications, backup before continuing with anything else.

I have cases where I like to have 'everything' done by the db engine. An almost completely self contained unit that one can connect to and switch on the data pump.

(4) By Simon Slavin (slavin) on 2022-10-26 12:54:31 in reply to 3 [link] [source]

Hmm.

VACUUM is an extremely intensive operation. It does a lot of storage access and takes a lot of time. And the amount of time it takes is unpredictable. Once you have a useful amount of data in your database it will bust every one of your caches, which means that not only will VACUUM take a long time, but the next operation you do will take a long time because none of the things it uses are still cached.

You have the possibility of repeated TRIGGERs. For instance, consider a database of authors and books. If you delete an author, it automatically deletes all the books that author wrote. If you put a TRIGGER on deleting books it could be called many times in a row just by one DELETE command. You could easily tie up your program for minutes just from one command.

You have the possibility of backing up the wrong thing. Suppose you begin a transaction, delete a row, then roll it back instead of committing. Could you figure out what happened from the results of the VACUUM commands ?

If you could VACUUM INTO inside a TRIGGER, which your question implies you can't, I still wouldn't use it. I would defnitely not use it for anything mission-critical since it's more likely for VACUUM INTO to fail (external drive disconnected, external drive failure, network failure, disk full, folder deleted) than a normal SQLite command.

(5) By ingo on 2022-10-26 14:25:20 in reply to 4 [link] [source]

using you book example, one reads a lot of books and marks them read in the database. On the marking there's a trigger that looks, when was the last pruning? Is it more than a month ago and not done this month, start vacuum into. After that delete all read books. Once done, a plain vacuum. Ok, a contrived example, but not unlike what can done with/to time series. Average 1s data into 10m data. Remove the 1s data, backup before that.

Vacuum into is also a nice way to write a in-memory-database to disk (using a trigger ;) ).

Anyway, it's just an idea that popped up and I agree, with great power come ....

(6) By anonymous on 2022-10-28 10:48:00 in reply to 1 [link] [source]

This would need a major rewrite, because vacuum is not allowed inside any kind of transaction.

(7.4) By MBL (UserMBL) on 2022-10-31 09:22:44 edited from 7.3 in reply to 1 [source]

not sure if my solution, which uses a user defined extension function, will work for you or not ... but I did some first tests and it is working for my needs already.

get the source db handle from context and use it to get the source filename of schema 'main'
create a new connection to the source db and execute the 'vacuum into 'target db'
close the top level connection
free resource from the malloc
all this is done while the UDF calling transaction is still in progress

I grant all rights to this code to the public domain. Do good and not evil.



#ifdef WITH_VACUUMINTOFUNC

// will be called with the file path of the new database copy as 1st parameter
// the 2nd parameter is either the schema name or the filepath of a source database
// when called only with one paraemter the schema defaults to 'main'
// extracts the name of the database file or uses the 2nd parameter as source database
// the 2nd parameter is a schema name if it does not contain the '.' dot symbol
// creates a top-level-connection for the source filepath
// issues the "vacuum into 'filepath'" command
// return any error from that call or SQLITE_OK if the command succeeded

static void vacuumIntoFunc( sqlite3_context *context, int argc, sqlite3_value **argv )
{
  const char *outputText;
  const char *dbfilename;
  const sqlite3 *sourceDB;
  int rc1, rc2, rc3;
  char *errMsg;
  const sqlite3     *db = sqlite3_context_db_handle(context);
  const char  *pBakFile = sqlite3_value_text(argv[0]);
  const char    *schema = "main";    // default in case the 2nd parameter is not given
  if( argc>=2 )
	 schema = sqlite3_value_text(argv[1]);
  if( pBakFile )
  {
	 if( argc>=2  &&  strstr(schema,".") )	// if the schema contains a dot it is the file path of the source and not the schema
		dbfilename = schema;
	 else
		dbfilename = sqlite3_db_filename(db, schema );

         if( strlen(dbfilename)>=1 )
	        rc1 = sqlite3_open( dbfilename, &sourceDB );       // need a new connection to start Vacuum into from top level
         else
		rc1 = SQLITE_ERROR;             // in-memory database would return 'cannot VACUUM - SQL statements in progress'

	 if( dLevel>=3 )   // when dLevel>=3 then show information
	 {
		OutputDebugString( schema );
		OutputDebugString( dbfilename );
		OutputDebugString( pBakFile );
	 }

	 if( rc1==SQLITE_OK )
	 {
		outputText = sqlite3_mprintf( "VACUUM INTO '%s'; -- from '%s'.'%s'", pBakFile, schema, dbfilename );
		if( dLevel>=3 )   // when dLevel>=3 then show sql statements
			OutputDebugString( outputText );

		rc2 = sqlite3_exec( sourceDB, outputText, NULL, NULL, &errMsg );
		sqlite3_free( outputText );

                rc3 = sqlite3_close( sourceDB );
		if( rc2 == SQLITE_OK )
		{
			if( rc3 == SQLITE_OK )
				sqlite3_result_text(context, "SQLITE_OK", -1, SQLITE_TRANSIENT );
                        else
				sqlite3_result_text(context, "close failed", -1, SQLITE_TRANSIENT );
		}
		else
		{
			if( dLevel>=3 )   // when dLevel>=3 then show sql statements
				OutputDebugString( errMsg );
			sqlite3_result_text(context, errMsg, -1, sqlite3_free );
		}
	 }
	 else
	 {
		errMsg = sqlite3_mprintf( "cannot open database '%s' with schema '%s'", dbfilename, schema );

		if( dLevel>=3 )   // when dLevel>=3 then show sql statements
			OutputDebugString( errMsg );
		sqlite3_result_text(context, errMsg, -1, sqlite3_free );
	 }
  }
  else
  {
	 errMsg = "no useful target filepath given";
	 if( dLevel>=3 )   // when dLevel>=3 then show sql statements
		OutputDebugString( errMsg );
	 sqlite3_result_text(context, errMsg, -1, SQLITE_TRANSIENT );
  }
}

#endif // WITH_VACUUMINTOFUNC

...


#ifdef WITH_VACUUMINTOFUNC
  sqlite3_create_function_v2(db, "VacuumInto", 1, SQLITE_UTF8, db, vacuumIntoFunc, 0, 0, NULL);  // called with filepath of the target db, default schema 'main'
  sqlite3_create_function_v2(db, "VacuumInto", 2, SQLITE_UTF8, db, vacuumIntoFunc, 0, 0, NULL);  // called with filepath and given schema name or source filepath
#endif // WITH_VACUUMINTOFUNC

How it is used should also work from inside a trigger:


D:\SQLite3x\sandbox>sqlite3.exe LoadExTest.SQB
SQLite version 3.39.4 2022-09-29 15:55:41
Enter ".help" for usage hints.
sqlite> .load ../Win32/Debug/sqlite3.dll
sqlite> .mode box
sqlite> attach './slave.db' as slave;
sqlite> select VacuumInto('./Target1.db')              as mainTarget1  -- main as default
   ...>      , VacuumInto('./Target2.db','main')       as mainTarget2  -- main from 2nd param
   ...>      , VacuumInto('./Target3.db','slave')      as slaveTarget1 -- slave from 2nd param
   ...>      , VacuumInto('./Target4.db','./slave.db') as slaveTarget2 -- slave by given file path
   ...>      ;
┌─────────────┬─────────────┬──────────────┬──────────────┐
│ mainTarget1 │ mainTarget2 │ slaveTarget1 │ slaveTarget2 │
├─────────────┼─────────────┼──────────────┼──────────────┤
│ SQLITE_OK   │ SQLITE_OK   │ SQLITE_OK    │ SQLITE_OK    │
└─────────────┴─────────────┴──────────────┴──────────────┘
sqlite> select VacuumInto('./Target2.db','main') as main;
┌────────────────────────────┐
│            main            │
├────────────────────────────┤
│ output file already exists │
└────────────────────────────┘
sqlite>

EDIT1: updated for improved error handling (to avoid memory leaks), 2nd parameter usage as schema name or source file path, test results

EDIT2: close sourceDB connection independent from the success of the VACUUM INTO execution, which fails if the backup database already exists

EDIT3: moved sqlite3_free up some lines, correction in handling of possible error by sqlite3_close (which returned NULL instead of an error text)

EDIT4: :memory: database returns an empty string as dbfilename, which would have written an empty database as copy; changed into an error text return

(8) By ingo on 2022-10-29 07:16:37 in reply to 7.1 [link] [source]

Thank you!

(9) By anonymous on 2022-10-29 08:50:47 in reply to 7.2 [link] [source]

I believe you are leaking ouputText in some cases. Move the free right after the exec.

(10) By MBL (UserMBL) on 2022-10-30 07:01:27 in reply to 9 [link] [source]

I did not see where you suspected any memory leak but I shifted up the free command just after the last usage of the outputText. While I did that I realized that I double checked rc2 instead of once rc2 and in addition rc3. This has now also been corrected. - Thanks for the feedback.

I can see one weakness in my function as I only tested with a database file: I did not check yet what will be returned for an in-memory database by the following command and if that returned dbfilename would also work with an in-memory database.

dbfilename = sqlite3_db_filename(db, schema );

(11) By ingo on 2022-10-30 07:23:56 in reply to 10 [link] [source]

from an in memory db it creates a 4kb file with no table definitions or data.

but, as I know very little about c programming and compiling I maybe did something wrong. I stuck your code in the run time loadable extension template. Then commented out all if( dLevel>=3 ) as it errored on that. It does not know dLevel.

It seems to work fine for file based db's

Thank you.

(13) By Keith Medcalf (kmedcalf) on 2022-10-30 16:27:11 in reply to 11 [link] [source]

:memory: databases are private to the (single) connection which "attached" (created) the :memory: database instance. Mutatis Mutandis a database "created or attached" using the name '' (an empty string).

They cannot be accessed by any other "private" connection, whether in the same process or a different process, other than the one on which the database was created/attached.

You may, however, use the memdb vfs to create a database which exists in memory (like a :memory: database) but is a "named memory object (in that process)" that can be accessed by multiple private connections from the same process only. (memdb does not create a interprocess named shareable memory region).

Such a database has a name of the form: /filename and uses the VFS memdb. (That is, file:///filename?vfs=memdb in URL form.)

(14) By MBL (UserMBL) on 2022-10-31 09:26:07 in reply to 13 [link] [source]

Thanks for this reply.. in my EDIT4 above I now consider the length of the dbfilename and do not try to sqlite3_open this non-filename, which would create an empty new in-memory database and not the one in consideration; converting such a call into an error-text-return.

(15) By Ryan Smith (cuz) on 2022-10-31 09:38:45 in reply to 14 [link] [source]

Just an aside, but one of the parameters to the sqlite3_open_ functions takes a flag that causes the operation to fail if the target file/path does not exist (or is inaccessible). Would it not be prudent to simply set that flag and let it error out if the name does not work?

(I realize I may be missing some previously mentioned fact that negates this method).

(12) By MBL (UserMBL) on 2022-10-30 07:41:01 in reply to 9 [link] [source]

dLevel was used in another recent post of mine about transaction identification.

I use the OutputDebugString function (Windows) during development phase. I meanwhile also added an UDF Tid() with a 64 bit long for the purpose of TransactionID within Triggers. My example just was using the dLevel from the DebugLevel() UDF because I had this ready made. Debugging only needs few levels but TransactionID's should last forever (at least as long as I am still alive).

I would really appreciate an enhancement of the dbfilename = sqlite3_db_filename(db, schema ); command to get a handle, maybe some kind of URI, to also use the same UDF for in-memory databases.

Thanks for you feedback.