SQLite Forum

Extract files from sqlar with a directory prepended
Login

Extract files from sqlar with a directory prepended

(1) By 6kEs4Majrd on 2021-06-12 23:49:11 [link] [source]

https://www.sqlite.org/sqlar.html

It shows the following command on how to extract an sqlar file.

sqlite3 example.sqlar -Ax

I'd like to extract the files but prepend the path with some directory name.

For example, if there is a file called myfile.txt, I'd like it to be extracted to mydir/myfile.txt instead of to ./myfile.txt. For files starting with '/', they are considered as illegal, errors should be printed and nothing should be extracted.

Could anybody show me the sqlite3 code to achieve this? If it is not possible to achieve this in raw sqlit3 code, can anybody show me the code in python asqw?

(2.1) By Adrian Ho (lexfiend) on 2021-06-14 03:22:25 edited from 2.0 in reply to 1 [link] [source]

See https://sqlite.org/cli.html#sqlite_archive_support, particularly the -C option. You also probably want to use the dry-run (-n) option to see the SQL queries that do the actual extraction.

(4) By 6kEs4Majrd on 2021-06-13 13:01:19 in reply to 2.0 [link] [source]

Here is what I see. I don't quite understand where dirOnly and dirare defined.

https://sqlite.org/cli.html

writefile() only has two arguments according to the above page. Why does it have 4 arguments below ($dir || name), sqlar_uncompress(data, sz), mode, mtime?

Why does the same select command appear twice below?

What does name NOT GLOB '*..[/\]*' mean? So things like abc../ is ignored? This does not make sense to me.

$ sqlite3 -Axfn my.db
-- open database 'my.db'
SELECT  ($dir || name), writefile(($dir || name), sqlar_uncompress(data, sz), mode, mtime) FROM sqlar WHERE (1) AND (data IS NULL OR $dirOnly = 0) AND name NOT GLOB '*..[/\]*'
SELECT  ($dir || name), writefile(($dir || name), sqlar_uncompress(data, sz), mode, mtime) FROM sqlar WHERE (1) AND (data IS NULL OR $dirOnly = 0) AND name NOT GLOB '*..[/\]*'

(5) By Adrian Ho (lexfiend) on 2021-06-13 14:49:51 in reply to 4 [source]

writefile() only has two arguments according to the above page.

That's just an example. The documentation continues:

Note that the readfile(X) and writefile(X,Y) functions are extension functions and are not built into the core SQLite library. These routines are available as a loadable extension in the ext/misc/fileio.c source file in the SQLite source code repositories.

Clicking on the fileio.c link gives you the source code with the full documentation of the writefile() function. Figuring out what those last two arguments do is a simple exercise for the reader.

where dirOnly and dir are defined

In arExtractCommand(). dir would appear to be the directory specified as the argument to the -C option I mentioned before (default: ""), while dirOnly is set to 0 and 1 successively, to first select the files in the archive, then the directories.

Why does the same select command appear twice below?

From the source:

    /* Run the SELECT statement twice. The first time, writefile() is called
    ** for all archive members that should be extracted. The second time,
    ** only for the directories. This is because the timestamps for
    ** extracted directories must be reset after they are populated (as
    ** populating them changes the timestamp).  */

What does name NOT GLOB '*..[/\]*' mean?

I'm pretty sure it's a cheap way to weed out directory traversal attacks like ../../../../../../../etc/passwd.

(6) By 6kEs4Majrd on 2021-06-13 16:01:14 in reply to 5 [link] [source]

Does ($dir || name) right after 'SELECT' do anything?

So GLOB '*..[/\]*' does match abc../? In other words, this glob can miss some legal paths?

Why is (1) needed?

If I want to ignore abspaths, I should add AND NOT GLOB '/*' at the end?

(8) By Adrian Ho (lexfiend) on 2021-06-14 03:17:49 in reply to 6 [link] [source]

Does ($dir || name) right after SELECT do anything?

It gives you the actual paths extracted, to be printed with --verbose.

So GLOB '*..[/\]*' does match abc../? In other words, this glob can miss some legal paths?

It would seem so. Yes. A more correct filter might be:

name NOT GLOB '..[/\]*' AND name NOT GLOB '*[/\]..[/\]*'

Why is (1) needed?

It's part of the overall fixed-format query because the user can specify paths to extract. Basically, that part of the query means WHERE (<path_filter_condition>), and 1 (or any nonzero value) just says "match 'em all":

$ sqlite3 test.sqlar -Axn 
SELECT  ($dir || name), writefile(($dir || name), sqlar_uncompress(data, sz), mode, mtime) FROM sqlar WHERE (1) AND (data IS NULL OR $dirOnly = 0) AND name NOT GLOB '*..[/\]*'
SELECT  ($dir || name), writefile(($dir || name), sqlar_uncompress(data, sz), mode, mtime) FROM sqlar WHERE (1) AND (data IS NULL OR $dirOnly = 0) AND name NOT GLOB '*..[/\]*'

$ sqlite3 test.sqlar -Axn /etc/logrotate.conf
SELECT  ($dir || name), writefile(($dir || name), sqlar_uncompress(data, sz), mode, mtime) FROM sqlar WHERE ( name = '/etc/logrotate.conf' OR substr(name,1,20) = '/etc/logrotate.conf/') AND (data IS NULL OR $dirOnly = 0) AND name NOT GLOB '*..[/\]*'
SELECT  ($dir || name), writefile(($dir || name), sqlar_uncompress(data, sz), mode, mtime) FROM sqlar WHERE ( name = '/etc/logrotate.conf' OR substr(name,1,20) = '/etc/logrotate.conf/') AND (data IS NULL OR $dirOnly = 0) AND name NOT GLOB '*..[/\]*'

If I want to ignore abspaths, I should add AND NOT GLOB '/*' at the end?

That would be my guess, yes.

(9) By 6kEs4Majrd on 2021-06-14 16:04:52 in reply to 8 [link] [source]

It gives you the actual paths extracted, to be printed with --verbose.

What do you mean? I don't see how to print the first ($dir || name).

$ sqlite3 --verbose test.sqlar -Axn
sqlite3: Error: unknown option: -verbose
Use -help for a list of options.

How to run the SQL commands in an sqlite3 session without using -Ax. I tried the following. But no files are written.

$ sqlite3 test.sqlar
SQLite version 3.35.5 2021-04-19 18:32:05
Enter ".help" for usage hints.
sqlite> SELECT  ($dir || name), writefile(($dir || name), sqlar_uncompress(data, sz), mode, mtime) FROM sqlar WHERE (1) AND (data IS NULL OR $dirOnly = 0);
sqlite>

(10) By Adrian Ho (lexfiend) on 2021-06-14 16:44:47 in reply to 9 [link] [source]

What do you mean? I don't see how to print the first ($dir || name).

The use of --verbose is documented in the help text:

$ sqlite3 -A --help
.archive ...             Manage SQL archives
   Each command must have exactly one of the following options:
     -c, --create               Create a new archive
     -u, --update               Add or update files with changed mtime
     -i, --insert               Like -u but always add even if unchanged
     -t, --list                 List contents of archive
     -x, --extract              Extract files from archive
   Optional arguments:
     -v, --verbose              Print each filename as it is processed
     -f FILE, --file FILE       Use archive FILE (default is current db)
     -a FILE, --append FILE     Open FILE using the apndvfs VFS
     -C DIR, --directory DIR    Read/extract files from directory DIR
     -n, --dryrun               Show the SQL that would have occurred
   Examples:
     .ar -cf ARCHIVE foo bar  # Create ARCHIVE from files foo and bar
     .ar -tf ARCHIVE          # List members of ARCHIVE
     .ar -xvf ARCHIVE         # Verbosely extract files from ARCHIVE
   See also:
      http://sqlite.org/cli.html#sqlar_archive_support

So:

# Extract with long options
$ sqlite3 test.sqlar -A --extract --directory ~/tmp /etc/logrotate.conf
# nothing printed

$ ls -l ~/tmp/etc/logrotate.conf 
-rw-r--r-- 1 aho aho 533 Jan 21  2019 /home/aho/tmp/etc/logrotate.conf

# Let's clean up and try again in verbose mode, this time with short options
$ rm -f ~/tmp/etc/logrotate.conf

$ sqlite3 test.sqlar -AxvC ~/tmp /etc/logrotate.conf
/home/aho/tmp//etc/lo$ ls -l ~/tmp/etc/logrotate.conf 

$ ls -l ~/tmp/etc/logrotate.conf 
-rw-r--r-- 1 aho aho 533 Jan 21  2019 /home/aho/tmp/etc/logrotate.conf

How to run the SQL commands in an sqlite3 session without using -Ax. I tried the following. But no files are written.

SELECT ($dir || name), writefile(($dir || name), sqlar_uncompress(data, sz), mode, mtime) FROM sqlar WHERE (1) AND (data IS NULL OR $dirOnly = 0);

$dir and $dirOnly are substituted internally in the SQLite shell. I've already explained the semantics of both parameters in an earlier reply.

Here's the actual query issued by the above extract, as revealed by the arExtractCommand() source pointed to in my earlier reply, and confirmed via a --dryrun run:

$ rm -f ~/tmp/etc/logrotate.conf

$ sqlite test.sqlar

sqlite> SELECT
   ...> ('/home/aho/tmp/' || name),
   ...> writefile(('/home/aho/tmp/' || name), sqlar_uncompress(data, sz), mode, mtime)
   ...> FROM sqlar WHERE (name = '/etc/logrotate.conf' OR substr(name,1,20) = '/etc/logrotate.conf/') AND (data IS NULL OR 0 = 0)
   ...> AND name NOT GLOB '*..[/\]*';
/home/aho/tmp//etc/logrotate.conf|533

sqlite> .quit

$ ls -l ~/tmp/etc/logrotate.conf 
-rw-r--r-- 1 aho aho 533 Jan 21  2019 /home/aho/tmp/etc/logrotate.conf

$ sqlite3 test.sqlar -AxnC ~/tmp /etc/logrotate.conf
SELECT  ($dir || name), writefile(($dir || name), sqlar_uncompress(data, sz), mode, mtime) FROM sqlar WHERE ( name = '/etc/logrotate.conf' OR substr(name,1,20) = '/etc/logrotate.conf/') AND (data IS NULL OR $dirOnly = 0) AND name NOT GLOB '*..[/\]*'
SELECT  ($dir || name), writefile(($dir || name), sqlar_uncompress(data, sz), mode, mtime) FROM sqlar WHERE ( name = '/etc/logrotate.conf' OR substr(name,1,20) = '/etc/logrotate.conf/') AND (data IS NULL OR $dirOnly = 0) AND name NOT GLOB '*..[/\]*'

(11) By 6kEs4Majrd on 2021-06-14 20:33:31 in reply to 10 [link] [source]

sqlite> SELECT
   ...> ('/home/aho/tmp/' || name),
   ...> writefile(('/home/aho/tmp/' || name), sqlar_uncompress(data, sz), mode, mtime)

I don't want to use the path literally like '/home/aho/tmp/' in a sqlite session. Is there a way to specify the parameter $dir via the command line as in sqlite3 test.sqlar -A --extract --directory ~/tmp /etc/logrotate.conf? So that I can call the command returned by -n without replacing the variables like $dir to their actually value?

(12) By Adrian Ho (lexfiend) on 2021-06-15 02:40:40 in reply to 11 [link] [source]

See https://sqlite.org/cli.html#sql_parameters. You'll have to generate more code to create and populate the necessary temp table just to support parameters, so I don't think it's worth the trouble.

Is there a reason you can't simply generate the fully-expanded sqlar query directly?

(13) By 6kEs4Majrd on 2021-06-15 21:55:28 in reply to 12 [link] [source]

If my understanding of https://sqlite.org/cli.html#sql_parameters is correct, I think it is to specify the parameters in a sqlite3 script or session. But I need the parameters in the command line just like the -C option of sqlite3 -A.

So there is not a way to do so?

Is there a reason you can't simply generate the fully-expanded sqlar query directly?

I need such parameters to be easy to be changed in the command line. Changing command line parameters is easier than changing a sqlite3 script which I have to deal with special parameters that need to be escaped.

(14) By Adrian Ho (lexfiend) on 2021-06-16 03:03:02 in reply to 13 [link] [source]

If my understanding of https://sqlite.org/cli.html#sql_parameters is correct, I think it is to specify the parameters in a sqlite3 script or session. But I need the parameters in the command line just like the -C option of sqlite3 -A.

If you're saying "I need something like sqlite3 -Axf my.sqlar --param \$dir=/my/dir --param \$dirOnly=42 to work", then no, that's not currently a supported use case, and the SQLite shell's archive logic will substitute its own values anyway.

For the general (hypothetical) case of:

sqlite3 -param '$a' Bob -param '$b' 42 my.db 'SELECT * FROM t WHERE name = $a AND age = $b'

that's for Dr. Hipp and the other maintainers to decide if they want to support this. Until then, you'll have to write your own script that feeds sqlite3 with additional parameter table instructions, or fully-expanded queries.

(15) By anonymous on 2021-06-16 06:50:39 in reply to 14 [link] [source]

This "general case" is already possible using multiple -cmd options and the .parameter sqlite-cli-command:

sqlite3 -cmd ".param set :a 'Bob'" -cmd ".param set :b 42" my.db 'SELECT * FROM t WHERE name = :a AND age = :b'

(using : instead of $ just because I'm to lazy to handle the shell quoting of $)

(16) By Adrian Ho (lexfiend) on 2021-06-16 16:10:09 in reply to 15 [link] [source]

While technically true, note that using -cmd forces the same sort of string substitution that make security-conscious folks very nervous indeed, and can fail even if security were not a concern.

For instance, strings with embedded single quotes break .param if not passed in a somewhat non-obvious way:

$ sqlite3 -cmd ".param set :name 'Bobby's Tavern'" :memory: 'select * from sqlite_parameters'
.parameter CMD ...       Manage SQL parameter bindings
   clear                   Erase all bindings
   init                    Initialize the TEMP table that holds bindings
   list                    List the current parameter bindings
   set PARAMETER VALUE     Given SQL parameter PARAMETER a value of VALUE
                           PARAMETER should start with one of: $ : @ ?
   unset PARAMETER         Remove PARAMETER from the binding table
Error: no such table: sqlite_parameters

# How about the standard SQL practice of doubling internal quotes?
$ sqlite3 -cmd ".param set :name 'Bobby''s Tavern'" :memory: 'select * from sqlite_parameters'
.parameter CMD ...       Manage SQL parameter bindings
   clear                   Erase all bindings
   init                    Initialize the TEMP table that holds bindings
   list                    List the current parameter bindings
   set PARAMETER VALUE     Given SQL parameter PARAMETER a value of VALUE
                           PARAMETER should start with one of: $ : @ ?
   unset PARAMETER         Remove PARAMETER from the binding table
Error: no such table: sqlite_parameters

# OK, will double quotes work?
$ sqlite3 -cmd ".param set :name \"Bobby's Tavern\"" :memory: 'select * from sqlite_parameters'
:name|Bobby's Tavern

And if the string you want to substitute contains both single and double quotes? Have fun.

With my hypothetical -param, strings with pretty much any content can be passed safely to your SQLite queries:

sqlite3 -param :name "Bobby's Tavern" ...
sqlite3 -param '$quote' "\"I'm full,\" said Jane." ...

(17) By Keith Medcalf (kmedcalf) on 2021-06-17 03:25:48 in reply to 16 [link] [source]

There is nothing magical. The shell is processing the command (the entire command given to the shell) in accordance with the rules by which every other shell command is processed by that shell. The fact that the target executable word happens to be "sqlite3' is quite irrelevant to the process.

The quoting and escaping rules are exactly and precisely the same as if the target executable word was 'awk' or 'grep' or 'ls'.

(18.1) By Adrian Ho (lexfiend) on 2021-06-17 04:42:27 edited from 18.0 in reply to 17 [link] [source]

Yes, there's nothing magical about it...once you've figured out the quoting rules of both the shell and SQLite.

Take the last string I mentioned, for instance:

$ sqlite3
SQLite version 3.35.5 2021-04-19 18:32:05
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .mode column
sqlite> .param set :quote "\"I'm full\", said Bob."
sqlite> select * from sqlite_parameters;
key     value                
------  ---------------------
:quote  "I'm full", said Bob.

Right off the bat, it's an exception to the "single-quote for strings (double up single quotes internally), double-quote for identifiers" rule of thumb that's been SQL lore since forever.

Setting that aside, it looks like this on the command line:

$ sqlite3 -cmd ".param set :quote \"\\\"I'm full\\\", said Bob.\"" :memory 'select * from sqlite_parameters'

which isn't entirely unmanageable, but then hard-coded parameters aren't all that useful. It's much more likely that the value comes from user input, or from a file:

$ read quote
# user types: "I'm full", said Bob.

# This absolutely will not work, but only experienced shell folks will know why...and how to fix it.
$ sqlite3 -cmd ".param set :quote \"${quote}\"" :memory 'select * from sqlite_parameters'

sqlite3_bind(), and similar facilities in almost every SQL language binding, largely remove the need to deal with SQL quoting issues. The fact that sqlite3 has parameter substitution facilities, but forces both string substitution and SQL parsing to populate them, seems like an "impedance mismatch" to me:

$ read quote
# user types: "I'm full", said Bob.

$ sqlite3 -param :quote "$quote" :memory: 'select * from sqlite_parameters'
# No muss, no fuss, no errors.

Two questions:

  1. Am I missing something here?
  2. If not, any objections to a patch to enable the above facility?

(19) By anonymous on 2021-06-17 11:48:56 in reply to 18.1 [link] [source]

I do like the way jq does it with its --arg (and/or --argsjon) options:

$ read quote
# user types: "I'm full", said Bob.

# see what jq make from that --> properly quoted json
$ jq -n --arg q "${quote}" '$q'
"\"I'm full\", said Bob."

# better, $ARGS.named is a json object of all of them
$ jq -n --arg :quote "${quote}" --arg a1 77 --argjson a2 55 '$ARGS.named'
{
  ":quote": "\"I'm full\", said Bob.",
  "a1": "77",
  "a2": 55
}

So if you don't mind using another tool (jq), json_each + readfile from sqlite proper and a temporary file:

$ jq -n --arg :quote "${quote}" --arg a1 77 --argjson a2 55 '$ARGS.named' > args.json
$ sqlite3 -cmd '.para init' -cmd "insert into temp.sqlite_parameters(key,value) select key, value from json_each(readfile('args.json'))" :memory '.para list'
:quote '"I''m full", said Bob.'
a1     '77'
a2     55

(3) By anonymous on 2021-06-13 07:37:36 in reply to 1 [link] [source]

One way is:

mkdir mydir
cd mydir
sqlite3 ../example.sqlar -Ax

To check for file names starting with /, you could use a SQL query to see if there are any such files first before extracting anything, e.g. select count() from sqlar where name like '/%';

(7) By anonymous on 2021-06-13 18:49:09 in reply to 1 [link] [source]

I think that what might a useful option to add to the archive system would be the option to specify a different table name, if you do not want to use the default table name "sqlar" (e.g. you might have multiple tables, or views which reference it, possibly including triggers). (If you do not specify such an option, then sqlar is used.)