SQLite Forum

encoding on new DB-File
Login

encoding on new DB-File

(1) By anonymous on 2021-03-03 12:59:41 [link] [source]

Dear SQLite-Dev's,

I have to create a new databse in an automated manner (inside a build process) on a windows-machine. I currently do it like this via calling the sqlite-shell with an file-argument and a read-command to run a sql-skript:

sqlite3 DB_FILE ".read \"CREATION_SCRIPT.SQL\""

Inside CREATION_SCRIPT.SQL:
- I first set via "PRAGMA encoding = ..." the desired encoding for the DB
- actual tables and so on are created afterwards by this script

This will create the database within DB_FILE and the desired encoding just fine. It does so inspite of defining/creating the pure file on disk *even before* the 'PRAGMA encoding' is set. This "method" seems to be widely-used and is recommended in some forum-threads around the internet.

My actual question is:
Is this behavior of the sqlite-shell guaranteed to work also in future versions/builds of SQLite? Or may this be subject to be broken?

In the doc, I did not find an unambigous statement/specification on this (at least in my understanding of i.e. https://sqlite.org/pragma.html#pragma_encoding). But maybe i also miss the forest by the trees ...
I would appreciate some link to the doc, where this behaviour may be specified in more clarity or any kind of additional statement on this subject.

(I could possibly circumvent the subject by changing the way of db-creation, but would prefer not to be forced to do so)

Thanks a lot in advance,

- Jey

(2) By David Jones (vman59) on 2021-03-03 14:26:45 in reply to 1 [link] [source]

The same conflict exists at the API level. To execute a pragma statement you need a database connection, which you get by opening the database. Logically, the encoding should be flags on the open call.

(3.3) By Keith Medcalf (kmedcalf) on 2021-03-03 15:48:59 edited from 3.2 in reply to 1 [link] [source]

IMHO, the statement "It is not possible to change the text encoding of a database after it has been created and any attempt to do so will be silently ignored." is incorrect.

The statement should say "It is not possible to change the text encoding of a database after content or free pages exist in that database, and any attempt to do so will be silently ignored." "As long as the database is empty (select * from sqlite_master returns no rows) and there are no pages allocated to the database freelist, the encoding may be changed for the database."

That is, if you open any old database just a chock full of stuff, drop all the tables and indexes and whatnot so that select * from sqlite_master returns nothing, then perform a vacuum to get rid of all the pages save the file header and empty sqlite_master table, you can set the encoding as your little heart desires, and it will have effect.

Once you have created something in the database you can no longer change the encoding (until you empty it out again).

It is the fact of the existence of content in the database which determines the encoding. If there is no content, there is no encoding.

(4) By Larry Brasfield (larrybr) on 2021-03-03 15:45:36 in reply to 3.2 [link] [source]

Just curious about this: Why should the possible content of free pages contained in the freelist make any difference for an encoding change?

(5) By Keith Medcalf (kmedcalf) on 2021-03-03 15:52:10 in reply to 4 [link] [source]

I suspect that the database is not "empty" if it has pages allocated to it even though the sqlite_master is empty. This is perhaps an oversight, but perhaps there is a very good reason for it. I don't know, merely that you have to empty the database including the free pages before you can change the encoding.

(6) By David Raymond (dvdraymond) on 2021-03-03 15:58:07 in reply to 3.3 [link] [source]

That doesn't work. At least not on my machine. Even if you remove everything and vacuum it then you'll still always have the first page with the header and the empty sqlite_master on it.

Running "pragma encoding = <a different option>;" doesn't change it. If you just run a straight "pragma encoding;" afterwards then it will report the new value yeah, but if you close and re-open it, then it's still the original. Setting it and immediately running a vacuum won't change it either like you would do for a page size change.

So any time after that first page is written you're stuck with what it was originally.

(7.1) By Keith Medcalf (kmedcalf) on 2021-03-03 16:03:49 edited from 7.0 in reply to 6 [link] [source]

Running "pragma encoding = <a different option>;" doesn't change it. If you just run a straight "pragma encoding;" afterwards then it will report the new value yeah, but if you close and re-open it, then it's still the original. Setting it and immediately running a vacuum won't change it either like you would do for a page size change.

That is correct. You have to create something in the database in order for the encoding to persist. As long as the database is "empty" whenever you open it you will get the default encoding and can set it as you please for future content.

(8.1) By Larry Brasfield (larrybr) on 2021-03-03 16:13:43 edited from 8.0 in reply to 6 [link] [source]

So any time after that first page is written you're stuck with what it was originally.

v3.35.0 is documented just so.

Running "pragma encoding = <a different option>;" doesn't change it. If you just run a straight "pragma encoding;" afterwards then it will report the new value yeah, but if you close and re-open it, then it's still the original. Setting it and immediately running a vacuum won't change it either like you would do for a page size change.

v3.35.0 does not even report something different. (However, it does accept the encoding change attempt silently, which is also documented but may be considered suboptimal.)

What version acts as you say, reporting an ineffective encoding?

(9) By David Raymond (dvdraymond) on 2021-03-03 17:05:20 in reply to 8.1 [source]

I've using 3.34.0, and am using the CLI.

After way too long of experimenting I think I have it.

You have to vacuum it down to just the first page.

Then you have to close out of the CLI, and open the file again.

Then you have to do something that reads from the database before you set the new encoding.

Then you can create a new table and have it stick.

If you just use the CLI to open the 1 page databse, and immediately run "pragma encoding = 'UTF-16';" for example, then it won't change it. You have to do something like .tables first to get it to read the database, then set the encoding, then make a new table.

C:\Data\Temp\sqlite>sqlite3 DeleteMe.sqlite
SQLite version 3.34.0 2020-12-01 16:14:00
Enter ".help" for usage hints.

sqlite> drop table foo;

sqlite> vacuum;

sqlite> pragma encoding;
encoding
UTF-8

sqlite> .exit

C:\Data\Temp\sqlite>sqlite3 DeleteMe.sqlite
SQLite version 3.34.0 2020-12-01 16:14:00
Enter ".help" for usage hints.

sqlite> pragma encoding = 'UTF-16';

sqlite> create table foo (a);

sqlite> pragma encoding;
encoding
UTF-8

sqlite> drop table foo;

sqlite> vacuum;

sqlite> .exit

C:\Data\Temp\sqlite>sqlite3 DeleteMe.sqlite
SQLite version 3.34.0 2020-12-01 16:14:00
Enter ".help" for usage hints.

sqlite> .tables

sqlite> pragma encoding = 'UTF-16';

sqlite> create table foo (a);

sqlite> pragma encoding;
encoding
UTF-16le

sqlite>

(13) By anonymous on 2021-03-04 08:59:45 in reply to 3.3 [link] [source]

And if I am not mistaken you can't attach the OLD UTF-8 database to a NEW UTF-16 database. There is only one trick: Export the data to a CSV-format file and import the files into the new UTF-16 file.

And you need to compile your own SQLite UTF-16 version. Here is the script for Linux:
#! /bin/bash
<<COMMENT

        SQLit3 build schript
        
        first version     : June 2015 
        last modification : June 2020 R3.1 

        Modification R3.1:
        a) Added DEBUG options (still need some work to fix this)
        b) Improved the script

COMMENT

usage="Usage : '$0 ' [-ahbd]"

if [[ $1 =~ (-[abdh]) ]]; then
        echo "--------------"
else
        echo "Not a valid option is given!"
        echo $Usage
        exit 1
fi

while getopts "abdh" opt; do
  case ${opt} in
    h )
        echo $usage
        echo "Options:"
        echo "    -b            Compile without debug options"
        echo "    -d            Compile with debug options"
        echo "    -h            This menu for help"
        exit 0
        ;;
    a )
        echo "Building a debug version of SQLite3 with all options"
        ./configure --enable-debug --enable-all
        make
        exit 0
       ;;
    b )
        echo "Building SQLite3 without debug options"
        export CFLAGS="-fPIC \
                -DSQLITE_THREADSAFE=1 \
                -DSQLITE_ENABLE_FTS3  \
                -DSQLITE_ENABLE_FTS3_PARENTHESIS  \
                -DSQLITE_ENABLE_FTS4 \
                -DSQLITE_ENABLE_FTS5 \
                -DSQLITE_ENABLE_JSON1 \
                -DSQLITE_ENABLE_UTF16 \
                -DSQLITE_ENABLE_LOAD_EXTENSION \
                -DSQLITE_ENABLE_RTREE \
                -DSQLITE_ENABLE_STAT4 \
                -DSQLITE_ENABLE_UPDATE_DELETE_LIMIT \
                -DSQLITE_TEMP_STORE3 \
                -DSQLITE_USE_URI \
                -O2 "
        ;;
    d )
        echo "Bulding SQLite3 with debug options"
        export CFLAGS="-fPIC \
                -DSQLITE_THREADSAFE=1 \
                -DSQLITE_ENABLE_FTS3 \
                -DSQLITE_ENABLE_FTS3_PARENTHESIS  \
                -DSQLITE_ENABLE_FTS4 \
                -DSQLITE_ENABLE_FTS5 \
                -DSQLITE_ENABLE_JSON1 \
                -DSQLITE_ENABLE_UTF16 \
                -DSQLITE_ENABLE_LOAD_EXTENSION \
                -DSQLITE_ENABLE_RTREE \
                -DSQLITE_ENABLE_STAT4 \
                -DSQLITE_ENABLE_UPDATE_DELETE_LIMIT \
                -DSQLITE_TEMP_STORE3 \
                -DSQLITE_USE_URI \
                -DSQLITE_DEBUG \
                -DSQLITE_ENABLE_EXPLAIN_COMMENTS \
                -DSQLITE_ENABLE_SELECTTRACE \
                -DSQLITE_ENABLE_WHERETRACE \
                -O2"
        ;;
    * )
        echo "Invalid Option: -$OPTARG" 
        echo $usage >&2
        exit 1
      ;;
  esac
done

LIBS="-lm" ./configure --enable-shared --prefix="$PREFIX"

make

sudo make install

Please note that I still need to fix the debug version. Suggestions are very welcome.

P.S. The script is free for everyone to use.

(14) By Keith Medcalf (kmedcalf) on 2021-03-04 09:49:17 in reply to 13 [link] [source]

The define SQLITE_ENABLE_UTF16 does not do anything -- it is unused in the current source.

(16) By anonymous on 2021-03-04 10:11:41 in reply to 14 [link] [source]

Thank you for pointing that out. The flag lurking around the source code. Since I am an optimistic soul, so I set the flag not missing the moment when that day arrives.

(I hope that there will be a day in the foreseeable future where there is support for UTF-16)

(10) By Simon Slavin (slavin) on 2021-03-03 22:32:23 in reply to 1 [link] [source]

I am puzzled by the confusion in this thread.

  1. Create the new database.
  2. Use the PRAGMA to set your preferred encoding
  3. Actually put something into the database

If you skip step (3) then step (2) goes unused.

IMHO the documentation for PRAGMA encoding= should be changed to reflect this.

(11) By Larry Brasfield (larrybr) on 2021-03-03 23:14:22 in reply to 10 [link] [source]

If step 2 is "unused", then there is a default text encoding for the forms of sqlite3_open?(). This could be mentioned (and linked) in the "PRAGMA encoding" doc. It's an obvious question: What happens without the pragma?

(12.3) By Keith Medcalf (kmedcalf) on 2021-03-04 00:16:05 edited from 12.2 in reply to 11 [link] [source]

This information is contained within the documentation at the link you specified. (It is the second paragraph).

The default encoding for databases opened with the narrow character (UTF-8) variants of sqlite3_open* APIs is UTF-8.

The default encoding for databases opened with the wide character (UTF-16) variants of the sqlite3_open* APIs is UTF-16 in the endianness that is the default of the platform on which the database is so opened.

In either case, the default encoding can be set by the pragma encoding statement before content is stored in the database. Once content is stored in the database the encoding is fixed until the database is emptied of all content (and then re-opened) after which the default is determined by the sqlite3_open* API which was used to open the database, as described above.

(15) By anonymous on 2021-03-04 10:02:03 in reply to 12.3 [link] [source]

Thank you all very much for your comments and thoughts (and work) about the topic.

So, if i understand correctly, my initial actual question can be answerd as follows (see my thread-starting post):
"YES, this behaviour is desired and will purposely be kept in the future".

I would agree with Simon Slavin and Larry Brasfiel, that the doc shuold be more clear about it - possibly by adding some links to the pragma_encoding description and/or the sqlite-shell description.

I can now see (for myself), that my question arose partially due to the fact, that I am currently only engaged in the sqlite-shell and using it inside my build process. So i didn't dive too much in the C-API doc until now - which is going to change very soon ;-) . 

Maybe some future users will also go this route. So the extension of the doc for the shell and/or pragma encoding could be quite useful and provide more clarity to them, too.

- Jey (Thread-starter)