[BUG] Precompiled sqlite shell for windows can't open filenames with unicode chars
Thank you for developing SQLite! However, I found the precompiled sqlite shell for windows which is downloaded from offical site (sqlite-tools-win32-x86-3340100.zip), can't open db files with unicode characters in its file name.
Steps to reproduce:
- Create a test database 'test.db'
- Rename 'test.db' to '你好世界.db' (or try some strange names such as emojis)
- Try to open '你好世界.db' with sqlite shell and do some query.
- If the unicode char in filename can be represented in current code page:
The shell will silently create a new db file named '�������.db' and operate on that new file.
- If the unicode char in filename can't be represented in current code page:
The shell will try to create a new db file named '????.db' and say filename is illegal.
It's seems like unicode file name support in sqlite shell is broken. Although I found shell.c uses wmain() and is unicode aware, it seems the precompiled binary continue to use the ANSI versions of API. May be unicode support isn't compiled in?
Try chcp 65001
SQLite shell uses utf-8 internally, but converts between it and active code page of the console when using console functions.
chcp 65001 sets active code page of the current console window to utf-8.
c:\TEMP>chcp 65001 Active code page: 65001 c:\TEMP>sqlite3.exe 你好世界.db SQLite version 3.35.0 2021-02-26 15:20:17 Enter ".help" for usage hints. sqlite> select * from sqlite_master; table|xx|xx|2|CREATE TABLE xx (i integer primary key, x text) sqlite>
Thank you very much!
But I'm using a old version of windows, which doesn't have well support of 'chcp 65001', and I'd like drag database to the icon of sqlite.exe. It would be appreciated if precompiled sqlite shell have native unicode support.
This advice is a distraction. It changes the code page of the current Windows console, which means it can't affect the default, as you'd need for your particular use case, where the console isn't even up yet when the problem occurs: during program launch.
There are ways to change the Windows console's default code page, but it'll probably break all your other apps that assume UTF-16, so it's bad advice across the board.
It's a solution to a different problem anyway. It would allow output in the
sqlite3 shell to show up in the console properly, and it would let you type input that gets sent to the shell as UTF-8. It isn't going to change how
sqlite3 interprets file names in its
Now, if you wanted to get involved and start working on fixing such problems yourself, that'd be different, though in this particular case, I think you'd end up with a fork of SQLite rather than getting your fixes for this sort of problem into SQLite proper.
I called this
chcp stuff a distraction, because the core problems go much deeper.
On Windows 10 with all the toys, it still doesn't work as you want, and it's not entirely SQLite's fault.
I copied your Chinese database name from your forum post to the clipboard and induced SQLite to create a database of that name with a basic schema inside so I could tell when I'd opened it successfully. I ran into a whole series of failures along the way:
- Drag file into stock
cmd.exewindow: garbage text that doesn't work even when attempted.
- Drag file into
chcp 65001. Ditto.
- Drag file into same window already running
- Give up and try all of the above with PowerShell:
DIRshows mangled file name. Attempting to use it anyway either crashes
sqlite3or creates a new DB with an incorrect file name.
- Install Windows Terminal. Now the name shows up properly in
DIRunder PowerShell, but the native
sqlite3still can't open it due to the non-UTF-8
argvproblem called out above. At least now we've stripped away all of the non-SQLite parts of the problem, so success?
- Try Ubuntu-on-WSL in Windows Terminal. Now we're getting somewhere.
lsshows the file name properly and copy-pasting it into SQLite commands works. I can even copy the first character and use Tab-completion to fill in the rest of the name, which means if I could write Chinese, I could type that first character instead and use Tab completion. Not that this helps you, since WSL and Windows Terminal both require Windows 10, but it shows success is in principle possible under Windows.
- Or not. Drag file name to Windows Terminal; it refuses. Known bug, scheduled for a fix in 2.0, which is currently "24% complete" according to GitHub, scheduled for release by May 31, 2021. We'll see.
EDIT: Per knu's deleted reply, I then tried this on a Windows XP test VM I have, and it still doesn't work, even with
chcp 65001. The feature's there, but it doesn't help, essentially for the reasons given above, though there's additional brokenness atop it that's since been fixed in later versions of Windows.
Always remember people, Windows is the easy operating system, the one that gives people the least problems. 🙄
tl;dr: Install Linux on the system and be happy. :)
Thank you for your reply. After some digging, I found the unicode argv support is just not compiled into offical binary. I downloaded the sqlite source repo and compiled sqlite shell myself, everything worked. (I'm using Visual Studio 2019 x64 native tools + self-compiled tclsh)
In order to get MinGW/GCC on Windows to produce a a working unicode (wmain) executable the
-municode directive was required on the gcc command line.
-DSQLITE_SHELL_IS_UTF8=(0) because the current shell.c only sets this for MSVC compilers on Windows, not GCC.
The MSVC compiler seemed to drag in the correct unicode (wmain) correctly without making any changes.
Changing this block in shell.c.in
#ifndef SQLITE_SHELL_IS_UTF8 # if (defined(_WIN32) || defined(WIN32)) && defined(_MSC_VER) # define SQLITE_SHELL_IS_UTF8 (0) # else # define SQLITE_SHELL_IS_UTF8 (1) # endif #endif
#ifndef SQLITE_SHELL_IS_UTF8 # if (defined(_WIN32) || defined(WIN32)) && (defined(_MSC_VER) || (defined(UNICODE) && defined(__GNUC__))) # define SQLITE_SHELL_IS_UTF8 (0) # else # define SQLITE_SHELL_IS_UTF8 (1) # endif #endif
fixes this so that only the -municode compiler option is necessary to make this work with a GCC compiler targetting Windows, and without the option a narrow character version will be produced.