SQLite Forum

CLI on Windows: file name encoding problem
Login

CLI on Windows: file name encoding problem

(1) By Matthias Droste (mdrost) on 2020-11-02 16:57:18 [link] [source]

Hi,

I'm having problems with passing non-ASCII file name arguments to sqlite3.exe under Windows 10 1909: sqlite3.exe ä.sqlite opens or creates the file <invalid char>.sqlite.

This still happens when I change the codepage in cmd (with chcp) from the initial 850 (US) to 1252 (Western European) or 65001 (UTF-8). However, it doesn't happen if I activate the "UTF-8 for non-unicode programs" beta in settings.

It used to work some months ago in a MSBuild Task (which uses cmd) with SQLite 3.31.1. That might have been still under Windows 1903. Now this problem exists with both 3.31 and 3.33.0 and under cmd and Powershell.

Thanks in advance!

(2) By luuk on 2020-11-02 17:25:27 in reply to 1 [link] [source]

https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file

"Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255)..."

They are making things things hard for themselves at microsoft...

I do think this is more a Windows bug then a bug in SQLITE?

(3) By little-brother on 2020-11-02 23:35:50 in reply to 1 [link] [source]

Try to convert a database name from cp437 (used PowerShell by default) to utf8 and then pass the name to sqlite.

(4) By Matthias Droste (mdrost) on 2020-11-03 09:03:46 in reply to 3 [source]

Since our usecase is MSBuild tasks that generate data for unit tests from sql scripts (for source control friendlyness), converting the filenames isn't really practical. However, we do have other workarounds:

  • Manually renaming the problematic files is feasible.
  • We could quickly write our own utility program that applies a sql script to a db.

(5) By Matthias Droste (mdrost) on 2020-11-03 09:43:52 in reply to 2 [link] [source]

Looking at the source of the shell (shell.c.in), I think I see the problem:

  1. On Windows the entry point is a wmain() (line 10807), so arguments are passed as utf16 - the code page doesn't matter here.
  2. In L10877 ff. all arguments are converted to utf8. BTW: is overwriting the wchar*'s in argv with char* legal C?
  3. In L10925 the file path in utf8 is assigned to data.zDbFilename.
  4. At least in L11080 that path is passed to _access(), which would expect the path in the current code page. I assume the same pattern exists in the actual file creation call. That would explain why it works with the "utf8 codepage" system setting.