SQLite User Forum

Special language signs in output
Login

Foreign language chars in clipboard output (stdout and file output work fine)

(1.2) By 102022 on 2022-06-21 13:20:51 edited from 1.1 [source]

EDITED: file output problem resolved, see https://sqlite.org/forum/forumpost/f491259b0a and below - sorry for the confusion!

Hello,

I've begun to use sqlite3.exe, last precompiled version for Windows (I use Windows 10 English).

I run it on a database which is probably UTF-8 (I am not sure, don't know how to verify the language table or such), and in which, for example in the records title fields, there are special language signs like öäüÖÄÜéàè and so on, and which are correctly displayed within the database.

When I run sqlite3.exe on these tables, the standardoutput (.output stdout), in the command window, preserves those äöü, etc.

Ditto for output into a file (.output filename and .once filename), i.e. äöü, etc. are preserved. (My alleged "problem" was caused by the fact that my file manager, FreeCommander XE, didn't / doesn't update minutes later or even not at all, the "0" content counter for these files (where sqlite3.exe writes into an already existing (i.e. previously created by sqlite3.exe) when the content had already been written to the file.)

The best output is to clipboard, for my means, i.e. by output |clip, but here, with clipboard output, the äöü, etc are NOT preserved, and I get weird ASCII signs instead of them.

How would I set sqlite3.exe clipboard output to another language code, or my clipboard to such other language code, just for sqlite3.exe use then, since any other clipboard operations always are correct, preserving the äöü, etc., just the sqlite3.exe output does not do it.

It seems that getting correct clipboard output will be quite complicated, so I'm also considering to create a virtual disc in my work memory (1 GB or even just 500 MB), to which I then write the file.

I want to process the output further, with AutoHotkey, etc., so writing to a hdd file, then reading again from that file would not be the best solution, but writing to, and reading from, a virtual disc file would be more or less "ok" if I can't get the clipboard output correct, too.

(2) By Donal Fellows (dkfellows) on 2022-06-21 08:49:25 in reply to 1.0 [link] [source]

with clipboard output, the äöü, etc are NOT preservec, and I get weird ASCII signs instead of them

Sounds like an encoding mix-up. (The term is "mojibake".) SQLite is internally UTF-8 (unless you take special steps otherwise, which you probably haven't done) so the problem exists downstream of the database itself. I don't know the clip command you're using well, but it apparently expects its input to be encoded in the system codepage, which is almost certainly not UTF-8.

From reading around a bit, I think the easiest fix for you is to stop using clip and instead use a small Powershell script which sets up the relevant encoding(s) and uses Set-Clipboard. Or your batch script can do chcp 65001 (before calling clip) which might help, but be careful with that as I don't know if that's a local-inheritable or global setting; in the latter case, you can get into horrible trouble. (What's actually missing is a command line option to clip to tell it what the encoding of its input really is, but we don't live in that universe.)

Changing the output encoding of the sqlite3.exe program is much harder.

tl;dr: Find a different way to reach your eventual goal. Moving your data via files is far more likely to work, especially if you have a lot of output.

(3) By Gunter Hick (gunter_hick) on 2022-06-21 08:57:20 in reply to 1.0 [link] [source]

SQLite uses UTF; which flavor (default UTF-8) is stored in the database file is determined when the file is created and is stored in the file header. All output of SQLite, independent of the target, will therefore be UTF.

Windows and ist utilities use ISO encoding by default. You have to make sure that whatever you are using to display the output is set to the same encoding as SQLite. Otherwise, UTF codepoints for characters with diacritical marks tend to "look wierd" when displayed as ASCII or ISO.

(4) By 102022 on 2022-06-21 10:41:36 in reply to 2 [link] [source]

Hello Donal Fellows,
Thank you very much for your tips.

I looked up "windows code page" and got to this helpful page and table: https://ss64.com/nt/chcp.html which corresponds to what you hinted at.

So I did, in a command window,
chcp > gave 437 = U.S., so I did,
chcp 850 (default for Europe, being in Europe), then
chcp > gave 850 (as expected, but clipboard problem persisted), but some minutes later:
chcp > gave 437 again (obviously reset to system default)

So I tried
chcp 65001 (UTF-8, as you said and confirmed by that page), then
chcp > gave 65001

I tried sqlite again, but the problem persisted, so:
chcp > gave 65001, so should be "good", but isnt:

Thus, my db is UTF-8 (I checked that), but with (obviously persistent, at least for the time being) code page 65001, clipboard output scrambles the äöü etc., example between numerous ones: Cholest├⌐rol : attention, il nΓÇÖy a pas que le ┬½ bon ┬╗ et le ┬½ mauvais

Thus, changing the code page works, but has no effect here.

(5) By Florian Balmer (florian.balmer) on 2022-06-21 11:24:59 in reply to 1.0 [link] [source]

Yori is a (free and open-source) Windows command-line shell and collection of (almost 100) standalone utilities to fully operate with UTF-8.

Example shell session (CMD.EXE) with the default Windows clip.exe program vs. the Yori yclip.exe program to get UTF-8 text to/from the clipboard:

C:\...> echo .q | sqlite3.exe -cmd "SELECT 'äöüÄÖÜ';"
äöüÄÖÜ

C:\...> echo .q | sqlite3.exe -cmd "SELECT 'äöüÄÖÜ';" | clip

C:\...> yclip.exe -p
äöüÄÖÜ

C:\...> echo .q | sqlite3.exe -cmd "SELECT 'äöüÄÖÜ';" | yclip.exe

C:\...> yclip.exe -p
äöüÄÖÜ

Example shell session (CMD.EXE) with the Windows default CMD.EXE type command vs. the Yori ytype.exe program get UTF-8 text to/from a temporary file:

C:\...> echo .q | sqlite3.exe -cmd "SELECT 'äöüÄÖÜ';"
äöüÄÖÜ

C:\...> echo .q | sqlite3.exe -cmd "SELECT 'äöüÄÖÜ';" > temp.txt

C:\...> type temp.txt
äöüÄÖÜ

C:\...> ytype.exe temp.txt
äöüÄÖÜ

(Or just open temp.txt with a text editor.)

Note there's also a patch to enable Unicode input and output for the console on Windows.

(6) By Chris Locke (chrisjlocke1) on 2022-06-21 11:30:10 in reply to 1.0 [link] [source]

How would I

What is the goal? To export certain database rows to files?
Using the clipboard is a bit of a fudge ...

I struggled to follow the steps you've tried - 'file output is very erratic' - it usually works or it doesn't. Do you mean output is as expected but doesn't occur, or it does occur but output isn't as expected. If the output isn't as expected, is that throughout the whole file or just certain records?
It seems you're looking for a solution to problem B when there might be a better solution to problem A.

(7) By Florian Balmer (florian.balmer) on 2022-06-21 11:39:09 in reply to 5 [link] [source]

BTW, Yori's yclip.exe also works from within the SQLite shell:

C:\...> sqlite3.exe
SQLite version 3.38.5 2022-05-06 15:25:27
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .once | yclip.exe
sqlite> SELECT 'äöüÄÖÜ';
sqlite> .q

C:\...> yclip.exe -p
äöüÄÖÜ

(8) By KIT.james (kjames3411) on 2022-06-21 13:00:31 in reply to 2 [link] [source]

Do people in english speaking countries really say mojibake??

(9.1) By Donal Fellows (dkfellows) on 2022-06-21 13:21:58 edited from 9.0 in reply to 8 [link] [source]

Say it? No. Write it? Yes. Refer to Wikipedia for it? Definitely!

(10) By RandomCoder on 2022-06-21 13:33:59 in reply to 4 [link] [source]

I'm not sure where the issue lies, but the SQLite REPL does seem to not properly handle UTF-8 on Windows.

## Linux
$ sqlite3
SQLite version 3.37.2 2022-01-06 13:25:41
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select 'Une chaîne aléatoire de choses aléatoires.';
Une chaîne aléatoire de choses aléatoires.

## Windows 11
C:\> chcp
Active code page: 65001

C:\> python3
Python 3.10.5 (tags/v3.10.5:f377153, Jun  6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('Une chaîne aléatoire de choses aléatoires.')
Une chaîne aléatoire de choses aléatoires.
>>> exit()

C:\> sqlite3.exe
sqlite-tools-win32-x86-3380500\sqlite3.exe
SQLite version 3.38.5 2022-05-06 15:25:27
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> select 'Une chaîne aléatoire de choses aléatoires.';
';
Une cha
sqlite>

Note that I had to close a mysterious quote on a second line to even get some of the output, it appears the î in chaîne is somehow being decomposed to something that SQLite thinks is a quote. No idea what layer is failing, but Python is doing the right thing in the same environment.

(11) By Florian Balmer (florian.balmer) on 2022-06-21 13:45:12 in reply to 10 [link] [source]

You can use the patch linked in post 5 to fix this.

(12.4) By 102022 on 2022-06-22 08:41:21 edited from 12.3 in reply to 1.2 [link] [source]

The db developer says, it "is a Unicode application, and Windows uses UTF-16 encoding for its APIs." and "Text is converted to UTF-8 encoding when stored in the file."

Thus, when I retrieve, by sqlite3.exe, the content of the itemtitle and other plaintext columns of the db file, it should be in the right format, which is proven by stdout and file export, which renders the äöü, etc. correctly, just the clipboard output does not.

Setting the codepage to 65001 did not change this problem, trying out some other codepages did neither.

The above suggestions my be helpful for a developer, but I can't apply them, since in the sqlite3.exe command window, I can NOT enter any other commands than sqlite3.exe commands, for any other command I have to open ANOTHER command window. This implies that it should not be easily possible to use another clipboard tool, than the regular Windows clipboard.

I must say that the above-described problem is the very first time I've got such, obviously unresolvable, clipboard codepage problems; since after processing the data, I will have to write ("update table") the processed (i.e. changed) data back into the db, I will have to avoid the clipboard.

For the "update table" commands though, I obviously have to write them into a file then, then have sqlite3.exe read and process that file, so that probably WILL work as expected. ;-)

Btw, from my research, it seems there is "UTF-8 with BOM" and without, and mixing up the two should create such special-char problems. So perhaps the .output |clip leaves out the BOM where it's needed, or the other way round?