SQLite Forum

Unicode Input/output in sqlite3.exe on Windows 10
Login

Unicode Input/output in sqlite3.exe on Windows 10

(1) By anonymous on 2022-01-10 20:35:42 [link] [source]

Hi all,

I cannot find a way to bring the shell (sqlite3.exe 3.37.1) to work with input/output of unicode text and look for assistance.

I'm on US Windows 10 64-bit (1903) with non-english locale. I've faced some strange behavior of sqlite shell.

  1. When I start sqlite3.exe in console and try to input unicode text the console hangs and I should terminate it (Ctrl + C).

  2. When I try to open database from command-line sqlite3 test.db and then run query that expected to return unicode results I get garbage text instead. For example I get:

    sqlite> select * from unicode;
    unicode is possible
    Unicode ist moglich
    Unicode ��������
    sqlite>
    
    instead of:
    sqlite> select * from unicode;
    unicode is possible
    Unicode ist möglich
    Unicode возможен
    sqlite>
    

  3. All GUI tools I've checked (DBeaver and SQlite Studio) work with the same database as expected, with correct input/output of unicode text. This also ensures that database itself is not corrupted. Python 3 and Golang also work correctly.

  4. I've tried above with cmd.exe and with Windows Terminal. Both have chcp 65001 by default. Oher popular console apps as git and psql work with unicode input/output without problems.

  5. I've tried both precompiled sqlite3.exe binary from sql.org and also self-compiled with MinGW/gcc (option -municode). With the same results.

What did I do wrong? And what else should be done to make sqlite3.exe working with unicode? Thank you!

(2.1) By AlexJ (CompuRoot) on 2022-01-11 00:11:30 edited from 2.0 in reply to 1 [link] [source]

Try to run cmd and issue command chcp 65001, then try again SQLite3

If it works, then you can either make a cmd script like this:

@echo off
CHCP 65001

:: And below going your logic with sqlite3

or switch windows's console permanently to UTF8:

run regedit and go to:

HKLM\Software\Microsoft\Command Processor\Autorun

and change the value of the key to @chcp 65001>nul

(3) By anonymous on 2022-01-11 01:52:00 in reply to 2.1 [link] [source]

Please see #4 in my report. The default code page in all consoles (cmd.exe, PowerShell and Windows Terminal) set to UTF-8. And this also requirement to work correctly with psql (Postgres shell). Unfortunately this doesn't help with SQLite.

(4) By AlexJ (CompuRoot) on 2022-01-11 04:04:18 in reply to 3 [link] [source]

Please see #4 in my report.

Ohh, sorry, my bad, was so quick to assumption.

It could be also a font that won't support some characters. Try first to set "Consolas" font in terminal's properties or some other TTF fonts.

Also, instead of ancient cmd, try consolez. It is much more feature reach to compare to default windows's console.

Could you put here some original string that supposed to be go into database ?

(5) By Florian Balmer (florian.balmer) on 2022-01-11 07:00:23 in reply to 1 [link] [source]

And what else should be done to make sqlite3.exe working with unicode?

There was a patch to enable Unicode console input and output on Windows.

I've tested it, and I think it's well done and works quite well, but the points mentioned in the reply (by me, when I was too lazy to register for a forum account, I'm sorry) should be taken into account.

Related (but off-topic):

Calling directly into ReadConsoleW() (instead of fgets() provided by the CRT) has the advantage that it's possible to intercept Ctrl-C, i.e. the function returns TRUE but sets *lpNumberOfCharsRead to 0, and GetLastError() returns ERROR_OPERATION_ABORTED. (In practice, because other error conditions are unlikely, this is probably the same as fgets() returning an empty string, but the latter does some special processing, for example for Ctrl-Z.)

So something based on the linked patch could even be used to disarm the strict handling of Ctrl-C on Windows a bit, i.e. disable the global handler set by SetConsoleCtrlHandler() when calling ReadConsoleW(). Because I commonly end up terminating the shell when I meant to cancel input to the current line (or some multi-line statement, when there's a typo in the no-longer-editable previous line, and I want to start over instead of completing the statement), or when I meant to cancel an SQL command being executed that completes moments before I was able to reach Ctrl-C.

(Not sure if the sharp Ctrl-C handling is what people are used to on other platforms, but it seems that at least using Ctrl-C to cancel a partially input line without terminating the shell is not only a Windows thing.)

(6) By anonymous on 2022-01-11 19:23:56 in reply to 5 [link] [source]

Thank you for sharing Florian. It seems Unicode support in shell on Windows is broken. It's interesting if SQlite team is aware of this problem and there are plans to fix it.

(7) By anonymous on 2022-01-12 06:21:19 in reply to 1 [link] [source]

Who knows how can I submit an issue to the official repo?

(8) By Stephan Beal (stephan) on 2022-01-12 06:52:03 in reply to 7 [link] [source]

Who knows how can I submit an issue to the official repo?

The preferred outlet for public problem reports is this forum. Arbitrary users are not permitted to open tickets in the main sqlite repo.

(9) By AlexJ (CompuRoot) on 2022-01-12 11:47:18 in reply to 7 [link] [source]

I didn't read this time your messages by "diagonal" and investigated deeply this issue (everyone need to know a tool one using).

I tested unicode support on windows x86_64 7,8,10, Server 2016,2019 with official sqlite3.exe in code page 65001 (aka UTF-8) and everywhere sqlite3 abruptly quit to console (or freezed in Windows terminal) on its own after issuing SQL statements with characters that aren't ASCII.

Test case (font used to make sure UTF-8 characters are visible: DejaVu Sans Mono):


console> chcp 65001

sqlite> create temp table uni(txt);
sqlite> insert into uni (txt)values('test'),('Prüfung'),('испытание'),('اختبار'),('测试');

After last statement sqlite3 exits on its own back to console (regardless if it is classic cmd or it is ConEmu or ConsoleZ).

The only official solution if you on Windows 10 v1903 or later and you need well supported by MS UTF8, then you need to install Windows terminal ("monster" = 40Mb) that supposed to fix UTF8 issue in console for all unicode aware programs, including WSL.

Regarding sqlite3, it somehow choke even in Windows terminal too.

The only non-portable solution I found that works in sqlite3, is to use particular code page for particular language. I see you used in your test case cyrillic and german's characters, in such cases, using chcp 855 or chcp 866 would at least accept cyrillic and leave umlaut converted to ASCII. But this is obviously a bad workaround.

(10) By anonymous on 2022-01-12 14:06:44 in reply to 1 [source]

Just in case the font is part of the problem, here's the most complete Unicode terminal font I've found so far:

https://www.kreativekorp.com/charset/font/FairfaxHD/

There are nicer-looking fonts. But sometimes you absolutely need to distinguish N from Ñ on sight, or accurately see lesser-used symbols.

(11) By AlexJ (CompuRoot) on 2022-01-12 16:44:50 in reply to 10 [link] [source]

No, the issue isn't with fonts as I already wrote above.

<offtopic>

here's the most complete Unicode terminal font I've found so far:

I tried the font you suggested, but sorry, I dont like how this font places tilda "~" character, aligned it on the top instead of in a middle. I believe programming font (well, for everyday use too) should have very distinct characteristics also between "looks alike" characters, that's why I always comparing fonts against these test cases below (placing looks alike in pairs, as well make sure that font don't trying to be smart and don't substitute arrows, dashes, underscores, series of dots to html entities like characters):

    O0 OQ0 !Il1| ij g9q CG6 8B3 5S FP KX G& C{ C(

    `~-+=>. &  -- __ ({[]}) " '

    <!-- --> <-- ->> <<- -> <- => <=> <==> ==> <== >>= =<< -- := =:= == !== != <= >=

    // /** /* */ && .& || !! :: >> << ¯\_(ツ)_/¯ __ ___ ... ...

IMHO, bellow are most helful fonts for terminals and programmig:

- Fantasque Sans Mono
- Ubuntu Mono - Bront
- JetBrains Mono
- JuliaMono
- Fira Code

</offtopic>