SQLite Forum

Unicode Input/output in sqlite3.exe on Windows 10
Login

Unicode Input/output in sqlite3.exe on Windows 10

(1) By anonymous on 2022-01-10 20:35:42 [link]

Hi all,

I cannot find a way to bring the shell (sqlite3.exe 3.37.1) to work with input/output of unicode text and look for assistance.

I'm on US Windows 10 64-bit (1903) with non-english locale.  I've faced some strange behavior of sqlite shell.

1. When I start sqlite3.exe in console and try to input unicode text the console hangs and I should terminate it (Ctrl + C).

2. When I try to open database from command-line `sqlite3 test.db` and then run query that expected to return unicode results I get garbage text instead.
For example I get:
```
sqlite> select * from unicode;
unicode is possible
Unicode ist moglich
Unicode ��������
sqlite>
```
instead of:
```
sqlite> select * from unicode;
unicode is possible
Unicode ist möglich
Unicode возможен
sqlite>
```

3. All GUI tools I've checked (DBeaver and SQlite Studio) work with the same database as expected, with correct input/output of unicode text. This also ensures that database itself is not corrupted. Python 3 and Golang also work correctly.

4. I've tried above with cmd.exe and with Windows Terminal. Both have `chcp 65001` by default. Oher popular console apps as `git` and `psql` work with unicode input/output without problems.

5. I've tried both precompiled sqlite3.exe binary from sql.org and also self-compiled with MinGW/gcc (option -municode). With the same results.

What did I do wrong? And what else should be done to make sqlite3.exe working with unicode? Thank you!

(2.1) By AlexJ (CompuRoot) on 2022-01-11 00:11:30 edited from 2.0 in reply to 1 [link]

Try to run **`cmd`** and issue command **`chcp 65001`**, then try again SQLite3

If it works, then you can either make a **cmd** script like this:

```
@echo off
CHCP 65001

:: And below going your logic with sqlite3

```

or switch windows's console permanently to UTF8:

run **`regedit`** and go to:

```
HKLM\Software\Microsoft\Command Processor\Autorun
```

and change the value of the key to **`@chcp 65001>nul`**

(3) By anonymous on 2022-01-11 01:52:00 in reply to 2.1 [link]

Please see #4 in my report. The default code page in all consoles (cmd.exe, PowerShell and Windows Terminal) set to UTF-8. And this also requirement to work correctly with psql (Postgres shell). Unfortunately this doesn't help with SQLite.

(4) By AlexJ (CompuRoot) on 2022-01-11 04:04:18 in reply to 3 [link]

> Please see #4 in my report.

Ohh, sorry, my bad, was so quick to assumption.

It could be also a font that won't support some characters. Try first to set "Consolas" font in terminal's properties or some other TTF fonts.

Also, instead of ancient **`cmd`**, try [consolez](https://github.com/cbucher/console). It is much more feature reach to compare to default windows's console.

Could you put here some original string that supposed to be go into database ?

(5) By Florian Balmer (florian.balmer) on 2022-01-11 07:00:23 in reply to 1 [link]

> And what else should be done to make sqlite3.exe working with unicode?

There was a [patch][0] to enable Unicode console input and output on Windows.

[0]: https://sqlite.org/forum/forumpost/5231a77f7c

I've tested it, and I think it's well done and works quite well, but the points
mentioned in the reply (by me, when I was too lazy to register for a forum
account, I'm sorry) should be taken into account.

Related (but off-topic):

Calling directly into `ReadConsoleW()` (instead of `fgets()` provided by the
CRT) has the advantage that it's possible to intercept Ctrl-C, i.e. the function
returns `TRUE` but sets `*lpNumberOfCharsRead` to `0`, and `GetLastError()`
returns `ERROR_OPERATION_ABORTED`. (In practice, because other error conditions
are unlikely, this is probably the same as `fgets()` returning an empty string,
but the latter does some special processing, for example for Ctrl-Z.)

So something based on the linked patch could even be used to disarm the strict
handling of Ctrl-C on Windows a bit, i.e. disable the global handler set by
`SetConsoleCtrlHandler()` when calling `ReadConsoleW()`. Because I commonly end
up terminating the shell when I meant to cancel input to the current line (or
some multi-line statement, when there's a typo in the no-longer-editable
previous line, and I want to start over instead of completing the statement), or
when I meant to cancel an SQL command being executed that completes moments
before I was able to reach Ctrl-C.

(Not sure if the sharp Ctrl-C handling is what people are used to on other
platforms, but it seems that at least using Ctrl-C to cancel a partially input
line without terminating the shell is not only a Windows thing.)

(6) By anonymous on 2022-01-11 19:23:56 in reply to 5 [link]

Thank you for sharing Florian.
It seems Unicode support in shell on Windows is broken. It's interesting if SQlite team is aware of this problem and there are plans to fix it.

(7) By anonymous on 2022-01-12 06:21:19 in reply to 1

Who knows how can I submit an issue to the official repo?

(8) By Stephan Beal (stephan) on 2022-01-12 06:52:03 in reply to 7 [link]

> Who knows how can I submit an issue to the official repo?

The preferred outlet for public problem reports is this forum. Arbitrary users are not permitted to open tickets in the main sqlite repo.

(9) By AlexJ (CompuRoot) on 2022-01-12 11:47:18 in reply to 7 [link]

I didn't read this time your messages by "diagonal" and investigated deeply this issue (everyone need to know a tool one using).

I tested unicode support on windows x86_64  7,8,10, Server 2016,2019 with official [sqlite3.exe](https://www.sqlite.org/2022/sqlite-tools-win32-x86-3370200.zip) in code page 65001 (aka UTF-8) and everywhere **sqlite3** abruptly quit to console (or freezed in [Windows terminal](https://github.com/Microsoft/Terminal#installing-and-running-windows-terminal)) on its own after issuing SQL statements with characters that aren't ASCII.

Test case (font used to make sure UTF-8 characters are visible: **DejaVu Sans Mono**): 

------

```
console> chcp 65001

sqlite> create temp table uni(txt);
sqlite> insert into uni (txt)values('test'),('Prüfung'),('испытание'),('اختبار'),('测试');
```

------

After last statement **sqlite3** exits on its own back to console (regardless if it is classic **`cmd`** or it is [**`ConEmu`**](https://conemu.github.io/) or [**`ConsoleZ`**](https://github.com/cbucher/console)). 

The only official solution if you on Windows 10 v1903 or later and you need well supported by MS UTF8, then you need to install [Windows terminal ("monster" = 40Mb)](https://github.com/Microsoft/Terminal#installing-and-running-windows-terminal) that supposed to fix UTF8 issue in console for all unicode aware programs, including WSL.

Regarding **`sqlite3`**, it somehow choke even in [Windows terminal](https://github.com/Microsoft/Terminal#installing-and-running-windows-terminal) too.

The only non-portable solution I found that works in **`sqlite3`**, is to use particular code page for particular language. I see you used in your test case cyrillic and german's characters, in such cases, using **`chcp 855`** or **`chcp 866`** would at least accept cyrillic and leave umlaut converted to ASCII. But this is obviously a bad workaround.

(10) By anonymous on 2022-01-12 14:06:44 in reply to 1 [link]

Just in case the font is part of the problem, here's the most complete Unicode terminal font I've found so far:

https://www.kreativekorp.com/charset/font/FairfaxHD/

There are nicer-looking fonts.  But sometimes you absolutely need to distinguish N from Ñ on sight, or accurately see lesser-used symbols.

(11) By AlexJ (CompuRoot) on 2022-01-12 16:44:50 in reply to 10 [link]

No, the issue isn't with fonts as I already wrote above.

<offtopic>

>  here's the most complete Unicode terminal font I've found so far:

I tried the font you suggested, but sorry, I dont like how this font places tilda "~" character, aligned it on the top instead of in a middle.
I believe programming font (well, for everyday use too) should have very distinct characteristics also between "looks alike" characters,
that's why I always comparing fonts against these
test cases below (placing looks alike in pairs, as well
make sure that font don't trying to be smart and
don't substitute arrows, dashes, underscores, series of dots
to html entities like characters):


<strong>

~~~
    O0 OQ0 !Il1| ij g9q CG6 8B3 5S FP KX G& C{ C(

    `~-+=>. &  -- __ ({[]}) " '

    <!-- --> <-- ->> <<- -> <- => <=> <==> ==> <== >>= =<< -- := =:= == !== != <= >=

    // /** /* */ && .& || !! :: >> << ¯\_(ツ)_/¯ __ ___ ... ...

~~~

</strong>


IMHO, bellow are most helful fonts for terminals and programmig:

    - Fantasque Sans Mono
    - Ubuntu Mono - Bront
    - JetBrains Mono
    - JuliaMono
    - Fira Code

</offtopic>