Unicode Character in TEXT column
(1) By anonymous on 2022-03-30 12:56:57 [link] [source]
I am using the CLI; this query is either not inserting the second row of the first column or prompting me for the end of statement - typing ; does not resolve it.
DROP TABLE IF EXISTS tblCountry;
CREATE TABLE tblCountry (
Country TEXT,
ISO2 TEXT,
ISO3 TEXT,
ISONum INT,
PRIMARY KEY (
Country,
ISO2,
ISO3,
ISONum
)
);
INSERT INTO tblCountry (
Country,
ISO2,
ISO3,
ISONum
)
VALUES ('Afghanistan','AF','AFG',4),('Åland Islands','AX','ALA',248);
The problem appears to be he Unicode character Å. The SQL works when I replace Å by A.
How do I insert unicode characters in a TEXT column (I'm using V3.38.2)?
(2) By Gunter Hick (gunter_hick) on 2022-03-30 13:21:14 in reply to 1 [link] [source]
Are you sure you are running on a console with UTF-8 character set? Maybe you are running under windows in ISO-8859-1? What does hex('Åland Islands') return? SQLite version 3.24.0 2018-06-04 19:24:41 asql> create temp table country(name text,iso2 text, iso3 text, isoNum int); asql> insert into country values ('Afghanistan','AF','AFG',4),('Åland Islands','AX','ALA',248); rows inserted ------------- 2 asql> select * from country; name iso2 iso3 isoNum ------------------- ---- ---- ------ Afghanistan AF AFG 4 Åland Islands AX ALA 248
(4) By anonymous on 2022-03-30 13:37:48 in reply to 2 [link] [source]
I am using Windows & simply starting SQLite3.EXE without any start-up parameters.
SELECT hex('Åland Islands');
prompts me for the end of statement character.
In 22. Command-line Options I do not see an option for starting SQLite3.EXE with UTF-8 character set: presumably, there is a way. What is it?
(9) By Larry Brasfield (larrybr) on 2022-03-30 14:19:14 in reply to 4 [link] [source]
The CLI has no such option. It reads stdin to get interactive input, and writes either UTF-8 (or, on Windows, UTF-16 sometimesa) to stdout. It deals with UTF-8 internally, without exception. (The DB it wraps is another matter.)
Your problem, and the challenge to getting past it, is how to type or paste UTF-8 as input, apparently. The CLI does nothing for input except read it in and treat it as UTF-8 text. This is done exactly the same way for stdin as it is done when .read FILE is used. And, as we have seen, that works without character set issues.
[a. On Windows, when stdout is seen to be the console, the UTF-8 to be output is translated to what the Windows Console API demands. This is not germane to your problem, which is input going awry rather than output. ]
(10) By Larry Brasfield (larrybr) on 2022-03-30 14:28:40 in reply to 4 [link] [source]
A work-around for your strange platform is:
Run the SQLite CLI shell as an inferior shell process in Emacs running in GUI mode. You will want to pass the -interactive flag to sqlite3.exe if you like to see prompts in interactive use.
(3) By Larry Brasfield (larrybr) on 2022-03-30 13:35:09 in reply to 1 [source]
Gunter has his finger on the issue.
If I create a proper, UTF-8 file with the content you posted, in one editor I use (mg, a micro-Emacs descendent), the last line renders as:
VALUES ('Afghanistan','AF','AFG',4),('\303\202land Islands','AX','ALA',248);
.
Then, after I .read it into the shell and query, I get:
select * from tblCountry;
Afghanistan|AF|AFG|4
Âland Islands|AX|ALA|248
.
There are highly platform/tool dependent reasons that your cap A with circumflex never made it to the either the SQLite CLI or the library it exposes. I think that to diagnose such issues, we would need to know: Your platform; what terminal (or console, etc.) you are using the CLI with; how you entered that text that is not reaching the CLI intact.
(5) By anonymous on 2022-03-30 13:49:47 in reply to 3 [link] [source]
When I save my script from NOTEPAD using Encoding UTF-8, the script works.
sqlite> .read e:/temp/sql.txt
Country ISO2 ISO3 ISONum
------------- ---- ---- ------
Afghanistan AF AFG 4
Åland Islands AX ALA 248
Is there a way to start SQLite3.EXE with UTF-8 encoding on Windows?
(6) By Simon Slavin (slavin) on 2022-03-30 14:01:54 in reply to 5 [link] [source]
One way to do it is to use the 'chcp' command:
https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/chcp
This may or may not help with your problem. First I would issue that command without arguments to find out what the current setting is.
(7) By anonymous on 2022-03-30 14:10:01 in reply to 6 [link] [source]
I get the following:
sqlite> .shell chcp
Active code page: 65001
chcp expect an integer:
sqlite> .shell chcp /?
Displays or sets the active code page number.
CHCP [nnn]
nnn Specifies a code page number.
Type CHCP without a parameter to display the active code page number.
What is nnn for UTF-8?
(8) By anonymous on 2022-03-30 14:13:00 in reply to 6 [link] [source]
A quick Google search, code page 65001 is UTF-8!
(I 'm back to not understanding my original query fails.)