SQLite User Forum

Windows 10 RS3 and later builtin ICU
Login

Windows 10 RS3 and later builtin ICU

(1.1) By Keith Medcalf (kmedcalf) on 2022-09-15 17:58:31 edited from 1.0 [source]

Windows 10 RS3 (and later) contains the ICU library built-in. This library can be used by the SQLite3 ICU extension(s).

You must have Windows 10 SDK 19042 or later installed.

The header is called #include <icu.h> and the library is icu.lib (icu.dll). The include and lib directories are already set if you are using MSVC. For MinGW 10.2 there is the additional requirement that the following are defined before <windows.h> or <winnt.h> are included:

#define _WIN32_WINNT 0x0A00
#define WINVER _WIN32_WINNT
#define NTDDI_VERSION 0x0A000004

The following code should replace the loading of the ICU headers in ext/icu/icu.c, ext/fts2/fts2_icu.c, etc/fts3/fts3_icu.c

/* Include ICU headers */
#ifdef _WIN32
#if (NTDDI_VERSION >= NTDDI_WIN10_RS3)
#include <icu.h>
#endif
#else
#include <unicode/utypes.h>
#include <unicode/uregex.h>
#include <unicode/ustring.h>
#include <unicode/ucol.h>
#endif

so that if the target is Windows and the NTDDI_VERSIONS are correct, use the builtin icu.h header rather than the unicode/ ones.

FOR MINGW GCC 10.2 (AND PERHAPS OTHER SIMILAR COMPILERS)

The path to the location of the SDK must be set using -idirafter rather than -I in order to not screwup the system search path (aka -idirafter "C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\\um" for example).

(2) By Keith Medcalf (kmedcalf) on 2022-09-15 19:30:55 in reply to 1.1 [link] [source]

Note the schema for loading the correct headers works, mostly, but is not correct. It needs to work as follows (and note that none of the symbols might be defined):

if the target is Windows (aka _WIN32) and (NTDDI_VERSION >= NTDDI_WIN10_RS3): use <icu.h> else use<unicode/...> endif

(3) By jose isaias cabrera (jicman) on 2022-09-16 12:39:58 in reply to 1.1 [link] [source]

Hi Keith. Is there a link with instructions to show how to build it in Windows? I use cygwin to build SQLite, so I would like to try and see if I can also build it with that option. Thanks.

(4.1) By Keith Medcalf (kmedcalf) on 2022-09-16 14:27:42 edited from 4.0 in reply to 3 [link] [source]

Firstly, you need to make sure that you have the Windows 10 SDK installed (at least version 10.0.19041.0). It would (unless you change the location) install at C:\Program Files (x86)\Windows Kits\10\ and that the following files exist:

C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\icu.h
and at least one of the following to match your target:
C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\um\arm\icu.lib
C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\um\arm64\icu.lib
C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\um\x64\icu.lib
C:\Program Files (x86)\Windows Kits\10\Lib\10.0.19041.0\um\x86\icu.lib

The actual dll is located in the Windows Toilet:

C:\Windows\System32\icu.dll
C:\Windows\SysWOW64\icu.dll

Now you need to modify the source code (what are you compiling) and how you compile.

As an aside, the following replacement for the three places where the icu headers are loaded (icu.c fts2_icu.c fts3_icu.c, or where those files are inserted into the amalgamation) appear to work:

/* Include ICU headers */
#if defined(_WIN32) && defined(NTDDI_VERSION) && defined(NTDDI_WIN10_RS3) && (NTDDI_VERSION >= NTDDI_WIN10_RS3)
#include <icu.h>
#else
#include <unicode/utypes.h>
#include <unicode/uregex.h>
#include <unicode/ustring.h>
#include <unicode/ucol.h>
#endif

As a starting point I would suggest just copying and modifying the single-file ext/icu/icu.c, which you can build as a loadable extension, to make sure it works.

So, get the icu.c and fix the icu.h import as above.
Rename the file to icuwin.c and fix the init name at the end (sqlite3_icu_init -> sqlite3_icuwin_init).

You cannot have a loadable extension called icu.dll that loads a different icu.dll without specialized coding. Changing the name is easy (and only required for a loadable extension build in this case).

You may need to add the following to the compiler invokation line:

-D_WIN32_WINNT=0xA000 -DWINVER=0xA00 -DNTDDI_VERSION=0xA000004 -DNTDDI_WIN10_RS3==0xA000004

(the icu.c does not #include <windows.h> which is why the Windows symbols are not defined by default)

You need to somehow append C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\ to the include path your compiler is using (append, not pre-pend). This is compiler specific. Standard GCC uses -idirafter. Or you can cheat by copying the icu.h file to somewhere that is in the include search path already.

Then you need to link against the appropriate .lib/.dll file. How you do this is also compiler dependant. GCC uses -L<path> to append a library search path, and can link against .lib files directly aka -licu

You should then be able to fiddle compliling icuwin.c to generate a loadable extension, and then load it.

.load icuwin
select icu_load_extension('und-u-ks-level1-kc-false', 'Folded');
select char(0xE9)=='E', char(0xE9)=='E' collate Folded;

Which should result in:

sqlite> select char(0xE9)=='E', char(0xE9)=='E' collate Folded;
┌─────────────────┬────────────────────────────────┐
│ char(0xE9)=='E' │ char(0xE9)=='E' collate Folded │
├─────────────────┼────────────────────────────────┤
│ 0               │ 1                              │
└─────────────────┴────────────────────────────────┘

Once that is working, simply reflect the same procedure in however you are building SQLite3.

This is the best I have at the moment. If you can fiddle and get it working then we should be able to arrive at a "properly working" patch for the source tree so that the amalgamation will be generated with the appropriate detections built-in.

(5) By jose isaias cabrera (jicman) on 2022-09-16 13:38:43 in reply to 4.0 [link] [source]

Thanks for this. I will post problems, if any, here. I appreciate your support efforts.

(7) By Keith Medcalf (kmedcalf) on 2022-09-18 20:03:07 in reply to 4.1 [link] [source]

This works better to detect whether of not the Windows builtin ICU can be used, and should be used in all the three places where icu is loaded.

If SQLITE_OS_WIN is undefined (meaning this is not compiled as part of the amalgamation) then define SQLITE_OS_WIN as 1 or 0 using the same detection method as used by the core.

If SQLITE_OS_WIN is set and NTDDI_WIN10_RS3 is not set, then the <sdkddkver.h> header is processed (this is the windows H that defines the various SDK/DDK versions).

If NTDDI_WIN10_RS3 is not set, make it 0xA000004.
If NTDDI_VERSION is not set, it is set to 0x0000000.

Then, if SQLITE_OS_WIN is set and the target NTDDI_VERSION is sufficient, then the <icu.h> header is loaded, otherwise the standalone icu <unicode/...> headers are used.

/* Include ICU headers */
#if !defined(SQLITE_OS_WIN)
# if defined(_WIN32) || defined(WIN32) || defined(__CYGWIN__) || defined(__MINGW32__) || defined(__BORLANDC__)
#  define SQLITE_OS_WIN 1
# else
#  define SQLITE_OS_WIN 0
# endif
#endif
#if SQLITE_OS_WIN && (!defined(NTDDI_WIN10_RS3) || !defined(NTDDI_VERSION))
# include <sdkddkver.h>
#endif
#ifndef NTDDI_WIN10_RS3
# define NTDDI_WIN10_RS3 0x0A000004
#endif
#ifndef NTDDI_VERSION
# define NTDDI_VERSION 0x0000000
#endif
#if SQLITE_OS_WIN && (NTDDI_VERSION >= NTDDI_WIN10_RS3)
# include <icu.h>
#else
#include <unicode/utypes.h>
#include <unicode/uregex.h>
#include <unicode/ustring.h>
#include <unicode/ucol.h>
#endif

You can find a fixed copy of icu.c at http://www.dessus.com/files/icu.c

Note that I have added some functions and made some additional changes including duplicating the sqlite3_icu_init as sqlite3_icuwin_init so that you can merely compile the code and rename the output loadable extension to icuwin.dll

(6) By Keith Medcalf (kmedcalf) on 2022-09-16 13:46:45 in reply to 3 [link] [source]

As an aside, if you are not familliar with ICU (I am not) then you can specify a whole crapload of languages and flags when creating a collation.

The first part of the specification is the language locale: en, jp, da, de, etc. und is for the undefined (generic) language.

Then can add a bunch of "option overrides", the most useful ones are:
-u-ks-level1-kc-false case and accent folded
-u-ks-level1-kc-true accent folded
-u-ks-level2-kc-false case folded

'und-u-ks-level1-kc-false' is the generic language case and accent folding collation.
'en-u-ks-level1-kc-false' is the english language case and accent folding collation.
'de-u-ks-level1-kc-false' is the german language case and accent folding collation.
'de' is the german language full collation.