upper and lower have unicode character problem

(1) By suat suphi (suphisuphi) on 2022-09-19 12:20:02 [link] [source]

Sqlite have a big problem with upper and lower. It is not working for unicode character. you cant search anything in it if you dont use English. This database is just for English Language

(2) By nalgeon on 2022-09-19 12:28:26 in reply to 1 [source]

You can use the unicode extension.

(3) By suat suphi (suphisuphi) on 2022-09-26 18:37:47 in reply to 2 [link] [source]

how to use it ? can you explain it ? there is no example for asp.net core project

(4) By Rishikeshan (rishikeshan) on 2022-09-27 11:26:53 in reply to 3 [link] [source]

It seems like none of the popular versions (ref. https://learn.microsoft.com/en-us/dotnet/standard/data/sqlite/custom-versions?tabs=netcore-cli ) include any Unicode support. They are not mentioned there, at least. You may have to build with ICU support on your own, unless you want to cache each locale-normalized versions (which is also possible in a separate column and use _that_ instead over the real values).

The rationale for not including Unicode/ICU case support (by default) is given here:
https://www.sqlite.org/faq.html#q18
> ...The reason for this is that doing full Unicode case-insensitive comparisons and case conversions requires tables and logic that would nearly double the size of the SQLite library. The SQLite developers reason that any application that needs full Unicode case support probably already has the necessary tables and functions and so SQLite should not take up space to duplicate this ability....

https://www.sqlite.org/src/dir?ci=trunk&name=ext/icu seems to have the ICU extensions. You may load it as an extension, it seems, like mentioned here (from DotNet): https://learn.microsoft.com/en-us/dotnet/standard/data/sqlite/extensions . It may seem intimidating first, but it's supposed to be simple (I have done it in Rust, for which I had no real experience before). It's mostly a matter of having the proper system libraries installed then it goes on with something similar to how you build a shared library (.so, .dll or .dylib). It may be platform-dependent, though, when you ship your binaries.

Regards,
Rishikeshan.
oss <at> rishikeshan <dot> com.

(5) By suat suphi (suphisuphi) on 2022-09-27 12:09:39 in reply to 4 [link] [source]

you know better. I do not agree with you. You say do not use upper lower because the database size will grow. or you say change the characters (like locale-normalized) that won't this enlarge?

In my opinion there is a commercial problem here. Fears are that a complete database will take over the market. If you dont belive me, you can release a new version (which doesn't care about growing the database) and see the results. Which one is used more?

(6.1) By KIT.james (kjames3411) on 2022-09-27 15:05:00 edited from 6.0 in reply to 5 [link] [source]

Fears are that a complete database will take over the market.

You might be more more right than many people here would like to think :)

But even if you are right, "SQLite taking over the market" would mean "death of SQLite" and so.. no one would accept that.

Do ICU yourself.

PS: I am not an SQLite developer.

(7) By Stephan Beal (stephan) on 2022-09-27 15:09:29 in reply to 5 [link] [source]

If you dont belive me, you can release a new version (which doesn't care about growing the database) and see the results. Which one is used more?

It's not about the database size, it's about the sizes of source code, compiled object code, and test code.

Which one is used more?

As there are countless billions of sqlite databases installed without ICU, it's clear that most uses do not need full Unicode support built into the library.

(8) By suat suphi (suphisuphi) on 2022-09-27 19:59:40 in reply to 7 [link] [source]

I said that you know better. I guess some database sellers is very happy with this situation :)

(9) By Warren Young (wyoung) on 2022-09-27 20:49:43 in reply to 8 [link] [source]

I said that you know better.

You're not making friends by offering snark to one of the most prolific helpers on this list.

However, I will concede that there's something to your request.

For SQLite proper, the language is C, which predates Unicode by roughly two decades, with the result that you need to go out of your way to do Unicode right. This is why the ICU extension isn't included in all builds of SQLite.

The thing is, though, this thread is about .NET Core, which has always been Unicode-native, going clear back to its pre-Core roots. Is there a good reason why the .NET builds of SQLite don't include ICU? All string I/O is going to be taking .NET UTF-16 String objects as input, and all SQLite result data is going to be translated from its on-disk form to UTF-16. Given that, what possible justification is there for smashing Unicode to whatever 8-bit encoding it presumably uses in between?

(10.3) By suat suphi (suphisuphi) on 2022-09-28 08:16:20 edited from 10.2 in reply to 9 [link] [source]

How is it working on other database. why upper and lower working on other database. why upper and lower dont work on sqlite.

https://github.com/suatsuphi/ConsoleSQLite

I tried something in this repository. There is a function on System.Data.SQLite.Core thas's name is SQLiteFunction. this is not working on Microsoft.Data.Sqlite.Core or Microsoft.EntityFrameworkCore.Sqlite.Core. Removed SQLiteFunction and name changed SQLiteConnection > SqliteConnection.

I wanna use it on codefirst is there anyway? Why is it so hard to use? I understand why there is no an example. this issue is very commercial. I said that you know better. I guess some database sellers is very happy with this situation :)

(11) By Richard Hipp (drh) on 2022-09-28 11:41:43 in reply to 10.3 [link] [source]

why upper and lower dont work on sqlite.

The definitions of "upper" and "lower" case are dynamic. They change from one release of Unicode to the next. (I suppose the definition is more stable these days than in the past, but SQLite has lived through multiple historical changes.)

If an SQLite database uses an index on upper() or lower() or has a CHECK constraint that involves upper() and lower() and if the definition of upper()/lower() change, that could corrupt the database. Hence, everybody would have to rebuild all of their SQLite databases whenever Unicode made subtle changes to the meaning of upper and lower.
I believe that the tables necessary to do proper case folding in unicode are larger than SQLite itself. Maybe this has changed in the years since I last checked, but regardless, the tables are huge.

Including massive tables to support case folding is not a problem if you are doing a multi-gigabyte install of a database server like PostgreSQL or MySQL or SQL Server. But it is a problem for a small library like SQLite.
The ICU extension is available for people who want full Unicode case-folding support.
Some operators (ex: the LIKE operator) are substantially slower when they are required to do full unicode case folding.
For better or for worse, the original language spec for SQLite says that the upper() and lower() functions only work for ASCII. Changing that now would generate an incompatibility, which might cause some of the millions of legacy applications that use SQLite to malfunction.

(12) By Rishikeshan (rishikeshan) on 2022-09-28 15:00:20 in reply to 5 [link] [source]

It's still available as an option (you have to compile yourself); it's just not included by default. Case conversions in non-English Latin scripts or others with case have been a (1) moving target and (2) somewhat 'hard' and technically complicated problems (you have to find what the _user_ wants first). There might be multiple locales installed on the system, with yet another different locale for _your_ application. Case conversions are script-dependent. You have to know what you are getting into, to even use them properly as it comes with multiple footguns.

I believe that it is beyond the scope of this project to find the correct locale as it's system dependent. This is the first step in case conversions. 

Ref: https://unicode-org.github.io/icu/userguide/transforms/casemappings.html
States: ... There are different case mappings for different locales. For instance, unlike English, the character Latin small letter ‘i’ in Turkish has an equivalent Latin capital letter ‘I’ with dot above ( \u0130 ‘İ’). ...

I do not recommend using even case folding without knowing the current preferred locale with ICU as it's locale sensitive.

See also: Unicode NFC/NFD forms.

Regards,
Rishikeshan.
oss <at> rishikeshan <dot> com.

(13) By suat suphi (suphisuphi) on 2022-09-29 15:54:51 in reply to 12 [link] [source]

I asked something

https://learn.microsoft.com/en-us/answers/questions/1028876/sqlitefunction-not-working-so-how-to-override-func.html

SQLiteFunction not working so how to override function for collate

[SQLiteFunction(FuncType = FunctionType.Collation, Name = "UTF8CI")] public class SQLiteCaseInsensitiveCollation : SQLiteFunction { public override int Compare(string x, string y) { return string.Compare(x, y, true, new CultureInfo("tr-TR")); } }

(14) By Tim Streater (Clothears) on 2022-09-29 17:15:04 in reply to 13 [link] [source]

I imagine, but don't actually know, that you have to write your function in C, not in whatever that is. BICBW.

(15) By suat suphi (suphisuphi) on 2022-10-04 14:06:14 in reply to 1 [link] [source]

I solved this issue for asp.net core. Brice Lambson said that Functions (like the ones for string.Contains and ToUpper) don't honor the collation on SQLite--they're only used for comparisons. You'd have to provide your own overrides:

SqliteConnection _connection = new SqliteConnection(Configuration.GetConnectionString("DefaultConnection"));

_connection.CreateCollation( "NOCASE", (x, y) => string.Compare(x, y, true, new CultureInfo("tr-TR")));

_connection.CreateFunction( "upper", (string value) => value.ToUpper(new CultureInfo("tr-TR")));

_connection.CreateFunction( "lower", (string value) => value.ToLower(new CultureInfo("tr-TR")));

services.AddDbContext<ApplicationDbContext>(options => options.UseSqlite(_connection));