SQLite Forum

Searching in text columns with Unicode folding
Login
I'm [still](https://sqlite.org/forum/info/16b3e8bbe5f27f0f) struggling to achieve Unicode aware, case insensitive LIKE searching using an index. This isn't exotic and should be easy! Think about names containing letters with diaeresis in languages like Spanish, German, French, Turkish etc. The [FAQ entry](https://sqlite.org/faq.html#q18) isn't quite satisfying.

A **COLLATE NOCASE** index will only handle ASCII. What I really would like to see is a COLLATE UNICODE (or something similar) that will simply do the right thing. Is there any chance of this happening (in the foreseeable future)? The necessary machinery seems to be already available in the FTS5 extension in the unicode61 tokenizer, see below.

The **icu extension** doesn't really help, because it replaces the SQL like() function which automatically [disables](https://www.sqlite.org/optoverview.html#the_like_optimization) the use of any index for the LIKE operator, regardless of the collation, so this will slow down with large data sets. As it affects every LIKE operator, not just specific ones, it might have unwanted side effects.

The [**FTS5 extension**](https://sqlite.org/fts5.html) does Unicode folding. So I can create an FTS5 virtual table and use that for searching.  I'm not sure about the overhead (less with the options detail=column and content/content_rowid), but at least it is usable right now. It makes the queries a little more complicated, though, as both the original table and the virtual table are involved.

Another option I see is to create **"shadow" columns** for the text fields in the same table and automatically fill them by generating and indexing it like this:

```
CREATE TABLE t1 (
  name TEXT NOT NULL,
  name2 TEXT NOT NULL AS (fold_unicode(name)) STORED
);
CREATE INDEX t1_name2 ON t1(name2 COLLATE NOCASE);
SELECT rowid, name FROM t1 WHERE name2 LIKE ?;
```

The missing piece is the folding function, which exists buried somewhere in the FTS5 extension. Unfortunately, it is not exposed as an SQL function.  How about that?  I could implement it myself, but it won't be as complete and the database won't be usable without loading the custom extension.

Are there any (simpler) other options I have overlooked?