SQLite

All files named ”ext/fts3/fts3_unicode2.c”
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

History for ext/fts3/fts3_unicode2.c

2020-07-29
16:18
[eb122763e3] part of check-in [a80ae2c98b] Dozens and dozens of typo fixes in comments. This change adds no value to the end product and is disruptive, so it is questionable whether or not it will ever land on trunk. (check-in: [a80ae2c98b] user: drh branch: typos, size: 17908)
2019-01-02
23:49
[416eb7e1e8] part of check-in [703029ac6d] Fix harmless compiler warnings in the unicode2 logic of FTS3 and FTS5. (check-in: [703029ac6d] user: drh branch: trunk, size: 17907)
2018-12-28
07:37
[faf0c750a1] part of check-in [c564bf8701] Fix problems in fts5 found by ASAN. (check-in: [c564bf8701] user: dan branch: trunk, size: 17890)
2018-12-14
18:11
[e49f9e015f] part of check-in [27221c6990] Fix harmless compiler warnings. (check-in: [27221c6990] user: drh branch: trunk, size: 17857)
2018-12-03
16:14
[90e65f4291] part of check-in [06177f3f11] Add the "remove_diacritics=2" option to the unicode61 tokenizer in both FTS5 and FTS3/4. (check-in: [06177f3f11] user: dan branch: trunk, size: 17835)
2017-03-20
18:53
[cc04fc672b] part of check-in [16a8e84fa7] Fix some problems in fts3 found by address-sanitizer. (check-in: [16a8e84fa7] user: dan branch: trunk, size: 16696)
2014-08-06
18:50
[c3d01968d4] part of check-in [bcf6d775f9] A couple more harmless compiler warnings eliminated. (check-in: [bcf6d775f9] user: drh branch: trunk, size: 16666)
17:49
[b81e5f0f79] part of check-in [a2a60307ea] Fix two more harmless compiler warnings. Make sure the fts3_unicode2.c file is in sync with mkunicode.tcl. (check-in: [a2a60307ea] user: drh branch: trunk, size: 16663)
2014-07-03
12:18
[c8adda75aa] part of check-in [0cc0230ae9] Change fts3/4 so that the "unicode61" is included in builds by default. It may now be excluded by defining SQLITE_DISABLE_FTS3_UNICODE. (check-in: [0cc0230ae9] user: dan branch: trunk, size: 16663)
2013-06-05
16:17
[0113d3acf1] part of check-in [6cfd9af525] Up until now the fts4 "unicode61" tokenizer has treated all private use codepoints except the first and last of each of the three ranges as alphanumeric (eligible to be part of tokens). This commit fixes this so that all private use codepoints are considered alphanumeric. In other words, it fixes the handling of codepoints 0xE000, 0xF8FF, 0xF0000, 0xFFFFD, 0x100000 and 0x10FFFD. (check-in: [6cfd9af525] user: dan branch: trunk, size: 16670)
2012-06-18
08:00
[a863f05f75] part of check-in [f970a3de61] Fix a few compilation issues that can occur with certain compilers (e.g. GCC 2.95.3, MSVC). (check-in: [f970a3de61] user: mistachkin branch: compiler-compat, size: 16734)
2012-06-06
19:51
[2965d217c3] part of check-in [eccd6b6580] Disable FTS unicode61 by default. It is enabled by specifying compile time option SQLITE_ENABLE_FTS4_UNICODE61. (check-in: [eccd6b6580] user: dan branch: trunk, size: 16718)
19:30
[6381bcfd62] part of check-in [790f76a589] Have the FTS unicode61 strip out diacritics when tokenizing text. This can be disabled by specifying the tokenizer option "remove_diacritics=0". (check-in: [790f76a589] user: dan branch: trunk, size: 16717)
2012-05-28
12:22
[3ddf1728a3] part of check-in [c00bb5d460] Omit the fts3 unicode character class routines from the build if fts3/4 is disabled. (check-in: [c00bb5d460] user: drh branch: trunk, size: 13984)
2012-05-26
18:28
[46ff2289f5] part of check-in [e71495a817] If SQLITE_DISABLE_FTS3_UNICODE is defined, do not build the "unicode61" tokenizer. (check-in: [e71495a817] user: dan branch: fts4-unicode, size: 13840)
17:57
[e43024fe05] part of check-in [b89d3834f6] Change the format of the tables used by sqlite3FtsUnicodeTolower() to make them a little smaller. (check-in: [b89d3834f6] user: dan branch: fts4-unicode, size: 13755)
2012-05-25
19:50
[75fa8f249a] part of check-in [cf7b25d476] Add special fast paths to sqlite3FtsUnicodeTolower() and Isalnum() for codepoints in the ASCII range. (check-in: [cf7b25d476] user: dan branch: fts4-unicode, size: 13179)
18:48
[6989db92af] part of check-in [3dc567ef47] Fix comments in generated file fts3_unicode2.c. (check-in: [3dc567ef47] user: dan branch: fts4-unicode, size: 12930)
17:50
Added: [83ad4e6a2e] part of check-in [0c13570ec7] Add an experimental tokenizer to fts4 - "unicode". This tokenizer works in the same way except that it understands unicode "simple case folding" and recognizes all characters not classified as "Letters" or "Numbers" by unicode as token separators. (check-in: [0c13570ec7] user: dan branch: fts4-unicode, size: 11358)