/ File History
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

History of test/fts4unicode.test

2017-05-30
18:14
Omit a test of codepoint 0x202F (non-break narrow space) from the fts3 ICU tests. Different versions of ICU apparently handle this obscure codepoint slightly differently. file: [ceca7642] check-in: [69ae6889] user: dan branch: trunk, size: 18318
2015-06-15
16:40
Adjust ICU tests to account for recent changes in the official Unicode definition of whitespace. file: [27378af7] check-in: [08165253] user: drh branch: trunk, size: 18293
2013-12-19
16:26
Make sure errors encountered while initializing extensions such as FTS4 get reported out from sqlite3_open(). This fixes a bug introduced by check-in [9d347f547e7ba9]. Also remove lots of forgotten "breakpoint" commands left in test scripts over the years. file: [01ec3fe2] check-in: [ca3fdfd4] user: drh branch: trunk, size: 18034
2013-10-12
00:56
Fix Unicode character encoding issues on Windows in the fts4unicode test file. file: [e28ba1a1] check-in: [c9310c9a] user: mistachkin branch: trunk, size: 18049
2013-10-11
22:17
Fix test numbering. file: [20195bca] check-in: [cef39f69] user: mistachkin branch: trunk, size: 18017
2013-09-18
11:16
Test that the unicode61 tokenchars= and separators= options work with the fts3tokenize virtual table. file: [ebd93706] check-in: [ed240514] user: dan branch: trunk, size: 18017
2013-09-13
12:10
Add tests for the fts4 unicode61 tokenchars and separators options. file: [26a0bd30] check-in: [9ce6f40d] user: dan branch: trunk, size: 17692
2013-08-30
13:29
Add a test for fts4 unicode61 option remove_diacritics=0. file: [5fa8e0a7] check-in: [6bf7ae6f] user: dan branch: trunk, size: 14332
2013-06-05
16:17
Up until now the fts4 "unicode61" tokenizer has treated all private use codepoints except the first and last of each of the three ranges as alphanumeric (eligible to be part of tokens). This commit fixes this so that all private use codepoints are considered alphanumeric. In other words, it fixes the handling of codepoints 0xE000, 0xF8FF, 0xF0000, 0xFFFFD, 0x100000 and 0x10FFFD. file: [c8ac4421] check-in: [6cfd9af5] user: dan branch: trunk, size: 13231
2013-01-26
19:26
Add a single test case to fts4unicode.test to verify that title-case maps to lower case. file: [25ccad45] check-in: [46f7c930] user: drh branch: branch-3.7.15, size: 12656
2012-06-19
06:35
Add tests to check that the "unicode61" and "icu" tokenizers both identify white-space codepoints outside the ASCII range. file: [aad033ab] check-in: [bfb2d473] user: dan branch: trunk, size: 12545
2012-06-07
15:53
Add the "tokenchars=" and "separators=" options, for customizing the set of characters considered to be token separators, to the unicode61 tokenizer. file: [247e6c64] check-in: [e56fb462] user: dan branch: trunk, size: 10555
2012-06-06
19:30
Have the FTS unicode61 strip out diacritics when tokenizing text. This can be disabled by specifying the tokenizer option "remove_diacritics=0". file: [f3945851] check-in: [790f76a5] user: dan branch: trunk, size: 7861
2012-05-26
18:28
If SQLITE_DISABLE_FTS3_UNICODE is defined, do not build the "unicode61" tokenizer. file: [c812e9cf] check-in: [e71495a8] user: dan branch: fts4-unicode, size: 7265
16:22
Add coverage tests for fts3_unicode.c. file: [dd0b67a2] check-in: [07d3ea8a] user: dan branch: fts4-unicode, size: 7257
14:54
Change the name of the "unicode" tokenizer to "unicode61" to emphasize that the case folding and separator-character identification routines are based on unicode version 6.1. file: [073546a1] check-in: [8f3e60aa] user: dan branch: fts4-unicode, size: 4793
2012-05-25
17:50
Add an experimental tokenizer to fts4 - "unicode". This tokenizer works in the same way except that it understands unicode "simple case folding" and recognizes all characters not classified as "Letters" or "Numbers" by unicode as token separators. file: [0627683f] check-in: [0c13570e] user: dan branch: fts4-unicode, size: 1399 Added