/ File History
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

History of test/fts4unicode.test

2017-05-30
18:14
[ceca7642] part of check-in [69ae6889] Omit a test of codepoint 0x202F (non-break narrow space) from the fts3 ICU tests. Different versions of ICU apparently handle this obscure codepoint slightly differently. (check-in: [69ae6889] user: dan branch: trunk, size: 18318)
2015-06-15
16:40
[27378af7] part of check-in [08165253] Adjust ICU tests to account for recent changes in the official Unicode definition of whitespace. (check-in: [08165253] user: drh branch: trunk, size: 18293)
2013-12-19
16:26
[01ec3fe2] part of check-in [ca3fdfd4] Make sure errors encountered while initializing extensions such as FTS4 get reported out from sqlite3_open(). This fixes a bug introduced by check-in [9d347f547e7ba9]. Also remove lots of forgotten "breakpoint" commands left in test scripts over the years. (check-in: [ca3fdfd4] user: drh branch: trunk, size: 18034)
2013-10-12
00:56
[e28ba1a1] part of check-in [c9310c9a] Fix Unicode character encoding issues on Windows in the fts4unicode test file. (check-in: [c9310c9a] user: mistachkin branch: trunk, size: 18049)
2013-10-11
22:17
[20195bca] part of check-in [cef39f69] Fix test numbering. (check-in: [cef39f69] user: mistachkin branch: trunk, size: 18017)
2013-09-18
11:16
[ebd93706] part of check-in [ed240514] Test that the unicode61 tokenchars= and separators= options work with the fts3tokenize virtual table. (check-in: [ed240514] user: dan branch: trunk, size: 18017)
2013-09-13
12:10
[26a0bd30] part of check-in [9ce6f40d] Add tests for the fts4 unicode61 tokenchars and separators options. (check-in: [9ce6f40d] user: dan branch: trunk, size: 17692)
2013-08-30
13:29
[5fa8e0a7] part of check-in [6bf7ae6f] Add a test for fts4 unicode61 option remove_diacritics=0. (check-in: [6bf7ae6f] user: dan branch: trunk, size: 14332)
2013-06-05
16:17
[c8ac4421] part of check-in [6cfd9af5] Up until now the fts4 "unicode61" tokenizer has treated all private use codepoints except the first and last of each of the three ranges as alphanumeric (eligible to be part of tokens). This commit fixes this so that all private use codepoints are considered alphanumeric. In other words, it fixes the handling of codepoints 0xE000, 0xF8FF, 0xF0000, 0xFFFFD, 0x100000 and 0x10FFFD. (check-in: [6cfd9af5] user: dan branch: trunk, size: 13231)
2013-01-26
19:26
[25ccad45] part of check-in [46f7c930] Add a single test case to fts4unicode.test to verify that title-case maps to lower case. (check-in: [46f7c930] user: drh branch: branch-3.7.15, size: 12656)
2012-06-19
06:35
[aad033ab] part of check-in [bfb2d473] Add tests to check that the "unicode61" and "icu" tokenizers both identify white-space codepoints outside the ASCII range. (check-in: [bfb2d473] user: dan branch: trunk, size: 12545)
2012-06-07
15:53
[247e6c64] part of check-in [e56fb462] Add the "tokenchars=" and "separators=" options, for customizing the set of characters considered to be token separators, to the unicode61 tokenizer. (check-in: [e56fb462] user: dan branch: trunk, size: 10555)
2012-06-06
19:30
[f3945851] part of check-in [790f76a5] Have the FTS unicode61 strip out diacritics when tokenizing text. This can be disabled by specifying the tokenizer option "remove_diacritics=0". (check-in: [790f76a5] user: dan branch: trunk, size: 7861)
2012-05-26
18:28
[c812e9cf] part of check-in [e71495a8] If SQLITE_DISABLE_FTS3_UNICODE is defined, do not build the "unicode61" tokenizer. (check-in: [e71495a8] user: dan branch: fts4-unicode, size: 7265)
16:22
[dd0b67a2] part of check-in [07d3ea8a] Add coverage tests for fts3_unicode.c. (check-in: [07d3ea8a] user: dan branch: fts4-unicode, size: 7257)
14:54
[073546a1] part of check-in [8f3e60aa] Change the name of the "unicode" tokenizer to "unicode61" to emphasize that the case folding and separator-character identification routines are based on unicode version 6.1. (check-in: [8f3e60aa] user: dan branch: fts4-unicode, size: 4793)
2012-05-25
17:50
[0627683f] part of check-in [0c13570e] Add an experimental tokenizer to fts4 - "unicode". This tokenizer works in the same way except that it understands unicode "simple case folding" and recognizes all characters not classified as "Letters" or "Numbers" by unicode as token separators. (check-in: [0c13570e] user: dan branch: fts4-unicode, size: 1399) Added