SQLite Forum

Timeline
Login

3 forum posts by user DaveBlake

2020-05-06
21:47 Reply: Natural sort order (artifact: e4dc6f3331 user: DaveBlake)
Thanks TripeHound, I am aware that "accent folding" is not the same as full ICU locale specific collation. However the need is to reproduce the same collation using SQLlite as utf8_general_ci in MySQL (now called utf8mb3_general_ci in MariaDB). That is just simple accent folding.
17:33 Reply: Natural sort order (artifact: f1ae53ffcb user: DaveBlake)

Richard having natural sorting of this kind available as a standard collation in SQLite would be useful.

Actually I am working on a bespoke collation that handles number naturally like this, is case insensitive and does some simple accent folding so 'a', 'á' and 'À' are with 'A'. However the result is slower than I would like, and I think that is partly because of the way I am converting from the const void data of the collation callback function into wstring before processing compare even though most of the data is simple ascii.Need to give that more thought.

2020-03-27
09:24 Post: Custom Collation of a calculated field (artifact: df1aafaa43 user: DaveBlake)

What are the advantages and disadvantages of applying custom collation as part of the table definition rather than applying it to the field as part of the query syntax?

Would default column collation pass to any calculated fields based on that column? I suspect not.

A simple example may clarify. Say I have a created collating function

sqlite3_create_collation(conn, "my_collation", SQLITE_UTF8, 0, myCollatingFunc)
and have table
CREATE myTable(id INTEGER PRIMARY KEY, name1 TEXT, name2 TEXT);
When fetching data I want to sort by a calculated field using the custom collation. I have the following query working

SELECT id, name1, name2, 
  CASE 
     WHEN name2 is not null THEN name2
     WHEN name1 LIKE 'the %' THEN  SUBSTR(name1, 5)    
     WHEN name1 LIKE 'a %' THEN  SUBSTR(name1, 3) 
     WHEN name1 LIKE 'an %' THEN  SUBSTR(name1, 4) 
     ELSE name1
  END AS sortname
ORDER BY sortname COLLATE my_collation;

So far so good.

The main role of the collating function is to give natural number sorting for digits anywhere in the text e.g. "ab3x" before "ab100", it makes no difference to comparison in where clauses e.g "name1 = xxxx". Does defining default column collation have anything to offer this situation?

For the future.... The collating funtion also handles case sensitivity for unicode chars, something that could be nice to have applied in where clauses for LIKE comparisons, create a like function too? Then again using the ICU extension is a future possibility as well, but would I still be able to do natural number sorting? I would need to look closely at the performance impact of doing any of that. It may be that simply storing a derived ascii sortable value (that has the artcles "the", "an" etc. stripped too) is the best way to go.

Would like to know the pros and cons and any experiences of doing similar.