SQLite Forum

Custom Collation of a calculated field
Login

Custom Collation of a calculated field

(1) By Dave B (DaveBlake) on 2020-03-27 09:24:14 [source]

What are the advantages and disadvantages of applying custom collation as part of the table definition rather than applying it to the field as part of the query syntax?

Would default column collation pass to any calculated fields based on that column? I suspect not.

A simple example may clarify. Say I have a created collating function

sqlite3_create_collation(conn, "my_collation", SQLITE_UTF8, 0, myCollatingFunc)
and have table
CREATE myTable(id INTEGER PRIMARY KEY, name1 TEXT, name2 TEXT);
When fetching data I want to sort by a calculated field using the custom collation. I have the following query working

SELECT id, name1, name2, 
  CASE 
     WHEN name2 is not null THEN name2
     WHEN name1 LIKE 'the %' THEN  SUBSTR(name1, 5)    
     WHEN name1 LIKE 'a %' THEN  SUBSTR(name1, 3) 
     WHEN name1 LIKE 'an %' THEN  SUBSTR(name1, 4) 
     ELSE name1
  END AS sortname
ORDER BY sortname COLLATE my_collation;

So far so good.

The main role of the collating function is to give natural number sorting for digits anywhere in the text e.g. "ab3x" before "ab100", it makes no difference to comparison in where clauses e.g "name1 = xxxx". Does defining default column collation have anything to offer this situation?

For the future.... The collating funtion also handles case sensitivity for unicode chars, something that could be nice to have applied in where clauses for LIKE comparisons, create a like function too? Then again using the ICU extension is a future possibility as well, but would I still be able to do natural number sorting? I would need to look closely at the performance impact of doing any of that. It may be that simply storing a derived ascii sortable value (that has the artcles "the", "an" etc. stripped too) is the best way to go.

Would like to know the pros and cons and any experiences of doing similar.