SQLite Forum

how to collate in a union and noaccent
Login

how to collate in a union and noaccent

(1.1) By rbucker on 2021-05-01 13:57:46 edited from 1.0 [link] [source]

As is the life of an ETL programmer I'm constantly matching things that do not really match and while I can de-dupe the number of "collate" expressions when creating the schema it's not always known in advance. and so on...

I started with this example:

select * from (
select 'a'
union
select 'A'
)x

then I added the COLLATE

select * from (
select 'a'
union
select 'A'
collate nocase
)x

It's interesting to note that the results depend on the order of the 'A' and 'a'... I'm not sure it's meaningful or not.

I'm not a language expert but in the example I think the "collate nocase" is being assigned to the column so if I added a FROM the COLLATE would need to be attributed to the column and not the from... (no examples in the doc that I could see)

select * from (
select 'a' from test
union
select 'A' collate nocase from test
)x

And so here are my questions...

  • can there be multiple collate attributes(can they be stacked)?
  • I read someplace that NOACCENT was left to the programmer? The reason given was that it was too big to implement. SHRUG. Given the use-cases for SQLite one might think it should be baked in.(see below)

example from SQL Server(??)

SELECT * FROM Venue WHERE Name COLLATE Latin1_general_CI_AI Like '%cafe%' COLLATE Latin1_general_CI_AI

THANKS!

(2) By Keith Medcalf (kmedcalf) on 2021-05-01 23:17:10 in reply to 1.1 [source]

It's interesting to note that the results depend on the order of the 'A' and 'a'... I'm not sure it's meaningful or not.

No its not because they are the same when compared using collate nocase. Which one you get depends on which one is "first" in the comparison. And while which one is "first" in the comparison is entirely deterministic, it is an implementation detail on which you should not rely.

I'm not a language expert but in the example I think the "collate nocase" is being assigned to the column

No. The collation applies to the compare operator, although it is "carried along as an attribute of a column or a data item", the collate applies to the comparison operator. That is to say that 'A' collate nocase has absolutely no effect on the value of 'A' but rather on comparison operations which refer to the value.

so if I added a FROM the COLLATE would need to be attributed to the column and not the from... (no examples in the doc that I could see)

Cannot parse your meaning here.

can there be multiple collate attributes(can they be stacked)?

Cannot parse your meaning here either. If you mean can a single comparison be carried out with multiple collations, the answer is no. In the face of multiple collations applying to a operation, the documentation explains how the "winning collation in the tournament" as selected. If you mean can you specify a chain of comparisons and have a different collation attached to each comparison, the answer is yes.

I read someplace that NOACCENT was left to the programmer? The reason given was that it was too big to implement. SHRUG. Given the use-cases for SQLite one might think it should be baked in.(see below)

This is correct. Even the NOCASE collation only handles plain ASCII. Plain ASCII does not have accents. If you wish to handle a larger character set than plain ASCII, you have to provide your own collation (comparator function).

If you want to extend the collation comparator functions you can do so. There are numerous extensions to do this including built-in support for using ICU.

(3) By Simon Slavin (slavin) on 2021-05-02 13:28:30 in reply to 1.1 [link] [source]

Do you ever need to do a case sensitive sort or search ? Any sort or search where case matters ? It's unusual to need to do this. If you don't need to do it, I recommend you put the collation in the column definition. This means that all WHERE clauses automatically respect what you think is significant.

CREATE TABLE Contacts (
    familyname TEXT COLLATE NOCASE,
    givenname TEXT COLLATE NOCASE,
    homephone TEXT,
    mobilephone TEXT
);