SQLite Forum

like and virtual tables on database ascii 256 format
Login

like and virtual tables on database ascii 256 format

(1) By Anton Azanov (AAzanov) on 2023-06-15 13:17:56 [link] [source]

Hello! I have encountered a problem with using LIKE. Some program (it doesn't matter which one) uses databases in 256-character ASCII encoding. SQLite is used as an extension through virtual tables. Characters are converted to UTF-8 and back when working with SQLite. There is a table in the database with an indexed DESCR column. When querying DESCR LIKE char%, optimization occurs using DESCR BETWEEN char AND char+1. My optimizer understands BETWEEN and applies a constraint to the database tables by index. But when I execute a query LIKE char(256)%, the value of fromUTF8(toUtf8(char(256))+1) doesn't make sense. I can specify +DESCR, but I can't solve this issue without using "+". Perhaps it is worth replacing the optimization with BETWEEN char-1 and char, instead of BETWEEN char and char+1, but I'm not sure and haven't found where this is located in the SQLite code yet. Any ideas?

(2) By Gunter Hick (gunter_hick) on 2023-06-15 14:11:49 in reply to 1 [link] [source]

Your proposed change would break the functionality of the LIKE optimization because it is plain wrong.

No 256 character ASCII encoding can contain a char(256), since that does not fit into 8 bits. Did you write the fromUTF8() and toUTF8() functions yourself?

It seems you are suffering from character transcoding problems, which unfortunately arise from attempting to convert on the fly between UTF8 and whatever the backing store of your virtual table actually uses. (Some windows codepage perhaps?)

(3) By anonymous on 2023-06-16 02:52:43 in reply to 1 [link] [source]

For this and other reason, I would want a non-Unicode mode (possibly as a compile-time option, or else as a run-time option, without having to mess with everything and do all of the conversion making it inefficient and not work very well (doing it in application code will disable some optimizations and make included functions even though they are not used and a simpler implementation would be needed instead)). This way, it is not necessary to convert to/from UTF-8 (especially if that is not really possible, which sometimes it isn't; and even if it is possible, it is inefficient when working with non-Unicode). This would include changing the behaviour of LIKE and of other functions, although some do not need to be changed.

(4.1) By Simon Slavin (slavin) on 2023-06-16 13:26:48 edited from 4.0 in reply to 3 [link] [source]

Either you, or the programmers for the program you're using, did not allow for a situation where someone might want to do a search like yours.

Text in SQLite databases is handled internally as UTF-8 or UTF-16. SQLite does not have any way to handle (e.g. search and sort, which are the main activities of a DBMS) characters held in any ASCII format. Adding this functionality would involve a huge amount of work since every part of SQLite which handles text would have to be changed, and new tests would have to be created for all the changes. It's not going to happen.

Your programming language or development toolchain may have function calls for converting text in your preferred format to/from either UTF-8 or UTF-16. Alternatively, you could write some conversion functions yourself. If none of these are true, you're using the wrong tools or preferring the wrong format.

You could instead provide your own external function for SQLite which turns text supplied in your preferred format into a format SQLite's API calls can handle.

(5.1) By Anton Azanov (AAzanov) on 2023-06-17 11:43:22 edited from 5.0 in reply to 4.1 [source]

I understand it all, it's just that the main program is very old and proprietary.

There is no way to change the storage format of its data.

But since there are no better ideas on how this could be done, I had to disable in xBestIndex the ability to use indexes in LIKE optimization.

Some remarks on this situation. I don't see https://www.sqlite.org/c3ref/index_info.html giving the current limit values, in my situation I could replace betveen with >= and it would work.