SQLite Forum

SQLite emoji represented as blob, not text regardless specified db field
Login

SQLite emoji represented as blob, not text regardless specified db field

(1) By David Fox (mookie) on 2022-03-08 16:20:04 [source]

Greetings, I have been creating a Web/TK Journal socket application.
Utilising SQLite as the backend, I have been having issues with selecting columns with a hexidemical value while using the following procedures as the exchange for characters to hexidemical. proc 2Hex { input } { binary encode hex [encoding convertto utf-8 "$input"] } proc 2Base { input } { encoding convertfrom utf-8 [binary decode hex "$input"] }

It appears that when a unicode character, such as an emoji and in my case a blueheart: 💙
Code point: U+1F499
Hex: f09f9299 when converted; is not stored as a text-string within SQLite.

When debugging the issue where the query:
SELECT * FROM entries WHERE journal_entry_menu_tag = 'f09f9299';
returns with zero results within a populated database. The encoded hex string of the emoji appears to be encoded as a blob representation leading to a state of confusion. It originally lead me to believe that the hex-string was to be inserted to the database as text as the schema of the table has the column defined as text.

An example piece of code which demostrates this:
package require sqlite3 sqlite3 db db eval { create table t1 ( c1 text ) } set value %F0%9F%92%99 set valueb [encoding convertto utf-8 $value] set valuec [binary encode hex $valueb] puts [list valuec representation [::tcl::unsupported::representation $valuec]] db eval { insert into t1 values ($valuec) } puts [list first result [db eval {select typeof(c1) from t1}]] puts $valuec puts [list second result [db eval {select typeof(c1) from t1 where c1 = $valuec}]]

Which outputs:

valuec representation {value is a bytearray with a refcount of 2, object pointer at 0x12e7370, internal representation 0x1b909d0:0x6064bd, no string representation}
first result blob
254630253946253932253939
second result {}

While workarounds exist and cover up the issue, such as cast, or trimming the binary element from a string, I do not wish to rely on such espcially with emoji's as an example are becoming more potent on the internet and far-beyond.

The versions I am using are: TCL 8.7a5 with TCLSqlite 3.37.2. Without workarounds are there any methods to tackle this bug?

Thanks, D.Fox

(2) By Richard Hipp (drh) on 2022-03-08 17:05:23 in reply to 1 [link] [source]

A TCL bytearray gets stuffed into the database as a BLOB. And TCL and SQLite disagree on how to transform content between TEXT and BLOB. This is, I think, what is getting you into trouble.

What exactly are you trying to accomplish here? What is the source of your data? And what are you trying to store in the database? You seem to be making this much more difficult than it ought to be. I can tell you exactly what SQLite is doing and why, but I'm not sure that would move you forward. Taking content in TCL and stuffing it in the database is normally very straightforward. You seem to be going around and around in encoding circles however. Please tell us why? What is the end goal here?

(3.12) By David Fox (mookie) on 2022-03-08 23:26:57 edited from 3.11 in reply to 2 [link] [source]

Hi Richard, Thank you for your reply.

The application I am trying to produce is nothing more then a CRUD application. The client being a browser, the backend being a dictonary or SQLite Database. The input is from a HTML form which is powered by NaviServer. All the client does is commit incoming data to a socket server. The socket server does the processing where all data captured is converted to hex and then passes to SQLite. This preforms very neat results espcially when trying to manipulate hex data.

Now where the problem resides is that I have three HTML form fields:

Journal Tag

Journal Title

Journal Content

These may be populated as such for example:

- Journal Tag: 💙

- Title: 🚧 In progress

- Content: ⚠️ This website is currently under-construction

When producing a navigational bar of journal posts, I wish to select journal post based on the hexdidemical value of Journal Tag. So the HTML looks like this.

<button name="journal_entry_menu_tag"

formaction="/view"

value="f09f9299"

class="button_menu"> 💙 </button>

And this is where the SQL query comes in.

SELECT * FROM entries WHERE journal_entry_menu_tag = 'f09f9299';

f09f9299 being the Emoji.

The user clicks the button, the value of the form is sent to the client, which is then sent to the socket server (with other information) and then processess. This method I've found more elegant then having the above field in standard BASE32. Between TCL and SQLite and the uplifting to the database; is the issue where the encoded value is as a blob.

Nothing complex, just the following code: (notice the cast for workaround, however it does mess with the neatness)

db eval { INSERT INTO entries (journal_entry_menu_tag, journal_entry_title, journal_entry_image, journal_entry_content, journal_entry_datetime )

VALUES( cast(:journal_entry_menu_tag as text), :journal_entry_title, :journal_entry_image, :journal_entry_content, :journal_entry_datetime ) }

db close } ;#end proc

One way to store the emoji as blob, but I do not wish it to be so as that adds additional complication. "💡 New Idea" may be a potential tag in the future and if the emojii data is being encoded a blob mixed with ascii data this may cause more problems down the road.

Imagine everything as a string is TCL's motto, I want to imagine TCL as everything as a hex-string.

So the end goal is an SQLite Database containing BASE16 for ascii-data rather than BASE32.

Regards, D.Fox

(4) By Larry Brasfield (larrybr) on 2022-03-09 08:08:05 in reply to 3.12 [link] [source]

I think maybe you have misunderstood Richard's question. That was focused on the specific issues arising from your attempts to encode, store, retrieve and (presumably eventually) use your emojii characters in a TCL environment using the SQLite library for TCL. He (and I or anybody else looking closely) sees such a lengthy sequence of encoding/recoding/decoding operations that the objective is difficult to discern. By "the objective", I do not mean "what your program is to accomplish" or "what user need is met by your program". What I mean by it is a teensy sub-objective of all that, whereby the data giving you trouble starts in one form and is to be used in that form or some other form after storage and retrieval. In other words, what are the inputs and outputs, stripped of all the intermediate data-munging you have inserted into the problem solution?

Try to keep it simple, so that people reading need not filter out scads of facts that are irrelevant to the problem that induced your post #1.

A side-issue, which your record-breaking post editing saga brings to the fore: When you change a post, it is a courtesy to indicate what edits you made. This is especially true for long posts. You can hardly expect people to read a post over and over, trying to see what is new. The people who can help are generally too busy to waste their time helping those who take little care to avoid wasting their time.

(5) By David Fox (mookie) on 2022-03-09 11:36:20 in reply to 4 [link] [source]

Hi Larry, That I may of misunderstand Richards question. I wasn't aware that the edits were up for show as I do tend to edit after submit as for to tidy up and ensure it's coherent and grammatically correct. I just assume people to read the final post rather than all the edits; there was no agenda to waste anyone time.

The goal is to store a Base16 string of an emoji as TEXT within a SQLite Database. I want to be able to recall the row data based off the Base16 text-value of the emoji.

Inputs: HTML Form - Data being an emoji

Output: The row of data based off the string of the emoji.

Issue: Unable to retrieve the row of data from the SQL DB based off the Base16 value because the data is being saved to the DB as a BLOB rather than TEXT.

Sorry & Regards, D.Fox

(6) By Richard Hipp (drh) on 2022-03-09 14:46:37 in reply to 1 [link] [source]

I think you need to force the input to have a string representation so that it will be stored as a string and not as a BLOB. The [string map ...] command worked to accomplish that for me, but perhaps there is a better way...

package require sqlite3
sqlite3 db
db eval {
  create table t1 (
    c1 text
  )
}
set value %F0%9F%92%99
set valueb [encoding convertto utf-8 $value]
set valuec [binary encode hex $valueb]
set valued [string map {x y} $valuec]
puts "valued representation: [::tcl::unsupported::representation $valued]"
db eval {
  insert into t1 values ($valued)
}
puts "first result: [list [db eval {select typeof(c1), c1 from t1}]]"
puts "valued: $valued"
puts "second result: [list [db eval {select typeof(c1), c1 from t1 where c1 = $valuec}]]"

(7.1) By David Fox (mookie) on 2022-03-10 02:07:18 edited from 7.0 in reply to 6 [link] [source]

I appreciate the workaround, but it does add length to the actual code, in that example it is only a single entry. I am dealing with multiple inputs. Trying to avoid workarounds as I see that if a db field is defined as text it should strip all data to text but as this appears to be a documented issue and tricky solution to fix, it may have to be a workaround.

[string trimright] is also another alternative to remove the binary meta and using cast() on eval values and might be better.

Am I going to encounter any pitfalls using this? Other than if I actually use blob data.

Thanks & Regards, D.Fox