SQLite Forum

Trouble writing/retrieving a small UTF-8 GIF image.
Login

Trouble writing/retrieving a small UTF-8 GIF image.

(1.3) By Gary (1codedebugger) on 2021-03-30 08:39:24 edited from 1.2 [link] [source]

All on the client, I'm passing a JSON string from the browser to a C file compiled with the SQLite amalgamation version 3.34, in which one property is an image of mime type "image/gif;charset=utf-8" and around 120kb. Writing the object to the console before sending it, displays many symbols. The writing of the data to the SQLite database succeeds without error and at the CLI the data can be selected, but the GIF data is only about 13 characters long.

I thought perhaps it was just a display issue because earlier, when working with Hebrew and Greek text, the CLI displayed a lot of ??? but when the data was retrieved and passed back to the browser it was displayed properly. I tried retrieving the GIF data and passing it back but it still was only 13 characters in length. Even strlen() in C is 13 characters.

I tried passing it as a blob and storing as a blob, converting the blob to array buffer and encoding as UTF-8 before passing (which I think was nothing more than a circle since the GIF was UTF-8) and storing it as text and again as blob; and the only difference is that passing it back from sqlite3_column_blob() fails at the first character and sqlite3_column_text() passes the 13 characters.

I'm sure I'm doing something stupid because I don't know something important, but how can I tell whether or not the GIF was written completely? I can see at the CLI that the rest of the data was written completely.

The JSON sent back to the browser is built in C using sprintf() and %s is used for the GIF from sqlite3_column_text(). If it printed properly at the CLI, I'd know it was something to do with building the JSON. SQLite doesn't care what the data type is; so, if it is writing it to the database then could it be that one of those UTF-8 symbols is acting as a termination character in the string? And, if so, how can that be corrected? An error in the browser when attempting to parse the JSON states "bad control character".

[ I can see now that only 13 characters are written to the database because sqlite3_column_bytes() returns 13. The 14th character is \u0000, which I just learned is the same as \0. Perhaps conversion to base64 will correct this; but I thought that was 133% larger. Converting to Uint8 made content multiple times larger.]

Also, would it be better to change the GIF to another format before storing it? The images are stored to be viewed in Tcl/Tk not the browser, if that impacts what format would be best.

Thank you for any direction you may be able to provide.

(2) By Rowan Worth (sqweek) on 2021-03-30 07:05:44 in reply to 1.1 [link] [source]

mime type "image/gif;charset=utf-8"

This is one of the most bizarre claims I've ever seen; GIF is a binary interchange format and utf-8 is a text encoding. I mean ok the binary data could be base64 encoded, which would kind of make sense as I can't see how else you could embed a GIF in JSON, but judging by your other observations this is not the case:

Writing the object to the console before sending it, displays many symbols. but the GIF data is only about 13 characters long. Even strlen() in C is 13 characters.

I'd wager you're looking at a binary GIF dump including embedded NUL characters (ie. '\0' in C). I'm not a master of web technologies but I'm 99% sure it's not legal to just shove that into a JSON field :)

sqlite will happily store binary data, but you need to use the correct APIs (ie. treat it as blob data and never string data).

The JSON sent back to the browser is built in C using sprintf() and %s is used for the GIF from sqlite3_column_text().

The length of C strings is determined by the presence of a terminating NUL character. You cannot use C string functions such as printf in conjunction with binary data.

but how can I tell whether or not the GIF was written completely?

"SELECT length(column_name) FROM table_name" <- the length will reflect the actual amount of data stored in the database cell. If that returns 13 for this row then the full data was not written.

(4.1) By Gary (1codedebugger) on 2021-03-31 05:28:06 edited from 4.0 in reply to 2 [link] [source]

Thank you for the responses.  Perhaps this will be a bore but I add this clarification for anyone else at my limited level of understanding of these basic topics. I may still be misunderstanding.  Ultimately, it's not a SQLite question apart from whether or not the form of the GIF image passed to C should be converted in C before being stored in the database, or if there is way to store the UTF-8 as passed. 

Computer-science terminology is far from my native language but I'm not fabricating the information. The response returned by the fetch promise is consumed as a blob, which also provides a "type" property providing the mime type.  That is the source of "image/gif;charset=utf-8". I'm not qualified to comment on whether or not that is bizarre and leave that to whoever coded the browser or image to convey that information.  Even after the blob is converted to a base64 data URL the string begins with "data:image/gif;charset=utf-8;base64,...".

Many methods are discussed in the browser documentation and other forums and I tried most of them before posting. Using the text() method of the blob object returns a UTF-8 string.  Converting the blob to an array buffer and then decoding it to the default UTF-8 provides the exact same result. That's what I meant by a circle.  Both of these can be placed in a JSON string, but binary data cannot be directly placed in a JSON string (if you know how, please let me know).  The GIF image cannot be shoved into a JSON string.  Nor can it be converted in C, in this scenario, because it cannot be passed to C since JSON is the only method of communication offered by the browser extension APIs. It simply won't accept it or replaces it with nothing, such as an empty object. Thus, the web-extension native-messaging API cannot directly pass binary to the C application.  Apparently, the browser's object serialization methods take care of this step when storing a blob in indexedDB, which I thought was really SQLite behind it. Perhaps I was naive in thinking that if one can simply pass a blob to the add() and put() methods of indexedDB, it would be as easy in these circumstances using SQLite directly.  The SQLite part is easy but my lack of understanding of the different typed arrays and JSON limitations makes it a bit more challenging to get the GIF there.

Not knowing much, I thought UTF-8 would work in C but the \u0000 is the same as \0 and the read of the strings is terminated before desired completion. Perhaps they could be escaped before passing to C but I don't think UTF-8 is of much value, in that it would have to be converted again before displaying as an image.

It appears that converting the array buffer to string works okay and, perhaps, can be converted to binary or base64 in C.  The array buffer can be passed in the JSON string without converting to string when you build it but the browser will convert it to an object in the messaging API, including every index as text also, before finally converting it to string; and that increases the size by more than double. 

In attempt at following the browser documentation concerning typed arrays, I tried using btoa(), which errors on UTF-8 for characters larger than one byte; but converting the array buffer to Uint8Array() and then using btoa(), as suggested, produces base64.  The ratio of the byte length of base64 to that of the array buffer, in my small number of tests, has matched exactly the predicted 133% in the MDN documents. However, if you try to convert that base64 string back to an image, it doesn't work, even before passing it to C.  Perhaps it will work if you convert it to Uint8Array() again and then try to get that back to an array buffer to make a blob to use in an object URL.  But that is rather ridiculous.

Using the filereader() to read the blob as a data URL converts it to base64 and the image source can be set directly to base64 without conversion to display it. It's quite simple.  I tested this by passing to C, storing in database, retrieving, passing back to browser and displaying by setting image source to exact base64 string, no manual conversion needs.

The Tcl/Tk manual pages seem to state that base64 is better for GIF images than binary, where it reads "base64-encoded data (this is currently guaranteed to be supported for PNG and GIF images)."  So, that may do it, in that no additional conversion will be required to display it through Tk. I still need to see if these base64 strings will work in Tk. You'd think they would but I'm not confident of it.

There are about five thousand of these small GIF images; so I don't know whether or not it would be better to pass them as base64 to C and then convert them to binary before storing, if possible, to reduce storage space, or just save them as base64 and never convert again.

Thank you.


          fetch( iSrc )
            .then( response =>
              { 
                if ( !response.ok )
                  throw new Error( 'Network response was not ok.' );
                else
                  return response.blob();
              } )
            .then( result =>
              { 
                console.log( result.type ); // => "image/gif;charset=utf-8"

                // None of this code inside arrayBuffer produces anything usable that can be retrieved and displayed as an image again.
                // At least not in a few steps.

                result.arrayBuffer().then( ( arrBuf ) =>
                  {
                    const buffer = new Uint8Array( arrBuf );
                    const btoaBuf = btoa( buffer ); // This yields base64.

                    // btoaBuf.length / buffer.toString().length ~= 133%.

                    let utf8decoder = new TextDecoder(); // default 'utf-8'
                    const decBuf = utf8decoder.decode( arrBuf ); // Same as blob.text().

                    buffer.toString() is "179,12,0,..."

                  } );                  

                // This works and is simple.
                var reader = new FileReader();  
                reader.onloadend = function() { let base64 = reader.result; };
                reader.readAsDataURL( result );

(5.1) By Harald Hanche-Olsen (hanche) on 2021-03-31 09:19:12 edited from 5.0 in reply to 4.1 [link] [source]

I admit I have a hard time understanding your setup.

But let me make a few remarks that may help:

Specifying charset=utf-8 for a gif image is definitely wrongheaded. It probably comes from a default configuration of the server. It may or may not be harmless: That depends what the client does when it sees this specification. Worst case scenario, the client will get some mangled data that cannot be reconstituted into the original gif.

I created a small (16x16) gif file for experimental purposes.

I notice that it starts with the characters GIF89a immediately followed by binary data. If you see those characters at the beginning of your data, you are indeed looking at binary data. It will almost certainly contain NUL characters, and thus will not survive a trip through a C string.

Here is a hexdump of the file:

47494638396110000e00f00000ff00000000ff21f90400000000002c0000
000010000e000002178c8fa9cbbd00236452d23a1f0617f7fa599ae794e6
9914003b

Any hex representation of a gif file should start with 474946383961 (that is hex for GIF89a.

Similarly, here is a base64 encoding of the same file:

R0lGODlhEAAOAPAAAP8AAAAA/yH5BAAAAAAALAAAAAAQAA4AAAIXjI+py70AI2RS0jofBhf3+lma55TmmRQAOw==

Any base64 representation of a gif file should start with R0lGODlh, which is GIF89a in base64.

Hopefully, these observations may be a help in understanding what form of the data you are looking at.

Note that JSON cannot encode binary data. So if a GIF file is to be sent via JSON, it has to be encoded as a string. Base64 and hex are perhaps the two most popular ways of doing that. Obviously, you have to know what you've got in order to decode it.

I have some more to say, but I'll defer that to a second reply (for readability, and to give my mind time to clear).

Edited to add:

I ran a tiny experiment with the gif file above (eliding some sqlite3 verbosty):

⬥ sqlite3
▶▶▶ create table g(b);
▶▶▶ insert into g values(readfile('tt.gif'));
▶▶▶ .headers off
▶▶▶ select hex(b) from g;
47494638396110000E00F00000FF00000000FF21F90400000000002C0000000010000E000002178C8FA9CBBD00236452D23A1F0617F7FA599AE794E69914003B

Again, look for the telltale 474946383961 at the beginning of the return value.

(9) By Decker (d3x0r) on 2021-04-01 04:55:06 in reply to 5.1 [link] [source]

Converting binary to utf8 is just making sure it's legal text, which can later be reversed by looking up the appropriate code points, so it doesn't get rejected by strict transport layers. 0xFF is a byte that can never appear in UTF-8 encoding (actually something like 0xF8-FF). So if you take each byte of the image, and convert it to an equivalent codepoint representation you can later reverse it.

Sometimes, this can be more condensed then a base64 conversion; a base64 expands by 33% or 3:4; that for every 3 bytes input you get 4. UTF-8 is 1:1 or 1:2 .. and ideally it also is 200% more 50% of the time or 2:3, which is actually +50%, so encoding in utf-8 is often worse than just using a base64 conversion.

(10) By Harald Hanche-Olsen (hanche) on 2021-04-01 08:35:21 in reply to 9 [link] [source]

We're getting a bit far afield for a sqlite mailing list, perhaps, but anyhow:

Yes, this will work, with the caveat that NUL bytes (0x00) in the original binary will be stored as NUL (0x00) in the resulting UTF-8 encoding, and so you absolutely cannot pass this through code such as the standard C string routines that assume strings are NUL terminated.

You could, in principle, get around that by encoding NUL bytes instead using a different code points such as 0x100 (the letter Ā, letter A with macron), and undoing this on the receiving end before converting back to binary. Either that, or simply ensure that the string is not touched by anything treating NUL as a terminator. In C, that means using a character array and keeping track of the number of bytes separately.

(6) By Harald Hanche-Olsen (hanche) on 2021-03-31 09:33:32 in reply to 4.1 [link] [source]

As indicated, I had more to say. To repeat myself, I am confused.

You speak of a browser, but don't tell us which browser. Is it just a generic web browser? But then there is also a C program, or quite possibly two C programs, and your browser communicates with the C program(s) – how? Do you have a C program acting like a web server that the browser is talking to? Over http(s), or the websocket protocol, or some other means? And last, but not least, since this is after all the sqlite forum, where in all this does your use of sqlite fit in?

In short, without some clear picture of where the data (the gif file) originates, how it is passed to the browser, what you do to it there (using javascript?), before you do what? Pass it to another C program?

In short, I think your problem description has too many words and too little code. I just can't wrap my brain around it, and so feel unable to help. It would be great if you could cook up a toy example and provide code for it, if that is at all possible.

Regarding indexedDB, as far as I know, in most browsers it is indeed based on sqlite. But the specification of the indexedDB API does not assume that, and hence you most likely do not have access to the full sqlite functionality from within the browser. In which case, you just need to stick to whatever the API gives you.

(3) By Gunter Hick (gunter_hick) on 2021-03-30 07:49:34 in reply to 1.2 [source]

I think you are only pretending that your (binary) GIF image (which contains NUL bytes) is text data instead of encoding it properly. This will cause the data to be truncated at the first NUL byte.

You need to actually encode (and later decode) the binary data using base64 or whatever instead of "converting the blob to array buffer" using a typecast in C.

(8) By Decker (d3x0r) on 2021-04-01 04:45:27 in reply to 3 [link] [source]

This only applies for the C runtime, binding column values lets you specify the length, which makes any checks for NUL get bypassed. SQLite itself has no issues with storing '0' or retrieving (except the console will truncate the displayed value retrieved)

(7.1) By little-pan on 2021-04-01 04:47:31 edited from 7.0 in reply to 1.3 [link] [source]

  1. The type "image/gif;charset=utf-8" is correct for git image by javascript Blob API(see "https://developer.mozilla.org/zh-CN/docs/Web/API/Blob" please). But this JSON property of the git image is an utf-8 string, and not a normal C or SQLite string(terminated by zero).

  2. Writing/retrieving your JSON properties by SQLite C API sqlite3_bind_text(), sqlite3_column_text() and sqlite3_column_bytes() are also correct by specifying the length of the field or column.

  3. But at CLI(sqlite3 command), the git image field that got by calling sqlite3_column_text() and strdup() directly is handled and displayed as C/SQLite string, so only 13 characters is showed.Please see exec_prepared_stmt_columnar() in shell.c in sqlite3 source.