SQLite Forum

How to handle double backslashes when binding parameters in C API?

How to handle double backslashes when binding parameters in C API?

(1.1) By Gary (1codedebugger) on 2021-03-04 08:45:28 edited from 1.0 [source]

Hello, I'm sure this a very novice question and due to my weakness in C or programming in general. The code below has worked fine in the C API for the past couple months but is now failing after having to change from forward slashes to backslashes in part of the JSON request. It worked for "Historical Folder/American History" but fails for "Historical Folder\\American History".

If this simple query is run at the command line it succeeds as below.

select * from json_tree( '{"r":"A","c":"A","p":"Historical Folder\\American History","g":0,"tab":"A_87"}')

fullkey value

------- ------------------------------------------------------------------------------

$ {"r":"A","c":"A","p":"Historical Folder\\American History","g":0,"tab":"A_87"}

$.r A

$.c A

$.p Historical Folder\American History

$.g 0

$.tab A_87

If the statement is prepared in the C API and then the JSON string bound as:

sqlite3_prepare_v3( db_mem->handle, "insert into request select * from json_tree( ? )", -1, SQLITE_PREPARE_PERSISTENT, &(db_mem->parse), NULL )

sqlite3_bind_text( db_mem->parse, 1, j, l, SQLITE_STATIC )// Where j points to the JSON string in json_tree() above and l is strlen(j).

The bind returns SQLITE_OK or 0 but any query run to retrieve data from table request fails, returning code 1.

If the following code is run to expand the statement after binding,

char *exp = sqlite3_expanded_sql( db_mem->parse ); printf( "%s\n", exp ); sqlite3_free( exp );

the result is as below with one backslash, which I assume is why it fails.

insert into request select * from json_tree( '{"g":0,"r":"A","c":"A","p":"Historical Folder\American History","tab":"A_119"}' )

Although I don't know what the content of the JSON strings passed to the C application will be, I control the code that builds them.

Would you please tell me what should be done to correct this, such that the C API will return the same result as the command line.

I should add that using four backslashes as "History Folder\\\\American History" works but is that the right way?

Thank you.

(2) By ddevienne on 2021-03-04 08:50:46 in reply to 1.0 [link] [source]

If you look at JSON's grammar, you'll see that in JSON, backslash
must be escaped with a backslash. And the same is true in C as well!

Thus to get two backslashes in the JSON, you need 4 of them in the C literal.

If you have a C++11 compiler, you can use a raw-string-literal to avoid having
to escape your backslashes in the C/C++11 code, i.e. R"(foo\\bar)", instead
of "foo\\\\bar" in a plain C compiler.

I suspect that's your issue. Good luck. --DD

(3) By Gary (1codedebugger) on 2021-03-04 18:21:16 in reply to 2 [link] [source]

Thanks. Perhaps I'm misunderstanding; but I don't need two backslashes in the final text of the object property p passed in the JSON string.

The prepared statement includes json_tree(?) and when the ? has two backslashes (to get one desired) sqlite3_bind_text() escapes it there and, when the statement is executed, it is escaped again, treating the single remaining backslash as if it were escaping the next character in the string.

In this case, "History Folder\\American History" is bound as "History Folder\American History" and when subsequently executed the SQL tries to escape the "A" and fails.

Thus to get one backslash into a text column in table, it appears four backslashes are needed in the literal when binding parameters.

I understand that to get two backslashes, both need escaped requiring four in the literal. It appears that all escapes need to be "doubled" since, even though sqlite3_bind_text() performs the escape, it is again performed before/when the statement is executed.

I wanted to make sure that is the correct way to handle it and that it'll always work.


(4.1) By ddevienne on 2021-03-04 18:51:53 edited from 4.0 in reply to 3 [link] [source]

I don't need two backslashes in the final text of the object property p passed in the JSON string.

Well, according to the JSON grammar, you do in fact, IMHO.
So the JSON text stored in the DB should have two backslashes,
such that the output of a JSON parser has a single-backslash
for that text value.

I.e. the text in the DB supposed to be in JSON format must double
backslashes, if you expect to have those backslashes has text node
values in that JSON document.

When you extract a text node value out of a JSON document, the returned
text is no longer part of a JSON document, so does not need to respect
the JSON grammar, and can have a single backslash.

Basically you have to differentiate the lexical-space, and the value-space.
You also need to differentiate text in JSON-land, and outside JSON-land.

I hope that makes sense...

(5) By Gary (1codedebugger) on 2021-03-04 21:21:16 in reply to 4.1 [link] [source]

Thanks again. I probably wrote the statement you quoted in a less than precise manner. I was speaking of '$.p' in a row of type text, not as a member of a table row of type object that would still be JSON format.

That is an interesting point, however, that I hadn't yet been forced to consider; but I tried it (that is, parsing an object with properties that are also objects) out and the escape characters, double or more backslashes, are all automatically preserved for any row of type object after the parse. Of course, that makes it quite easy.

Although I think one backslash will work in my particular case since I will be building a command using the text of the '$.p' row, I cannot say with certainty at this point; and that is not really my question. Thank you for bringing that to my attention, nonetheless.

I was just, stupidly perhaps, thinking that the text should be escaped only once; but was missing that the string passed in sqlite3_bind_text() is escaped when replacing the ? and then that expression is escaped when executed. I thought that sqlite3_bind_text() would replace the ? with the escaped string and it would not then be escaped a second time when the statement is executed.

Maybe I'm still missing something; but to get two backslashes in the text row of '$.p', the original JSON has to start with eight; because sqlite3_bind_text() will escape it to four, and then the statement will treat those four as needing escaped again to two. As long as it works that way all the time consistently, it's all I need to know.


(6) By Larry Brasfield (larrybr) on 2021-03-04 21:31:07 in reply to 5 [link] [source]

I'm not trying to sort out your whole perplexity, but I can address this:

because sqlite3_bind_text() will escape it to four

Banish that from your brain! sqlite3_bind_text() is responsible for getting some characters, provided in the format it expects, into whatever form the database needs (per its encoding, often the same as what ...bind_text() expects.) It does no escaping. It you seem to pass C "string" literals into it that need some kind of escape sequence processing, the compiler does that, not ...bind_text(). It similarly knows nothing of Json or how anything Json-related might be done with the text after it has been bound to a query parameter.

(7) By Gary (1codedebugger) on 2021-03-04 22:07:46 in reply to 6 [link] [source]

Yeah, that was a most terrible way for me to express that.

Upon further reflection, this is a most useless question. It's not entirely a misunderstanding of C either. It's just my confusion over thinking that somehow sqlite3_bind_text() and the execution of the prepared statement were more closely related. They are two steps and each involves an escape procedure somewhere along the way, such as one in determining what will replace the ? and one in interpreting it as SQL.

If you have the authority to delete this question, please do so.

Perhaps, most likely, I better make it a rule to never post a question at 3:30 AM any longer.

Thank you.

(8.1) By Keith Medcalf (kmedcalf) on 2021-03-04 23:01:59 edited from 8.0 in reply to 5 [link] [source]

the string passed in sqlite3_bind_text() is escaped when replacing the ?


sqlite3_bind_text passes a bag-o-bytes unmolested between the "external" application and the "SQLite3 internals". The bag-o-bytes is a sequence of bytes representing a C string encoded using UTF-8. "C string" means a sequence of bytes followed by a null (0) terminator. If these conditions are violated (UTF-8 encoding and null termination with no embedded null bytes) then "all hell may break loose" which may include the immediate termination of the multiverse.

"escape sequences" are interpreted by "parsers". Converting what you type as a C (or other language) program into the actual "machine code" executed by the actual physical machine requires "parsing" what you typed and interpreting "escape codes" to make up for your inability to type certain characters.

Similarly, JSON glook is "parsed" into what it contains and "parses" escape characters in order to make up for your inability to type them or the JSON parsing protocol specification to represent them directly.

Sometimes there are multiple layers of parsers which mayhaps use the same escape characters. So, for example, to create a C string that respresents JSON containing a string with a special character requires that you multiplicate the relevant escape characters to make the output of the parser from the parsers' parser be what you actually intend.

(9) By Gary (1codedebugger) on 2021-03-10 04:12:39 in reply to 8.1 [link] [source]

Thank you. I was confused and missing the fact that the json_tree() parse performed one escape and then the execution of the SQL statement, after the bind, parsed and escaped again; and was erroneously attributing the first escape to the bind itself since the sql expand showed one escape had been performed by that point. Thanks for the explanation.

May I ask a related bag-o-bytes question? It's part of this same sqlite process of exchanging JSON messages but is not really a sqlite-specific question. I understand if it is too far off topic.

Anytime a JSON string is of a size that is a multiple of 256 plus 10 bytes (such as 266 ad 522) and that size is passed as a uint32 prefixed to the JSON message, the message is ignored on the other end, which is a native-messaging API in a browser extension. I'm an old man but not an old programmer. Four bytes in a uint32; eight bits in a byte; eight bits represent a decimal value range from 0 to 255. What does the remainder of 10 have to do with anything? It appears to be represented by a new-line character and prints to the screen as a smiley face when the C/SQLite .exe is run from the command line. When written to a local file, the JSON string itself is moved to a new line from the first uint32 byte which is a smiley face. All other lengths appear to work fine. Could you please point me in the right direction for where to search for an answer? It might be my novice stupidity and lack of solid background in computer basics or a problem in the native-messaging API. I know sqlite is not causing the issue because it couldn't possibly return weird values for only these string sizes and the strings are built from other values in addition to those retrieved from sqlite tables; and it appears to be the uint32 size not the string itself. Thank you.

(10) By David Empson (dempson) on 2021-03-10 04:51:18 in reply to 9 [link] [source]

Somewhat off-topic for SQLite but easiest to answer the question here.

The appearance of the 10 byte on a new line in your output file means you are probably on a Windows system and you have opened the file in ASCII mode rather than binary mode. This concept only exists on platforms such as Windows which don't follow UNIX conventions for text file formats.

For text files on UNIX/Linux systems, end of line is a newline '\n' (ASCII 10). For text files on DOS/Windows, end of line is a carriage return '\r' (ASCII 13) followed by a newline '\n' (ASCII 10).

C originated on UNIX and follows the UNIX standard, so in C when writing to a file, you represent the end of line with '\n'. Since that would produce non-standard text files on Windows, the standard library for Windows C compilers defaults to ASCII mode when a file is opened. In ASCII mode, any '\n' written by the application is replaced with a "\r\n" pair, with the reverse translation for read.

In ASCII mode, if you write binary data which happens to include a byte with the value 10, the file will have a byte with the value 13 inserted before the 10, which will break your binary formatting protocol.

Solution: open the file in binary mode, which writes the data without modification, e.g. use "wb" mode for fopen() or add the O_BINARY flag for open().

If your application needs to be portable, you might want to put this part of the code in a #if/#else/#endif conditional block so opening in binary mode is specific to Windows. The "b" option for fopen() is part of the C90 standard so should be accepted and ignored on modern UNIX/Linux systems, but the O_BINARY flag for open() is a non-standard extension and may cause a compile error.

(11.2) By Gary (1codedebugger) on 2021-03-10 07:12:55 edited from 11.1 in reply to 10 [link] [source]

Thank you very much for the explanation.

I thought these comments relative to the local text file rather than stdout in general, because I read similar information last night. I don't mess with Chrome browser at all but found in their development area a warning concerning Windows, much like your information, and it directed to a Windows web page that discussed a _setmode() function and it used _O_BINARY flag for stdin and stdout.

I tried using _setmode() on stdout with the _O_BINARY flag, with and without the underscores (for some reason half of Windows function names appear to have been deprecated and prefixed with an underscore) and it failed. It worked for stdin but not stdout.

After reading your response, I went back to that Chrome developer message and the referenced Windows web page and tried it all again; and this time it worked!

It changed _setmode() on stdout to flag _O_BINARY and the extension now reads 266 and 522 successfully. I wouldn't be surprised if I typed an 0 instead of an O in O_BINARY when I added it for stdout or something stupid like that after so many tries.

Thanks again! I can't tell you what a relief it is that it works. I was ready to throw the browser away and Windows away, and either give up or try another UI and OS. I may still do that when I get a chance anyways.