SQLite Forum

Tcl bytearray woes
Login

Tcl bytearray woes

(1) By Poor Yorick (pooryorick) on 2022-02-08 22:37:42 [link] [source]

Because the Tcl SQLite interface handles a Tcl byte array differently depending on whether it currently has a string representation, once a string representation is generated for $bytearray in the script below, that value no longer matches the blob value stored in the table, even though the value in the table was derived from the same Tcl value:

	package require sqlite3
	sqlite3 db :memory:
	db eval {
		create table t1 (
			value
		)
	}
	set string \x80 
	set bytearray [binary format H* 80]
	db eval {
		insert into t1 values ($bytearray)
	}

	set found [db eval {
		select typeof(value) from t1 where value = $bytearray
	}]
	puts [list {pure byte array match} $found]; #-> blob

	# give data1b string representation
	encoding convertto utf-8 $bytearray
	set found [db eval {
		select typeof(value) from t1 where value = $bytearray
	}]
	puts [list {byte array with string rep match} $found]; #-> {}
	puts [list {string and bytearray equal?} [expr {$string eq $bytearray}]]; #-> 1
	

This makes it problematic to provide a Tcl package of routines that manage data in a SQLite database that might contain things like cryptographic hashes or other arbitrary binary data unless it either ensures that a string representation for each value exists before storing it into the database, in which case all values, binary values included, are stored as text encoded in the database encoding, or it goes the other route and detects that the string representation of a value would be better stored as a blob, in which case it must do the conversation to bytearray itself, ensuring that no string value is generated for the Tcl_Obj. Furthermore, such a package must carefully document how it transforms values, and instruct the user on using exactly the same transformation regime for any value that the user places directly into a SQL script.

In practice, this needless dichotomy between text and blob makes things complicated enough that I don't know how to to proceed with the development of this Tcl package, which provides an API for a generic tree structure but also allows users to access tables directly, depending on their needs. It isn't far-fetched at all that a cryptographic hash value in Tcl might not have a bytearray internal representation, or that the bytearray internal representation might be accompanied by a string representation. One inadvertent mistake by a developer somewhere in their own code on a project that uses this package, and the database will end up with some binary data stored as text, some as blob, and queries will not return complete results. Such mistakes could easily elude detection because they depend on internal state of a Tcl value, which shouldn't be detectable at all at script level.

Ideally, such a package wouldn't have to be in the business of second-guessing SQLite data typing at all. Tcl elegantly handles binary data by treating it as Unicode data that happens to be constrained to the characters at code points 0 through 255, while internally maintaining a more compact and efficient single-byte representation of the data. SQLite should be able to do the same thing, but currently, for the purpose of making a comparison with text, SQLite considers the blob data to be encoded in the encoding of the database. Is there someone out there who needs that? Such data is better served by the text data type. It shouldn't be a performance concern because either way some sort of "decoding" from blob to text must be done.

If SQLite rules were a little different, i.e. if for the sake of a comparison with text, each byte of a blob was interpreted as the character having the corresponding code point, the distinction between blob and text would melt away, providing a perfect experience from a Tcl perspective. Is there some chance that a setting governing this of behaviour could be introduced, along with the unification of blob and text described above, so that a package such the one described might have a chance at providing a sane user experience?

-- Poor Yorick

(2) By Poor Yorick (pooryorick) on 2022-02-22 14:42:47 in reply to 1 [source]

set v "" = bytearray object?, comp.lang.tcl, 2005-06-21, is a previous report and discussion of the issue.