SQLite Forum

Why is utf8 IN char* but utf8 OUT is unsigned char*
Login
I noticed in the sqlite3.c code that utf8 IN is accepted as char*
e.g. int sqlite3_bind_text(sqlite3_stmt*,int,const char*,int,void(*)(void*));
whereas utf8 OUT is returned as
e.g. const unsigned char *sqlite3_column_text(sqlite3_stmt*, int iCol);

Within the code the char* IN is converted to unsigned char* and then manipulated. Can anyone explain to me why? I think I know how utf8 works but I'm unsure as to why the conversion is neccessary.

e.g. this define plays a big part in the sqlite3.c utf8 code (uz is the unsigned char* conversion of the recveived char* z)

#define SQLITE_SKIP_UTF8(uz) { 		    	\
  if( (*(uz++))>=0xc0 ){             	        \
	while( (*uz & 0xc0)==0x80 ){ uz++; }    \
  }                                             \
}

Couldn't the conversion to unsigned be avoided and the SKIP macro replaced with

do z++; while (*z<-64)

I know the latter doesn't check that the initial byte indicates a UTF8 char follows but does it matter? I mean if there's embedded chars < -64 without the leading utf8 byte is it not a malformed utf8 string in any case? If it did matter you could always change it to

if ((*++z & 192) == 192) while (*++z<-64);

(In the above I'm assuming -i & i always returns i for all compilers). 

I'm just wondering what I'm missing about utf8.