In commit 95dca8d0c, `fts5TriTokenize()` in ext/fts5/fts5_tokenize.c was patched to prevent the trigram tokenizer from returning tokens that contain embedded nul characters. There is similar logic in `fts5UnicodeTokenize()`, so I think there should be a check on `iCode` after `READ_UTF8()` too. ```c while( 1 ){ if( zCsr>=zTerm ) goto tokenize_done; if( *zCsr & 0x80 ) { /* A character outside of the ascii range. Skip past it if it is ** a separator character. Or break out of the loop if it is not. */ is = zCsr - (unsigned char*)pText; READ_UTF8(zCsr, zTerm, iCode); if( fts5UnicodeIsAlnum(p, iCode) ){ goto non_ascii_tokenchar; } }else{ if( a[*zCsr] ){ is = zCsr - (unsigned char*)pText; goto ascii_tokenchar; } zCsr++; } } ```