ext/fts5/fts5_tokenize: not handle tokens that contain embedded nul characters

(1) By Xiaohui Zhang (zxh0420) on 2020-10-26 01:27:21 [source]

In commit 95dca8d0c, fts5TriTokenize() in ext/fts5/fts5_tokenize.c was patched to prevent the trigram tokenizer from returning tokens that contain embedded nul characters. There is similar logic in fts5UnicodeTokenize(), so I think there should be a check on iCode after READ_UTF8() too.

    while( 1 ){
      if( zCsr>=zTerm ) goto tokenize_done;
      if( *zCsr & 0x80 ) {
        /* A character outside of the ascii range. Skip past it if it is
        ** a separator character. Or break out of the loop if it is not. */
        is = zCsr - (unsigned char*)pText;
        READ_UTF8(zCsr, zTerm, iCode);
        if( fts5UnicodeIsAlnum(p, iCode) ){
          goto non_ascii_tokenchar;
        }
      }else{
        if( a[*zCsr] ){
          is = zCsr - (unsigned char*)pText;
          goto ascii_tokenchar;
        }
        zCsr++;
      }
    }

(2) By Dan Kennedy (dan) on 2020-10-26 13:28:27 in reply to 1 [link] [source]

Thanks for reporting this. A unicode61 tokenizer configured to treat unicode "control-characters" (class Cc), was treating embedded nul characters as tokens. Which causes all manner of problems. Now fixed here:

https://sqlite.org/src/info/b7b7bde9b7a03665

(3) By Richard Hipp (drh) on 2022-08-26 13:56:14 in reply to 2 [link] [source]

To clarify: This error is not a vulnerability. No memory errors or other exploit opportunities occur due to this error. The error causes FTS5 to sometimes return an incorrect result. But the error cannot be exploited by an attacker to compromise the system, to our knowledge.

This clarification is added because a third-party has written a (bogus) CVE against this error.

(4) By anonymous on 2022-09-12 14:43:20 in reply to 3 [link] [source]

Hi Richard,

I am developer, working in company that uses SQLite. Currently, we are using SQLite 3.23.1. Meld reported CVE-2021-20223 issue to us. I am trying to implement patch on this version, but it is not so straightforward because it is quit old version in regards to the version for which the patch is already implemented and lots of things had changed in between.

I saw this comment of yours and wondering, if this is not vulnerability, then patch is not necessary. Can you please confirm this? Since I am not aware of your role, I am wondering, can I rely on your comment.

Thanks in advance!

Robert

(5) By Stephan Beal (stephan) on 2022-09-12 14:47:52 in reply to 4 [link] [source]

Can you please confirm this? Since I am not aware of your role, I am wondering, can I rely on your comment.

Richard Hipp is this project's main architect and lead developer. His assessment about this being a non-vulnerability is the most authoritative one you can possibly get.

(6) By anonymous on 2022-09-13 06:20:45 in reply to 5 [link] [source]

Hi Stephan,

Well, that's the answer I was looking for :). Thanks a lot for quick response!

Best regards, Robert