Documentation Source Text

Check-in [3bf90e0caa]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Enhancements and updates to the discussion of fuzzing in the testing document.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 3bf90e0caaaa351c6fea10d13659b20f078abacd5e41993b95e98d5aca432d8d
User & Date: drh 2019-12-21 17:23:35.167
Context
2019-12-24
02:33
Document the fact that the R-Tree extension is not capable of doing simultaneous reads and writes. (check-in: a8afb2f905 user: drh tags: trunk)
2019-12-21
17:23
Enhancements and updates to the discussion of fuzzing in the testing document. (check-in: 3bf90e0caa user: drh tags: trunk)
15:22
Enhance CLI documentation to talk more about reading ZIP archives as database files. (check-in: e92fb43396 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/testing.in.
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382



383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
wildly nonsensical SQL statements and feeding them to SQLite to see
what it will do with them.  Usually some kind of error is returned
(such as "no such table").  Sometimes, purely by chance, the SQL
statement also happens to be semantically correct.  In that case, the
resulting prepared statement is run to make sure it gives a reasonable
result.</p>

<p>The SQL fuzz generator tests are part of the TCL test suite.
During a full test run, about <tcl>KB {$stat(nSqlFuzz)}</tcl> 
thousand fuzz SQL statements are
generated and tested.</p>

<tcl>hd_fragment aflfuzz {American Fuzzy Lop fuzzer} {AFL}</tcl>
<h3>SQL Fuzz Using The American Fuzzy Lop Fuzzer</h3>




<p>The <a href="http://lcamtuf.coredump.cx/afl/">American Fuzzy Lop</a>
or "AFL" fuzzer is a recent (circa 2014) innovation from Michal Zalewski.
Unlike most other fuzzers that blindly generate random inputs, the AFL
fuzzer instruments the program being tested (by modifying the assembly-language
output from the C compiler) and uses that instrumentation to detect when
an input causes the program to do something different - to follow
a new control path or loop a different number of times.  Inputs that provoke
new behavior are retained and further mutated.  In this way, AFL is able
to "discover" new behaviors of the program under test, including behaviors
that were never envisioned by the designers.

<p>AFL has proven remarkably adept at finding arcane bugs in SQLite.
Most of the findings have been assert() statements where the conditional
was false under obscure circumstances.  But AFL has also found
a fair number of crash bugs in SQLite, and even a few cases where SQLite 
computed incorrect results.

<p>Because of its past success, AFL became a standard part of the testing
strategy for SQLite beginning with [version 3.8.10] ([dateof:3.8.10]).  
Both SQL statements and database files are fuzzed.
Billions and billions of mutations have been tried, but AFL's 
instrumentation has narrowed them down to less than 50,000 test cases that
cover all distinct behaviors.  Newly discovered test cases are periodically
captured and added to the [TCL test suite] where they can be rerun using
the "make fuzztest" or "make valgrindfuzz" commands.

<p>Update:  As of SQLite [version 3.29.0] ([dateof:3.29.0]) the use of
AFL has been superceded by the new [dbsqlfuzz] fuzzer described below.

<tcl>hd_fragment ossfuzz {OSS Fuzz}</tcl>
<h3>Google OSS Fuzz</h3>

<p>Beginning in 2016, a team of engineers at Google started the
[https://github.com/google/oss-fuzz|OSS Fuzz] project.  
OSS Fuzz uses a AFL-style guided fuzzer running on Google's infrastructure.







<
<
<
<
<



>
>
>
|
<
|
|







|






|
<
<
<
<
<
<
<
|
<







368
369
370
371
372
373
374





375
376
377
378
379
380
381

382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398







399

400
401
402
403
404
405
406
wildly nonsensical SQL statements and feeding them to SQLite to see
what it will do with them.  Usually some kind of error is returned
(such as "no such table").  Sometimes, purely by chance, the SQL
statement also happens to be semantically correct.  In that case, the
resulting prepared statement is run to make sure it gives a reasonable
result.</p>






<tcl>hd_fragment aflfuzz {American Fuzzy Lop fuzzer} {AFL}</tcl>
<h3>SQL Fuzz Using The American Fuzzy Lop Fuzzer</h3>

<p>The concept of fuzz testing has been around for decades, but fuzz
testing was not an effective way to find bugs until 2014 when
Michal Zalewski invented the first practical profile-guided fuzzer,
<a href="http://lcamtuf.coredump.cx/afl/">American Fuzzy Lop</a> or "AFL".

Unlike prior fuzzers that blindly generate random inputs, AFL
instruments the program being tested (by modifying the assembly-language
output from the C compiler) and uses that instrumentation to detect when
an input causes the program to do something different - to follow
a new control path or loop a different number of times.  Inputs that provoke
new behavior are retained and further mutated.  In this way, AFL is able
to "discover" new behaviors of the program under test, including behaviors
that were never envisioned by the designers.

<p>AFL proved adept at finding arcane bugs in SQLite.
Most of the findings have been assert() statements where the conditional
was false under obscure circumstances.  But AFL has also found
a fair number of crash bugs in SQLite, and even a few cases where SQLite 
computed incorrect results.

<p>Because of its past success, AFL became a standard part of the testing
strategy for SQLite beginning with [version 3.8.10] ([dateof:3.8.10]) until







it was superceded by better fuzzers in [version 3.29.0] ([dateof:3.29.0]).


<tcl>hd_fragment ossfuzz {OSS Fuzz}</tcl>
<h3>Google OSS Fuzz</h3>

<p>Beginning in 2016, a team of engineers at Google started the
[https://github.com/google/oss-fuzz|OSS Fuzz] project.  
OSS Fuzz uses a AFL-style guided fuzzer running on Google's infrastructure.
436
437
438
439
440
441
442
443
444
445








446


447








448

















449
450
451
452
453
454
455
456
457
458
459
460
461



































462
463
464
465
466
467
468
[https://github.com/google/fuzzer-test-suite/blob/master/tutorial/structure-aware-fuzzing.md|Structure-Aware Mutator]
on a specialized input file that defines both an input database and SQL
text to be run against that database. Because it mutates both the input
database and the input SQL at the same time, dbsqlfuzz has been able to
find some obscure faults in SQLite that were missed by prior fuzzers that
mutated only SQL inputs or only the database file.

<p>As of this writing (2019-07-16), the SQLite developers have stopped
using AFL for routine testing and instead are focused on running
dbsqlfuzz.  At least one instance of dbsqlfuzz is running on the latest








SQLite source code at all times, in order to catch any new problems that 


might be introduced into the source tree as features are added and routine








maintenance is performed.


















<tcl>hd_fragment fuzzcheck fuzzcheck</tcl>
<h3>The fuzzcheck test harness</h3>

<p>Historical test cases from [AFL], [OSS Fuzz], and [dbsqlfuzz] are
collected in a set of database files in the main SQLite source tree
and then rerun by the "fuzzcheck" utility program whenever one runs
"make test".  Fuzzcheck only runs a few thousand "interesting" cases
out of the hundreds of millions of cases that the various fuzzers have
examined over the years.  "Interesting" cases are cases that exhibit
previously unseen behavior.  Actual bugs found by fuzzers are always
included among the interesting test cases, but most of the cases run
by fuzzcheck were never actual bugs. 




































<h2>Malformed Database Files</h2>

<p>There are numerous test cases that verify that SQLite is able to
deal with malformed database files.
These tests first build a well-formed database file, then add
corruption by changing one or more bytes in the file by some means







|
<
|
>
>
>
>
>
>
>
>
|
>
>
|
>
>
>
>
>
>
>
>
|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>













>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







425
426
427
428
429
430
431
432

433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
[https://github.com/google/fuzzer-test-suite/blob/master/tutorial/structure-aware-fuzzing.md|Structure-Aware Mutator]
on a specialized input file that defines both an input database and SQL
text to be run against that database. Because it mutates both the input
database and the input SQL at the same time, dbsqlfuzz has been able to
find some obscure faults in SQLite that were missed by prior fuzzers that
mutated only SQL inputs or only the database file.

<p>The SQLite developers usually leave an instance or two of dbsqlfuzz

running on the latest trunk code of SQLite whenever they are away from
the office for an extended period, such as overnight.  The dbsqlfuzz
fuzzer has proven so effective that OSSFuzz has ceased to find new
problems in SQLite, except for cases when a new bug is introduced
and OSSFuzz finds it overnight before dbsqlfuzz has had a chance to
run.  (Examples:
[https://www.sqlite.org/src/timeline?y=ci&c=c422afb507dc8757|&#91;1&#93;]
[https://www.sqlite.org/src/timeline?y=ci&c=0a2eb949f8a759e5|&#91;2&#93;]
[https://www.sqlite.org/src/timeline?y=ci&c=62f2235adf796c72|&#91;3&#93;])

<tcl>hd_fragment 3pfuzz {3rd-party fuzzers}</tcl>
<h3>Other third-party fuzzers</h3>

<p>SQLite seems to be a popular target for third-parties to fuzz.
The developers hear about many attempts to fuzz SQLite
and they do occasionally get bug reports found by independent
fuzzers.  All such reports are promptly fixed, so the product is
improved and that the entire SQLite user community benefits.
This mechanism of having many independent testers is similar to
[https://en.wikipedia.org/wiki/Linus%27s_law|Linus's law]:
"given enough eyeballs, all bugs are shallow".

<p>One fuzzing researcher of particular note is 
<a href="https://www.manuelrigger.at/">Manuel Rigger</a>, currently
(as this paragraph is written on 2019-12-21)
at <a href="https://ethz.ch/en.html">ETH Zurich</a>.
Most fuzzers only look for assertion faults, crashes, undefined behavior (UB),
or other easily detected anomalies.  Dr. Rigger's fuzzers, on the other hand,
are able to find cases where SQLite computes an incorrect answer.
Rigger has found
[https://www.sqlite.org/src/timeline?y=t&u=mrigger&n=all|many such cases].
Most of these finds are fairly obscure corner cases involving type
conversions and affinity transformations, and a good number of the finds
are against unreleased features.  Nevertheless, his finds are still important
as they are real bugs,
and the SQLite developers are grateful to be able to identify and fix
the underlying problems.  Rigger's work is currently unpublished.  When it
is released, it could be as influential as Zalewski's invention of AFL
and profile-guided fuzzing.

<tcl>hd_fragment fuzzcheck fuzzcheck</tcl>
<h3>The fuzzcheck test harness</h3>

<p>Historical test cases from [AFL], [OSS Fuzz], and [dbsqlfuzz] are
collected in a set of database files in the main SQLite source tree
and then rerun by the "fuzzcheck" utility program whenever one runs
"make test".  Fuzzcheck only runs a few thousand "interesting" cases
out of the hundreds of millions of cases that the various fuzzers have
examined over the years.  "Interesting" cases are cases that exhibit
previously unseen behavior.  Actual bugs found by fuzzers are always
included among the interesting test cases, but most of the cases run
by fuzzcheck were never actual bugs. 

<tcl>hd_fragment tension {coverage testing vs. fuzz testing}</tcl>
<h3>Tension Between Fuzz Testing And 100% MC/DC Testing</h3>

<p>Fuzz testing and [MC/DC|100% MC/DC testing] are in tension with
one another.
That is to say, code tested to 100% MC/DC will tend to be
more vulnerable to problems found by fuzzing and code that performs
well during fuzz testing will tend to have (much) less than 
100% MC/DC.
This is because MC/DC testing discourages [defensive code] with
unreachable branches, but without defensive code, a fuzzer is
more likely to find a path that causes problems.  MC/DC testing
seems to work well for building code that is robust during typical
normal use, whereas fuzz testing is good for building code that is
robust against malecious attack.

<p>Of course, users would prefer code that is both robust in normal
use and resistent to malcious attack.  The SQLite developers are
dedicated to providing that.  The purpose of this section is merely
to point out that doing both at the same time is hard.

<p>For much of its history SQLite has been focused on 100% MC/DC testing.
Resistence to fuzzing attacks only became a concern with the introduction
of AFL in 2014.  For a while there, fuzzers were finding many problems
in SQLite.  In more recent years, the testing strategy of SQLite has
evolved to place more emphasis on fuzz testing.  We still maintain
100% MC/DC of the core SQLite code, but most testing CPU cycles are 
now devoted to fuzzing.

<p>While fuzz testing and 100% MC/DC testing are in tension, they
are not completely at cross-purposes.  The fact that the SQlite test
suite does test to 100% MC/DC means that when fuzzers do find problems,
those problems can be fixed quickly and with little risk of introducing
new errors.

<h2>Malformed Database Files</h2>

<p>There are numerous test cases that verify that SQLite is able to
deal with malformed database files.
These tests first build a well-formed database file, then add
corruption by changing one or more bytes in the file by some means
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
<p>Any one of the above test cases would provide 100% statement coverage
but all three are required for 100% branch coverage.  Generally speaking,
100% branch coverage implies 100% statement coverage, but the converse is
not true.  To reemphasize, the
[TH3] test harness for SQLite provides the stronger form of
test coverage - 100% branch test coverage.</p>

<tcl>hd_fragment defensivecode</tcl>
<h2>Coverage testing of defensive code</h2>

<p>A well-written C program will typically contain some defensive
conditionals which in practice are always true or always false.
This leads to a 
programming dilemma:  Does one remove defensive code in order to obtain
100% branch coverage?</p>







|







632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
<p>Any one of the above test cases would provide 100% statement coverage
but all three are required for 100% branch coverage.  Generally speaking,
100% branch coverage implies 100% statement coverage, but the converse is
not true.  To reemphasize, the
[TH3] test harness for SQLite provides the stronger form of
test coverage - 100% branch test coverage.</p>

<tcl>hd_fragment defcode {defensive code}</tcl>
<h2>Coverage testing of defensive code</h2>

<p>A well-written C program will typically contain some defensive
conditionals which in practice are always true or always false.
This leads to a 
programming dilemma:  Does one remove defensive code in order to obtain
100% branch coverage?</p>