SQLite Forum

Timeline
Login

20 forum posts by user gisbert

2020-12-14
12:06 Reply: SQLite3.EXE (artifact: 54ac193056 user: gisbert)

It seems a lot of backslashes in the Perl code were lost in transit.

2020-05-04
12:39 Reply: Feature request: Dynamically execute SQL from shell (artifact: acb0fdb3e4 user: gisbert)

I may have misunderstood the question, but it seems all this (and more) should be possible using the scripting language of your choice to script the shell (as Simon has already suggested), even without any SQLite extensions. Or am I missing the point?

2020-04-12
21:46 Reply: 3.31.1: docs as PDF? (artifact: eb96b65d3d user: gisbert)

Depending in how hard you want it, you can use pandoc (as described above) to convert to texinfo and then use the TeX package texi2roff to convert to troff. (Disclaimer: I haven't tried this and am unlikely to ever do so :-) )

2020-04-11
22:17 Reply: 3.31.1: docs as PDF? (artifact: 7160098d60 user: gisbert)

Not necessarily. Zip files have the so-called central directory at the end, indeed, but they have also local directory entries directly before the individual members, so readily available when streaming. For this purpose, the central directory is redundant. (At least, when I looked last time, which is nearly 25 years ago.)

However, .tar.gz usually has higher compression rates, especially when including lots of small files, because they compress the (tar) stream as a whole, whereas zip compresses individual members, leading to higher overhead.

So, yes, .tar.gz would be the better option. And .tar.bz2 and .tar.lz even better.

2020-04-07
21:59 Edit reply: 3.31.1: docs as PDF? (artifact: 65e8aaf338 user: gisbert)

For what it's worth, here are two pretty straightforward Perl scripts that will amalgamate the SQLite doc structure and produce a PDF file of about 3500 pages.

The first script contains usage hints in the comment section near the top. The second script is only necessary if you, like myself, prefer to have 7-bit clean input for LaTeX. Obviously, there is a lot of room for tweaking the scripts and the pandoc template according to your taste.


#! /usr/bin/perl
#
# File name: pre.pl
#
# Amalgamate and convert the SQLite documentation to PDF format
# (with the help of pandoc and TeX)
#
# Simple usage:
# perl pre.pl docs.html *.html c3ref/*.html session.html syntax/*.html   \
#      releaselog/*.html                                               | \ 
#   pandoc --to pdf --pdf-engine xelatex -o sqlitebook.pdf
# (Start this in the SQLite doc base directory.)
#
# perl pre.pl docs.html *.html c3ref/*.html session.html syntax/*.html   \
#      releaselog/*.html                                               | \ 
#   pandoc --standalone --from html --to latex --table-of-contents       \
#   --toc-depth 3 --number-sections --highlight-style tango              \
#   --variable documentclass=scrreprt --variable fontsize=10pt           \
#   --variable colorlinks=yes                                            \
#   --variable titlegraphic="images/sqlite370_banner.png"  |             \
#   perl post.pl > sqlitebook.tex
# pdflatex sqlitebook.tex    # repeat twice! no xelatex needed!
# (Start this in the SQLite doc base directory.)
#
# Preparatory steps:
# (1) Once only: get
#     https://www.fossil-scm.org/fossil/doc/trunk/www/xkcd-git.gif
#     and put it into the images/ folder. E.g.,
#       wget ... --output-document=images/xkcd-git.gif
# (2) Once only: if necessary, install pandoc, TeX, and possibly ImageMagick
# (3) Once only: if necessary, install the HTML::TreeBuilder Perl through
#       cpan install HTML::TreeBuilder
# (2) Once, but may need updates from time to time: convert all GIF images
#     in the images/ folder and its subfolders to PNG
#     (e.g., ImageMagick will do the trick:  magic foo.gif foo.png  )
#
# Bundling all processing steps into a Makefile is left as an exercise for
# the reader.
#
# TapirSoft Gisbert W. Selke 2020-04-07.
#
# This software may be used under the terms of the Artistic License, i.e.,
# under the same terms as Perl itself:
# https://dev.perl.org/licenses/artistic.html
#

use strict;
use warnings;
use HTML::TreeBuilder;

our $VERSION = '0.1.1';

my( %file_seen, $master );

for my $fn( glob( join( ' ', @ARGV ) ) ) {
  # process all our files in order, but each filename once only

  next if $file_seen{$fn};
  $file_seen{$fn}++;
  print STDERR "$fn...\n";

  # Build the DOM tree of this file:
  open( my $fh, '<:utf8', $fn ) || die "Cannot open '$fn' for reading: $!";
  my $tree = HTML::TreeBuilder->new( );
  $tree->parse_file($fh);
  $tree->elementify( );

  # Essential steps:
  fix_references( $tree, $fn );
  fix_headers( $tree );

  # Nice to have: remove navigation elements
  $_->delete( ) for $tree->look_down( 'class', 'nosearch' );
  # remove script elements (just because we can)
  $_->delete( ) for $tree->find( 'script' );

  # Add a comment near the beginning indicating which file this content is
  # coming from, and an anchor so that we can link here:
  $tree->find('body')->unshift_content(
    HTML::Element->new( 'br' ),
    HTML::Element->new( '~comment', 'text' => 'start of input file ' . $fn ),
    HTML::Element->new( 'a', 'id' => $fn . '__' )
  );

  if ( $master ) {
    # Append the contents of the body of this document to the master element
    $master->find('body')->push_content( $tree->find('body')->content_list( ) );
  } else {
    # Our first document serves as the master container
    $master = $tree;
  }

}

print $master->as_HTML( undef, '  ', { } ); # safe, human-readable, clean tag structure


sub fix_references {
  # Fix links and anchors so that they work within the amalgamated document

  my( $tree, $fn ) = @_;

  # find names and ids and disambiguate them, using our file name as prefix:
  fix_names( [ $tree->descendants() ], $fn, 'name' );
  fix_names( [ $tree->descendants() ], $fn, 'id' );
  # find links to other documents and change to disambiguated names:
  fix_hrefs( [ $tree->find('a') ] , $fn, 'href' );
  # fix images:
  fix_images( [ $tree->find('img') ], $fn, 'src' );

  return $tree;
} ## end sub fix_references


sub fix_names {
  # Disambiguate the anchor names and ids in this document

  my( $elems, $fn, $attrname ) = @_;
  for my $elem ( grep { defined $_->attr($attrname) } @{ $elems } ) {
    # walk through all nodes and fix them
    $elem->attr( $attrname, $fn . '__' . $elem->attr($attrname) );
  }

  return $elems;
} ## end sub fix_names


sub fix_hrefs {
  # Change links so that they work with the disambiguated anchors

  my( $elems, $fn, $attrname ) = @_;
  for my $elem ( grep { defined $_->attr($attrname) } @{ $elems } ) {
    # walk through all these nodes
    my $label = $elem->attr($attrname);
    if ( $label =~ m|^https?://| ) {
      # external link, leave unchanged
    } elsif ( $label =~ m|^#(.+)$| ) {
      # internal link to an anchor within this document => prepend our name
      $elem->attr( $attrname, '#' . $fn . '__' . $1 );
    } elsif ( $label =~ m|^[./]*(.+)#(.+)$| ) {
      # relative link to a different file of ours,  with an anchor therein.
      # remember to flatten relative paths.
      $elem->attr( $attrname, '#' . $1 . '__' . $2 );
    } elsif ( $label =~ m|^[./]*(.+)$| ) {
      # relative link to a different file of ours,  with no anchor therein.
      # remember to flatten relative paths.
      $elem->attr( $attrname, '#' . $1 . '__' );
    } else {
      warn "file $fn: strange $attrname: '$label'";
    }
  }

  return $elems;
} ## end sub fix_hrefs


sub fix_images {
  # Change references to GIF to PNG. Clean up relative paths.
  # Make <img> within <dl> nicer.

  my( $elems, $fn, $attrname ) = @_;
  for my $elem ( grep { defined $_->attr($attrname) } @{ $elems } ) {
    # walk through all these nodes
    my $label = $elem->attr($attrname);
    if ( $label =~ m|^https?://| ) {
      # external link, leave unchanged, but handle the special case of an xkcd
      # comic that is not ordinarily included with the docs bundle but is
      # referenced directly from the sqlite fossil site. We require this to be
      # downloaded separately and placed in the images/ folder.
      if ( lc($label) eq
           'https://www.fossil-scm.org/fossil/doc/trunk/www/xkcd-git.gif' ) {
        $elem->attr( $attrname, 'images/xkcd-git.png' )
      }
    } else {
      # relative link to a different file of ours
      $label =~ s|^[./]+||; # flatten relative paths
      # change .gif to .png.
      # Conversion of gif file to png files must be done outside this script,
      # probably using ImageMagick ("magic picname.gif picname.png").
      $label =~ s|\.gif$|.png|i;
      $elem->attr( $attrname, $label );
    }

    # Prepend <br> to <img> within <p> within <dd> to produce nicer results
    if ( ( lc( $elem->parent( )->tag( ) ) eq 'p' ) &&
         ( lc( $elem->parent( )->parent( )->tag( ) ) eq 'dd' ) &&
         ( !$elem->left( ) ) &&
         ( !$elem->parent( )->left( ) )
       ) {
      $elem->preinsert( HTML::Element->new( 'br' ) );
    }

  }

  return $elems;
} ## end sub fix_images


sub fix_headers {
  # remove explicit numbering from <h...> headers because latex will add its own.

  my( $tree ) = @_;
  for my $level (1..6) {
    for my $node (  $tree->find( "h$level" ) ) {
      $node->objectify_text( );
      my $textnode = $node->find( '~text' );
      if ( $textnode ) {
        my $text = $textnode->attr( 'text' );
        if ( $text =~ m|^\s*[0-9.]+\s+| ) {
          $text =~ s|^\s*[0-9.]+\s*||;
          $textnode->attr( 'text', $text );
        }
      }
      $node->deobjectify_text( );
    }
  }

  return $tree;
} ## end sub fix_headers

And here's the second script:


#! /usr/bin/perl
#
# File name: post.pl
#
# Convert UTF-8 characters in the output of pandoc to clean 7-bit ASCII
# characters by recoding as LaTeX macros, this obviating the need to
# use xelatex.
#
# For usage hints see pre.pl
#
# TapirSoft Gisbert W. Selke 2020-04-07.
#
# This software may be used under the terms of the Artistic License, i.e.,
# under the same terms as Perl itself:
# https://dev.perl.org/licenses/artistic.html
#

use strict;
use warnings;

our $VERSION = '0.1.1';

binmode(STDIN, ':encoding(UTF-8)');

while (<>) {
  s/\N{U+03c0}/\\ensuremath{\\pi}/g;                # pi
  s/\N{U+2007}/\\hspace*{1em}/g;                    # fixed space
  s/\N{U+2190}/\\ensuremath{\\leftarrow}/g;         # <-
  s/\N{U+2191}/\\ensuremath{\\uparrow}/g;           # ^
  s/\N{U+2192}/\\ensuremath{\\rightarrow}/g;        # ->
  s/\N{U+21d2}/\\ensuremath{\\Rightarrow}/g;        # =>
  s/\N{U+2260}/\\ensuremath{\\neq}/g;               # !=
  s/\N{U+2265}/\\ensuremath{\\geq}/g;               # >=
  s/\N{U+2588}/\\rule{1em}{1ex}/g;                  # little black box
  s/\N{U+25ba}/\\ensuremath{\\bigtriangledown}/g;   # down-pointing triangle   (might use dingbat instad)
  s/\N{U+25bc}/\\ensuremath{\\rhd}/g;               # right-ppointing triangle (might use dingbat instad)
  s/\N{U+2714}/\\checkmark{}/g;                     # check mar, (might use dingbat instead)
  s/\N{U+00a0}/~/g;                                 # non-breaking space
  s/\N{U+00b1}/\\ensuremath{\\pm}/g;                # +/-
  s/\N{U+00b2}/\\ensuremath{^2}/g;                  # subscript 2
  s/\N{U+00b9}/\\ensuremath{^1}/g;                  # subscript 1
  s/\N{U+00c0}/\\`{A}/g;                            # A acute
  s/\N{U+00c2}/\\^{A}/g;                            # A circumflex
  s/\N{U+00c3}/\\~{A}/g;                            # A tilde
  s/\N{U+00c6}/\\AE{}/g;                            # AE ligature
  s/\N{U+00d7}/\\ensuremath{\\times}/g;             # multiplication sign
  s/\N{U+00df}/\\ss{}/g;                            # esszet
  s/\N{U+00e0}/\\`{a}/g;                            # a acute
  s/\N{U+00e1}/\\'{a}/g;                            # a gravis
  s/\N{U+00e2}/\\^{a}/g;                            # a circumflex
  s/\N{U+00e3}/\\~{a}/g;                            # a tilde
  s/\N{U+00e4}/\\"{a}/g;                            # a umlaut
  s/\N{U+00e6}/\\ae{}/g;                            # ae ligature
  s/\N{U+00fe}/\\th{}/g;                            # thorn

  print;
}

20:13 Reply: Trivial typo in the documentation (artifact: 3b8dd9552d user: gisbert)

Yes, in fact some have escaped the cleaning process, viz., multiple instances of &sup1 and &sup2 and &sup3.

They seem to come up always together, and the affected files are, based on the HTML files online at sqlite.org as of now (2020-04-07 20:00 UTC):

appfunc.html
c3ref\constlist.html
c3ref\context.html
c3ref\funclist.html
c3ref\objlist.html
c3ref\sqlite3.html
c3ref\stmt.html
c3ref\str.html
c3ref\value.html
capi3ref.html
lang.html
lang_aggfunc.html
lang_corefunc.html
pragma.html
rescode.html
session.html
session\changeset_iter.html
session\constlist.html
session\funclist.html
session\objlist.html
session\session.html
syntax.html

Thank you for caring!

19:26 Reply: Trivial typo in the documentation (artifact: 10e71583d8 user: gisbert)

You're incredible! That message came within half an hour after my post.

Thank you for all that incredible struggle for perfection, benefitting all of us!

Will re-run the check, of course.

16:58 Reply: shell's .load is pathSep picky (artifact: 110ccf5653 user: gisbert)

Yes, I agree. But the OP's point was not that the Windows command shell should be more accepting, but that the rest of the world should accept Windows (command shell) conventions.

16:50 Reply: 3.31.1: docs as PDF? (artifact: 1d9d14844a user: gisbert)

Your wish has been granted: pandoc can output to three different brands of epub and about 50 other formats. Just replace the value for the --to parameter in the sample call described near the top of the script.

16:45 Reply: 3.31.1: docs as PDF? (artifact: b2d8f6d31c user: gisbert)

I'm not sure whether I have understood your wish correctly, but it sounds like a suitable CGI script (or whatever your web server's preferred way of running scripts is) should be able to extract something from a zip archive and stream it to the client without littering the place. That would work either with a remote server or with an ad hoc local server process, similar to docapp described below by drh.

2020-04-06
18:43 Edit reply: 3.31.1: docs as PDF? (artifact: fd31944641 user: gisbert)

For what it's worth, here are two pretty straightforward Perl scripts that will amalgamate the SQLite doc structure and produce a PDF file of about 3400 pages.

The first script contains usage hints in the comment section near the top. The second script is only necessary if you, like myself, prefer to have 7-bit clean input for LaTeX. Obviously, there is a lot of room for tweaking the scripts and the pandoc template according to your taste.


#! /usr/bin/perl
#
# File name: pre.pl
#
# Amalgamate and convert the SQLite documentation to PDF format
# (with the help of pandoc and TeX)
#
# Simple usage:
# perl pre.pl docs.html *.html c3ref/*.html session.html syntax/*.html | \
#   pandoc --to pdf --pdf-engine xelatex -o sqlitebook.pdf
# (Start this in the SQLite doc base directory.)
#
# More flexible usage:
# perl pre.pl docs.html *.html c3ref/*.html session.html syntax/*.html | \
#   pandoc --standalone --from html --to latex --table-of-contents       \
#   --toc-depth 3 --number-sections --highlight-style tango              \
#   --variable documentclass=scrreprt --variable fontsize=10pt           \
#   --variable colorlinks=yes                                            \
#   --variable titlegraphic="images/sqlite370_banner.png"  |             \
#   perl post.pl > sqlitebook.tex
# pdflatex sqlitebook.tex    # repeat twice! no xelatex needed!
# (Start this in the SQLite doc base directory.)
#
# Preparatory steps:
# (1) Once only: get
#     https://www.fossil-scm.org/fossil/doc/trunk/www/xkcd-git.gif
#     and put it into the images/ folder. E.g.,
#       wget ... --output-document=images/xkcd-git.gif
# (2) Once only: if necessary, install pandoc, TeX, and possibly ImageMagick
# (3) Once only: if necessary, install the HTML::TreeBuilder Perl through
#       cpan install HTML::TreeBuilder
# (2) Once, but may need updates from time to time: convert all GIF images
#     in the images/ folder and its subfolders to PNG
#     (e.g., ImageMagick will do the trick:  magic foo.gif foo.png  )
#
# Bundling all processing steps into a Makefile is left as an exercise for
# the reader.
#
# TapirSoft Gisbert W. Selke 2020-04-06.
#
# This software may be used under the terms of the Artistic License, i.e.,
# under the same terms as Perl itself:
# https://dev.perl.org/licenses/artistic.html
#

use strict;
use warnings;
use HTML::TreeBuilder;

our $VERSION = '0.1.0';

my( %file_seen, $master );

for my $fn( glob( join( ' ', @ARGV ) ) ) {
  # process all our files in order, but each filename once only

  next if $file_seen{$fn};
  $file_seen{$fn}++;
  print STDERR "$fn...\n";

  # Build the DOM tree of this file:
  open( my $fh, '<:utf8', $fn ) || die "Cannot open '$fn' for reading: $!";
  my $tree = HTML::TreeBuilder->new( );
  $tree->parse_file($fh);
  $tree->elementify( );

  # Essential steps:
  fix_references( $tree, $fn );
  fix_headers( $tree );

  # Nice to have: remove navigation elements
  $_->delete( ) for $tree->look_down( 'class', 'nosearch' );
  # remove script elements (just because we can)
  $_->delete( ) for $tree->find( 'script' );

  # Add a comment near the beginning indicating which file this content is
  # coming from, and an anchor so that we can link here:
  $tree->find('body')->unshift_content(
    HTML::Element->new( 'br' ),
    HTML::Element->new( '~comment', 'text' => 'start of input file ' . $fn ),
    HTML::Element->new( 'a', 'id' => $fn . '__' )
  );

  if ( $master ) {
    # Append the contents of the body of this document to the master element
    $master->find('body')->push_content( $tree->find('body')->content_list( ) );
  } else {
    # Our first document serves as the master container
    $master = $tree;
  }

}

print $master->as_HTML( undef, '  ', { } ); # safe, human-readable, clean tag structure


sub fix_references {
  # Fix links and anchors so that they work within the amalgamated document

  my( $tree, $fn ) = @_;

  # find names and ids and disambiguate them, using our file name as prefix:
  fix_names( [ $tree->descendants() ], $fn, 'name' );
  fix_names( [ $tree->descendants() ], $fn, 'id' );
  # find links to other documents and change to disambiguated names:
  fix_hrefs( [ $tree->find('a') ] , $fn, 'href' );
  # fix images:
  fix_images( [ $tree->find('img') ], $fn, 'src' );

  return $tree;
} ## end sub fix_references


sub fix_names {
  # Disambiguate the anchor names and ids in this document

  my( $elems, $fn, $attrname ) = @_;
  for my $elem ( grep { defined $_->attr($attrname) } @{ $elems } ) {
    # walk through all nodes and fix them
    $elem->attr( $attrname, $fn . '__' . $elem->attr($attrname) );
  }

  return $elems;
} ## end sub fix_names


sub fix_hrefs {
  # Change links so that they work with the disambiguated anchors

  my( $elems, $fn, $attrname ) = @_;
  for my $elem ( grep { defined $_->attr($attrname) } @{ $elems } ) {
    # walk through all these nodes
    my $label = $elem->attr($attrname);
    if ( $label =~ m|^https?://| ) {
      # external link, leave unchanged
    } elsif ( $label =~ m|^#(.+)$| ) {
      # internal link to an anchor within this document => prepend our name
      $elem->attr( $attrname, '#' . $fn . '__' . $1 );
    } elsif ( $label =~ m|^[./]*(.+)#(.+)$| ) {
      # relative link to a different file of ours,  with an anchor therein.
      # remember to flatten relative paths.
      $elem->attr( $attrname, '#' . $1 . '__' . $2 );
    } elsif ( $label =~ m|^[./]*(.+)$| ) {
      # relative link to a different file of ours,  with no anchor therein.
      # remember to flatten relative paths.
      $elem->attr( $attrname, '#' . $1 . '__' );
    } else {
      warn "file $fn: strange $attrname: '$label'";
    }
  }

  return $elems;
} ## end sub fix_hrefs


sub fix_images {
  # Change references to GIF to PNG. Clean up relative paths.
  # Make <img> within <dl> nicer.

  my( $elems, $fn, $attrname ) = @_;
  for my $elem ( grep { defined $_->attr($attrname) } @{ $elems } ) {
    # walk through all these nodes
    my $label = $elem->attr($attrname);
    if ( $label =~ m|^https?://| ) {
      # external link, leave unchanged, but handle the special case of an xkcd
      # comic that is not ordinarily included with the docs bundle but is
      # referenced directly from the sqlite fossil site. We require this to be
      # downloaded separately and placed in the images/ folder.
      if ( lc($label) eq
           'https://www.fossil-scm.org/fossil/doc/trunk/www/xkcd-git.gif' ) {
        $elem->attr( $attrname, 'images/xkcd-git.png' )
      }
    } else {
      # relative link to a different file of ours
      $label =~ s|^[./]+||; # flatten relative paths
      # change .gif to .png.
      # Conversion of gif file to png files must be done outside this script,
      # probably using ImageMagick ("magic picname.gif picname.png").
      $label =~ s|\.gif$|.png|i;
      $elem->attr( $attrname, $label );
    }

    # Prepend <br> to <img> within <p> within <dd> to produce nicer results
    if ( ( lc( $elem->parent( )->tag( ) ) eq 'p' ) &&
         ( lc( $elem->parent( )->parent( )->tag( ) ) eq 'dd' ) &&
         ( !$elem->left( ) ) &&
         ( !$elem->parent( )->left( ) )
       ) {
      $elem->preinsert( HTML::Element->new( 'br' ) );
    }

  }

  return $elems;
} ## end sub fix_images


sub fix_headers {
  # remove explicit numbering from <h...> headers because latex will add its own.

  my( $tree ) = @_;
  for my $level (1..6) {
    for my $node (  $tree->find( "h$level" ) ) {
      $node->objectify_text( );
      my $textnode = $node->find( '~text' );
      if ( $textnode ) {
        my $text = $textnode->attr( 'text' );
        if ( $text =~ m|^\s*[0-9.]+\s+| ) {
          $text =~ s|^\s*[0-9.]+\s*||;
          $textnode->attr( 'text', $text );
        }
      }
      $node->deobjectify_text( );
    }
  }

  return $tree;
} ## end sub fix_headers

And here's the second script:


#! /usr/bin/perl
#
# File name: post.pl
#
# Convert UTF-8 characters in the output of pandoc to clean 7-bit ASCII
# characters by recoding as LaTeX macros, this obviating the need to
# use xelatex.
#
# For usage hints see pre.pl
#
# TapirSoft Gisbert W. Selke 2020-04-06.
#
# This software may be used under the terms of the Artistic License, i.e.,
# under the same terms as Perl itself:
# https://dev.perl.org/licenses/artistic.html
#

use strict;
use warnings;

our $VERSION = '0.1.0';

binmode(STDIN, ':encoding(UTF-8)');

while (<>) {
  s/\N{U+03c0}/\\ensuremath{\\pi}/g;                # pi
  s/\N{U+2007}/\\hspace*{1em}/g;                    # fixed space
  s/\N{U+2190}/\\ensuremath{\\leftarrow}/g;         # <-
  s/\N{U+2191}/\\ensuremath{\\uparrow}/g;           # ^
  s/\N{U+2192}/\\ensuremath{\\rightarrow}/g;        # ->
  s/\N{U+21d2}/\\ensuremath{\\Rightarrow}/g;        # =>
  s/\N{U+2260}/\\ensuremath{\\neq}/g;               # !=
  s/\N{U+2265}/\\ensuremath{\\geq}/g;               # >=
  s/\N{U+2588}/\\rule{1em}{1ex}/g;                  # little black box
  s/\N{U+25ba}/\\ensuremath{\\bigtriangledown}/g;   # down-pointing triangle   (might use dingbat instad)
  s/\N{U+25bc}/\\ensuremath{\\rhd}/g;               # right-ppointing triangle (might use dingbat instad)
  s/\N{U+2714}/\\checkmark{}/g;                     # check mar, (might use dingbat instead)
  s/\N{U+00a0}/~/g;                                 # non-breaking space
  s/\N{U+00b1}/\\ensuremath{\\pm}/g;                # +/-
  s/\N{U+00b2}/\\ensuremath{^2}/g;                  # subscript 2
  s/\N{U+00b9}/\\ensuremath{^1}/g;                  # subscript 1
  s/\N{U+00c0}/\\`{A}/g;                            # A acute
  s/\N{U+00c2}/\\^{A}/g;                            # A circumflex
  s/\N{U+00c3}/\\~{A}/g;                            # A tilde
  s/\N{U+00c6}/\\AE{}/g;                            # AE ligature
  s/\N{U+00d7}/\\ensuremath{\\times}/g;             # multiplication sign
  s/\N{U+00df}/\\ss{}/g;                            # esszet
  s/\N{U+00e0}/\\`{a}/g;                            # a acute
  s/\N{U+00e1}/\\'{a}/g;                            # a gravis
  s/\N{U+00e2}/\\^{a}/g;                            # a circumflex
  s/\N{U+00e3}/\\~{a}/g;                            # a tilde
  s/\N{U+00e4}/\\"{a}/g;                            # a umlaut
  s/\N{U+00e6}/\\ae{}/g;                            # ae ligature
  s/\N{U+00fe}/\\th{}/g;                            # thorn

  s/.\x08//g;                                       # remove stray backspace and the character before it

  print;
}

18:41 Reply: shell's .load is pathSep picky (artifact: 6c659bc1ee user: gisbert)

Indeed. The programme getting needlessly disturbed was the Windows command shell.

18:37 Reply: Trivial typo in the documentation (artifact: a0abae1eed user: gisbert)

There are also a number of HTML entities in the doc files with the terminating semicolon missing. (Browsers are usually pretty good at covering up for such minor typos, so this is more of an aesthetic issue, cleanliness is next to codeliness.)

Misspelled entity present in file(s)...
'&#195"' fts5.html
'&#91<' json1.html printf.html
'&#93 ' opcode.html
'&#93"' json1.html
'&#93.' opcode.html
'&#93<' changes.html vtab.html
'&#93=' opcode.html
'&#94"' oldnews.html
'&gt)' fts3.html
'&lt<' changes.html
'&nbsp&' nulls.html
'&nbspc' cli.html
'&sup1.' onefile.html
'&sup1' appfunc.html and numerous other files
'&sup2)' limits.html
'&sup2' appfunc.html and numerous other files
'&sup3' appfunc.html and numerous other files
16:52 Reply: shell's .load is pathSep picky (artifact: 7f9802a6b4 user: gisbert)

How could SQLite distinguish the use of the backslash as a path separator from its legitimate use as an escape character in Unix-like environments? Apart from that, I don't think either Windows or Unix is "agnostic" of the path separator. At least, last time I tried "type foo/bar.txt" on Windows it didn't seem to like it.

16:43 Reply: 3.31.1: docs as PDF? (artifact: 754d8a9353 user: gisbert)

For what it's worth, here are two pretty straightforward Perl scripts that will amalgamate the SQLite doc structure and produce a PDF file of about 3400 pages.

The first script contains usage hints in the comment section near the top. The second script is only necessary if you, like myself, prefer to have 7-bit clean input for LaTeX. Obviously, there is a lot of room for tweaking the scripts and the pandoc template according to your taste.


#! /usr/bin/perl
#
# File name: pre.pl
#
# Amalgamate and convert the SQLite documentation to PDF format
# (with the help of pandoc and TeX)
#
# Simple usage:
# perl pre.pl docs.html *.html c3ref/*.html session.html syntax/*.html | \
#   pandoc --to pdf --pdf-engine xelatex -o sqlitebook.pdf
# (Start this in the SQLite doc base directory.)
#
# More flexible usage:
# perl pre.pl docs.html *.html c3ref/*.html session.html syntax/*.html | \
#   pandoc --standalone --from html --to latex --table-of-contents       \
#   --toc-depth 3 --number-sections --highlight-style tango              \
#   --variable documentclass=scrreprt --variable fontsize=10pt           \
#   --variable colorlinks=yes                                            \
#   --variable titlegraphic="images/sqlite370_banner.png"  |             \
#   perl post.pl > sqlitebook.tex
# pdflatex sqlitebook.tex    # repeat twice! no xelatex needed!
# (Start this in the SQLite doc base directory.)
#
# Preparatory steps:
# (1) Once only: get
#     https://www.fossil-scm.org/fossil/doc/trunk/www/xkcd-git.gif
#     and put it into the images/ folder. E.g.,
#       wget ... --output-document=images/xkcd-git.gif
# (2) Once only: if necessary, install pandoc, TeX, and possibly ImageMagick
# (3) Once only: if necessary, install the HTML::TreeBuilder Perl through
#       cpan install HTML::TreeBuilder
# (2) Once, but may need updates from time to time: convert all GIF images
#     in the images/ folder and its subfolders to PNG
#     (e.g., ImageMagick will do the trick:  magic foo.gif foo.png  )
#
# Bundling all processing steps into a Makefile is left as an exercise for
# the reader.
#
# TapirSoft Gisbert W. Selke 2020-04-06.
#
# This software may be used under the terms of the Artistic License, i.e.,
# under the same terms as Perl itself:
# https://dev.perl.org/licenses/artistic.html
#

use Modern::Perl;
use HTML::TreeBuilder;

our $VERSION = '0.1.0';

my( %file_seen, $master );

for my $fn( glob( join( ' ', @ARGV ) ) ) {
  # process all our files in order, but each filename once only

  next if $file_seen{$fn};
  $file_seen{$fn}++;
  print STDERR "$fn...\n";

  # Build the DOM tree of this file:
  open( my $fh, '<:utf8', $fn ) || die "Cannot open '$fn' for reading: $!";
  my $tree = HTML::TreeBuilder->new( );
  $tree->parse_file($fh);
  $tree->elementify( );

  # Essential steps:
  fix_references( $tree, $fn );
  fix_headers( $tree );

  # Nice to have: remove navigation elements
  $_->delete( ) for $tree->look_down( 'class', 'nosearch' );
  # remove script elements (just because we can)
  $_->delete( ) for $tree->find( 'script' );

  # Add a comment near the beginning indicating which file this content is
  # coming from, and an anchor so that we can link here:
  $tree->find('body')->unshift_content(
    HTML::Element->new( 'br' ),
    HTML::Element->new( '~comment', 'text' => 'start of input file ' . $fn ),
    HTML::Element->new( 'a', 'id' => $fn . '__' )
  );

  if ( $master ) {
    # Append the contents of the body of this document to the master element
    $master->find('body')->push_content( $tree->find('body')->content_list( ) );
  } else {
    # Our first document serves as the master container
    $master = $tree;
  }

}

print $master->as_HTML( undef, '  ', { } ); # safe, human-readable, clean tag structure


sub fix_references {
  # Fix links and anchors so that they work within the amalgamated document

  my( $tree, $fn ) = @_;

  # find names and ids and disambiguate them, using our file name as prefix:
  fix_names( [ $tree->descendants() ], $fn, 'name' );
  fix_names( [ $tree->descendants() ], $fn, 'id' );
  # find links to other documents and change to disambiguated names:
  fix_hrefs( [ $tree->find('a') ] , $fn, 'href' );
  # fix images:
  fix_images( [ $tree->find('img') ], $fn, 'src' );

  return $tree;
} ## end sub fix_references


sub fix_names {
  # Disambiguate the anchor names and ids in this document

  my( $elems, $fn, $attrname ) = @_;
  for my $elem ( grep { defined $_->attr($attrname) } @{ $elems } ) {
    # walk through all nodes and fix them
    $elem->attr( $attrname, $fn . '__' . $elem->attr($attrname) );
  }

  return $elems;
} ## end sub fix_names


sub fix_hrefs {
  # Change links so that they work with the disambiguated anchors

  my( $elems, $fn, $attrname ) = @_;
  for my $elem ( grep { defined $_->attr($attrname) } @{ $elems } ) {
    # walk through all these nodes
    my $label = $elem->attr($attrname);
    if ( $label =~ m|^https?://| ) {
      # external link, leave unchanged
    } elsif ( $label =~ m|^#(.+)$| ) {
      # internal link to an anchor within this document => prepend our name
      $elem->attr( $attrname, '#' . $fn . '__' . $1 );
    } elsif ( $label =~ m|^[./]*(.+)#(.+)$| ) {
      # relative link to a different file of ours,  with an anchor therein.
      # remember to flatten relative paths.
      $elem->attr( $attrname, '#' . $1 . '__' . $2 );
    } elsif ( $label =~ m|^[./]*(.+)$| ) {
      # relative link to a different file of ours,  with no anchor therein.
      # remember to flatten relative paths.
      $elem->attr( $attrname, '#' . $1 . '__' );
    } else {
      warn "file $fn: strange $attrname: '$label'";
    }
  }

  return $elems;
} ## end sub fix_hrefs


sub fix_images {
  # Change references to GIF to PNG. Clean up relative paths.
  # Make <img> within <dl> nicer.

  my( $elems, $fn, $attrname ) = @_;
  for my $elem ( grep { defined $_->attr($attrname) } @{ $elems } ) {
    # walk through all these nodes
    my $label = $elem->attr($attrname);
    if ( $label =~ m|^https?://| ) {
      # external link, leave unchanged, but handle the special case of an xkcd
      # comic that is not ordinarily included with the docs bundle but is
      # referenced directly from the sqlite fossil site. We require this to be
      # downloaded separately and placed in the images/ folder.
      if ( lc($label) eq
           'https://www.fossil-scm.org/fossil/doc/trunk/www/xkcd-git.gif' ) {
        $elem->attr( $attrname, 'images/xkcd-git.png' )
      }
    } else {
      # relative link to a different file of ours
      $label =~ s|^[./]+||; # flatten relative paths
      # change .gif to .png.
      # Conversion of gif file to png files must be done outside this script,
      # probably using ImageMagick ("magic picname.gif picname.png").
      $label =~ s|\.gif$|.png|i;
      $elem->attr( $attrname, $label );
    }

    # Prepend <br> to <img> within <p> within <dd> to produce nicer results
    if ( ( lc( $elem->parent( )->tag( ) ) eq 'p' ) &&
         ( lc( $elem->parent( )->parent( )->tag( ) ) eq 'dd' ) &&
         ( !$elem->left( ) ) &&
         ( !$elem->parent( )->left( ) )
       ) {
      $elem->preinsert( HTML::Element->new( 'br' ) );
    }

  }

  return $elems;
} ## end sub fix_images


sub fix_headers {
  # remove explicit numbering from <h...> headers because latex will add its own.

  my( $tree ) = @_;
  for my $level (1..6) {
    for my $node (  $tree->find( "h$level" ) ) {
      $node->objectify_text( );
      my $textnode = $node->find( '~text' );
      if ( $textnode ) {
        my $text = $textnode->attr( 'text' );
        if ( $text =~ m|^\s*[0-9.]+\s+| ) {
          $text =~ s|^\s*[0-9.]+\s*||;
          $textnode->attr( 'text', $text );
        }
      }
      $node->deobjectify_text( );
    }
  }

  return $tree;
} ## end sub fix_headers

And here's the second script:


#! /usr/bin/perl
#
# File name: post.pl
#
# Convert UTF-8 characters in the output of pandoc to clean 7-bit ASCII
# characters by recoding as LaTeX macros, this obviating the need to
# use xelatex.
#
# For usage hints see pre.pl
#
# TapirSoft Gisbert W. Selke 2020-04-06.
#
# This software may be used under the terms of the Artistic License, i.e.,
# under the same terms as Perl itself:
# https://dev.perl.org/licenses/artistic.html
#

use strict;
use warnings;

our $VERSION = '0.1.0';

binmode(STDIN, ':encoding(UTF-8)');

while (<>) {
  s/\N{U+03c0}/\\ensuremath{\\pi}/g;                # pi
  s/\N{U+2007}/\\hspace*{1em}/g;                    # fixed space
  s/\N{U+2190}/\\ensuremath{\\leftarrow}/g;         # <-
  s/\N{U+2191}/\\ensuremath{\\uparrow}/g;           # ^
  s/\N{U+2192}/\\ensuremath{\\rightarrow}/g;        # ->
  s/\N{U+21d2}/\\ensuremath{\\Rightarrow}/g;        # =>
  s/\N{U+2260}/\\ensuremath{\\neq}/g;               # !=
  s/\N{U+2265}/\\ensuremath{\\geq}/g;               # >=
  s/\N{U+2588}/\\rule{1em}{1ex}/g;                  # little black box
  s/\N{U+25ba}/\\ensuremath{\\bigtriangledown}/g;   # down-pointing triangle   (might use dingbat instad)
  s/\N{U+25bc}/\\ensuremath{\\rhd}/g;               # right-ppointing triangle (might use dingbat instad)
  s/\N{U+2714}/\\checkmark{}/g;                     # check mar, (might use dingbat instead)
  s/\N{U+00a0}/~/g;                                 # non-breaking space
  s/\N{U+00b1}/\\ensuremath{\\pm}/g;                # +/-
  s/\N{U+00b2}/\\ensuremath{^2}/g;                  # subscript 2
  s/\N{U+00b9}/\\ensuremath{^1}/g;                  # subscript 1
  s/\N{U+00c0}/\\`{A}/g;                            # A acute
  s/\N{U+00c2}/\\^{A}/g;                            # A circumflex
  s/\N{U+00c3}/\\~{A}/g;                            # A tilde
  s/\N{U+00c6}/\\AE{}/g;                            # AE ligature
  s/\N{U+00d7}/\\ensuremath{\\times}/g;             # multiplication sign
  s/\N{U+00df}/\\ss{}/g;                            # esszet
  s/\N{U+00e0}/\\`{a}/g;                            # a acute
  s/\N{U+00e1}/\\'{a}/g;                            # a gravis
  s/\N{U+00e2}/\\^{a}/g;                            # a circumflex
  s/\N{U+00e3}/\\~{a}/g;                            # a tilde
  s/\N{U+00e4}/\\"{a}/g;                            # a umlaut
  s/\N{U+00e6}/\\ae{}/g;                            # ae ligature
  s/\N{U+00fe}/\\th{}/g;                            # thorn

  s/.\x08//g;                                       # remove stray backspace and the character before it

  print;
}

2020-03-30
13:18 Post: Trivial typo in the documentation (artifact: 8e6cf07cc8 user: gisbert)

unlock_notify.html contains a spurious backspace (x08, ^H) before "Atomically releases..." (and before that, a spurious "a"). This may throw off post processors of the documentation. Thanks to anonymous for spotting that something is amiss.

13:14 Edit reply: 3.31.1: docs as PDF? (artifact: cffc9481fe user: gisbert)

The "unknown character" is a meek backspace (x08, ^H) that is erroneously included in the page https://sqlite.org/unlock_notify.html before "Atomically releases...", so that can be fixed trivially. As for the rest of the messages, I'll have to dig in deeper. Stay tuned.

P.S.: Thanks for the download link!

13:13 Reply: 3.31.1: docs as PDF? (artifact: ddd9e2a310 user: gisbert)

The "unknown character" is a meek backspace (x08, ^H) that is erroneously included in the page https://sqlite.org/unlock_notify.html before "Atomically releases...", so that can be fixed trivially. As for the rest of the messages, I'll have to dig in deeper. Stay tuned.

2020-03-29
19:56 Reply: 3.31.1: docs as PDF? (artifact: c5242bdf9c user: gisbert)

That sounds like it should be possible to fix. Care to let me have a look at the input you sent to Pandoc, or its output? I'd have a go at it, being a regular Pandoc and LaTeX user. (Pick some place from where I could download it.)

19:51 Reply: 3.31.1: docs as PDF? (artifact: 734ff8a796 user: gisbert)

Things have evolved: Pandoc is a marvellous tool (indeed written in Haskell) and works as a standalone executable (at least under Windows, but I assume there will be binaries for other platforms available), no Haskell installation needed at all. For all I know you don't need xelatex, just latex should do for Pandoc output (haven't tried myself yet, though).