3.31.1: docs as PDF?

(1) By Rich S (richs) on 2020-03-27 21:36:51 [link] [source]

Is there a PDF version of the most current docs? I prefer that to a directory full of html files.

(2) By Richard Hipp (drh) on 2020-03-28 12:37:52 in reply to 1 [link] [source]

No.

The documentation is generated by a TCL script that is held in a separate repository. The only output format supported is HTML.

The documentation repository contains much of the documentation source text. But the build script also scans the SQLite source code and extracts a lot of the documentation from comments in the source code. The documentation source repository, like the SQLite source code itself, is open-source. You can clone the documentation repository and rebuild the documentation yourself, if you want.

There are several books about SQLite if you prefer printed documentation.

(3) By anonymous on 2020-03-28 14:41:26 in reply to 1 [link] [source]

You can also try to make one yourself using wkhtmltopdf or Pandoc.

In theory, Pandoc might give you a nicer-looking PDF, but it requires XeLaTeX, and a simple command pandoc --latex-engine=xelatex -o docs.pdf docs.html *html didn't work (XeLaTeX told me that line 150913 of the generated *.tex file contained an invalid character).

(4) By Rich S (richs) on 2020-03-28 15:25:26 in reply to 2 [link] [source]

Thanks, Richard.

I had the first edition of 'The Complete Guide to SQLITE' but printed books quickly expire past their best-used-by dates as the software develops.

It's been a few years since I last used sqlite so I'll use the on-line docs as I need to check syntax converting postgres tables.

Stay healthy,

Rich

(5) By Rich S (richs) on 2020-03-28 15:28:03 in reply to 3 [link] [source]

Thanks for the thought. I looked at pandoc a couple of years ago but it required a complete Haskell installation and I've enough languages already installed that I don't need another one for a single application. I've been using TeXLive for a long time but not xelatex.

Regards,

Rich

(6) By luuk on 2020-03-28 16:14:54 in reply to 1 [link] [source]

I made an attempt, which is not complete yet!, see:

https://github.com/Luuk34/sqlite3pdf/blob/master/sqlite.pdf

(7.1) By Z4us (KlaasZ4usVanBv) on 2020-03-30 12:00:01 edited from 7.0 in reply to 6 [link] [source]

Deleted

(8.1) By Z4us (KlaasZ4usVanBv) on 2020-03-30 12:02:13 edited from 8.0 in reply to 7.0 [link] [source]

Thank you, Luuk

Some basic docs can be found here as well: https://en.wikibooks.org/wiki/SQLite

Everybody may change those and to create a PDF:

https://en.wikibooks.org/w/index.php?title=Special:ElectronPdf&page=SQLite&action=show-download-screen

(9) By Gisbert (gisbert) on 2020-03-29 19:51:40 in reply to 5 [link] [source]

Things have evolved: Pandoc is a marvellous tool (indeed written in Haskell) and works as a standalone executable (at least under Windows, but I assume there will be binaries for other platforms available), no Haskell installation needed at all. For all I know you don't need xelatex, just latex should do for Pandoc output (haven't tried myself yet, though).

(10) By Gisbert (gisbert) on 2020-03-29 19:56:02 in reply to 3 [link] [source]

That sounds like it should be possible to fix. Care to let me have a look at the input you sent to Pandoc, or its output? I'd have a go at it, being a regular Pandoc and LaTeX user. (Pick some place from where I could download it.)

(11) By Rich S (richs) on 2020-03-29 21:10:36 in reply to 9 [link] [source]

Thanks, Gilbert. Now SlackBuilds.org has pandoc-bin which is a repackaged version of the pandoc github binary. There's still the version requiring a full Haskill installation but I just downloaded, built, and installed the self-contained version.

Regards,

Rich

(12) By anonymous on 2020-03-30 07:45:06 in reply to 10 [source]

Care to let me have a look at the input you sent to Pandoc

It's the built HTML documentation from https://sqlite.org/docsrc.

or its output?

pandoc -o sqlite.pdf --latex-engine=xelatex docs/*html fails with an error mentioned above and cleans up temporary files.

cd doc && pandoc -s -o sqlite.tex docs.html *html produces a 6-megabyte *.tex file which can actually be built, albeit not without a lot of warnings (image loading problems and a lot of occurrences of "Extra }, or forgotten endgroup"). Use of XeLaTeX seems to be unavoidable because the resulting document contains symbols like ▼ and ►. Inter-documentation links don't seem to be working, either (they lead to original *.html, instead).

I think that there needs to be a preprocessing step: a script should follow all links in docs.html, cut out the <div> with the relevant content, replace all local links with anchor-links, insert a relevant <span id='...'></span> in the beginning of each file (to make the anchor-links work) and concatenate all that into a single large *.html. That file could be fed to pandoc to produce a more logically sound PDF.

(13.1) By Gisbert (gisbert) on 2020-03-30 13:14:43 edited from 13.0 in reply to 12 [link] [source]

The "unknown character" is a meek backspace (x08, ^H) that is erroneously included in the page https://sqlite.org/unlock_notify.html before "Atomically releases...", so that can be fixed trivially. As for the rest of the messages, I'll have to dig in deeper. Stay tuned.

P.S.: Thanks for the download link!

(14.2) By Gisbert (gisbert) on 2020-04-07 21:59:10 edited from 14.1 in reply to 1 [link] [source]

For what it's worth, here are two pretty straightforward Perl scripts that will amalgamate the SQLite doc structure and produce a PDF file of about 3500 pages.

The first script contains usage hints in the comment section near the top. The second script is only necessary if you, like myself, prefer to have 7-bit clean input for LaTeX. Obviously, there is a lot of room for tweaking the scripts and the pandoc template according to your taste.

#! /usr/bin/perl
#
# File name: pre.pl
#
# Amalgamate and convert the SQLite documentation to PDF format
# (with the help of pandoc and TeX)
#
# Simple usage:
# perl pre.pl docs.html *.html c3ref/*.html session.html syntax/*.html   \
#      releaselog/*.html                                               | \ 
#   pandoc --to pdf --pdf-engine xelatex -o sqlitebook.pdf
# (Start this in the SQLite doc base directory.)
#
# perl pre.pl docs.html *.html c3ref/*.html session.html syntax/*.html   \
#      releaselog/*.html                                               | \ 
#   pandoc --standalone --from html --to latex --table-of-contents       \
#   --toc-depth 3 --number-sections --highlight-style tango              \
#   --variable documentclass=scrreprt --variable fontsize=10pt           \
#   --variable colorlinks=yes                                            \
#   --variable titlegraphic="images/sqlite370_banner.png"  |             \
#   perl post.pl > sqlitebook.tex
# pdflatex sqlitebook.tex    # repeat twice! no xelatex needed!
# (Start this in the SQLite doc base directory.)
#
# Preparatory steps:
# (1) Once only: get
#     https://www.fossil-scm.org/fossil/doc/trunk/www/xkcd-git.gif
#     and put it into the images/ folder. E.g.,
#       wget ... --output-document=images/xkcd-git.gif
# (2) Once only: if necessary, install pandoc, TeX, and possibly ImageMagick
# (3) Once only: if necessary, install the HTML::TreeBuilder Perl through
#       cpan install HTML::TreeBuilder
# (2) Once, but may need updates from time to time: convert all GIF images
#     in the images/ folder and its subfolders to PNG
#     (e.g., ImageMagick will do the trick:  magic foo.gif foo.png  )
#
# Bundling all processing steps into a Makefile is left as an exercise for
# the reader.
#
# TapirSoft Gisbert W. Selke 2020-04-07.
#
# This software may be used under the terms of the Artistic License, i.e.,
# under the same terms as Perl itself:
# https://dev.perl.org/licenses/artistic.html
#

use strict;
use warnings;
use HTML::TreeBuilder;

our $VERSION = '0.1.1';

my( %file_seen, $master );

for my $fn( glob( join( ' ', @ARGV ) ) ) {
  # process all our files in order, but each filename once only

  next if $file_seen{$fn};
  $file_seen{$fn}++;
  print STDERR "$fn...\n";

  # Build the DOM tree of this file:
  open( my $fh, '<:utf8', $fn ) || die "Cannot open '$fn' for reading: $!";
  my $tree = HTML::TreeBuilder->new( );
  $tree->parse_file($fh);
  $tree->elementify( );

  # Essential steps:
  fix_references( $tree, $fn );
  fix_headers( $tree );

  # Nice to have: remove navigation elements
  $_->delete( ) for $tree->look_down( 'class', 'nosearch' );
  # remove script elements (just because we can)
  $_->delete( ) for $tree->find( 'script' );

  # Add a comment near the beginning indicating which file this content is
  # coming from, and an anchor so that we can link here:
  $tree->find('body')->unshift_content(
    HTML::Element->new( 'br' ),
    HTML::Element->new( '~comment', 'text' => 'start of input file ' . $fn ),
    HTML::Element->new( 'a', 'id' => $fn . '__' )
  );

  if ( $master ) {
    # Append the contents of the body of this document to the master element
    $master->find('body')->push_content( $tree->find('body')->content_list( ) );
  } else {
    # Our first document serves as the master container
    $master = $tree;
  }

}

print $master->as_HTML( undef, '  ', { } ); # safe, human-readable, clean tag structure


sub fix_references {
  # Fix links and anchors so that they work within the amalgamated document

  my( $tree, $fn ) = @_;

  # find names and ids and disambiguate them, using our file name as prefix:
  fix_names( [ $tree->descendants() ], $fn, 'name' );
  fix_names( [ $tree->descendants() ], $fn, 'id' );
  # find links to other documents and change to disambiguated names:
  fix_hrefs( [ $tree->find('a') ] , $fn, 'href' );
  # fix images:
  fix_images( [ $tree->find('img') ], $fn, 'src' );

  return $tree;
} ## end sub fix_references


sub fix_names {
  # Disambiguate the anchor names and ids in this document

  my( $elems, $fn, $attrname ) = @_;
  for my $elem ( grep { defined $_->attr($attrname) } @{ $elems } ) {
    # walk through all nodes and fix them
    $elem->attr( $attrname, $fn . '__' . $elem->attr($attrname) );
  }

  return $elems;
} ## end sub fix_names


sub fix_hrefs {
  # Change links so that they work with the disambiguated anchors

  my( $elems, $fn, $attrname ) = @_;
  for my $elem ( grep { defined $_->attr($attrname) } @{ $elems } ) {
    # walk through all these nodes
    my $label = $elem->attr($attrname);
    if ( $label =~ m|^https?://| ) {
      # external link, leave unchanged
    } elsif ( $label =~ m|^#(.+)$| ) {
      # internal link to an anchor within this document => prepend our name
      $elem->attr( $attrname, '#' . $fn . '__' . $1 );
    } elsif ( $label =~ m|^[./]*(.+)#(.+)$| ) {
      # relative link to a different file of ours,  with an anchor therein.
      # remember to flatten relative paths.
      $elem->attr( $attrname, '#' . $1 . '__' . $2 );
    } elsif ( $label =~ m|^[./]*(.+)$| ) {
      # relative link to a different file of ours,  with no anchor therein.
      # remember to flatten relative paths.
      $elem->attr( $attrname, '#' . $1 . '__' );
    } else {
      warn "file $fn: strange $attrname: '$label'";
    }
  }

  return $elems;
} ## end sub fix_hrefs


sub fix_images {
  # Change references to GIF to PNG. Clean up relative paths.
  # Make <img> within <dl> nicer.

  my( $elems, $fn, $attrname ) = @_;
  for my $elem ( grep { defined $_->attr($attrname) } @{ $elems } ) {
    # walk through all these nodes
    my $label = $elem->attr($attrname);
    if ( $label =~ m|^https?://| ) {
      # external link, leave unchanged, but handle the special case of an xkcd
      # comic that is not ordinarily included with the docs bundle but is
      # referenced directly from the sqlite fossil site. We require this to be
      # downloaded separately and placed in the images/ folder.
      if ( lc($label) eq
           'https://www.fossil-scm.org/fossil/doc/trunk/www/xkcd-git.gif' ) {
        $elem->attr( $attrname, 'images/xkcd-git.png' )
      }
    } else {
      # relative link to a different file of ours
      $label =~ s|^[./]+||; # flatten relative paths
      # change .gif to .png.
      # Conversion of gif file to png files must be done outside this script,
      # probably using ImageMagick ("magic picname.gif picname.png").
      $label =~ s|\.gif$|.png|i;
      $elem->attr( $attrname, $label );
    }

    # Prepend <br> to <img> within <p> within <dd> to produce nicer results
    if ( ( lc( $elem->parent( )->tag( ) ) eq 'p' ) &&
         ( lc( $elem->parent( )->parent( )->tag( ) ) eq 'dd' ) &&
         ( !$elem->left( ) ) &&
         ( !$elem->parent( )->left( ) )
       ) {
      $elem->preinsert( HTML::Element->new( 'br' ) );
    }

  }

  return $elems;
} ## end sub fix_images


sub fix_headers {
  # remove explicit numbering from <h...> headers because latex will add its own.

  my( $tree ) = @_;
  for my $level (1..6) {
    for my $node (  $tree->find( "h$level" ) ) {
      $node->objectify_text( );
      my $textnode = $node->find( '~text' );
      if ( $textnode ) {
        my $text = $textnode->attr( 'text' );
        if ( $text =~ m|^\s*[0-9.]+\s+| ) {
          $text =~ s|^\s*[0-9.]+\s*||;
          $textnode->attr( 'text', $text );
        }
      }
      $node->deobjectify_text( );
    }
  }

  return $tree;
} ## end sub fix_headers

And here's the second script:

#! /usr/bin/perl
#
# File name: post.pl
#
# Convert UTF-8 characters in the output of pandoc to clean 7-bit ASCII
# characters by recoding as LaTeX macros, this obviating the need to
# use xelatex.
#
# For usage hints see pre.pl
#
# TapirSoft Gisbert W. Selke 2020-04-07.
#
# This software may be used under the terms of the Artistic License, i.e.,
# under the same terms as Perl itself:
# https://dev.perl.org/licenses/artistic.html
#

use strict;
use warnings;

our $VERSION = '0.1.1';

binmode(STDIN, ':encoding(UTF-8)');

while (<>) {
  s/\N{U+03c0}/\\ensuremath{\\pi}/g;                # pi
  s/\N{U+2007}/\\hspace*{1em}/g;                    # fixed space
  s/\N{U+2190}/\\ensuremath{\\leftarrow}/g;         # <-
  s/\N{U+2191}/\\ensuremath{\\uparrow}/g;           # ^
  s/\N{U+2192}/\\ensuremath{\\rightarrow}/g;        # ->
  s/\N{U+21d2}/\\ensuremath{\\Rightarrow}/g;        # =>
  s/\N{U+2260}/\\ensuremath{\\neq}/g;               # !=
  s/\N{U+2265}/\\ensuremath{\\geq}/g;               # >=
  s/\N{U+2588}/\\rule{1em}{1ex}/g;                  # little black box
  s/\N{U+25ba}/\\ensuremath{\\bigtriangledown}/g;   # down-pointing triangle   (might use dingbat instad)
  s/\N{U+25bc}/\\ensuremath{\\rhd}/g;               # right-ppointing triangle (might use dingbat instad)
  s/\N{U+2714}/\\checkmark{}/g;                     # check mar, (might use dingbat instead)
  s/\N{U+00a0}/~/g;                                 # non-breaking space
  s/\N{U+00b1}/\\ensuremath{\\pm}/g;                # +/-
  s/\N{U+00b2}/\\ensuremath{^2}/g;                  # subscript 2
  s/\N{U+00b9}/\\ensuremath{^1}/g;                  # subscript 1
  s/\N{U+00c0}/\\`{A}/g;                            # A acute
  s/\N{U+00c2}/\\^{A}/g;                            # A circumflex
  s/\N{U+00c3}/\\~{A}/g;                            # A tilde
  s/\N{U+00c6}/\\AE{}/g;                            # AE ligature
  s/\N{U+00d7}/\\ensuremath{\\times}/g;             # multiplication sign
  s/\N{U+00df}/\\ss{}/g;                            # esszet
  s/\N{U+00e0}/\\`{a}/g;                            # a acute
  s/\N{U+00e1}/\\'{a}/g;                            # a gravis
  s/\N{U+00e2}/\\^{a}/g;                            # a circumflex
  s/\N{U+00e3}/\\~{a}/g;                            # a tilde
  s/\N{U+00e4}/\\"{a}/g;                            # a umlaut
  s/\N{U+00e6}/\\ae{}/g;                            # ae ligature
  s/\N{U+00fe}/\\th{}/g;                            # thorn

  print;
}

(15) By luuk on 2020-04-06 19:14:27 in reply to 14.1 [link] [source]

I will (also) take a look at this.

My own try to produce some PDF has stopped on creating an index (read: working index).

Maybe, when i have some time left coming weekend, but who knows....

Thanks for this script!

Luuk

(16) By Wout Mertens (wmertens) on 2020-04-06 21:38:46 in reply to 1 [link] [source]

I'm of the contrary opinion: I hate pdf files. They don't reformat for your screen, they're hard to parse, they're an endless source of crash bugs, and they are just good enough to make people not look for alternatives.

In the meantime, we could have had something based on light weight formats like epub, mhtml or now web bundles, but the inertia is tremendous. It makes me a little sad.

(17) By Larry Brasfield (LarryBrasfield) on 2020-04-06 21:53:26 in reply to 16 [link] [source]

It's too bad browsers cannot just go into a .zip archive. Opening one in the browser should be like opening a directory subtree, where an index.html file is shown when present.

The only reason I like PDFs is because a related set of pages reside in a single file. IIRC, mhtml worked like that.

I wonder if the fossil http page server knows how to peruse .zip archives.

(18) By Richard Hipp (drh) on 2020-04-06 22:05:32 in reply to 17 [link] [source]

OT: You know that the SQLite command-line shell knows how to read and write ZIP archives, right? A ZIP archive is just a key/value database, after all. The filename is the key and the file content is the value. SQLite has a virtual table that translates this key/value store into something usable the the SQLite query engine.

(19) By Larry Brasfield (LarryBrasfield) on 2020-04-06 22:27:47 in reply to 18 [link] [source]

(If I knew how to move this to a new thread, such as could be done in email by editing the subject, I would get this to be on-topic. ;-)

Yes, I am/was aware that the shell can access .zip archives. So can many other tools. However, I do not know of any existing tool or combination of tools that is willing to stream .zip archive content in response to HTTP requests while leaving it merely compressed in place rather than writing a file (and typically gobs of files) in a temp directory.

As I recall, the Java runtime pulls things for use from .jar files without littering files around in the file system.

I also know of sqlar archives and their use in Linux/Unix to present a user-space implemented file system which can be mounted where needed. ("FUSE"?) Making that do the above on my Windows-running workstation might be enough to induce getting the Linux subsystem going. Having that tool is more attractive than a special-cased browser capability with a new transport protocol.

(20) By Warren Young (wyoung) on 2020-04-07 01:17:58 in reply to 17 [link] [source]

The only reason I like PDFs is because a related set of pages reside in a single file.

Do you need a single file to copy from one place to another, or do you just want a single thing to fetch in order to get the whole document set?

If the latter, then you could clone the SQLite docsrc repo and build it locally. This would get you a pile of files in a directory, not a single thing to copy from place to place, but it has the benefit that you can then easy update your local copy with:

    $ fossil up
    $ make all

(21) By Richard Hipp (drh) on 2020-04-07 12:12:43 in reply to 17 [link] [source]

The only reason I like PDFs is because a related set of pages reside in a single file.

I put something like this together for the SQLite docs years ago. You can still try to build it from the SQLite Documentation Sources by using "make docapp", though I found that the Makefile didn't quite work and I had to finish the build manually.

If you successfully build "docapp", then you just run the application and it automatically starts up a small web-server running on "localhost" and hosting all of the SQLite documentation. It also automatically launches your default web-browser and points it to "http://localhost:8080/" (or whatever alternative TCP port the web-server chose) so that you can browse the complete SQLite documentation.

Note that all the documentation and the code to run the local web-server and launch the web-browser are all packaged into a single executable file, which is I believe what you are looking for, is it not?

How Docapp Works

The base executable is "sqltclsh" - a version of "tclsh" that has the "sqlite3" command built-in, along with a few other goodies. You can build "sqltclsh" from the standard SQLite makefile: "make sqltclsh".

The documentation pages are put into an SQLite Archive that contains all of the SQLite documentation file. An SQLite Archive is just an SQLite database. This is appended to the "sqltclsh" executable and accessed using the Append VFS extension.

The built-in web-server is just Wapp. The Wapp script to start up and run the web-server and server files out of the appended SQLite Archive is a mere 23 lines of code.

(22) By Gisbert (gisbert) on 2020-04-07 16:45:07 in reply to 19 [link] [source]

I'm not sure whether I have understood your wish correctly, but it sounds like a suitable CGI script (or whatever your web server's preferred way of running scripts is) should be able to extract something from a zip archive and stream it to the client without littering the place. That would work either with a remote server or with an ad hoc local server process, similar to docapp described below by drh.

(23) By Gisbert (gisbert) on 2020-04-07 16:50:21 in reply to 16 [link] [source]

Your wish has been granted: pandoc can output to three different brands of epub and about 50 other formats. Just replace the value for the --to parameter in the sample call described near the top of the script.

(24) By Larry Brasfield (LarryBrasfield) on 2020-04-07 17:31:12 in reply to 21 [link] [source]

That is very close to an ideal solution. Thanks for the tip (and the creation, of course.)

I think that tool, absent the appended archive and made able to access archives specified at run time, would be very useful. I plan to adapt that code for that purpose, as it is a great toolbox addition. Shall I announce it here, in a separate thread?

(25) By Richard Hipp (drh) on 2020-04-07 19:44:56 in reply to 24 [link] [source]

To view the latest development snapshot of the draft documentation for the 3.32.0 release of SQLite as a "WDF Document", do this:

Download and install Fossil on your machine.
Download the https://sqlite.org/tmp/sqlite-docs-20200407.wdf document (16MB)
Run "fossil ui sqlite-docs-20200407.wdf"

(26) By Wout Mertens (wmertens) on 2020-04-07 20:19:02 in reply to 17 [link] [source]

Actually, this is what Web Bundles are meant to do:

A Web Bundle is a file format for encapsulating one or more HTTP resources in a single file. It can include one or more HTML files, JavaScript ...

It has support for signing, and basically allow you to provide an offline copy of an entire website. See also the explainer.

Sadly it doesn't quite replace PDF because PDF has provisions for paging afaik, and this is just a way to encapsulate websites.

(27) By Peter da Silva (resuna) on 2020-04-09 19:30:50 in reply to 17 [link] [source]

tar.gz would be better, because zip has the index at the end which means you have to wait for the whole file before you can start rendering it. For a file you're streaming over a possibly low bandwidth stream that adds more startup delay.

(28) By Gisbert (gisbert) on 2020-04-11 22:17:39 in reply to 27 [link] [source]

Not necessarily. Zip files have the so-called central directory at the end, indeed, but they have also local directory entries directly before the individual members, so readily available when streaming. For this purpose, the central directory is redundant. (At least, when I looked last time, which is nearly 25 years ago.)

However, .tar.gz usually has higher compression rates, especially when including lots of small files, because they compress the (tar) stream as a whole, whereas zip compresses individual members, leading to higher overhead.

So, yes, .tar.gz would be the better option. And .tar.bz2 and .tar.lz even better.

(29) By anonymous on 2020-04-11 22:31:02 in reply to 27 [link] [source]

tar.gz would be better, because zip has the index at the end which means you have to wait for the whole file before you can start rendering it.

Many programs need to read the entire file to read the index at the end, although bsdtar does support streaming ZIP files (and can also be used to read a truncated ZIP file, which is something I had to do once).

(30) By anonymous on 2020-04-11 22:32:32 in reply to 1 [link] [source]

Another format I would like to have the SQLite documentation in is troff format.

(31) By Gisbert (gisbert) on 2020-04-12 21:46:47 in reply to 30 [link] [source]

Depending in how hard you want it, you can use pandoc (as described above) to convert to texinfo and then use the TeX package texi2roff to convert to troff. (Disclaimer: I haven't tried this and am unlikely to ever do so :-) )