SQLite Forum

Saving Wikipedia to SQLite
Funny story about that....

In 2006, I acquired an [iRex iLiad](https://en.wikipedia.org/wiki/ILiad) ebook reader, one of the first consumer products with an e-ink display.  It ran Linux, and they released an SDK.  It was an amazing piece of hardware, way ahead of its time.

During the 2006 holiday break, I was trying to figure out a project I could do with it, and struck upon the idea of writing an off-line Wikipedia app.  This was inspired by some SF movie or TV show I had recently seen that showed someone using a tablet/ebook reader with different things (like dictionaries, encyclopedias, etc.) stored on rod-shaped data storage units, kind of like small pencils (anyone remember what show/movie that was from?).

The issue was that the iLiad only supported SD cards (like, actual, original SDs, not SDHD or SDXD), so was limited to 4GB.  That meant figuring out a way to squeeze all of Wikipedia into 4GB, in a format that would be easily accessible.

Since Wikipedia makes  MySQL dumps available, I downloaded the latest (at work) and wrote some scripts to extract the dumps and load them into SQLite.  I did some cool things like used SQLite's type system to save articles either as text or a zlib-compressed BLOB, depending on which was smaller, and have automatic access functions to unpack the data.  Starting with the formatted text, and only saving the current editing of actual articles, without discussions or meta-articles (and using a custom salt dictionary for zlib), I did manage to get it down to 4GB.  Writing a front-end app for the data turned out to be way more trouble than it was worth, however, and never happened, but the database was pretty cool.

During this process, at some point I got really frustrated with the SQLite APIs, and the exact sequence they needed to be called, and the lack of good tutorials.  At some moment, I actually shouted at the ceiling, "Someone should write a book about all this!"  At in the most cartoon fashion, for a moment I swear an actual lightbulb appeared above my head.

You see, I'd been in contact with editors at O'Reilly for a few years at that point in my career, having done numerous tech-reviews and proof-reads.  I had started a book in 2004 ("[Distributed Computing with Mac OS X: Building Clusters and Grids](https://www.amazon.com/dp/059600804X)"), but the book was canceled (along with about 70% of their in-progress titles) when O'Reilly went though a big business shift and the CFO took over the CEO position from Tim, and effectively saved the company (wonderful lady; we talked at a few events and she's absolutely amazing).  It took the company a year or so to recover, but they were looking for other book ideas, and my editor was really pushing me to try my hand at writing something again.  We were working on an idea or two about networking, but the idea wouldn't gel, and I didn't want to do it without laser focus, so we were trying to come up with another book topic.

And suddenly I had one.  The world needed an(other) SQLite book.  I wrote up a proposal, told my editor it was coming his way, got everything polished and ready to go for the editors' review meeting.  THAT morning, Tim O'Reilly emailed my editor asking if they had anyone looking at "this SQLite thing," as it was spiking on Google Trends.  "Yeah, we have something in the works."  That afternoon my idea got pitched and was approved.  2.5 years later, in August of 2010, "[Using SQLite](https://www.oreilly.com/library/view/using-sqlite/9781449394592/)" was published.  I still make about USD$60/month in royalties.

And it all started because I wanted  Wikipedia in SQLite on an SD card.