What Is DjVu and Where Did It Come From?
DjVu (pronounced "déjà vu") is a digital document format invented at AT&T Bell Labs in 1996 by researchers Léon Bottou and Yann LeCun — yes, the same Yann LeCun who later became one of the founding figures of modern deep learning and now serves as Chief AI Scientist at Meta. The format was specifically designed to compress scanned documents at ratios far beyond what PDF could achieve at the time.
AT&T open-sourced DjVu in 2002 under the GNU GPL license. This allowed the format to be adopted widely by digital library projects. The Internet Archive (archive.org) adopted DjVu early for its massive book scanning initiatives. Wikimedia Commons uses DjVu for historical documents and old books. Google Books used DjVu in its early scanning phases before transitioning toward PDF. Academic and public libraries in Russia, Eastern Europe, and parts of Asia became particularly heavy users, a legacy still visible today.
The Technical Innovation: Three-Layer Compression
What makes DjVu genuinely clever is its approach to document structure. Instead of treating a scanned page as a single image (as naive PDF scanning does), DjVu decomposes each page into three distinct layers:
- Foreground layer — text, line art, and fine details encoded at very high resolution (typically 300–600 DPI), using a lossless or near-lossless wavelet-based compressor called IW44.
- Background layer — photographs, gradients, and color regions encoded at a much lower resolution (typically 100 DPI or less), also using IW44 but at heavier compression ratios.
- Mask layer — a binary (black/white) bitmap that tells the renderer which pixels belong to foreground and which to background.
This separation is the key insight. A text character on a white page background benefits from very high-resolution foreground encoding but almost no background data at all. An illustration benefits from a well-compressed background without wasting bits on a high-resolution text layer. The encoder applies each compression strategy only where it matters.
Real File Size Comparison
The results are dramatic:
| Document Type | DjVu | PDF (JBIG2) | Raw TIFF scan |
|---|---|---|---|
| 300-page scanned book | ~3 MB | ~30 MB | ~150 MB |
| Single scanned page (300 DPI) | ~15–40 KB | ~100–300 KB | ~500 KB |
| Mixed text+photo magazine page | ~50–80 KB | ~200–400 KB | ~800 KB |
A 5–10x size advantage over PDF for scanned documents is typical. For purely text-heavy scanned books, the advantage can exceed 20x compared to naive PDF scanning with embedded JPEG images.
Where DjVu Files Are Found Today
Despite its relative obscurity in everyday computing, DjVu files are extremely common in specific contexts:
- Internet Archive (archive.org) — Millions of books scanned by the Archive are available in DjVu format alongside PDF and EPUB. The Archive's scan pipeline historically produced DjVu as its primary archival format.
- Wikimedia Commons — Historical newspapers, maps, manuscripts, and old scientific publications are stored as DjVu files and rendered into PNG thumbnails for Wikipedia pages.
- Google Books (early era) — Google's initial scanning program in the mid-2000s produced DjVu files for many books. Later scans shifted to PDF.
- Academic digital libraries — Russian State Library (RSL), National Digital Library of India, and numerous Eastern European academic repositories host extensive DjVu collections.
- Wikipedia's own source files — Many Wikipedia articles cite DjVu documents that require a special viewer to open directly.
How to Open DjVu Files
Opening a DjVu file is not as simple as double-clicking, since no major operating system includes a native DjVu viewer:
Windows:
- WinDjView — Lightweight, fast, free. The most popular Windows DjVu viewer. Includes thumbnails, bookmarks, and zoom.
- Sumatra PDF — Supports DjVu alongside PDF, EPUB, and other formats. Excellent for quick viewing.
- DjView4 — The official reference viewer, cross-platform, slightly heavier but very capable.
macOS:
- DjView4 — Available via Homebrew (
brew install djview4) or as a standalone build. - No native macOS support exists; converting to PDF first is often the path of least resistance.
Linux:
- Evince — GNOME's document viewer supports DjVu natively on most distributions.
- Okular — KDE's viewer also handles DjVu well.
djvulibrepackage provides both the viewer and command-line tools.
Browser:
- djvu.js — A browser extension available for Firefox and Chrome that renders DjVu files inline. Works well for single-file viewing but is not installed by default anywhere.
Android/iOS:
- EBookDroid (Android) — Supports DjVu among many other formats. Popular among users who read archive.org books on tablets.
- iOS has limited native DjVu support; converting to PDF first is recommended.
Converting DjVu Files
The primary tool for DjVu conversion is ddjvu, included in the djvulibre package available on all major platforms.
Convert entire DjVu to PDF:
ddjvu -format=pdf input.djvu output.pdf
Extract a single page as TIFF:
ddjvu -format=tiff -page=3 input.djvu page3.tiff
Extract a range of pages:
ddjvu -format=tiff -page=1..10 input.djvu pages1to10.tiff
Using ImageMagick (if DjVu delegate is installed):
convert input.djvu output.pdf
For users who prefer a graphical approach, WinDjView on Windows offers File → Export as PDF, and DjView4 provides similar export functionality. Online converters handle single DjVu files adequately if you don't want to install software.
DjVu vs PDF for Scanned Documents
| Feature | DjVu | PDF (scanned) |
|---|---|---|
| File size for scanned books | Excellent (5–10x smaller) | Moderate to poor |
| Native browser support | None (requires extension) | Full (all modern browsers) |
| Mobile support | Requires EBookDroid or conversion | Native on iOS/Android |
| Searchable text layer | Supported (if OCR was run) | Supported |
| Printing | Supported via viewers | Universal |
| Editing/annotation | Very limited | Good (with Acrobat/Foxit) |
| Long-term archival | Acceptable | PDF/A standard exists |
| Ecosystem & tools | Niche | Ubiquitous |
Why DjVu Lost the Format War
Despite its technical superiority for scanned documents, DjVu failed to become mainstream. The reasons are largely about ecosystem and timing:
- Adobe Reader was free and everywhere. By 2005, nearly every Windows PC had Adobe Reader installed. DjVu required users to install a separate viewer.
- PDF became the web standard. Browsers began embedding PDF renderers. DjVu never achieved this integration.
- PDF/A for archival. Institutions adopted PDF/A as the archival standard because it had ISO backing (ISO 19005). DjVu has no equivalent standardization.
- Microsoft killed its own XPS format too. The document format wars of the 2000s left PDF as the only viable survivor.
- AT&T's open-sourcing came too late. By 2002, PDF had already won mindshare. Open-sourcing DjVu couldn't overcome five years of PDF adoption.
When to Keep DjVu vs When to Convert
Keep DjVu if:
- You're downloading from archive.org purely for reading — DjVu viewers are available and the smaller file size matters on slow connections.
- You're storing a large collection of scanned books and storage space is a concern.
- The source DjVu has a high-quality text/OCR layer you want to preserve exactly.
Convert to PDF if:
- You need to share the document with anyone who isn't technically inclined.
- You need annotation, commenting, or digital signature support.
- You're uploading to any document management system or email attachment.
- Accessibility is a concern (PDF/UA has better accessibility tooling).
For most users who encounter a DjVu file from archive.org, converting it to PDF once and keeping the PDF is the pragmatic choice.
Related conversions
Frequent conversions across the catalogue: