What Is DjVu?
DjVu (pronounced "déjà vu") is a compressed file format optimized for storing scanned documents, particularly those with mixed content — text, line art, and photographic illustrations on the same page. Developed at AT&T Labs by Yann LeCun, Léon Bottou, and colleagues in 1996, DjVu was designed to solve a specific problem: how to distribute high-quality scanned books, magazines, and documents over slow internet connections.
When DjVu was introduced in the late 1990s, a scanned book page at 300 DPI might be a 5 MB TIFF. The same page as a JPEG would be 200 KB but with unacceptable compression artifacts on text. DjVu achieved 100-300 KB per page with excellent text clarity by using a fundamentally different compression strategy: separate the page into its components and compress each optimally.
How DjVu Compression Works
DjVu's key innovation is the IW44 wavelet compression algorithm combined with a document analysis pipeline that separates a scanned page into three layers:
1. Background Layer (BG44)
The background layer contains low-frequency color information — the paper texture, photographic images, and gradients. It is compressed using the IW44 progressive wavelet algorithm similar to JPEG 2000.
2. Foreground Layer (FG44)
The foreground layer contains the color information of text and line art. For most text documents this layer is very simple (black text on white) and compresses to near-zero size.
3. Mask / JB2 Layer
The JB2 mask (Joint Bilevel Image experts group variant) is the most clever part. It contains a bitonal (two-color: black or white, no grays) representation of all text and line art. JB2 uses pattern matching to find repeated character shapes (like all occurrences of the letter "e" or "a") and stores them only once, with pointers to each position where they appear. Because the same character shape appears hundreds of times on a page, this achieves enormous compression.
4. Shared Dictionary
The shared dictionary extends JB2's pattern matching across multiple pages — all pages in a book share a single library of character shapes. A single shape for each letter of the alphabet covers all occurrences across the entire book.
The result:
- Scanned text-heavy pages: 20-100 KB (vs 1-3 MB for TIFF, 100-300 KB for PDF/TIFF)
- Scanned photographic pages: 50-200 KB (comparable to JPEG)
- Mixed content (text + photos): 50-150 KB (much better than any single-algorithm approach)
DjVu File Structure
A DjVu file uses the IFF (Interchange File Format) container — the same container used by AIFF audio. The structure is a hierarchy of named "chunks":
FORM:DJVU ← single-page DjVu
├── INFO ← page dimensions, resolution, version
├── Djbz ← shared JB2 dictionary (character shapes)
├── Sjbz ← JB2-encoded mask layer (text outline)
├── FG44 ← IW44-encoded foreground layer
├── BG44 ← IW44-encoded background layer (may be split into multiple chunks for progressive loading)
├── TXTz ← hidden text layer (OCR text for search/selection)
└── ANTa / ANTz ← annotations (hyperlinks, highlights)
FORM:DJVM ← multi-page DjVu document
├── DIRM ← directory of pages
├── FORM:DJVU ← page 1
├── FORM:DJVU ← page 2
└── ...
The Hidden Text Layer (TXTz)
DjVu files can contain a compressed hidden text layer storing the OCR (Optical Character Recognition) transcript of the page. This enables:
- Text search within the DjVu viewer (without re-OCRing)
- Text selection and copy-paste
- Accessibility (screen readers)
- Indexing by search engines
The text positions are word-level — each word in the OCR transcript is associated with its bounding rectangle on the page.
DjVu vs PDF for Scanned Documents
| Feature | DjVu | |
|---|---|---|
| Compression for text scans | Excellent (JB2 + IW44) | Good (JBIG2 + JPEG2000 in PDF 1.5+) |
| File size for 300 DPI scan | 20-100 KB/page | 50-300 KB/page |
| Universal viewer support | No (requires plugin/app) | Yes (PDF is everywhere) |
| Browser native support | No | Via browser PDF viewer |
| Reflowable text | No (image-based) | No (for scanned PDFs) |
| Hyperlinks | Yes (annotations) | Yes |
| OCR text layer | Yes (TXTz) | Yes (invisible text overlay) |
| Editing | Very limited | Limited (for scanned) |
| Open standard | Open specification | ISO standard (ISO 32000) |
| Creation tools | DjVuLibre, Any2DjVu | Virtually everything |
When DjVu wins: archiving large libraries of scanned books or journals at minimum file size while preserving excellent visual quality. A 600-page scanned book that would be 150 MB as PDF may be only 30 MB as DjVu.
When PDF wins: everything else — universal compatibility, browser support, editing, forms, digital signatures, and tool ecosystem. PDF has largely displaced DjVu in most workflows.
Opening DjVu Files
DjVu is not natively supported by most operating systems. You need dedicated software:
Windows:
- WinDjView (free, lightweight)
- DjView (free, cross-platform, from DjVuLibre project)
- Okular (KDE viewer for Windows)
- Sumatra PDF (supports DjVu natively)
macOS:
- DjView (free)
- MacDjView
Linux:
- Evince (GNOME document viewer — supports DjVu)
- Okular (KDE document viewer — supports DjVu)
- DjView
Browser:
- No native support. DjVu.js is a JavaScript renderer for web embedding.
Converting DjVu
DjVu to PDF (Most Common Conversion)
# Using DjVuLibre (djvulibre package)
ddjvu -format=pdf input.djvu output.pdf
# With DPI specification
ddjvu -format=pdf -resolution=300 input.djvu output.pdf
# Page range
ddjvu -format=pdf -page=1-50 input.djvu output_partial.pdf
DjVu to TIFF or PNG (Per-Page Export)
# Export all pages as individual TIFF files
ddjvu -format=tiff -resolution=300 input.djvu page-%04d.tiff
# Export all pages as PNG
ddjvu -format=ppm input.djvu page-%04d.ppm
# Export a single page as PNG
ddjvu -format=png -page=5 input.djvu page_5.png
Creating DjVu from Scans
# c44 — compress a single JPEG/PNG to DjVu (color page)
c44 scanned_page.jpg page.djvu
# cjb2 — compress a bitonal TIFF to JB2 (text-only page, best compression)
cjb2 -lossy scanned_text.tiff page_text.djvu
# djvm — bundle multiple page DjVu files into a multi-page document
djvm -c document.djvu page1.djvu page2.djvu page3.djvu
# Any2DjVu web service — uploads and converts automatically
# https://any2djvu.djvuzone.org/
# Document Express (commercial) — best quality for production scanning
Python with python-djvulibre
# pip install python-djvulibre
import djvu.decode
ctx = djvu.decode.Context()
doc = ctx.new_document(djvu.decode.FileURI('book.djvu'))
doc.decoding_job.wait()
print(f"Pages: {doc.pages.length}")
for i, page in enumerate(doc.pages):
job = page.decode(wait=True)
width = page.width
height = page.height
print(f"Page {i+1}: {width} x {height} pixels")
Where DjVu Is Used Today
Despite being overshadowed by PDF, DjVu maintains a presence in specific communities:
Internet Archive (archive.org): uses DjVu as one of its primary formats for book scans — millions of public domain books are available in DjVu format alongside PDF.
Libraries and academic archives: institutions that scanned large collections in the early 2000s still maintain DjVu archives.
Russian-language internet: DjVu has a particularly strong following in Russian-speaking communities, where many technical manuals, textbooks, and literary works are distributed as DjVu files.
Retro computing: vintage computer magazines and technical manuals from the 1980s-90s are often found as DjVu scans.
For new digitization projects today, the choice is typically between PDF (universal) or JPEG 2000 / HEIF (better compression for photos) rather than DjVu. But the format remains the best option for converting a legacy DjVu collection to a smaller file size than PDF would achieve.
Related conversions
Document conversions that follow this topic naturally: