Guide

DjVu Format Guide: Compressed Scanned Documents & Digital Libraries

PC By Pablo Cirre • Updated May 7, 2026

What Is DjVu?

DjVu (pronounced "déjà vu") is a compressed file format optimized for storing scanned documents, particularly those with mixed content — text, line art, and photographic illustrations on the same page. Developed at AT&T Labs by Yann LeCun, Léon Bottou, and colleagues in 1996, DjVu was designed to solve a specific problem: how to distribute high-quality scanned books, magazines, and documents over slow internet connections.

When DjVu was introduced in the late 1990s, a scanned book page at 300 DPI might be a 5 MB TIFF. The same page as a JPEG would be 200 KB but with unacceptable compression artifacts on text. DjVu achieved 100-300 KB per page with excellent text clarity by using a fundamentally different compression strategy: separate the page into its components and compress each optimally.

How DjVu Compression Works

DjVu's key innovation is the IW44 wavelet compression algorithm combined with a document analysis pipeline that separates a scanned page into three layers:

1. Background Layer (BG44)

The background layer contains low-frequency color information — the paper texture, photographic images, and gradients. It is compressed using the IW44 progressive wavelet algorithm similar to JPEG 2000.

2. Foreground Layer (FG44)

The foreground layer contains the color information of text and line art. For most text documents this layer is very simple (black text on white) and compresses to near-zero size.

3. Mask / JB2 Layer

The JB2 mask (Joint Bilevel Image experts group variant) is the most clever part. It contains a bitonal (two-color: black or white, no grays) representation of all text and line art. JB2 uses pattern matching to find repeated character shapes (like all occurrences of the letter "e" or "a") and stores them only once, with pointers to each position where they appear. Because the same character shape appears hundreds of times on a page, this achieves enormous compression.

4. Shared Dictionary

The shared dictionary extends JB2's pattern matching across multiple pages — all pages in a book share a single library of character shapes. A single shape for each letter of the alphabet covers all occurrences across the entire book.

The result:

Scanned text-heavy pages: 20-100 KB (vs 1-3 MB for TIFF, 100-300 KB for PDF/TIFF)
Scanned photographic pages: 50-200 KB (comparable to JPEG)
Mixed content (text + photos): 50-150 KB (much better than any single-algorithm approach)

DjVu File Structure

A DjVu file uses the IFF (Interchange File Format) container — the same container used by AIFF audio. The structure is a hierarchy of named "chunks":

FORM:DJVU         ← single-page DjVu
├── INFO          ← page dimensions, resolution, version
├── Djbz          ← shared JB2 dictionary (character shapes)
├── Sjbz          ← JB2-encoded mask layer (text outline)
├── FG44          ← IW44-encoded foreground layer
├── BG44          ← IW44-encoded background layer (may be split into multiple chunks for progressive loading)
├── TXTz          ← hidden text layer (OCR text for search/selection)
└── ANTa / ANTz   ← annotations (hyperlinks, highlights)

FORM:DJVM         ← multi-page DjVu document
├── DIRM          ← directory of pages
├── FORM:DJVU     ← page 1
├── FORM:DJVU     ← page 2
└── ...

The Hidden Text Layer (TXTz)

DjVu files can contain a compressed hidden text layer storing the OCR (Optical Character Recognition) transcript of the page. This enables:

Text search within the DjVu viewer (without re-OCRing)
Text selection and copy-paste
Accessibility (screen readers)
Indexing by search engines

The text positions are word-level — each word in the OCR transcript is associated with its bounding rectangle on the page.

DjVu vs PDF for Scanned Documents

Feature	DjVu	PDF
Compression for text scans	Excellent (JB2 + IW44)	Good (JBIG2 + JPEG2000 in PDF 1.5+)
File size for 300 DPI scan	20-100 KB/page	50-300 KB/page
Universal viewer support	No (requires plugin/app)	Yes (PDF is everywhere)
Browser native support	No	Via browser PDF viewer
Reflowable text	No (image-based)	No (for scanned PDFs)
Hyperlinks	Yes (annotations)	Yes
OCR text layer	Yes (TXTz)	Yes (invisible text overlay)
Editing	Very limited	Limited (for scanned)
Open standard	Open specification	ISO standard (ISO 32000)
Creation tools	DjVuLibre, Any2DjVu	Virtually everything

When DjVu wins: archiving large libraries of scanned books or journals at minimum file size while preserving excellent visual quality. A 600-page scanned book that would be 150 MB as PDF may be only 30 MB as DjVu.

When PDF wins: everything else — universal compatibility, browser support, editing, forms, digital signatures, and tool ecosystem. PDF has largely displaced DjVu in most workflows.

Opening DjVu Files

DjVu is not natively supported by most operating systems. You need dedicated software:

Windows:

WinDjView (free, lightweight)
DjView (free, cross-platform, from DjVuLibre project)
Okular (KDE viewer for Windows)
Sumatra PDF (supports DjVu natively)

macOS:

DjView (free)
MacDjView

Linux:

Evince (GNOME document viewer — supports DjVu)
Okular (KDE document viewer — supports DjVu)
DjView

Browser:

No native support. DjVu.js is a JavaScript renderer for web embedding.

Converting DjVu

DjVu to PDF (Most Common Conversion)

# Using DjVuLibre (djvulibre package)
ddjvu -format=pdf input.djvu output.pdf

# With DPI specification
ddjvu -format=pdf -resolution=300 input.djvu output.pdf

# Page range
ddjvu -format=pdf -page=1-50 input.djvu output_partial.pdf

DjVu to TIFF or PNG (Per-Page Export)

# Export all pages as individual TIFF files
ddjvu -format=tiff -resolution=300 input.djvu page-%04d.tiff

# Export all pages as PNG
ddjvu -format=ppm input.djvu page-%04d.ppm

# Export a single page as PNG
ddjvu -format=png -page=5 input.djvu page_5.png

Creating DjVu from Scans

# c44 — compress a single JPEG/PNG to DjVu (color page)
c44 scanned_page.jpg page.djvu

# cjb2 — compress a bitonal TIFF to JB2 (text-only page, best compression)
cjb2 -lossy scanned_text.tiff page_text.djvu

# djvm — bundle multiple page DjVu files into a multi-page document
djvm -c document.djvu page1.djvu page2.djvu page3.djvu

# Any2DjVu web service — uploads and converts automatically
# https://any2djvu.djvuzone.org/

# Document Express (commercial) — best quality for production scanning

Python with python-djvulibre

# pip install python-djvulibre
import djvu.decode

ctx = djvu.decode.Context()
doc = ctx.new_document(djvu.decode.FileURI('book.djvu'))
doc.decoding_job.wait()

print(f"Pages: {doc.pages.length}")
for i, page in enumerate(doc.pages):
    job = page.decode(wait=True)
    width = page.width
    height = page.height
    print(f"Page {i+1}: {width} x {height} pixels")

Where DjVu Is Used Today

Despite being overshadowed by PDF, DjVu maintains a presence in specific communities:

Internet Archive (archive.org): uses DjVu as one of its primary formats for book scans — millions of public domain books are available in DjVu format alongside PDF.

Libraries and academic archives: institutions that scanned large collections in the early 2000s still maintain DjVu archives.

Russian-language internet: DjVu has a particularly strong following in Russian-speaking communities, where many technical manuals, textbooks, and literary works are distributed as DjVu files.

Retro computing: vintage computer magazines and technical manuals from the 1980s-90s are often found as DjVu scans.

For new digitization projects today, the choice is typically between PDF (universal) or JPEG 2000 / HEIF (better compression for photos) rather than DjVu. But the format remains the best option for converting a legacy DjVu collection to a smaller file size than PDF would achieve.

Related conversions

Document conversions that follow this topic naturally:

Frequently Asked Questions

DjVu is a document format designed for scanned pages with mixed content (text, photos, and line art). Its small file size comes from a clever three-layer separation: (1) the JB2 layer stores text and line art as a bitonal (black/white) image using pattern matching — the same character shape (like the letter "e") is stored once and referenced by position everywhere it appears, achieving extreme compression for text; (2) the IW44 wavelet layer stores color/grayscale background; (3) a shared dictionary across all pages means common characters are stored only once per document. A 300 DPI scanned text page achieves 20-100 KB in DjVu vs 1-3 MB in TIFF.

DjVu is a document formato designed para scanned pages com mixed content (text, photos, e line art). Its small tamanho do arquivo comes de a clever three-layer separation: (1) the JB2 layer stores text e line art como um bitonal (black/white) image usando pattern matching — the same character shape (like the letter "e") is stored once e referenced by position everywhere it appears, achieving extreme compressão para text; (2) the IW44 wavelet layer stores color/grayscale fundo; (3) a compartilhado dictionary across all pages means common characters are stored only once per document. A 300 DPI scanned text page achieves 20-100 KB in DjVu vs 1-3 MB in TIFF.

DjVu is a document Format designed für scanned pages mit mixed content (text, photos, und line art). Its small Dateigröße comes von a clever three-layer separation: (1) the JB2 layer stores text und line art als ein bitonal (black/white) image using pattern matching — the same character shape (like the letter "e") is stored once und referenced by position everywhere it appears, achieving extreme Komprimierung für text; (2) the IW44 wavelet layer stores color/grayscale Hintergrund; (3) a shared dictionary across all pages means common characters are stored only once per document. A 300 DPI scanned text page achieves 20-100 KB in DjVu vs 1-3 MB in TIFF.

DjVu is a document formato designed para scanned pages con mixed content (text, photos, y line art). Its small tamaño de archivo comes de a clever three-layer separation: (1) the JB2 layer stores text y line art como un bitonal (black/white) image using pattern matching — the same character shape (like the letter "e") is stored once y referenced by position everywhere it appears, achieving extreme compresión para text; (2) the IW44 wavelet layer stores color/grayscale fondo; (3) a shared dictionary across all pages means common characters are stored only once per document. A 300 DPI scanned text page achieves 20-100 KB in DjVu vs 1-3 MB in TIFF.

Send <strong>PDF</strong> when the document is final and the layout must be preserved exactly (contracts, invoices, certificates). Send <strong>DOCX</strong> when reviewers need to edit, comment, or track changes. Many teams send both: PDF as the canonical version + DOCX for editable feedback. PDF/A is the right pick for legal archival (ISO 19005).

DjVu is not natively supported by most operating systems or browsers. On Windows, Sumatra PDF (free, lightweight) and WinDjView open DjVu natively. On macOS, DjView (free, from the DjVuLibre project) works well. On Linux, Evince and Okular both support DjVu. For a quick conversion without installing software, use the command-line tool `ddjvu` (from the djvulibre package): `ddjvu -format=pdf input.djvu output.pdf` converts to PDF which opens in any PDF viewer. Internet Archive (archive.org) provides DjVu.js — a browser-based viewer embedded in their book viewer.

DjVu is not natively suportado por most operating systems ou browsers. no Windows, Sumatra PDF (free, lightweight) e WinDjView abrir DjVu natively. On macOS, DjView (free, de the DjVuLibre project) funciona well. no Linux, Evince e Okular both support DjVu. para a quick conversion sem installing software, usar the command-line tool `ddjvu` (from the djvulibre package): `ddjvu -format=pdf input.djvu output.pdf` converts to PDF which opens in any PDF viewer. Internet Archive (archive.org) fornece DjVu.js — um navegador-based viewer embedded in their book viewer.

DjVu is not natively unterstützt by most operating systems oder browsers. auf Windows, Sumatra PDF (free, lightweight) und WinDjView öffnen DjVu natively. On macOS, DjView (free, von the DjVuLibre project) works well. auf Linux, Evince und Okular both support DjVu. für a quick conversion ohne installing Software, verwenden the command-line tool `ddjvu` (from the djvulibre package): `ddjvu -format=pdf input.djvu output.pdf` converts to PDF which opens in any PDF viewer. Internet Archive (archive.org) bietet DjVu.js — ein Browser-based viewer embedded in their book viewer.

DjVu is not natively soportado by most operating systems o browsers. en Windows, Sumatra PDF (free, lightweight) y WinDjView abrir DjVu natively. On macOS, DjView (free, de the DjVuLibre project) works well. en Linux, Evince y Okular both support DjVu. para a quick conversion sin installing software, usar the command-line tool `ddjvu` (from the djvulibre package): `ddjvu -format=pdf input.djvu output.pdf` converts to PDF which opens in any PDF viewer. Internet Archive (archive.org) proporciona DjVu.js — un navegador-based viewer embedded in their book viewer.

Round-tripping between similar formats (DOCX ↔ ODT, DOCX → PDF) is generally safe. Round-tripping with format-specific features (Word macros, complex tables, footnotes) often loses fidelity. Embedded fonts survive only if both source and target support font embedding (PDF yes, DOCX yes, plain HTML no). Always preview the result before deleting the original.

The standard tool is `ddjvu` from the DjVuLibre package: install on Ubuntu with `apt install djvulibre-bin`, on macOS with `brew install djvulibre`. Then: `ddjvu -format=pdf input.djvu output.pdf`. For multi-page documents with a specific DPI: `ddjvu -format=pdf -resolution=300 input.djvu output.pdf`. For extracting specific pages: `ddjvu -format=pdf -page=1-50 input.djvu pages_1-50.pdf`. Online tools (Zamzar, Convertio, PDF2Doc) also convert DjVu to PDF without software installation. Note that the resulting PDF contains images (not native PDF text), just like the original DjVu.

The padrão tool is `ddjvu` de the DjVuLibre package: install on Ubuntu com `apt install djvulibre-bin`, on macOS com `brew install djvulibre`. Then: `ddjvu -format=pdf input.djvu output.pdf`. para multi-page documents com a specific DPI: `ddjvu -format=pdf -resolution=300 input.djvu output.pdf`. para extracting specific pages: `ddjvu -format=pdf -page=1-50 input.djvu pages_1-50.pdf`. Online ferramentas (Zamzar, Convertio, PDF2Doc) also converter DjVu to PDF sem software installation. Note that the resulting PDF contém images (not native PDF text), just like the original DjVu.

The Standard tool is `ddjvu` von the DjVuLibre package: install on Ubuntu mit `apt install djvulibre-bin`, on macOS mit `brew install djvulibre`. Then: `ddjvu -format=pdf input.djvu output.pdf`. für multi-page documents mit a specific DPI: `ddjvu -format=pdf -resolution=300 input.djvu output.pdf`. für extracting specific pages: `ddjvu -format=pdf -page=1-50 input.djvu pages_1-50.pdf`. Online Werkzeuge (Zamzar, Convertio, PDF2Doc) also umwandeln DjVu to PDF ohne Software installation. Note that the resulting PDF contains images (not native PDF text), just like the original DjVu.

The estándar tool is `ddjvu` de the DjVuLibre package: install on Ubuntu con `apt install djvulibre-bin`, on macOS con `brew install djvulibre`. Then: `ddjvu -format=pdf input.djvu output.pdf`. para multi-page documents con a specific DPI: `ddjvu -format=pdf -resolution=300 input.djvu output.pdf`. para extracting specific pages: `ddjvu -format=pdf -page=1-50 input.djvu pages_1-50.pdf`. Online herramientas (Zamzar, Convertio, PDF2Doc) also convertir DjVu to PDF sin software installation. Note that the resulting PDF contains images (not native PDF text), just like the original DjVu.

If the PDF contains real text (not scanned images), <code>pdftotext</code> from poppler-utils or <a href="/convert/pdf-to-txt">PDF to TXT</a> works in seconds. If the PDF is a scanned image, you need OCR — Tesseract is the open-source standard. KaijuConverter's PDF tools auto-detect text-vs-image PDFs and route accordingly.

The largest source is Internet Archive (archive.org) — search for books, magazines, or technical manuals and look for the DjVu download option (usually alongside PDF, EPUB, and plain text). Many public domain books, scientific journals, and historical documents are available. Academic libraries that digitized collections in the 2000s (Russian State Library, many university libraries) maintain DjVu archives. Russian-language sites (lib.ru, djvu.org) have extensive technical and literary DjVu collections. For retro computing: scans of vintage computer magazines (Byte, PCWorld, Dr. Dobb's) are often found in DjVu format on archive sites.

The largest source is Internet Archive (archive.org) — search para books, magazines, ou technical manuals e look para the DjVu baixar option (Geralmente alongside PDF, EPUB, e plain text). Many public domain books, scientific journals, e historical documents are disponível. Academic libraries that digitized collections no 2000s (Russian State Library, many university libraries) maintain DjVu archives. Russian-language sites (lib.ru, djvu.org) have extensive technical e literary DjVu collections. para retro computing: scans of vintage computer magazines (Byte, PCWorld, Dr. Dobb's) are often found in DjVu formato on archive sites.

The largest source is Internet Archive (archive.org) — search für books, magazines, oder technical manuals und look für the DjVu herunterladen option (Normalerweise alongside PDF, EPUB, und plain text). Many public domain books, scientific journals, und historical documents are verfügbar. Academic libraries that digitized collections im 2000s (Russian State Library, many university libraries) maintain DjVu archives. Russian-language sites (lib.ru, djvu.org) have extensive technical und literary DjVu collections. für retro computing: scans von vintage computer magazines (Byte, PCWorld, Dr. Dobb's) are often found in DjVu Format on archive sites.

The largest source is Internet Archive (archive.org) — search para books, magazines, o technical manuals y look para the DjVu descargar option (Normalmente alongside PDF, EPUB, y plain text). Many public domain books, scientific journals, y historical documents are disponible. Academic libraries that digitized collections en el 2000s (Russian State Library, many university libraries) maintain DjVu archives. Russian-language sites (lib.ru, djvu.org) have extensive technical y literary DjVu collections. para retro computing: scans de vintage computer magazines (Byte, PCWorld, Dr. Dobb's) are often found in DjVu formato on archive sites.

Light edits (annotations, signatures, form fields) are fine in any PDF reader. Structural edits (changing paragraphs, replacing images) are awkward — PDF is a presentation format, not an editing format. The robust workflow is: keep the source DOCX/MD/HTML as the master, regenerate the PDF when changes are needed. Tools that "edit PDFs" reverse-engineer the layout and frequently break it.