Guide

DjVu Format: The Forgotten Champion of Scanned Documents

PC By Pablo Cirre • Updated May 7, 2026

What Is DjVu and Where Did It Come From?

DjVu (pronounced "déjà vu") is a digital document format invented at AT&T Bell Labs in 1996 by researchers Léon Bottou and Yann LeCun — yes, the same Yann LeCun who later became one of the founding figures of modern deep learning and now serves as Chief AI Scientist at Meta. The format was specifically designed to compress scanned documents at ratios far beyond what PDF could achieve at the time.

AT&T open-sourced DjVu in 2002 under the GNU GPL license. This allowed the format to be adopted widely by digital library projects. The Internet Archive (archive.org) adopted DjVu early for its massive book scanning initiatives. Wikimedia Commons uses DjVu for historical documents and old books. Google Books used DjVu in its early scanning phases before transitioning toward PDF. Academic and public libraries in Russia, Eastern Europe, and parts of Asia became particularly heavy users, a legacy still visible today.

The Technical Innovation: Three-Layer Compression

What makes DjVu genuinely clever is its approach to document structure. Instead of treating a scanned page as a single image (as naive PDF scanning does), DjVu decomposes each page into three distinct layers:

Foreground layer — text, line art, and fine details encoded at very high resolution (typically 300–600 DPI), using a lossless or near-lossless wavelet-based compressor called IW44.
Background layer — photographs, gradients, and color regions encoded at a much lower resolution (typically 100 DPI or less), also using IW44 but at heavier compression ratios.
Mask layer — a binary (black/white) bitmap that tells the renderer which pixels belong to foreground and which to background.

This separation is the key insight. A text character on a white page background benefits from very high-resolution foreground encoding but almost no background data at all. An illustration benefits from a well-compressed background without wasting bits on a high-resolution text layer. The encoder applies each compression strategy only where it matters.

Real File Size Comparison

The results are dramatic:

Document Type	DjVu	PDF (JBIG2)	Raw TIFF scan
300-page scanned book	~3 MB	~30 MB	~150 MB
Single scanned page (300 DPI)	~15–40 KB	~100–300 KB	~500 KB
Mixed text+photo magazine page	~50–80 KB	~200–400 KB	~800 KB

A 5–10x size advantage over PDF for scanned documents is typical. For purely text-heavy scanned books, the advantage can exceed 20x compared to naive PDF scanning with embedded JPEG images.

Where DjVu Files Are Found Today

Despite its relative obscurity in everyday computing, DjVu files are extremely common in specific contexts:

Internet Archive (archive.org) — Millions of books scanned by the Archive are available in DjVu format alongside PDF and EPUB. The Archive's scan pipeline historically produced DjVu as its primary archival format.
Wikimedia Commons — Historical newspapers, maps, manuscripts, and old scientific publications are stored as DjVu files and rendered into PNG thumbnails for Wikipedia pages.
Google Books (early era) — Google's initial scanning program in the mid-2000s produced DjVu files for many books. Later scans shifted to PDF.
Academic digital libraries — Russian State Library (RSL), National Digital Library of India, and numerous Eastern European academic repositories host extensive DjVu collections.
Wikipedia's own source files — Many Wikipedia articles cite DjVu documents that require a special viewer to open directly.

How to Open DjVu Files

Opening a DjVu file is not as simple as double-clicking, since no major operating system includes a native DjVu viewer:

Windows:

WinDjView — Lightweight, fast, free. The most popular Windows DjVu viewer. Includes thumbnails, bookmarks, and zoom.
Sumatra PDF — Supports DjVu alongside PDF, EPUB, and other formats. Excellent for quick viewing.
DjView4 — The official reference viewer, cross-platform, slightly heavier but very capable.

macOS:

DjView4 — Available via Homebrew (brew install djview4) or as a standalone build.
No native macOS support exists; converting to PDF first is often the path of least resistance.

Linux:

Evince — GNOME's document viewer supports DjVu natively on most distributions.
Okular — KDE's viewer also handles DjVu well.
djvulibre package provides both the viewer and command-line tools.

Browser:

djvu.js — A browser extension available for Firefox and Chrome that renders DjVu files inline. Works well for single-file viewing but is not installed by default anywhere.

Android/iOS:

EBookDroid (Android) — Supports DjVu among many other formats. Popular among users who read archive.org books on tablets.
iOS has limited native DjVu support; converting to PDF first is recommended.

Converting DjVu Files

The primary tool for DjVu conversion is ddjvu, included in the djvulibre package available on all major platforms.

Convert entire DjVu to PDF:

ddjvu -format=pdf input.djvu output.pdf

Extract a single page as TIFF:

ddjvu -format=tiff -page=3 input.djvu page3.tiff

Extract a range of pages:

ddjvu -format=tiff -page=1..10 input.djvu pages1to10.tiff

Using ImageMagick (if DjVu delegate is installed):

convert input.djvu output.pdf

For users who prefer a graphical approach, WinDjView on Windows offers File → Export as PDF, and DjView4 provides similar export functionality. Online converters handle single DjVu files adequately if you don't want to install software.

DjVu vs PDF for Scanned Documents

Feature	DjVu	PDF (scanned)
File size for scanned books	Excellent (5–10x smaller)	Moderate to poor
Native browser support	None (requires extension)	Full (all modern browsers)
Mobile support	Requires EBookDroid or conversion	Native on iOS/Android
Searchable text layer	Supported (if OCR was run)	Supported
Printing	Supported via viewers	Universal
Editing/annotation	Very limited	Good (with Acrobat/Foxit)
Long-term archival	Acceptable	PDF/A standard exists
Ecosystem & tools	Niche	Ubiquitous

Why DjVu Lost the Format War

Despite its technical superiority for scanned documents, DjVu failed to become mainstream. The reasons are largely about ecosystem and timing:

Adobe Reader was free and everywhere. By 2005, nearly every Windows PC had Adobe Reader installed. DjVu required users to install a separate viewer.
PDF became the web standard. Browsers began embedding PDF renderers. DjVu never achieved this integration.
PDF/A for archival. Institutions adopted PDF/A as the archival standard because it had ISO backing (ISO 19005). DjVu has no equivalent standardization.
Microsoft killed its own XPS format too. The document format wars of the 2000s left PDF as the only viable survivor.
AT&T's open-sourcing came too late. By 2002, PDF had already won mindshare. Open-sourcing DjVu couldn't overcome five years of PDF adoption.

When to Keep DjVu vs When to Convert

Keep DjVu if:

You're downloading from archive.org purely for reading — DjVu viewers are available and the smaller file size matters on slow connections.
You're storing a large collection of scanned books and storage space is a concern.
The source DjVu has a high-quality text/OCR layer you want to preserve exactly.

Convert to PDF if:

You need to share the document with anyone who isn't technically inclined.
You need annotation, commenting, or digital signature support.
You're uploading to any document management system or email attachment.
Accessibility is a concern (PDF/UA has better accessibility tooling).

For most users who encounter a DjVu file from archive.org, converting it to PDF once and keeping the PDF is the pragmatic choice.

Related conversions

Frequent conversions across the catalogue:

Frequently Asked Questions

DjVu's three-layer compression (foreground text at high resolution, background images at lower resolution, binary mask) is purpose-built for scanned documents. PDF with images uses JPEG or JBIG2 compression but doesn't apply the same intelligent layer separation. A 300-page scanned book can be 3 MB in DjVu versus 30+ MB in PDF.

DjVu's three-layer compressão (foreground text at alta resolução, fundo images at baixaer resolução, binary mask) is purpose-built para scanned documents. PDF com images uses JPEG ou JBIG2 compressão mas doesn't apply the same intelligent layer separation. A 300-page scanned book can be 3 MB in DjVu versus 30+ MB in PDF.

DjVu's three-layer Komprimierung (foreground text at hoch Auflösung, Hintergrund images at niedriger Auflösung, binary mask) is purpose-built für scanned documents. PDF mit images uses JPEG oder JBIG2 Komprimierung aber doesn't apply the same intelligent layer separation. A 300-page scanned book can be 3 MB in DjVu versus 30+ MB in PDF.

DjVu's three-layer compresión (foreground text at alta resolución, fondo images at bajaer resolución, binary mask) is purpose-built para scanned documents. PDF con images uses JPEG o JBIG2 compresión pero doesn't apply the same intelligent layer separation. A 300-page scanned book can be 3 MB in DjVu versus 30+ MB in PDF.

On KaijuConverter every file is processed inside an isolated container, encrypted in transit (TLS 1.3) and at rest, and automatically deleted after 60 minutes with multi-pass overwrite. We never train on, share, or analyze user content. For maximum privacy on extremely sensitive material, prefer offline tools (ImageMagick, FFmpeg, LibreOffice) that you control end-to-end.

Not natively in modern browsers. You need the djvu.js browser extension (available for Firefox and Chrome) or use an online DjVu viewer. Alternatively, convert the DjVu file to PDF first, which any browser can display.

Not natively in moderno browsers. You need the djvu.js browser extension (available para Firefox e Chrome) ou usar an online DjVu viewer. Alternatively, converter the DjVu arquivo to PDF first, which any browser can display.

Not natively in modern browsers. You need the djvu.js browser extension (available für Firefox und Chrome) oder verwenden an online DjVu viewer. Alternatively, umwandeln the DjVu Datei to PDF first, which any browser can display.

Not natively in moderno browsers. You need the djvu.js browser extension (available para Firefox y Chrome) o usar an online DjVu viewer. Alternatively, convertir the DjVu archivo to PDF first, which any browser can display.

For 95% of use cases, yes — server-side ImageMagick, FFmpeg and LibreOffice produce identical output to the same tools on your laptop. Desktop software wins for: extremely large files (multi-GB), batch jobs of thousands of files, scripted pipelines, or content too sensitive to upload. KaijuConverter caps at 500 MB per file (1 GB on paid plans).

Install DjView4 or WinDjView, which include the `ddjvu` command-line tool. Then run `ddjvu -format=pdf input.djvu output.pdf`. If you prefer a GUI, open the file in WinDjView and use File → Export as PDF.

Install DjView4 ou WinDjView, which include the `ddjvu` command-line tool. Then run `ddjvu -format=pdf input.djvu output.pdf`. If you prefer a GUI, abrir o arquivo in WinDjView e usar arquivo → Export as PDF.

Install DjView4 oder WinDjView, which include the `ddjvu` command-line tool. Then run `ddjvu -format=pdf input.djvu output.pdf`. If you prefer a GUI, öffnen die Datei in WinDjView und verwenden Datei → Export as PDF.

Install DjView4 o WinDjView, which include the `ddjvu` command-line tool. Then run `ddjvu -format=pdf input.djvu output.pdf`. If you prefer a GUI, abrir el archivo in WinDjView y usar archivo → Export as PDF.

Most format conversions are lossy by design — JPG, MP3, MP4, WebP all discard perceptual data to save bytes. Going through a lossy intermediate compounds the loss. To minimize visible/audible drift: convert from the original master, choose a higher quality setting, and avoid converting back and forth between lossy formats.

Yes, if the document includes a hidden text layer (added via OCR during scanning). Many books from archive.org include searchable DjVu files. When converting to PDF, this text layer is preserved so the resulting PDF is also searchable.

Yes — KaijuConverter accepts multiple files in a single drop and returns a ZIP. For very large batches (thousands of files) consider command-line tools or our API: <code>find . -name "*.heic" -exec magick {} {.}.jpg \;</code> or similar one-liners scale to millions of files when run locally.