Why Convert PDF to Images?
For thumbnails, content extraction, Photoshop editing, compatibility, or OCR preparation.
Ghostscript — The Most Powerful Tool
# All pages to JPEG (150 DPI)
gs -dNOPAUSE -dBATCH -sDEVICE=jpeg -r150 -sOutputFile=page_%02d.jpg document.pdf
# First page only
gs -dNOPAUSE -dBATCH -dFirstPage=1 -dLastPage=1 -sDEVICE=jpeg -r150 -sOutputFile=cover.jpg doc.pdf
Poppler (pdftoppm) — Fast on Linux/macOS
sudo apt install poppler-utils # Ubuntu/Debian
brew install poppler # macOS
pdftoppm -r 150 -png document.pdf page # → page-1.png, page-2.png...
pdftoppm -r 150 -jpeg -jpegopt quality=85 document.pdf page
pdftoppm -r 150 -png -f 2 -l 5 document.pdf page # Pages 2-5 only
Python with pdf2image
from pdf2image import convert_from_path
pages = convert_from_path('document.pdf', dpi=150)
for i, page in enumerate(pages, 1):
page.save(f'page_{i:02d}.jpg', 'JPEG', quality=85)
What DPI to Choose?
| Use Case | DPI | Notes |
|---|---|---|
| Web thumbnail | 72 | Small and fast |
| HD display | 150 | Good quality/size balance |
| Print / OCR | 300 | Sharp text |
| High-quality archive | 600 | Critical documents |
Conclusion
Ghostscript for bulk server-side conversion. pdf2image for Python. Online tools for occasional use. DPI is the most critical parameter: 150 for screen, 300 for print and OCR.
Advanced Use Cases
Legal and business workflow: signed contracts are distributed in PDF/A (PDF Archival) which guarantees the document renders identically in 50 years — all fonts embedded, no JavaScript, no dynamic content. Electronic signature (PAdES, advanced eSignatures) is integrated in PDF and validates cryptographically. Academic publishing: theses and papers use LaTeX → PDF for precise mathematical formulas; journals require specific format (Word with styles, LaTeX with journal class, or both). Converting between LaTeX, DOCX and PDF preserving semantic structure (citations, references, equations) requires specialized tools like Pandoc. E-books: Amazon KDP requires MOBI (legacy) or KFX (modern) format generated from EPUB; Apple Books accepts EPUB natively; Kobo, PocketBook and Nook prefer EPUB; Google Play Books accepts PDF and EPUB. Converting DOCX manuscript to EPUB requires attention to semantic markup (hierarchical headings, lists, blockquotes) so reflowable layout works well. Translation workflows: professional translators prefer XLIFF as intermediate format — maintains segmentation, translation memory matches, and context info that is lost in plain text export.
Best Practices and Professional Tips
Style preservation: when converting between editable formats (DOCX↔ODT↔RTF), always use defined styles instead of direct formatting — Heading 1/2/3 vs "16pt Bold". This guarantees outline preservation and conversion maintains hierarchical structure. Font embedding: for cross-platform distribution (PDF), embed all non-standard fonts — without embedding, readers automatically substitute with available fonts which can break layout. PDF/A for archival: for legal, regulatory or permanent archive documents, export as PDF/A-1b (basic) or PDF/A-2u (Unicode + JPEG2000) — doesn't allow dynamic content, guarantees self-contained rendering. Revision history: PDF, DOCX and ODT support tracked changes and comments — when converting, decide if you want to preserve them (DOCX→DOCX or DOCX→ODT) or flatten them (DOCX→PDF typically accepts only the final state). Accessibility: PDF documents for public distribution should comply with PDF/UA — heading structure, alt text on images, defined reading order for screen readers.
Compatibility and Technical Considerations
KaijuConverter uses LibreOffice 7.6 headless as the main engine with Pandoc 3.x as fallback for complex markup conversions. We support more than 60 document formats (PDF, DOCX, DOC, ODT, RTF, TXT, HTML, MD, EPUB, MOBI, AZW3, FB2, LaTeX, RST and more). Format fidelity: we preserve fonts (with substitution fallback if original font is not on system), sizes, colors, complete paragraphs with indentation and line spacing, nested lists, tables with cell merging and complex borders, embedded images with anchor positioning, headers/footers with dynamic fields (page number, date, document title), footnotes and endnotes. PDF conversion: we guarantee PDF/A-conforming output when required, with correct font embedding and ICC profile embedding for absolute color fidelity. Limitations: documents with macros (DOCX with VBA) don't execute macros during conversion — only static content is preserved. Scanned PDFs (image without OCR) are not editable — they need previous OCR (Tesseract, ABBYY) to extract text. Privacy: TLS 1.3, isolated Docker containers, deletion after 2 hours. Performance: typical 20-page document takes 3-8 seconds; large documents with many images may require 15-30 seconds.
Related conversions
Most teams that read this guide convert images in one of these directions: