PDF/A: The ISO Standard for Long-Term Document Archival
PDF/A is a family of ISO standards that define a subset of PDF specifically designed for long-term archival. Where regular PDF allows external dependencies (linked fonts, external media, JavaScript, encryption), PDF/A mandates self-containment: everything needed to render the document identically must be embedded within the file itself. Understanding PDF/A's requirements, conformance levels, and use cases is essential for anyone working with legal documents, government records, medical records, or any content requiring decades of reliable access.
Why PDF/A? The Problem with Regular PDF
A standard PDF file can reference:
- External fonts: A file saying "use Times New Roman 12pt" relies on that font being installed on the viewing system
- External color profiles: Color rendering may differ between systems
- Encryption: An encrypted PDF cannot be read by a future viewer without the key
- JavaScript: Interactive elements may not function in future viewers
- Linked media: External images, audio, or video files that may not exist in the future
- Proprietary extensions: Vendor-specific PDF features that may not be implemented by future software
PDF/A prohibits all of these, ensuring the document will render identically on any compliant viewer in the year 2100.
PDF/A Conformance Levels and Parts
PDF/A has evolved through multiple ISO standards:
PDF/A-1 (ISO 19005-1:2005)
| Level | Description |
|---|---|
| PDF/A-1a | Conformance Level A β full accessibility: tagged structure, Unicode character mapping, logical reading order |
| PDF/A-1b | Conformance Level B β visual appearance preservation only; no structure requirements |
PDF/A-2 (ISO 19005-2:2011)
Based on PDF 1.7. Adds:
- JPEG 2000 image compression support
- Transparent graphics (PDF 1.4+)
- Optional content (layers)
- Embedding of PDF/A-1 and PDF/A-2 files
- Improved color management
| Level | Description |
|---|---|
| PDF/A-2a | Full accessibility (tagged, Unicode) |
| PDF/A-2b | Visual preservation only |
| PDF/A-2u | Visual preservation + Unicode character mapping |
PDF/A-3 (ISO 19005-3:2012)
Identical to PDF/A-2 except it allows embedding of any file type (not just other PDF/A files). This enables embedding source data (XML, CSV, original Word document) alongside the rendered PDF β critical for e-invoicing (ZUGFeRD, Factur-X standards).
PDF/A-4 (ISO 19005-4:2020)
Based on PDF 2.0. Replaces A/B/U levels with:
- PDF/A-4: basic requirements
- PDF/A-4e: engineering (allows embedded 3D models in U3D/PRC format)
- PDF/A-4f: allows arbitrary file attachments (like PDF/A-3)
What PDF/A Prohibits
| Prohibited Feature | Reason |
|---|---|
| Unembedded fonts | Font unavailability makes text unrenderable |
| Encryption | Prevents future access |
| JavaScript | Behavior may change with viewer versions |
| Audio/video content | External media may be unavailable |
| External content references | Links to external files may break |
| LZW compression | Patent concerns (historical, but retained for compatibility) |
| Colorspaces without ICC profiles | Color rendering would be device-dependent |
| Transparency groups (PDF/A-1 only) | Complex rendering requirements |
| Embedded TrueType with PostScript outlines | Compatibility issues |
What PDF/A Requires
| Required Feature | Purpose |
|---|---|
| All fonts embedded | Identical text rendering |
| Color spaces with ICC profiles (or DeviceGray/DeviceRGB/DeviceCMYK with OutputIntent) | Consistent color reproduction |
| XMP metadata | Structured, machine-readable document properties |
| Valid PDF structure | Correct cross-reference tables, object streams |
| Document information dictionary | Title, author, subject, keywords |
The XMP metadata requirement is particularly important: PDF/A mandates a pdfaid:conformance and pdfaid:part property in the XMP:
<rdf:Description rdf:about="" xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
<pdfaid:part>2</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
</rdf:Description>
Without this, a file is not self-identifying as PDF/A even if it meets all other requirements.
Creating PDF/A Files
From Microsoft Office
Word/Excel/PowerPoint:
File β Save As β PDF
Options β ISO 19005-1 compliant (PDF/A) β
LibreOffice
# Export as PDF/A-1b
libreoffice --headless --convert-to \
"pdf:writer_pdf_Export:{SelectPdfVersion:{type:long,value:1}}" \
document.docx
Or via GUI: File β Export as PDF β General tab β PDF/A-1b β
Ghostscript (convert any PDF to PDF/A)
# Convert to PDF/A-2b
gs -dPDFA=2 -dBATCH -dNOPAUSE \
-sProcessColorModel=DeviceRGB \
-sDEVICE=pdfwrite \
-dPDFACompatibilityPolicy=1 \
-sOutputFile=output_pdfa2b.pdf \
/usr/share/ghostscript/PDFA_def.ps \
input.pdf
# Convert to PDF/A-1b
gs -dPDFA=1 -dBATCH -dNOPAUSE \
-sProcessColorModel=DeviceRGB \
-sDEVICE=pdfwrite \
-dPDFACompatibilityPolicy=1 \
-sOutputFile=output_pdfa1b.pdf \
/usr/share/ghostscript/PDFA_def.ps \
input.pdf
Python β pikepdf
import pikepdf
# Open existing PDF and save as PDF/A (requires all fonts already embedded)
with pikepdf.open("input.pdf") as pdf:
# Add XMP metadata for PDF/A identification
with pdf.open_metadata() as meta:
meta["pdfaid:part"] = "2"
meta["pdfaid:conformance"] = "B"
pdf.save("output_pdfa.pdf")
Validating PDF/A Compliance
Creating a file as PDF/A is not the same as having a valid PDF/A file β tools can produce non-compliant output while claiming compliance. Always validate:
VeraPDF (open-source reference validator)
# Validate PDF/A conformance
verapdf --flavour 2b document.pdf
# Validate and output detailed report
verapdf --format text --verbosity 6 document.pdf > report.txt
# Batch validate directory
verapdf --flavour 1b *.pdf
Apache PDFBox
# Validate with PDFBox preflight
java -jar pdfbox-app-3.x.jar PDFAValidator document.pdf
Common validation failures:
- Missing font embedding (most common)
- Missing ICC color profile or OutputIntent
- Missing XMP metadata
- Invalid XMP structure
- Use of prohibited features (JavaScript, encryption)
- Incorrect PDF structure (corrupt cross-reference table)
PDF/A Use Cases and Industry Requirements
| Industry | Standard | Conformance Level | Notes |
|---|---|---|---|
| Government / legal | PDF/A-1 or 2 | A or B | Many jurisdictions require PDF/A for official submissions |
| Healthcare (HIPAA) | PDF/A-2 | B | Medical records archival |
| E-invoicing (EU) | PDF/A-3 | B | ZUGFeRD, Factur-X embed XML invoice data |
| Patent offices | PDF/A-1 | B | USPTO, EPO accept PDF/A-1b |
| Banking/finance | PDF/A-1 or 2 | B | Regulatory document retention |
| Museums/libraries | PDF/A-2 | A | Highest accessibility for cultural heritage |
| Engineering | PDF/A-4e | β | Allows 3D model attachments |
PDF/A vs. Regular PDF vs. PDF/X vs. PDF/UA
| Standard | Purpose | Key Requirement |
|---|---|---|
| PDF/A | Long-term archival | Self-contained, no external dependencies |
| PDF/X | Print production | CMYK color, no RGB, no transparency in v1 |
| PDF/UA | Universal accessibility | Full tagging, Alt text, logical reading order |
| PDF/E | Engineering | 3D content support |
| Regular PDF | General use | No restrictions |
Summary
PDF/A answers a specific question: "Will this document render identically in 50 years?" By mandating font embedding, ICC color profiles, XMP metadata, and prohibiting encryption and external dependencies, PDF/A creates self-contained archival artifacts. For organizations with long-term document retention requirements β government, legal, healthcare, financial, cultural heritage β PDF/A is not optional but a compliance requirement. Always validate PDF/A files with VeraPDF after creation, as many tools produce non-compliant output despite claiming PDF/A export support.