ODT (OpenDocument Text) is the ISO standard format for text documents in LibreOffice and Apache OpenOffice. It is a ZIP file containing XML, images, and metadata — fully open and free of proprietary software dependencies.
## ODT vs DOCX: comparison
| Feature | ODT | DOCX |
|---|---|---|
| Standard | ISO/IEC 26300 | ECMA-376 / ISO 29500 |
| Developed by | OASIS / ISO | Microsoft |
| Support | LibreOffice, OpenOffice, Google Docs | Word, LibreOffice, Google Docs |
| Internal format | Open XML in ZIP | XML in ZIP (proprietary variant) |
| Historical compatibility | Very stable | Varies across Word versions |
| Government adoption | Very high (EU, Spain) | High |
## Internal ODT structure
```bash
# An ODT is a ZIP — inspect its contents
unzip -l document.odt
# Archive: document.odt
# Length Name
# -------- ----
# 39 mimetype
# 2847 content.xml <- Document text
# 1024 styles.xml <- Styles and formatting
# 512 meta.xml <- Metadata (author, date...)
# 256 settings.xml <- LibreOffice settings
# 128 META-INF/manifest.xml
# Extract and view XML content
unzip -p document.odt content.xml | python3 -m xml.etree.ElementTree
```
## Install odfpy
```bash
pip install odfpy
```
## Create an ODT document with odfpy
```python
from odf.opendocument import OpenDocumentText
from odf.style import Style, TextProperties, ParagraphProperties
from odf.text import P, H, Span
# Create document
doc = OpenDocumentText()
# Define styles
heading_style = Style(name='MyHeading', family='paragraph')
heading_style.addElement(TextProperties(fontsize='18pt', fontweight='bold'))
heading_style.addElement(ParagraphProperties(textalign='center'))
doc.styles.addElement(heading_style)
# Add heading
h = H(outlinelevel=1, stylename='Heading 1')
h.addText('My First ODT Document')
doc.text.addElement(h)
# Add paragraphs
p1 = P(stylename='Text Body')
p1.addText('This document was created with Python and odfpy.')
doc.text.addElement(p1)
p2 = P()
p2.addText('odfpy lets you create OpenDocument files without LibreOffice installed.')
doc.text.addElement(p2)
# Save
doc.save('my_document.odt')
print("ODT document created successfully")
```
## Extract text from an ODT
```python
from odf.opendocument import load
from odf import teletype
def extract_odt_text(odt_path):
"""Extract all text from an ODT document."""
doc = load(odt_path)
lines = []
for element in doc.text.childNodes:
if element.qname[1] in ('p', 'h'):
line = teletype.extractText(element).strip()
if line:
lines.append(line)
return '\n'.join(lines)
content = extract_odt_text('document.odt')
print(content[:500])
```
## Read metadata
```python
from odf.opendocument import load
doc = load('document.odt')
meta = doc.meta
print(f"Title: {meta.title}")
print(f"Author: {meta.creator}")
print(f"Created: {meta.creation_date}")
print(f"Words: {meta.word_count}")
print(f"Pages: {meta.page_count}")
```
## Convert ODT with LibreOffice CLI
LibreOffice can convert ODT to DOCX, PDF, HTML and other formats headlessly (no GUI):
```bash
# ODT → DOCX
libreoffice --headless --convert-to docx document.odt
# ODT → PDF
libreoffice --headless --convert-to pdf document.odt
# ODT → HTML
libreoffice --headless --convert-to html document.odt
# Specify output directory
libreoffice --headless --convert-to pdf --outdir ./pdfs/ document.odt
# Batch convert all ODTs in directory
libreoffice --headless --convert-to pdf *.odt
```
## Batch conversion with Python
```python
import subprocess
from pathlib import Path
def convert_odt_to_pdf(input_dir, output_dir):
"""Convert all ODT files in a folder to PDF using LibreOffice."""
source = Path(input_dir)
dest = Path(output_dir)
dest.mkdir(parents=True, exist_ok=True)
odt_files = list(source.glob('*.odt'))
print(f"Found: {len(odt_files)} ODT files")
for odt in sorted(odt_files):
print(f" Converting: {odt.name}")
result = subprocess.run(
['libreoffice', '--headless', '--convert-to', 'pdf',
'--outdir', str(dest), str(odt)],
capture_output=True, text=True,
)
if result.returncode == 0:
pdf_name = odt.stem + '.pdf'
size = (dest / pdf_name).stat().st_size / 1024
print(f" OK → {pdf_name} ({size:.0f} KB)")
else:
print(f" ERROR: {result.stderr[:200]}")
print(f"\nDone: {len(odt_files)} files processed")
convert_odt_to_pdf('odt_documents/', 'generated_pdfs/')
```
## Convert DOCX to ODT
```python
import subprocess
# Most reliable: LibreOffice CLI
subprocess.run(['libreoffice', '--headless', '--convert-to', 'odt', 'document.docx'])
# Alternative: Pandoc (handles basic styles)
# pip install pypandoc
import pypandoc
pypandoc.convert_file('document.docx', 'odt', outputfile='document.odt')
```
## When to use ODT
- **Long-lived documents**: ODT is more stable than DOCX across future software versions
- **Government and public sector**: required by many European administrations
- **Avoiding Microsoft Office dependency**: completely free and open ecosystem
- **Interoperability**: compatible with LibreOffice, OpenOffice, Google Docs, Zoho, and more
Guide