ODT Format with Python: odfpy, LibreOffice CLI & conversion

ODT (OpenDocument Text) is the ISO standard format for text documents in LibreOffice and Apache OpenOffice. It is a ZIP file containing XML, images, and metadata — fully open and free of proprietary software dependencies. ## ODT vs DOCX: comparison | Feature | ODT | DOCX | |---|---|---| | Standard | ISO/IEC 26300 | ECMA-376 / ISO 29500 | | Developed by | OASIS / ISO | Microsoft | | Support | LibreOffice, OpenOffice, Google Docs | Word, LibreOffice, Google Docs | | Internal format | Open XML in ZIP | XML in ZIP (proprietary variant) | | Historical compatibility | Very stable | Varies across Word versions | | Government adoption | Very high (EU, Spain) | High | ## Internal ODT structure ```bash # An ODT is a ZIP — inspect its contents unzip -l document.odt # Archive: document.odt # Length Name # -------- ---- # 39 mimetype # 2847 content.xml <- Document text # 1024 styles.xml <- Styles and formatting # 512 meta.xml <- Metadata (author, date...) # 256 settings.xml <- LibreOffice settings # 128 META-INF/manifest.xml # Extract and view XML content unzip -p document.odt content.xml | python3 -m xml.etree.ElementTree ``` ## Install odfpy ```bash pip install odfpy ``` ## Create an ODT document with odfpy ```python from odf.opendocument import OpenDocumentText from odf.style import Style, TextProperties, ParagraphProperties from odf.text import P, H, Span # Create document doc = OpenDocumentText() # Define styles heading_style = Style(name='MyHeading', family='paragraph') heading_style.addElement(TextProperties(fontsize='18pt', fontweight='bold')) heading_style.addElement(ParagraphProperties(textalign='center')) doc.styles.addElement(heading_style) # Add heading h = H(outlinelevel=1, stylename='Heading 1') h.addText('My First ODT Document') doc.text.addElement(h) # Add paragraphs p1 = P(stylename='Text Body') p1.addText('This document was created with Python and odfpy.') doc.text.addElement(p1) p2 = P() p2.addText('odfpy lets you create OpenDocument files without LibreOffice installed.') doc.text.addElement(p2) # Save doc.save('my_document.odt') print("ODT document created successfully") ``` ## Extract text from an ODT ```python from odf.opendocument import load from odf import teletype def extract_odt_text(odt_path): """Extract all text from an ODT document.""" doc = load(odt_path) lines = [] for element in doc.text.childNodes: if element.qname[1] in ('p', 'h'): line = teletype.extractText(element).strip() if line: lines.append(line) return '\n'.join(lines) content = extract_odt_text('document.odt') print(content[:500]) ``` ## Read metadata ```python from odf.opendocument import load doc = load('document.odt') meta = doc.meta print(f"Title: {meta.title}") print(f"Author: {meta.creator}") print(f"Created: {meta.creation_date}") print(f"Words: {meta.word_count}") print(f"Pages: {meta.page_count}") ``` ## Convert ODT with LibreOffice CLI LibreOffice can convert ODT to DOCX, PDF, HTML and other formats headlessly (no GUI): ```bash # ODT → DOCX libreoffice --headless --convert-to docx document.odt # ODT → PDF libreoffice --headless --convert-to pdf document.odt # ODT → HTML libreoffice --headless --convert-to html document.odt # Specify output directory libreoffice --headless --convert-to pdf --outdir ./pdfs/ document.odt # Batch convert all ODTs in directory libreoffice --headless --convert-to pdf *.odt ``` ## Batch conversion with Python ```python import subprocess from pathlib import Path def convert_odt_to_pdf(input_dir, output_dir): """Convert all ODT files in a folder to PDF using LibreOffice.""" source = Path(input_dir) dest = Path(output_dir) dest.mkdir(parents=True, exist_ok=True) odt_files = list(source.glob('*.odt')) print(f"Found: {len(odt_files)} ODT files") for odt in sorted(odt_files): print(f" Converting: {odt.name}") result = subprocess.run( ['libreoffice', '--headless', '--convert-to', 'pdf', '--outdir', str(dest), str(odt)], capture_output=True, text=True, ) if result.returncode == 0: pdf_name = odt.stem + '.pdf' size = (dest / pdf_name).stat().st_size / 1024 print(f" OK → {pdf_name} ({size:.0f} KB)") else: print(f" ERROR: {result.stderr[:200]}") print(f"\nDone: {len(odt_files)} files processed") convert_odt_to_pdf('odt_documents/', 'generated_pdfs/') ``` ## Convert DOCX to ODT ```python import subprocess # Most reliable: LibreOffice CLI subprocess.run(['libreoffice', '--headless', '--convert-to', 'odt', 'document.docx']) # Alternative: Pandoc (handles basic styles) # pip install pypandoc import pypandoc pypandoc.convert_file('document.docx', 'odt', outputfile='document.odt') ``` ## When to use ODT - **Long-lived documents**: ODT is more stable than DOCX across future software versions - **Government and public sector**: required by many European administrations - **Avoiding Microsoft Office dependency**: completely free and open ecosystem - **Interoperability**: compatible with LibreOffice, OpenOffice, Google Docs, Zoho, and more

ODT Format: Create and Convert LibreOffice Documents with Python