## What Is XML?
**XML** (eXtensible Markup Language) is an extensible markup language defined by the W3C in 1998. Unlike HTML, XML lets you create custom tags to describe any data structure.
Standard in: SOAP services, configurations (Maven, Spring, Android), .docx/.xlsx internals, RSS/Atom feeds, industrial data (HL7, FIXML).
## XML Syntax
```xml
Laptop Pro X
1299.99
characters & symbols]]>
true
```
**Rules:** one root element, tags must close, attributes in double quotes, case-sensitive.
**Special chars:** `&` (&), `<` (<), `>` (>), `"` (").
## XPath Navigation
```python
import lxml.etree as ET
xml = b"""
Apple 1.20
Orange 0.80
"""
tree = ET.fromstring(xml)
expensive = tree.xpath("//product[price > 1]")
for p in expensive:
print(p.find("name").text)
prod = tree.xpath("//product[@id='1']")[0]
print(prod.find("name").text) # Apple
```
## Process XML in Python
```python
import xml.etree.ElementTree as ET
tree = ET.parse("catalog.xml")
root = tree.getroot()
for product in root.findall("product"):
name = product.find("name").text
price = float(product.find("price").text)
print(f"{name}: USD {price}")
# 10% price increase
for price_el in root.iter("price"):
price_el.text = str(round(float(price_el.text) * 1.1, 2))
# Add product
new = ET.SubElement(root, "product")
ET.SubElement(new, "name").text = "Grape"
ET.SubElement(new, "price").text = "2.00"
ET.indent(tree, space=" ")
tree.write("catalog_updated.xml", encoding="UTF-8", xml_declaration=True)
```
## Convert XML to JSON
```python
import xmltodict, json
with open("catalog.xml", "rb") as f:
doc = xmltodict.parse(f)
print(json.dumps(doc, ensure_ascii=False, indent=2))
```
## Validate with XSD
```python
from lxml import etree
schema = etree.XMLSchema(etree.parse("catalog.xsd"))
doc = etree.parse("catalog.xml")
if schema.validate(doc):
print("XML is valid")
else:
for err in schema.error_log:
print(f"Error: {err.message}")
```
## XSLT: Transform XML to HTML
```python
from lxml import etree
dom = etree.parse("catalog.xml")
xslt = etree.parse("catalog-html.xsl")
html = etree.XSLT(xslt)(dom)
with open("catalog.html", "wb") as f:
f.write(bytes(html))
```
## XML vs JSON
| Situation | XML | JSON |
|-----------|-----|------|
| Modern REST APIs | β | β
|
| SOAP/legacy services | β
| β |
| RSS/Atom feeds | β
| β |
| .docx/.xlsx internals | β
| β |
| Strict validation | XSD | JSON Schema |
## Conclusion
**XML remains essential** in legacy systems, SOAP, and office documents. For new REST projects, JSON is preferable. Mastering XPath and Python's `lxml` covers 95% of XML use cases.
Guide