ZIP and 7z: Archive Formats — Structure, Compression, and Best Practices
ZIP is the most universally understood archive format — built into every operating system, supported by every file manager, and instantly recognizable. 7z (7-Zip) delivers significantly better compression ratios using modern algorithms. Understanding how both work, their internal structures, and when to use each format is essential for anyone handling large file collections, software distribution, or automated backup pipelines.
ZIP: Universal Archive Format
ZIP was created by Phil Katz and published as an open specification in 1989. The design choice that made ZIP uniquely successful: each file inside a ZIP is independently compressed. You can extract any single file without decompressing the entire archive. This random-access capability, combined with the format's open specification, made ZIP the de facto standard for software distribution and file exchange.
ZIP File Structure
A ZIP file has a distinctive "end-to-beginning" design: metadata (the Central Directory) lives at the end of the file, with file data at the beginning:
ZIP File Layout
├── Local File Header 1 (30 + n bytes)
│ ├── Signature: PK\x03\x04
│ ├── Version needed
│ ├── General purpose bit flag
│ ├── Compression method (0=Store, 8=Deflate, 14=LZMA, 20=Zstd)
│ ├── Last modified time/date
│ ├── CRC-32
│ ├── Compressed size
│ ├── Uncompressed size
│ ├── File name length
│ ├── Extra field length
│ └── File name + Extra fields
├── Compressed file data 1
├── [Data Descriptor 1] (if bit 3 set in flags — sizes written after data)
├── Local File Header 2 + data 2
├── ... (repeat for each file)
│
├── Central Directory
│ ├── Central Directory Header 1 (46 + n bytes)
│ │ ├── Signature: PK\x01\x02
│ │ ├── Version made by
│ │ ├── Version needed
│ │ ├── All local header fields repeated...
│ │ ├── Disk number start
│ │ ├── Internal/External attributes
│ │ └── Offset to local header
│ └── Central Directory Headers 2, 3, ...
│
└── End of Central Directory Record (22 bytes minimum)
├── Signature: PK\x05\x06
├── Disk number
├── Number of entries on this disk
├── Total entries
├── Central directory size
└── Offset to central directory
The Central Directory at the end is why you can list ZIP contents (and extract individual files) without reading the entire archive — just seek to the EOCD, read the Central Directory, find the entry you want, and seek to its Local Header.
ZIP Compression Methods
| Method | Code | Algorithm | Notes |
|---|---|---|---|
| Stored | 0 | None | Files added without compression |
| Deflate | 8 | LZ77 + Huffman | Default for most ZIP tools; ZIP specification |
| Deflate64 | 9 | Enhanced Deflate | Better ratio, limited tool support |
| BZIP2 | 12 | Burrows-Wheeler | Better than Deflate, slower |
| LZMA | 14 | Lempel-Ziv-Markov chain | 7-Zip's LZMA algorithm in ZIP container |
| Zstandard | 20 | ANS + LZ77 | Fastest modern algorithm; Windows 11 native |
| XZ | 95 | LZMA2 | Best ratio, very slow |
ZIP64: Files Larger Than 4 GB
The original ZIP specification was limited to 4 GB files and 65,535 entries. ZIP64 extends these limits using extra fields in the Local and Central Directory headers:
import zipfile
# Python's zipfile module uses ZIP64 automatically for large files
with zipfile.ZipFile('large_archive.zip', 'w',
compression=zipfile.ZIP_DEFLATED,
allowZip64=True) as zf:
zf.write('10gb_file.bin')
Working with ZIP in Python
import zipfile
import os
from pathlib import Path
# ── Creating ZIP archives ───────────────────────────
with zipfile.ZipFile('archive.zip', 'w', compression=zipfile.ZIP_DEFLATED,
compresslevel=6) as zf:
# Add single file
zf.write('document.pdf', arcname='docs/document.pdf')
# Add entire directory tree
root = Path('project/')
for filepath in root.rglob('*'):
if filepath.is_file():
zf.write(filepath, arcname=str(filepath.relative_to(root.parent)))
# Add in-memory data
zf.writestr('metadata.json', '{"version": "1.0", "created": "2024-01-15"}')
print(f"Created archive with {len(zf.namelist())} files")
# ── Reading ZIP archives ───────────────────────────
with zipfile.ZipFile('archive.zip', 'r') as zf:
# List contents
for info in zf.infolist():
ratio = (1 - info.compress_size / info.file_size) * 100 if info.file_size else 0
print(f"{info.filename:40s} {info.file_size:>10,} → {info.compress_size:>10,} ({ratio:.1f}%)")
# Extract specific file
zf.extract('docs/document.pdf', path='./extracted/')
# Extract all
zf.extractall(path='./output/')
# Read file contents without extracting to disk
with zf.open('metadata.json') as f:
content = f.read().decode('utf-8')
print(content)
# ── Testing ZIP integrity ───────────────────────
with zipfile.ZipFile('archive.zip', 'r') as zf:
bad_files = zf.testzip()
if bad_files:
print(f"First bad file: {bad_files}")
else:
print("Archive is OK")
# ── Inspecting ZIP structure (low-level) ───────────
def inspect_zip_eocd(filepath):
"""Read the End of Central Directory Record."""
with open(filepath, 'rb') as f:
f.seek(-22, 2) # EOCD is 22 bytes from end (minimum)
eocd = f.read(22)
sig, disk_num, start_disk, disk_entries, total_entries, \
cd_size, cd_offset, comment_len = __import__('struct').unpack('<4sHHHHIIH', eocd)
print(f"Signature: {sig}")
print(f"Total entries: {total_entries}")
print(f"Central directory offset: {cd_offset}")
print(f"Central directory size: {cd_size} bytes")
7z: The High-Compression Archive Format
7z is the native format of 7-Zip, developed by Igor Pavlov and first released in 1999. The format was designed from the start for maximum compression — it uses the LZMA2 compression algorithm which achieves significantly better compression than ZIP's Deflate.
7z vs ZIP Compression Comparison
| Archive type | File | Compressed size | Ratio | Speed |
|---|---|---|---|---|
| ZIP (Deflate -6) | 100 MB text | ~38 MB | 62% | Fast |
| ZIP (Deflate -9) | 100 MB text | ~36 MB | 64% | Slow |
| 7z (LZMA2 -5) | 100 MB text | ~22 MB | 78% | Medium |
| 7z (LZMA2 -9) | 100 MB text | ~19 MB | 81% | Slow |
| .tar.xz | 100 MB text | ~19 MB | 81% | Very slow |
| .tar.gz | 100 MB text | ~37 MB | 63% | Fast |
7-Zip Command Line
# Install 7-Zip
# Ubuntu: sudo apt install p7zip-full
# macOS: brew install sevenzip (command: 7zz or 7z)
# Windows: winget install 7zip.7zip
# Create 7z archive (default LZMA2 compression)
7z a archive.7z files/
# Create with maximum compression (-mx=9) and solid mode (-ms=on)
7z a -mx=9 -ms=on archive.7z files/
# Create password-protected archive (AES-256 encryption)
7z a -p"MySecretPassword" -mhe=on archive.7z files/
# -mhe=on encrypts file headers (hides filenames)
# Create ZIP with 7-Zip (often better compression than OS default)
7z a -tzip archive.zip files/
# Extract archive
7z x archive.7z -o./output/
# List archive contents
7z l archive.7z
# Test archive integrity
7z t archive.7z
# Extract specific file
7z e archive.7z specific_file.txt
# Split archive into volumes (100 MB each)
7z a -v100m archive.7z large_directory/
# Benchmark compression
7z b
7z File Format Internals
7z uses a header-less design — unlike ZIP, there is no central directory at a fixed location. Instead:
7z File Structure
├── Signature (6 bytes): 0x37 0x7A 0xBC 0xAF 0x27 0x1C
├── Archive Version (2 bytes): Major.Minor
├── Start Header CRC (4 bytes): CRC of next 20 bytes
├── Next Header Offset (8 bytes)
├── Next Header Size (8 bytes)
├── Next Header CRC (4 bytes)
├── [Packed Streams / Data Blocks]
│ └── Compressed file contents (LZMA2 compressed)
└── Header (at NextHeaderOffset from end)
├── kHeader property ID
└── ArchiveProperties
├── MainStreamsInfo → PackInfo, CodersInfo, SubStreamsInfo
├── FilesInfo → Names, Sizes, Attributes, Times
└── [End]
The key insight: all metadata (file names, sizes, attributes, timestamps) is stored in a single compressed Header block at the end of the file, which can itself be LZMA2 compressed. This enables solid compression — compressing all files together as a single block, allowing LZMA2's dictionary to find matches across file boundaries. Solid compression can reduce archive size by 30–50% additional for similar-type files.
Archive Format Comparison
| Format | Compression | Ratio | Platform | Encryption | Solid | Split |
|---|---|---|---|---|---|---|
| ZIP | Deflate | Baseline | Universal | AES-256 (WinZip) / ZipCrypto (weak) | No | Yes |
| 7z | LZMA2/PPMd | Best | Universal | AES-256 | Yes | Yes |
| TAR.GZ | Gzip | Good | Unix | None (wrap with GPG) | Yes (whole) | No |
| TAR.BZ2 | Bzip2 | Better | Unix | None | Yes | No |
| TAR.XZ | XZ/LZMA2 | Excellent | Unix | None | Yes | No |
| RAR | Custom | Good-Excellent | Universal | AES-256 | Yes | Yes |
| ZSTD | Zstandard | Fast+Good | Unix | None | Yes | No |
When to Use Which Format
Use ZIP when:
- Sharing with Windows/macOS users who need to open without installing software
- Files are already compressed (JPEG, MP4, DOCX) — additional compression gains nothing
- You need random access to individual files without full extraction
- Distributing software packages that auto-extract
Use 7z when:
- Maximum compression ratio is the priority (backups, archival)
- Files are compressible text/code/data
- AES-256 encryption with header encryption is required
- Solid compression across many similar files
Use TAR.GZ / TAR.XZ when:
- Target platform is Linux/macOS
- Preserving Unix file permissions and ownership is important
- TAR preserves hard links, symlinks, device files
Summary
ZIP's universal compatibility makes it the right choice for file distribution and end-user exchange. 7z's LZMA2 compression and solid mode make it the right choice for archival and backup when file size matters. Understanding each format's internal structure — ZIP's end-anchored Central Directory and 7z's header-less solid design — helps you choose the right tool for each job and troubleshoot corrupted archives more effectively.
Related conversions
Archive format conversions used most often: