Skip to main content
Image Converter Video Converter Audio Converter Document Converter
Tools Guides Formats Pricing API
Log In
🇪🇸 Español 🇧🇷 Português 🇩🇪 Deutsch
Guide

CSV and TSV Format: The Complete Technical Guide

PC By Pablo Cirre

Related conversions

Put what you just learned into practice — convert your files now in seconds, free and without registration.

Frequently Asked Questions

Excel does not auto-detect encoding. When you open a .csv file by double-clicking it, Excel assumes the system default encoding (e.g., Windows-1252 on Western European systems) rather than UTF-8. Accented characters and symbols are then misinterpreted. Fix options: (1) add a UTF-8 BOM (\xEF\xBB\xBF) at the start of the file — Excel respects it; (2) use the Data → From Text/CSV import wizard and manually select UTF-8; (3) save as .xlsx instead of .csv when Excel is the final consumer.

Excel does not auto-detect codificação. When you abrir a .csv arquivo by double-clicking it, Excel assumes the system default codificação (e.g., Windows-1252 on Western European systems) em vez de UTF-8. Accented characters e symbols are then misinterpreted. Fix options: (1) add a UTF-8 BOM (\xEF\xBB\xBF) at the start of o arquivo — Excel respects it; (2) usar the Data → de Text/CSV import wizard e manually select UTF-8; (3) salvar as .xlsx em vez de .csv when Excel is the final consumer.

Excel does not auto-detect Codierung. When you öffnen a .csv Datei by double-clicking it, Excel assumes the system default Codierung (e.g., Windows-1252 on Western European systems) rather than UTF-8. Accented characters und symbols are then misinterpreted. Fix options: (1) add a UTF-8 BOM (\xEF\xBB\xBF) at the start von die Datei — Excel respects it; (2) verwenden the Data → von Text/CSV import wizard und manually select UTF-8; (3) speichern as .xlsx anstatt .csv when Excel is the final consumer.

Excel does not auto-detect codificación. When you abrir a .csv archivo by double-clicking it, Excel assumes the system default codificación (e.g., Windows-1252 on Western European systems) rather than UTF-8. Accented characters y symbols are then misinterpreted. Fix options: (1) add a UTF-8 BOM (\xEF\xBB\xBF) at the start de el archivo — Excel respects it; (2) usar the Data → de Text/CSV import wizard y manually select UTF-8; (3) guardar as .xlsx en vez de .csv when Excel is the final consumer.

Send <strong>PDF</strong> when the document is final and the layout must be preserved exactly (contracts, invoices, certificates). Send <strong>DOCX</strong> when reviewers need to edit, comment, or track changes. Many teams send both: PDF as the canonical version + DOCX for editable feedback. PDF/A is the right pick for legal archival (ISO 19005).

CSV uses a comma (or locale-specific character like semicolon) as the field delimiter and requires quoting and escaping for fields containing the delimiter, double quotes, or newlines. TSV uses a tab character, which rarely appears in data, so quoting is almost never needed — producing simpler files. Use TSV for bioinformatics pipelines (GFF3, BED, VCF), Unix command-line tools (cut -f works natively on tabs), and clipboard paste from spreadsheets. Use CSV when the receiving system mandates it or when values may contain tabs.

CSV uses a comma (or locale-specific character like semicolon) como o field delimiter e requires quoting e escaping para fields containing the delimiter, double quotes, ou newlines. TSV uses a tab character, which rarely appears in data, so quoting is almost never needed — producing simplesr files. usar TSV para bioinformatics pipelines (GFF3, BED, VCF), Unix command-line ferramentas (cut -f funciona natively on tabs), e clipboard paste de spreadsheets. usar CSV when the receiving system mandates it ou when values may contain tabs.

CSV uses a comma (or locale-specific character like semicolon) als das field delimiter und requires quoting und escaping für fields containing the delimiter, double quotes, oder newlines. TSV uses a tab character, which rarely appears in data, so quoting is almost never needed — producing einfachr files. verwenden TSV für bioinformatics pipelines (GFF3, BED, VCF), Unix command-line Werkzeuge (cut -f works natively on tabs), und clipboard paste von spreadsheets. verwenden CSV when the receiving system mandates it oder when values may contain tabs.

CSV uses a comma (or locale-specific character like semicolon) como el field delimiter y requires quoting y escaping para fields containing the delimiter, double quotes, o newlines. TSV uses a tab character, which rarely appears in data, so quoting is almost never needed — producing simpler files. usar TSV para bioinformatics pipelines (GFF3, BED, VCF), Unix command-line herramientas (cut -f works natively on tabs), y clipboard paste de spreadsheets. usar CSV when the receiving system mandates it o when values may contain tabs.

Round-tripping between similar formats (DOCX ↔ ODT, DOCX → PDF) is generally safe. Round-tripping with format-specific features (Word macros, complex tables, footnotes) often loses fidelity. Embedded fonts survive only if both source and target support font embedding (PDF yes, DOCX yes, plain HTML no). Always preview the result before deleting the original.

Stream the file row-by-row instead of loading it all at once. In Python, csv.reader is already a lazy iterator — never call list(reader) on a huge file. With pandas use chunksize: for chunk in pd.read_csv("big.csv", chunksize=100000): process(chunk). For SQL-style queries with zero memory overhead, DuckDB can query CSV directly: SELECT * FROM read_csv_auto("big.csv") WHERE price > 100. For repeated queries convert once to Parquet (5-10× smaller, columnar, typed) using pyarrow or DuckDB's COPY TO command.

Stream o arquivo row-by-row em vez de loading it all at once. In Python, csv.reader is already a lazy iterator — never call list(reader) em um huge file. com pandas usar chunksize: para chunk in pd.read_csv("big.csv", chunksize=100000): process(chunk). para SQL-style queries com zero memory overhead, DuckDB can query CSV directly: SELECT * de read_csv_auto("big.csv") WHERE price > 100. para repeated queries converter once to Parquet (5-10× smaller, columnar, typed) usando pyarrow ou DuckDB's COPY TO command.

Stream die Datei row-by-row anstatt loading it all at once. In Python, csv.reader is already a lazy iterator — never call list(reader) auf einem huge file. mit pandas verwenden chunksize: für chunk in pd.read_csv("big.csv", chunksize=100000): process(chunk). für SQL-style queries mit zero memory overhead, DuckDB can query CSV directly: SELECT * von read_csv_auto("big.csv") WHERE price > 100. für repeated queries umwandeln once to Parquet (5-10× smaller, columnar, typed) using pyarrow oder DuckDB's COPY TO command.

Stream el archivo row-by-row en vez de loading it all at once. In Python, csv.reader is already a lazy iterator — never call list(reader) en un huge file. con pandas usar chunksize: para chunk in pd.read_csv("big.csv", chunksize=100000): process(chunk). para SQL-style queries con zero memory overhead, DuckDB can query CSV directly: SELECT * de read_csv_auto("big.csv") WHERE price > 100. para repeated queries convertir once to Parquet (5-10× smaller, columnar, typed) using pyarrow o DuckDB's COPY TO command.

If the PDF contains real text (not scanned images), <code>pdftotext</code> from poppler-utils or <a href="/convert/pdf-to-txt">PDF to TXT</a> works in seconds. If the PDF is a scanned image, you need OCR — Tesseract is the open-source standard. KaijuConverter's PDF tools auto-detect text-vs-image PDFs and route accordingly.

Excel applies automatic type inference when opening CSV files, converting "001" to integer 1 (dropping the leading zero) and "1/2" to a date (January 2nd). To prevent this: (1) use the Data → From Text/CSV import wizard and explicitly set the column type to Text; (2) prefix the value with a tab or apostrophe (only works interactively); (3) programmatically quote all values in the CSV — Excel still infers types for quoted numbers, so the real fix is the import wizard. Alternatively, generate .xlsx directly with openpyxl or xlsxwriter where you control cell formats explicitly.

Excel applies automatic type inference when abrindo CSV files, convertendo "001" to integer 1 (dropping the leading zero) e "1/2" para um date (January 2nd). To prevent this: (1) usar the Data → de Text/CSV import wizard e explicitly set the column type to Text; (2) prefix the value com a tab ou apostrophe (only funciona interactively); (3) programmatically quote all values no CSV — Excel still infers types para quoted numbers, so the real fix is the import wizard. Alternatively, generate .xlsx directly com openpyxl ou xlsxwriter where you control cell formatoos explicitly.

Excel applies automatic type inference when opening CSV files, umwandelnd "001" to integer 1 (dropping the leading zero) und "1/2" zu einem date (January 2nd). To prevent this: (1) verwenden the Data → von Text/CSV import wizard und explicitly set the column type to Text; (2) prefix the value mit a tab oder apostrophe (only works interactively); (3) programmatically quote all values im CSV — Excel still infers types für quoted numbers, so the real fix is the import wizard. Alternatively, generate .xlsx directly mit openpyxl oder xlsxwriter where you control cell Formate explicitly.

Excel applies automatic type inference when opening CSV files, convirtiendo "001" to integer 1 (dropping the leading zero) y "1/2" a un date (January 2nd). To prevent this: (1) usar the Data → de Text/CSV import wizard y explicitly set the column type to Text; (2) prefix the value con a tab o apostrophe (only works interactively); (3) programmatically quote all values en el CSV — Excel still infers types para quoted numbers, so the real fix is the import wizard. Alternatively, generate .xlsx directly con openpyxl o xlsxwriter where you control cell formatoos explicitly.

Light edits (annotations, signatures, form fields) are fine in any PDF reader. Structural edits (changing paragraphs, replacing images) are awkward — PDF is a presentation format, not an editing format. The robust workflow is: keep the source DOCX/MD/HTML as the master, regenerate the PDF when changes are needed. Tools that "edit PDFs" reverse-engineer the layout and frequently break it.