JSON Format: The Complete Technical Guide
JSON (JavaScript Object Notation) is a lightweight, text-based data interchange format. Specified in RFC 8259 (2017) and ECMA-404 (2nd edition, 2017), JSON is derived from JavaScript object literal syntax but is language-independent — virtually every programming language has JSON parsing and serialization libraries. JSON has become the dominant data format for web APIs, configuration files, and data storage, supplanting XML in most new web development contexts.
Historical Context
JSON was formalized by Douglas Crockford around 2001 and popularized through json.org. The format is essentially a subset of JavaScript's object and array literals. Crockford noted that JSON already existed implicitly — JavaScript developers were using eval() to parse JavaScript object literals sent from servers. Formalizing it as a separate format enabled cross-language use. The first major standardization was RFC 4627 (2006). RFC 8259 (2017) supersedes earlier RFCs and is the current standard, aligned with ECMA-404.
JSON Grammar
JSON defines exactly six value types:
1. String
A sequence of Unicode code points enclosed in double quotes. Required escapes:
\"— double quote\\— backslash\/— forward slash (optional)\b— backspace (U+0008)\f— form feed (U+000C)\n— newline (U+000A)\r— carriage return (U+000D)\t— horizontal tab (U+0009)\uXXXX— Unicode code point (4 hex digits)
Surrogate pairs for characters outside BMP: 😀 = 😀 (U+1F600)
Single quotes are not valid in JSON. Control characters (U+0000–U+001F) must be escaped.
2. Number
Integers and floating-point numbers in decimal notation:
- Integer:
-123,0,42 - Decimal:
3.14,-0.5 - Scientific:
1.23e10,2.5E-4,1e+100
No special values: NaN, Infinity, and -Infinity are not valid JSON. Octal (0777) and hexadecimal (0xFF) notation are not valid. JSON numbers have no specified precision limit — parsers may implement them as IEEE 754 double-precision (64-bit float), losing precision for integers > 2^53.
3. Boolean
Lowercase only: true or false. (True, TRUE, False, FALSE are invalid.)
4. Null
null (lowercase only). Represents intentional absence of value.
5. Array
Ordered list of values: [value, value, ...]. Values can be of any JSON type, including mixed types. Trailing commas are not valid in JSON ([1, 2, 3,] is invalid).
6. Object
Unordered set of key/value pairs: {"key": value, "key2": value2}. Keys must be strings (double-quoted). Key order is not guaranteed by the specification. Duplicate keys are technically allowed but their behavior is undefined — parsers may keep the last, first, or throw an error. Trailing commas are not valid.
JSON Structure Example
{
"user": {
"id": 1234,
"name": "María García",
"email": "maria@example.com",
"active": true,
"created_at": "2024-01-15T09:30:00Z",
"score": 98.6,
"tags": ["premium", "verified"],
"address": null,
"preferences": {
"theme": "dark",
"notifications": false,
"language": "es"
}
}
}
JSON Schema
JSON Schema (draft-07, 2019-09, 2020-12) is a vocabulary for annotating and validating JSON documents:
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"name": {
"type": "string",
"minLength": 1,
"maxLength": 100
},
"age": {
"type": "integer",
"minimum": 0,
"maximum": 150
},
"email": {
"type": "string",
"format": "email"
},
"tags": {
"type": "array",
"items": { "type": "string" },
"uniqueItems": true
}
},
"required": ["name", "email"],
"additionalProperties": false
}
Validators: ajv (Node.js, fastest), jsonschema (Python), NetworkX (Java), Newtonsoft.Json (C#).
JSON5, JSONC, and Extensions
Standard JSON has limitations that led to extensions:
JSON5 (json5.org): Allows unquoted keys, single-quoted strings, comments, trailing commas, hex numbers, multiline strings. Compatible superset of JSON. Not an IETF standard.
JSONC (JSON with Comments): JSON + // and /* */ comments. Used in VS Code settings.json, TypeScript tsconfig.json. Not standard JSON.
NDJSON / JSON Lines (Newline-Delimited JSON): One JSON value per line. Used for streaming data and log files:
{"event": "login", "user": 1}
{"event": "purchase", "user": 1, "amount": 49.99}
{"event": "logout", "user": 1}
JSON Merge Patch (RFC 7396): A format for describing changes to a JSON document.
JSON Patch (RFC 6902): Array of operations (add, remove, replace, move, copy, test) for JSON transformation.
Processing JSON: Language Reference
# Python
import json
# Parse JSON string to dict
data = json.loads('{"name": "Alice", "age": 30}')
print(data["name"]) # Alice
# Parse JSON file
with open("data.json") as f:
data = json.load(f)
# Serialize to JSON string
json_str = json.dumps(data, indent=2, ensure_ascii=False)
# Write JSON to file
with open("output.json", "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, ensure_ascii=False)
// JavaScript
const data = JSON.parse('{"name": "Alice", "age": 30}');
console.log(data.name); // Alice
const jsonStr = JSON.stringify(data, null, 2); // pretty-print with 2-space indent
// Replacer function to filter keys
const filtered = JSON.stringify(data, ["name", "age"]); // only include these keys
// Reviver function to transform values
const parsed = JSON.parse(jsonStr, (key, value) => {
if (key === "date") return new Date(value);
return value;
});
# jq — command-line JSON processor
# Extract a field
jq '.user.name' data.json
# Filter array
jq '.users[] | select(.active == true)' data.json
# Transform structure
jq '{name: .user.name, id: .user.id}' data.json
# Pretty-print minified JSON
jq '.' minified.json
# Minify JSON
jq -c '.' pretty.json
# Count array elements
jq '.items | length' data.json
# Extract specific fields from array
jq '.products[] | {name, price}' data.json
# Sort by field
jq '.items | sort_by(.price)' data.json
# Convert JSON to CSV
jq -r '.[] | [.name, .price, .category] | @csv' data.json > output.csv
JSON vs Other Data Formats
| Format | Human readable | Size | Schema | Comments | Types | Best for |
|---|---|---|---|---|---|---|
| JSON | ✓ | Medium | JSON Schema | ✗ | 6 basic | Web APIs, config |
| YAML | ✓ (verbose) | Small | JSON Schema | ✓ | More rich | Config files, DevOps |
| XML | ✓ (verbose) | Largest | XSD | ✓ | Strings only | Enterprise, documents |
| CSV | ✓ (tabular) | Smallest | — | ✗ | Strings only | Tabular data |
| MessagePack | ✗ | Smallest binary | — | ✗ | JSON types + binary | High-performance APIs |
| Protocol Buffers | ✗ | Smallest binary | .proto files | ✗ | Strongly typed | gRPC, microservices |
| CBOR | ✗ | Small binary | — | ✗ | JSON types + binary | IoT, COSE |
| TOML | ✓ | Small | — | ✓ | More rich | Config files |
Security Considerations
JSON injection: Never use eval() to parse JSON — eval('{"a": 1}; maliciousCode()') executes arbitrary code. Always use JSON.parse().
Prototype pollution: In JavaScript, JSON with key __proto__ can pollute Object prototype. Use JSON.parse() which is safe, and validate keys when merging JSON into objects.
Large number precision: JSON numbers > 2^53 (9,007,199,254,740,992) lose precision in IEEE 754 double. For large integers (Twitter IDs, snowflake IDs), transmit as strings. Use BigInt in JavaScript: JSON.parse(str, (k, v) => typeof v === 'number' && v > Number.MAX_SAFE_INTEGER ? BigInt(v) : v).
Circular references: JSON.stringify() throws on circular references. Use a replacer or library like flatted.
Deeply nested JSON: Maliciously crafted deeply nested JSON can cause stack overflow during recursive parsing. Most parsers have depth limits (Node.js: 512, Python: default recursion limit).
Related conversions
Document conversions that follow this topic naturally: