Python JSON Parsing Best Practices
Comprehensive guide to JSON parsing in Python with focus on performance, security, and scalability.
Quick Start
Basic JSON Parsing
import json # Parse JSON string data = json.loads('{"name": "Alice", "age": 30}') # Parse JSON file with open("data.json", "r", encoding="utf-8") as f: data = json.load(f) # Write JSON file with open("output.json", "w", encoding="utf-8") as f: json.dump(data, f, indent=2)
Key Rule: Always specify encoding="utf-8" when reading/writing files.
When to Use This Skill
Use this skill when:
- Working with JSON APIs or data interchange
- Optimizing JSON performance in high-throughput applications
- Handling large JSON files (> 100MB)
- Securing applications against JSON injection
- Extracting data from complex nested JSON structures
Performance: Choose the Right Library
Library Comparison (10,000 records benchmark)
| Library | Serialize (s) | Deserialize (s) | Best For |
|---|---|---|---|
| orjson | 0.42 | 1.27 | FastAPI, web APIs (3.9x faster) |
| msgspec | 0.49 | 0.93 | Maximum performance (1.7x faster deserialization) |
| json (stdlib) | 1.62 | 1.62 | Universal compatibility |
| ujson | 1.41 | 1.85 | Drop-in replacement (2x faster) |
Recommendation:
- Use orjson for FastAPI/web APIs (native support, fastest serialization)
- Use msgspec for data pipelines (fastest overall, typed validation)
- Use json when compatibility is critical
Installation
# High-performance libraries pip install orjson msgspec ujson # Advanced querying pip install jsonpath-ng jmespath # Streaming large files pip install ijson # Schema validation pip install jsonschema
Large Files: Streaming Strategies
For files > 100MB, avoid loading into memory.
Strategy 1: JSONL (JSON Lines)
Convert large JSON arrays to line-delimited format:
# Stream process JSONL with open("large.jsonl", "r") as infile, open("output.jsonl", "w") as outfile: for line in infile: obj = json.loads(line) obj["processed"] = True outfile.write(json.dumps(obj) + "\n")
Strategy 2: Streaming with ijson
import ijson # Process large JSON without loading into memory with open("huge.json", "rb") as f: for item in ijson.items(f, "products.item"): process(item) # Handle one item at a time
See: patterns/streaming-large-json.md
Security: Prevent JSON Injection
Critical Rules:
- Always use
json.loads(), nevereval() - Validate input with
jsonschema - Sanitize user input before serialization
- Escape special characters (
"and\)
Vulnerable Code:
# NEVER DO THIS username = request.GET['username'] # User input: admin", "role": "admin json_string = f'{{"user":"{username}","role":"user"}}' # Result: privilege escalation
Secure Code:
# Use json.dumps for serialization data = {"user": username, "role": "user"} json_string = json.dumps(data) # Properly escaped
See: anti-patterns/security-json-injection.md, anti-patterns/eval-usage.md
Advanced: JSONPath for Complex Queries
Extract data from nested JSON without complex loops:
import jsonpath_ng as jp data = { "products": [ {"name": "Apple", "price": 12.88}, {"name": "Peach", "price": 27.25} ] } # Filter by price query = jp.parse("products[?price>20].name") results = [match.value for match in query.find(data)] # Output: ["Peach"]
Key Operators:
$- Root selector..- Recursive descendant*- Wildcard[?<predicate>]- Filter (e.g.,[?price > 20])[start:end:step]- Array slicing
See: patterns/jsonpath-querying.md
Custom Objects: Serialization
Handle datetime, UUID, Decimal, and custom classes:
from datetime import datetime import json class CustomEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, datetime): return obj.isoformat() if isinstance(obj, set): return list(obj) return super().default(obj) # Usage data = {"timestamp": datetime.now(), "tags": {"python", "json"}} json_str = json.dumps(data, cls=CustomEncoder)
See: patterns/custom-object-serialization.md
Performance Checklist
- Use orjson/msgspec for high-throughput applications
- Specify UTF-8 encoding when reading/writing files
- Use streaming (ijson/JSONL) for files > 100MB
- Minify JSON for production (
separators=(',', ':')) - Pretty-print for development (
indent=2)
Security Checklist
- Never use
eval()for JSON parsing - Validate input with
jsonschema - Sanitize user input before serialization
- Use
json.dumps()to prevent injection - Escape special characters in user data
Reference Documentation
Performance:
reference/python-json-parsing-best-practices-2025.md- Comprehensive research with benchmarks
Patterns:
patterns/streaming-large-json.md- ijson and JSONL strategiespatterns/custom-object-serialization.md- Handle datetime, UUID, custom classespatterns/jsonpath-querying.md- Advanced nested data extraction
Security:
anti-patterns/security-json-injection.md- Prevent injection attacksanti-patterns/eval-usage.md- Why never to use eval()
Examples:
examples/high-performance-parsing.py- orjson and msgspec codeexamples/large-file-streaming.py- Streaming with ijsonexamples/secure-validation.py- jsonschema validation
Tools:
tools/json-performance-benchmark.py- Benchmark different libraries