AS
AgSkills.dev
MARKETPLACE

python-json-parsing

Python JSON parsing best practices covering performance optimization (orjson/msgspec), handling large files (streaming/JSONL), security (injection prevention), and advanced querying (JSONPath/JMESPath). Use when working with JSON data, parsing APIs, handling large JSON files, or optimizing JSON performance.

262
171

Preview

SKILL.md
name
python-json-parsing
description
>

Python JSON Parsing Best Practices

Comprehensive guide to JSON parsing in Python with focus on performance, security, and scalability.

Quick Start

Basic JSON Parsing

import json # Parse JSON string data = json.loads('{"name": "Alice", "age": 30}') # Parse JSON file with open("data.json", "r", encoding="utf-8") as f: data = json.load(f) # Write JSON file with open("output.json", "w", encoding="utf-8") as f: json.dump(data, f, indent=2)

Key Rule: Always specify encoding="utf-8" when reading/writing files.

When to Use This Skill

Use this skill when:

  • Working with JSON APIs or data interchange
  • Optimizing JSON performance in high-throughput applications
  • Handling large JSON files (> 100MB)
  • Securing applications against JSON injection
  • Extracting data from complex nested JSON structures

Performance: Choose the Right Library

Library Comparison (10,000 records benchmark)

LibrarySerialize (s)Deserialize (s)Best For
orjson0.421.27FastAPI, web APIs (3.9x faster)
msgspec0.490.93Maximum performance (1.7x faster deserialization)
json (stdlib)1.621.62Universal compatibility
ujson1.411.85Drop-in replacement (2x faster)

Recommendation:

  • Use orjson for FastAPI/web APIs (native support, fastest serialization)
  • Use msgspec for data pipelines (fastest overall, typed validation)
  • Use json when compatibility is critical

Installation

# High-performance libraries pip install orjson msgspec ujson # Advanced querying pip install jsonpath-ng jmespath # Streaming large files pip install ijson # Schema validation pip install jsonschema

Large Files: Streaming Strategies

For files > 100MB, avoid loading into memory.

Strategy 1: JSONL (JSON Lines)

Convert large JSON arrays to line-delimited format:

# Stream process JSONL with open("large.jsonl", "r") as infile, open("output.jsonl", "w") as outfile: for line in infile: obj = json.loads(line) obj["processed"] = True outfile.write(json.dumps(obj) + "\n")

Strategy 2: Streaming with ijson

import ijson # Process large JSON without loading into memory with open("huge.json", "rb") as f: for item in ijson.items(f, "products.item"): process(item) # Handle one item at a time

See: patterns/streaming-large-json.md

Security: Prevent JSON Injection

Critical Rules:

  1. Always use json.loads(), never eval()
  2. Validate input with jsonschema
  3. Sanitize user input before serialization
  4. Escape special characters (" and \)

Vulnerable Code:

# NEVER DO THIS username = request.GET['username'] # User input: admin", "role": "admin json_string = f'{{"user":"{username}","role":"user"}}' # Result: privilege escalation

Secure Code:

# Use json.dumps for serialization data = {"user": username, "role": "user"} json_string = json.dumps(data) # Properly escaped

See: anti-patterns/security-json-injection.md, anti-patterns/eval-usage.md

Advanced: JSONPath for Complex Queries

Extract data from nested JSON without complex loops:

import jsonpath_ng as jp data = { "products": [ {"name": "Apple", "price": 12.88}, {"name": "Peach", "price": 27.25} ] } # Filter by price query = jp.parse("products[?price>20].name") results = [match.value for match in query.find(data)] # Output: ["Peach"]

Key Operators:

  • $ - Root selector
  • .. - Recursive descendant
  • * - Wildcard
  • [?<predicate>] - Filter (e.g., [?price > 20])
  • [start:end:step] - Array slicing

See: patterns/jsonpath-querying.md

Custom Objects: Serialization

Handle datetime, UUID, Decimal, and custom classes:

from datetime import datetime import json class CustomEncoder(json.JSONEncoder): def default(self, obj): if isinstance(obj, datetime): return obj.isoformat() if isinstance(obj, set): return list(obj) return super().default(obj) # Usage data = {"timestamp": datetime.now(), "tags": {"python", "json"}} json_str = json.dumps(data, cls=CustomEncoder)

See: patterns/custom-object-serialization.md

Performance Checklist

  • Use orjson/msgspec for high-throughput applications
  • Specify UTF-8 encoding when reading/writing files
  • Use streaming (ijson/JSONL) for files > 100MB
  • Minify JSON for production (separators=(',', ':'))
  • Pretty-print for development (indent=2)

Security Checklist

  • Never use eval() for JSON parsing
  • Validate input with jsonschema
  • Sanitize user input before serialization
  • Use json.dumps() to prevent injection
  • Escape special characters in user data

Reference Documentation

Performance:

  • reference/python-json-parsing-best-practices-2025.md - Comprehensive research with benchmarks

Patterns:

  • patterns/streaming-large-json.md - ijson and JSONL strategies
  • patterns/custom-object-serialization.md - Handle datetime, UUID, custom classes
  • patterns/jsonpath-querying.md - Advanced nested data extraction

Security:

  • anti-patterns/security-json-injection.md - Prevent injection attacks
  • anti-patterns/eval-usage.md - Why never to use eval()

Examples:

  • examples/high-performance-parsing.py - orjson and msgspec code
  • examples/large-file-streaming.py - Streaming with ijson
  • examples/secure-validation.py - jsonschema validation

Tools:

  • tools/json-performance-benchmark.py - Benchmark different libraries
GitHub Repository
benchflow-ai/skillsbench
Stars
262
Forks
171
Open Repository
Install Skill
Download ZIP2 files