Outlines: Structured Text Generation
When to Use This Skill
Use Outlines when you need to:
- Guarantee valid JSON/XML/code structure during generation
- Use Pydantic models for type-safe outputs
- Support local models (Transformers, llama.cpp, vLLM)
- Maximize inference speed with zero-overhead structured generation
- Generate against JSON schemas automatically
- Control token sampling at the grammar level
GitHub Stars: 8,000+ | From: dottxt.ai (formerly .txt)
Installation
# Base installation pip install outlines # With specific backends pip install outlines transformers # Hugging Face models pip install outlines llama-cpp-python # llama.cpp pip install outlines vllm # vLLM for high-throughput
Quick Start
Basic Example: Classification
import outlines from typing import Literal # Load model model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") # Generate with type constraint prompt = "Sentiment of 'This product is amazing!': " generator = outlines.generate.choice(model, ["positive", "negative", "neutral"]) sentiment = generator(prompt) print(sentiment) # "positive" (guaranteed one of these)
With Pydantic Models
from pydantic import BaseModel import outlines class User(BaseModel): name: str age: int email: str model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") # Generate structured output prompt = "Extract user: John Doe, 30 years old, [email protected]" generator = outlines.generate.json(model, User) user = generator(prompt) print(user.name) # "John Doe" print(user.age) # 30 print(user.email) # "[email protected]"
Core Concepts
1. Constrained Token Sampling
Outlines uses Finite State Machines (FSM) to constrain token generation at the logit level.
How it works:
- Convert schema (JSON/Pydantic/regex) to context-free grammar (CFG)
- Transform CFG into Finite State Machine (FSM)
- Filter invalid tokens at each step during generation
- Fast-forward when only one valid token exists
Benefits:
- Zero overhead: Filtering happens at token level
- Speed improvement: Fast-forward through deterministic paths
- Guaranteed validity: Invalid outputs impossible
import outlines # Pydantic model -> JSON schema -> CFG -> FSM class Person(BaseModel): name: str age: int model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") # Behind the scenes: # 1. Person -> JSON schema # 2. JSON schema -> CFG # 3. CFG -> FSM # 4. FSM filters tokens during generation generator = outlines.generate.json(model, Person) result = generator("Generate person: Alice, 25")
2. Structured Generators
Outlines provides specialized generators for different output types.
Choice Generator
# Multiple choice selection generator = outlines.generate.choice( model, ["positive", "negative", "neutral"] ) sentiment = generator("Review: This is great!") # Result: One of the three choices
JSON Generator
from pydantic import BaseModel class Product(BaseModel): name: str price: float in_stock: bool # Generate valid JSON matching schema generator = outlines.generate.json(model, Product) product = generator("Extract: iPhone 15, $999, available") # Guaranteed valid Product instance print(type(product)) # <class '__main__.Product'>
Regex Generator
# Generate text matching regex generator = outlines.generate.regex( model, r"[0-9]{3}-[0-9]{3}-[0-9]{4}" # Phone number pattern ) phone = generator("Generate phone number:") # Result: "555-123-4567" (guaranteed to match pattern)
Integer/Float Generators
# Generate specific numeric types int_generator = outlines.generate.integer(model) age = int_generator("Person's age:") # Guaranteed integer float_generator = outlines.generate.float(model) price = float_generator("Product price:") # Guaranteed float
3. Model Backends
Outlines supports multiple local and API-based backends.
Transformers (Hugging Face)
import outlines # Load from Hugging Face model = outlines.models.transformers( "microsoft/Phi-3-mini-4k-instruct", device="cuda" # Or "cpu" ) # Use with any generator generator = outlines.generate.json(model, YourModel)
llama.cpp
# Load GGUF model model = outlines.models.llamacpp( "./models/llama-3.1-8b-instruct.Q4_K_M.gguf", n_gpu_layers=35 ) generator = outlines.generate.json(model, YourModel)
vLLM (High Throughput)
# For production deployments model = outlines.models.vllm( "meta-llama/Llama-3.1-8B-Instruct", tensor_parallel_size=2 # Multi-GPU ) generator = outlines.generate.json(model, YourModel)
OpenAI (Limited Support)
# Basic OpenAI support model = outlines.models.openai( "gpt-4o-mini", api_key="your-api-key" ) # Note: Some features limited with API models generator = outlines.generate.json(model, YourModel)
4. Pydantic Integration
Outlines has first-class Pydantic support with automatic schema translation.
Basic Models
from pydantic import BaseModel, Field class Article(BaseModel): title: str = Field(description="Article title") author: str = Field(description="Author name") word_count: int = Field(description="Number of words", gt=0) tags: list[str] = Field(description="List of tags") model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, Article) article = generator("Generate article about AI") print(article.title) print(article.word_count) # Guaranteed > 0
Nested Models
class Address(BaseModel): street: str city: str country: str class Person(BaseModel): name: str age: int address: Address # Nested model generator = outlines.generate.json(model, Person) person = generator("Generate person in New York") print(person.address.city) # "New York"
Enums and Literals
from enum import Enum from typing import Literal class Status(str, Enum): PENDING = "pending" APPROVED = "approved" REJECTED = "rejected" class Application(BaseModel): applicant: str status: Status # Must be one of enum values priority: Literal["low", "medium", "high"] # Must be one of literals generator = outlines.generate.json(model, Application) app = generator("Generate application") print(app.status) # Status.PENDING (or APPROVED/REJECTED)
Common Patterns
Pattern 1: Data Extraction
from pydantic import BaseModel import outlines class CompanyInfo(BaseModel): name: str founded_year: int industry: str employees: int model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, CompanyInfo) text = """ Apple Inc. was founded in 1976 in the technology industry. The company employs approximately 164,000 people worldwide. """ prompt = f"Extract company information:\n{text}\n\nCompany:" company = generator(prompt) print(f"Name: {company.name}") print(f"Founded: {company.founded_year}") print(f"Industry: {company.industry}") print(f"Employees: {company.employees}")
Pattern 2: Classification
from typing import Literal import outlines model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") # Binary classification generator = outlines.generate.choice(model, ["spam", "not_spam"]) result = generator("Email: Buy now! 50% off!") # Multi-class classification categories = ["technology", "business", "sports", "entertainment"] category_gen = outlines.generate.choice(model, categories) category = category_gen("Article: Apple announces new iPhone...") # With confidence class Classification(BaseModel): label: Literal["positive", "negative", "neutral"] confidence: float classifier = outlines.generate.json(model, Classification) result = classifier("Review: This product is okay, nothing special")
Pattern 3: Structured Forms
class UserProfile(BaseModel): full_name: str age: int email: str phone: str country: str interests: list[str] model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, UserProfile) prompt = """ Extract user profile from: Name: Alice Johnson Age: 28 Email: [email protected] Phone: 555-0123 Country: USA Interests: hiking, photography, cooking """ profile = generator(prompt) print(profile.full_name) print(profile.interests) # ["hiking", "photography", "cooking"]
Pattern 4: Multi-Entity Extraction
class Entity(BaseModel): name: str type: Literal["PERSON", "ORGANIZATION", "LOCATION"] class DocumentEntities(BaseModel): entities: list[Entity] model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, DocumentEntities) text = "Tim Cook met with Satya Nadella at Microsoft headquarters in Redmond." prompt = f"Extract entities from: {text}" result = generator(prompt) for entity in result.entities: print(f"{entity.name} ({entity.type})")
Pattern 5: Code Generation
class PythonFunction(BaseModel): function_name: str parameters: list[str] docstring: str body: str model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, PythonFunction) prompt = "Generate a Python function to calculate factorial" func = generator(prompt) print(f"def {func.function_name}({', '.join(func.parameters)}):") print(f' """{func.docstring}"""') print(f" {func.body}")
Pattern 6: Batch Processing
def batch_extract(texts: list[str], schema: type[BaseModel]): """Extract structured data from multiple texts.""" model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") generator = outlines.generate.json(model, schema) results = [] for text in texts: result = generator(f"Extract from: {text}") results.append(result) return results class Person(BaseModel): name: str age: int texts = [ "John is 30 years old", "Alice is 25 years old", "Bob is 40 years old" ] people = batch_extract(texts, Person) for person in people: print(f"{person.name}: {person.age}")
Backend Configuration
Transformers
import outlines # Basic usage model = outlines.models.transformers("microsoft/Phi-3-mini-4k-instruct") # GPU configuration model = outlines.models.transformers( "microsoft/Phi-3-mini-4k-instruct", device="cuda", model_kwargs={"torch_dtype": "float16"} ) # Popular models model = outlines.models.transformers("meta-llama/Llama-3.1-8B-Instruct") model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3") model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")
llama.cpp
# Load GGUF model model = outlines.models.llamacpp( "./models/llama-3.1-8b.Q4_K_M.gguf", n_ctx=4096, # Context window n_gpu_layers=35, # GPU layers n_threads=8 # CPU threads ) # Full GPU offload model = outlines.models.llamacpp( "./models/model.gguf", n_gpu_layers=-1 # All layers on GPU )
vLLM (Production)
# Single GPU model = outlines.models.vllm("meta-llama/Llama-3.1-8B-Instruct") # Multi-GPU model = outlines.models.vllm( "meta-llama/Llama-3.1-70B-Instruct", tensor_parallel_size=4 # 4 GPUs ) # With quantization model = outlines.models.vllm( "meta-llama/Llama-3.1-8B-Instruct", quantization="awq" # Or "gptq" )
Best Practices
1. Use Specific Types
# β Good: Specific types class Product(BaseModel): name: str price: float # Not str quantity: int # Not str in_stock: bool # Not str # β Bad: Everything as string class Product(BaseModel): name: str price: str # Should be float quantity: str # Should be int
2. Add Constraints
from pydantic import Field # β Good: With constraints class User(BaseModel): name: str = Field(min_length=1, max_length=100) age: int = Field(ge=0, le=120) email: str = Field(pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$") # β Bad: No constraints class User(BaseModel): name: str age: int email: str
3. Use Enums for Categories
# β Good: Enum for fixed set class Priority(str, Enum): LOW = "low" MEDIUM = "medium" HIGH = "high" class Task(BaseModel): title: str priority: Priority # β Bad: Free-form string class Task(BaseModel): title: str priority: str # Can be anything
4. Provide Context in Prompts
# β Good: Clear context prompt = """ Extract product information from the following text. Text: iPhone 15 Pro costs $999 and is currently in stock. Product: """ # β Bad: Minimal context prompt = "iPhone 15 Pro costs $999 and is currently in stock."
5. Handle Optional Fields
from typing import Optional # β Good: Optional fields for incomplete data class Article(BaseModel): title: str # Required author: Optional[str] = None # Optional date: Optional[str] = None # Optional tags: list[str] = [] # Default empty list # Can succeed even if author/date missing
Comparison to Alternatives
| Feature | Outlines | Instructor | Guidance | LMQL |
|---|---|---|---|---|
| Pydantic Support | β Native | β Native | β No | β No |
| JSON Schema | β Yes | β Yes | β οΈ Limited | β Yes |
| Regex Constraints | β Yes | β No | β Yes | β Yes |
| Local Models | β Full | β οΈ Limited | β Full | β Full |
| API Models | β οΈ Limited | β Full | β Full | β Full |
| Zero Overhead | β Yes | β No | β οΈ Partial | β Yes |
| Automatic Retrying | β No | β Yes | β No | β No |
| Learning Curve | Low | Low | Low | High |
When to choose Outlines:
- Using local models (Transformers, llama.cpp, vLLM)
- Need maximum inference speed
- Want Pydantic model support
- Require zero-overhead structured generation
- Control token sampling process
When to choose alternatives:
- Instructor: Need API models with automatic retrying
- Guidance: Need token healing and complex workflows
- LMQL: Prefer declarative query syntax
Performance Characteristics
Speed:
- Zero overhead: Structured generation as fast as unconstrained
- Fast-forward optimization: Skips deterministic tokens
- 1.2-2x faster than post-generation validation approaches
Memory:
- FSM compiled once per schema (cached)
- Minimal runtime overhead
- Efficient with vLLM for high throughput
Accuracy:
- 100% valid outputs (guaranteed by FSM)
- No retry loops needed
- Deterministic token filtering
Resources
- Documentation: https://outlines-dev.github.io/outlines
- GitHub: https://github.com/outlines-dev/outlines (8k+ stars)
- Discord: https://discord.gg/R9DSu34mGd
- Blog: https://blog.dottxt.co
See Also
references/json_generation.md- Comprehensive JSON and Pydantic patternsreferences/backends.md- Backend-specific configurationreferences/examples.md- Production-ready examples