DNS Management
Configure and automate DNS records with proper TTL strategies, DNS-as-code patterns, and troubleshooting techniques.
Purpose
Guide DNS configuration for applications, infrastructure, and services with focus on:
- Record type selection (A, AAAA, CNAME, MX, TXT, SRV, CAA)
- TTL strategies for propagation and caching
- DNS-as-code automation (external-dns, OctoDNS, DNSControl)
- Cloud DNS services comparison and selection
- DNS-based load balancing patterns
- Troubleshooting tools and techniques
When to Use This Skill
Apply DNS management patterns when:
- Setting up DNS for new applications or services
- Automating DNS updates from Kubernetes workloads
- Configuring DNS-based failover or load balancing
- Troubleshooting DNS propagation or resolution issues
- Migrating DNS between providers
- Planning DNS changes with minimal downtime
- Implementing GeoDNS for global users
Record Type Selection
Quick Reference
Address Resolution:
- A Record: Map hostname to IPv4 address (example.com β 192.0.2.1)
- AAAA Record: Map hostname to IPv6 address (example.com β 2001:db8::1)
- CNAME Record: Alias to another domain (www.example.com β example.com)
- Cannot use at zone apex (@)
- Cannot coexist with other records at same name
Email Configuration:
- MX Record: Direct email to mail servers with priority
- TXT Record: Email authentication (SPF, DKIM, DMARC) and verification
Service Discovery:
- SRV Record: Specify service location (protocol, priority, weight, port, target)
Delegation and Security:
- NS Record: Delegate subdomain to different nameservers
- CAA Record: Restrict which Certificate Authorities can issue certificates
Cloud-Specific:
- ALIAS Record: Like CNAME but works at zone apex (Route53, Cloudflare)
Decision Tree
Need to point domain to:
ββ IPv4 Address? β A record
ββ IPv6 Address? β AAAA record
ββ Another Domain?
β ββ Zone apex (@) β ALIAS/ANAME or A record
β ββ Subdomain β CNAME
ββ Mail Server? β MX record (with priority)
ββ Email Authentication? β TXT record (SPF/DKIM/DMARC)
ββ Service Discovery? β SRV record
ββ Domain Verification? β TXT record
ββ Certificate Control? β CAA record
ββ Subdomain Delegation? β NS record
For detailed record type examples and patterns, see references/record-types.md.
TTL Strategy
Standard TTL Values
By Change Frequency:
- Stable records: 3600-86400s (1-24 hours) - NS, stable A/AAAA
- Normal operation: 3600s (1 hour) - Standard websites, MX
- Moderate changes: 300-1800s (5-30 min) - Development, A/B testing
- Failover scenarios: 60-300s (1-5 min) - Critical records needing fast updates
Key Principle: Lower TTL = faster propagation but higher DNS query load
Pre-Change Process
When planning DNS changes:
T-48h: Lower TTL to 300s
T-24h: Verify TTL propagated globally
T-0h: Make DNS change
T+1h: Verify new records propagating
T+6h: Confirm global propagation
T+24h: Raise TTL back to normal (3600s)
Propagation Formula: Max Time = Old TTL + New TTL + Query Time
Example: Changing a record with 3600s TTL takes up to 2 hours to fully propagate.
TTL by Use Case
| Use Case | TTL | Rationale |
|---|---|---|
| Production (stable) | 3600s | Balance speed and load |
| Before planned change | 300s | Fast propagation |
| Development/staging | 300-600s | Frequent changes |
| DNS-based failover | 60-300s | Fast recovery |
| Mail servers | 3600s | Rarely change |
| NS records | 86400s | Very stable |
For detailed TTL scenarios and calculations, see references/ttl-strategies.md.
DNS-as-Code Tools
Tool Selection by Use Case
Kubernetes DNS Automation β external-dns
- Annotation-based configuration on Services/Ingresses
- Automatic sync to DNS providers (20+ supported)
- No manual DNS updates required
- See
examples/external-dns/
Multi-Provider DNS Management β OctoDNS or DNSControl
- Version control for DNS records
- Sync configuration across multiple providers
- Preview changes before applying
- OctoDNS (Python/YAML) - See
examples/octodns/ - DNSControl (JavaScript) - See
examples/dnscontrol/
Infrastructure-as-Code β Terraform
- Manage DNS alongside cloud resources
- Provider-specific resources (aws_route53_record, etc.)
- See
examples/terraform/
Tool Comparison
| Tool | Language | Best For | Kubernetes | Multi-Provider |
|---|---|---|---|---|
| external-dns | Go | K8s automation | β β β β β | β β β β |
| OctoDNS | Python/YAML | Version control | β β β | β β β β β |
| DNSControl | JavaScript | Complex logic | β β | β β β β β |
| Terraform | HCL | IaC integration | β β β | β β β β |
Quick Start: external-dns
# Kubernetes Service with DNS annotation apiVersion: v1 kind: Service metadata: name: app annotations: external-dns.alpha.kubernetes.io/hostname: app.example.com external-dns.alpha.kubernetes.io/ttl: "300" spec: type: LoadBalancer ports: - port: 80
Deploy external-dns controller once, then all annotated Services/Ingresses automatically create DNS records.
For complete examples, see examples/external-dns/ and references/dns-as-code-comparison.md.
Cloud DNS Provider Selection
Provider Characteristics
AWS Route53
- Best for AWS-heavy infrastructure
- Advanced routing policies (weighted, latency, geolocation, failover)
- Health checks with automatic failover
- ALIAS records for AWS resources (ELB, CloudFront, S3)
- Pricing: $0.50/month per zone + $0.40 per million queries
Google Cloud DNS
- Best for GCP-native applications
- Strong DNSSEC support with automatic key rotation
- Private zones for VPC internal DNS
- Split-horizon DNS (different internal/external records)
- Pricing: $0.20/month per zone + $0.40 per million queries
Azure DNS
- Best for Azure-native applications
- Integration with Azure Traffic Manager
- Azure Private DNS zones
- Azure RBAC for access control
- Pricing: $0.50/month per zone + $0.40 per million queries
Cloudflare
- Best for multi-cloud or cloud-agnostic
- Fastest DNS query times globally
- Built-in DDoS protection
- Free tier with unlimited queries
- CDN integration
- Pricing: Free tier, $20/month Pro, $200/month Business
Selection Decision Tree
Choose based on:
ββ AWS-heavy? β Route53
ββ GCP-native? β Cloud DNS
ββ Azure-native? β Azure DNS
ββ Multi-cloud? β Cloudflare or OctoDNS/DNSControl
ββ Need fastest global DNS? β Cloudflare
ββ Need DDoS protection? β Cloudflare
ββ Budget-conscious? β Cloudflare (free tier) or Cloud DNS (lowest zone cost)
For detailed provider comparisons and examples, see references/cloud-providers.md.
DNS-Based Load Balancing
GeoDNS (Geographic Routing)
Return different IP addresses based on client location to:
- Reduce latency (route to nearest data center)
- Comply with data residency requirements
- Distribute load across regions
Example Pattern:
Client Location β DNS Response
ββ North America β 192.0.2.1 (US data center)
ββ Europe β 192.0.2.10 (EU data center)
ββ Default β CloudFront edge (global CDN)
Weighted Routing
Distribute traffic by percentage for:
- Blue-green deployments
- Canary releases (10% to new version)
- A/B testing
Example Pattern:
DNS Responses:
ββ 90% β 192.0.2.1 (stable version)
ββ 10% β 192.0.2.2 (canary version)
Health Check-Based Failover
Automatically route traffic away from unhealthy endpoints.
Pattern:
Primary: 192.0.2.1 (health checked every 30s)
ββ Healthy β Return primary IP
ββ Unhealthy β Return secondary IP (192.0.2.2)
Failover time: ~2-3 minutes
= Health check failures (90s) + TTL expiration (60s)
For complete load balancing examples, see examples/load-balancing/.
Troubleshooting
Essential Commands
Check DNS Resolution:
# Basic query dig example.com # Clean output (just IP) dig example.com +short # Query specific DNS server dig @8.8.8.8 example.com dig @1.1.1.1 example.com # Trace resolution path dig +trace example.com
Check TTL:
dig example.com | grep -A1 "ANSWER SECTION" # Look for TTL value (number before IN A)
Check Propagation:
# Multiple resolvers dig @8.8.8.8 example.com +short # Google dig @1.1.1.1 example.com +short # Cloudflare dig @208.67.222.222 example.com +short # OpenDNS
Flush Local DNS Cache:
# macOS sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder # Windows ipconfig /flushdns # Linux sudo systemd-resolve --flush-caches
Common Problems
Slow Propagation:
- Check current TTL (old TTL must expire first)
- Lower TTL 24-48 hours before changes
- Use propagation checkers: whatsmydns.net, dnschecker.org
CNAME at Zone Apex:
- Error: Cannot use CNAME at @ (zone apex)
- Solution: Use ALIAS record (Route53, Cloudflare) or A record
external-dns Not Creating Records:
- Verify annotation spelling:
external-dns.alpha.kubernetes.io/hostname - Check domain filter matches:
--domain-filter=example.com - Review external-dns logs for errors
- Confirm provider credentials configured
For detailed troubleshooting, see references/troubleshooting.md.
Common Patterns
Pattern 1: Kubernetes DNS Automation
# Deploy external-dns (once per cluster) helm install external-dns external-dns/external-dns \ --set provider=aws \ --set domainFilters[0]=example.com \ --set policy=sync # Then annotate Services apiVersion: v1 kind: Service metadata: annotations: external-dns.alpha.kubernetes.io/hostname: api.example.com external-dns.alpha.kubernetes.io/ttl: "300" spec: type: LoadBalancer
Pattern 2: Multi-Provider Sync with OctoDNS
# octodns-config.yaml providers: config: class: octodns.provider.yaml.YamlProvider directory: ./config route53: class: octodns_route53.Route53Provider cloudflare: class: octodns_cloudflare.CloudflareProvider zones: example.com.: sources: [config] targets: [route53, cloudflare]
Pattern 3: DNS-Based Failover
# Route53 with health checks resource "aws_route53_health_check" "primary" { fqdn = "primary.example.com" port = 443 type = "HTTPS" resource_path = "/health" failure_threshold = 3 request_interval = 30 } resource "aws_route53_record" "primary" { zone_id = aws_route53_zone.main.zone_id name = "api.example.com" type = "A" ttl = 60 set_identifier = "primary" failover_routing_policy { type = "PRIMARY" } health_check_id = aws_route53_health_check.primary.id records = ["192.0.2.1"] } resource "aws_route53_record" "secondary" { zone_id = aws_route53_zone.main.zone_id name = "api.example.com" type = "A" ttl = 60 set_identifier = "secondary" failover_routing_policy { type = "SECONDARY" } records = ["192.0.2.2"] }
Integration with Other Skills
infrastructure-as-code:
- Manage DNS via Terraform/Pulumi alongside other resources
- Zone configuration in IaC repositories
kubernetes-operations:
- external-dns automates DNS for Kubernetes workloads
- Ingress controller integration for automatic DNS
load-balancing-patterns:
- DNS-based load balancing (GeoDNS, weighted routing)
- Health checks and failover configurations
security-hardening:
- DNSSEC for DNS integrity
- CAA records for certificate authority control
- DNS-based DDoS mitigation
secret-management:
- Store DNS provider API credentials in vaults
- Secure DDNS update mechanisms
Additional Resources
Reference Documentation:
references/record-types.md- Detailed record type guide with examplesreferences/ttl-strategies.md- TTL scenarios and propagation calculationsreferences/cloud-providers.md- Provider comparison and detailed featuresreferences/troubleshooting.md- Common problems and solutionsreferences/dns-as-code-comparison.md- Tool comparison matrix
Examples:
examples/external-dns/- Kubernetes DNS automationexamples/octodns/- Multi-provider sync with YAMLexamples/dnscontrol/- Multi-provider with JavaScript DSLexamples/terraform/- Cloud provider configurationsexamples/load-balancing/- GeoDNS and failover patterns
Scripts:
scripts/check-dns-propagation.sh- Verify propagation across resolversscripts/validate-dns-config.py- Validate DNS configurationscripts/export-dns-records.sh- Export existing DNS recordsscripts/calculate-ttl-propagation.py- Calculate propagation time
Quick Reference
Record Types Cheat Sheet
| Record | Purpose | Example |
|---|---|---|
| A | IPv4 address | example.com β 192.0.2.1 |
| AAAA | IPv6 address | example.com β 2001:db8::1 |
| CNAME | Alias to domain | www β example.com |
| MX | Mail server | 10 mail.example.com |
| TXT | Text/verification | "v=spf1 include:_spf.google.com ~all" |
| SRV | Service location | 10 60 5060 sip.example.com |
| NS | Nameserver delegation | ns1.provider.com |
| CAA | CA authorization | 0 issue "letsencrypt.org" |
TTL Cheat Sheet
| Scenario | TTL | Why |
|---|---|---|
| Stable production | 3600s | Balance speed/load |
| Before change | 300s | Fast propagation |
| Failover | 60-300s | Fast recovery |
| NS records | 86400s | Very stable |
Provider Cheat Sheet
| Provider | Best For | Key Feature |
|---|---|---|
| Route53 | AWS | Advanced routing, health checks |
| Cloud DNS | GCP | DNSSEC, private zones |
| Azure DNS | Azure | Traffic Manager integration |
| Cloudflare | Multi-cloud | Fastest, DDoS protection, free tier |
Tool Cheat Sheet
| Tool | Use When |
|---|---|
| external-dns | Kubernetes DNS automation |
| OctoDNS | Multi-provider, Python shop |
| DNSControl | Multi-provider, JavaScript preference |
| Terraform | Managing DNS with other infrastructure |