YAML for Configuration Files: A Practical Guide
YAML (YAML Ain't Markup Language) is a human-readable data serialization format that has become the de facto standard for configuration files across the DevOps ecosystem. From Docker Compose and Kubernetes manifests to GitHub Actions workflows and Ansible playbooks, YAML's clean indentation-based syntax makes it easier to read and write than JSON for configuration purposes. However, that readability comes with subtle pitfalls that have tripped up developers for decades.
YAML Basics
YAML uses indentation (spaces only — tabs are not allowed) to represent structure. At its core, a YAML document consists of key-value pairs called mappings, where a key and value are separated by a colon and a space:
# A simple YAML document
name: my-application
version: 2.4.1
description: A web service for processing uploads
port: 8080
debug: falseNesting is expressed through indentation. Each level of indentation typically uses two spaces, though any consistent number of spaces works. The important rule is that sibling elements must share the same indentation level:
server:
host: 0.0.0.0
port: 3000
ssl:
enabled: true
cert: /etc/ssl/cert.pem
key: /etc/ssl/key.pemData Types
YAML automatically infers data types, which is both convenient and a source of bugs:
# Strings
name: hello world # Unquoted string
title: "Hello: World" # Quoted (necessary when value contains a colon)
path: '/usr/local/bin' # Single-quoted (no escape processing)
# Numbers
count: 42 # Integer
price: 19.99 # Float
hex_value: 0xff # Hexadecimal integer (255)
octal_value: 0o777 # Octal integer (511)
# Booleans
enabled: true # Boolean true
disabled: false # Boolean false
# Null
value: null # Explicit null
other: ~ # Tilde is also null
empty: # Empty value is null
# Arrays (sequences)
fruits:
- apple
- banana
- cherry
# Inline array (flow style)
colors: [red, green, blue]
# Inline mapping (flow style)
point: {x: 10, y: 20}Multi-line Strings
YAML provides two powerful block scalar styles for multi-line strings, controlled by the indicator character after the key:
Literal Block Scalar (|)
The pipe character preserves all line breaks exactly as written. This is ideal for embedding scripts, SQL queries, or any content where newlines are significant:
script: |
#!/bin/bash
echo "Starting deployment"
npm install
npm run build
echo "Done"
# Result: each line break is preserved in the stringFolded Block Scalar (>)
The greater-than character folds newlines into spaces, creating a single long line. Empty lines become literal newlines. This is useful for long descriptions that you want to wrap in the YAML file for readability:
description: >
This is a long description
that spans multiple lines
in the YAML source file.
# Result: "This is a long description that spans multiple lines in the YAML source file.
"Both block scalars support chomping indicators to control trailing newlines: |- and >- strip the final newline, while |+ and >+ keep all trailing newlines.
Anchors and Aliases
YAML supports DRY (Don't Repeat Yourself) principles through anchors and aliases. An anchor (&) marks a node for reuse, and an alias (*) references it:
defaults: &default_settings
timeout: 30
retries: 3
log_level: info
production:
<<: *default_settings # Merge key: inherits all default_settings
log_level: warn # Override specific values
staging:
<<: *default_settings
timeout: 60The << merge key combines the anchor's mappings into the current mapping, with local values taking precedence. This is particularly useful in Docker Compose files for sharing configuration across services.
Common Pitfalls
The Norway Problem
This is YAML's most infamous gotcha. In YAML 1.1 (which many parsers still implement by default), certain unquoted strings are interpreted as booleans:
# YAML 1.1 boolean interpretations — all become true or false!
countries:
- GB # String "GB" — fine
- FR # String "FR" — fine
- NO # Boolean false! Norway disappears
- DE # String "DE" — fine
# Also affected:
enabled: yes # Boolean true (not the string "yes")
enabled: on # Boolean true (not the string "on")
answer: NO # Boolean false (not the string "NO")
value: off # Boolean false (not the string "off")The fix is to always quote strings that could be misinterpreted:"NO", "yes", "on", "off". YAML 1.2 restricted booleans to only true and false, but many tools still use YAML 1.1 parsers.
Implicit Type Coercion
Numbers and version strings can also be silently misinterpreted:
version: 1.0 # Float 1.0 (not the string "1.0")
version: 1.20 # Float 1.2 (trailing zero dropped!)
phone: 0123456 # Octal number 42798 in YAML 1.1!
timestamp: 2024-01-15 # Parsed as a Date object, not a string
# Safe versions — always quote when in doubt:
version: "1.0"
version: "1.20"
phone: "0123456"
timestamp: "2024-01-15"Indentation Errors
Since YAML uses indentation for structure, a single misplaced space can change the meaning of your document or cause a parse error. Common mistakes include mixing tabs and spaces (YAML forbids tabs for indentation), inconsistent indentation depths, and accidentally indenting a key under the wrong parent.
YAML in Practice
Docker Compose
services:
web:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./html:/usr/share/nginx/html:ro
depends_on:
- api
api:
build: ./api
environment:
DATABASE_URL: postgres://db:5432/myapp
NODE_ENV: production
ports:
- "3000:3000"GitHub Actions
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "20"
- run: npm ci
- run: npm testKubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
labels:
app: web
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: myapp:latest
ports:
- containerPort: 8080YAML vs JSON for Configuration
JSON and YAML can represent the same data structures, but they serve different purposes well. JSON's strict syntax (no comments, mandatory quoting, explicit commas) makes it ideal for machine-to-machine data exchange where ambiguity is unacceptable. YAML's human-friendly features — comments, multi-line strings, anchors, and implicit typing — make it the preferred choice for configuration files that humans read and edit frequently.
That said, YAML's flexibility is a double-edged sword. The implicit type coercion that makes YAML concise also causes the Norway problem and version string bugs. JSON's explicitness eliminates these issues entirely. For programmatic config generation, JSON is often safer. For hand-authored config files with comments, YAML is usually more pleasant.
Best Practices
- Always quote strings that could be misinterpreted — country codes, version numbers, values like "yes", "no", "on", "off", and anything starting with a special character
- Use a linter — tools like
yamllintcatch indentation errors, implicit typing issues, and style inconsistencies before they reach production - Set a consistent indentation — two spaces is the most common convention; enforce it with editor settings and linter rules
- Add comments — YAML supports
#comments, so use them to explain non-obvious configuration choices - Use anchors sparingly — they reduce repetition but can make files harder to follow if overused
- Validate against a schema — tools like JSON Schema (which also validates YAML) or dedicated YAML schema validators catch structural errors early
Related Guides
- JSON vs. YAML — A detailed comparison of syntax, features, and when to use each
- JSON Formatting Guide — Best practices for formatting and working with JSON
- JSON API Patterns — Common patterns for designing JSON-based APIs