YAML for Configuration Files: A Practical Guide

YAML (YAML Ain't Markup Language) is a human-readable data serialization format that has become the de facto standard for configuration files across the DevOps ecosystem. From Docker Compose and Kubernetes manifests to GitHub Actions workflows and Ansible playbooks, YAML's clean indentation-based syntax makes it easier to read and write than JSON for configuration purposes. However, that readability comes with subtle pitfalls that have tripped up developers for decades.

YAML Basics

YAML uses indentation (spaces only — tabs are not allowed) to represent structure. At its core, a YAML document consists of key-value pairs called mappings, where a key and value are separated by a colon and a space:

# A simple YAML document
name: my-application
version: 2.4.1
description: A web service for processing uploads
port: 8080
debug: false

Nesting is expressed through indentation. Each level of indentation typically uses two spaces, though any consistent number of spaces works. The important rule is that sibling elements must share the same indentation level:

server:
  host: 0.0.0.0
  port: 3000
  ssl:
    enabled: true
    cert: /etc/ssl/cert.pem
    key: /etc/ssl/key.pem

Data Types

YAML automatically infers data types, which is both convenient and a source of bugs:

# Strings
name: hello world         # Unquoted string
title: "Hello: World"     # Quoted (necessary when value contains a colon)
path: '/usr/local/bin'    # Single-quoted (no escape processing)

# Numbers
count: 42                 # Integer
price: 19.99              # Float
hex_value: 0xff           # Hexadecimal integer (255)
octal_value: 0o777        # Octal integer (511)

# Booleans
enabled: true             # Boolean true
disabled: false           # Boolean false

# Null
value: null               # Explicit null
other: ~                  # Tilde is also null
empty:                    # Empty value is null

# Arrays (sequences)
fruits:
  - apple
  - banana
  - cherry

# Inline array (flow style)
colors: [red, green, blue]

# Inline mapping (flow style)
point: {x: 10, y: 20}

Multi-line Strings

YAML provides two powerful block scalar styles for multi-line strings, controlled by the indicator character after the key:

Literal Block Scalar (|)

The pipe character preserves all line breaks exactly as written. This is ideal for embedding scripts, SQL queries, or any content where newlines are significant:

script: |
  #!/bin/bash
  echo "Starting deployment"
  npm install
  npm run build
  echo "Done"
# Result: each line break is preserved in the string

Folded Block Scalar (>)

The greater-than character folds newlines into spaces, creating a single long line. Empty lines become literal newlines. This is useful for long descriptions that you want to wrap in the YAML file for readability:

description: >
  This is a long description
  that spans multiple lines
  in the YAML source file.
# Result: "This is a long description that spans multiple lines in the YAML source file.
"

Both block scalars support chomping indicators to control trailing newlines: |- and >- strip the final newline, while |+ and >+ keep all trailing newlines.

Anchors and Aliases

YAML supports DRY (Don't Repeat Yourself) principles through anchors and aliases. An anchor (&) marks a node for reuse, and an alias (*) references it:

defaults: &default_settings
  timeout: 30
  retries: 3
  log_level: info

production:
  <<: *default_settings    # Merge key: inherits all default_settings
  log_level: warn          # Override specific values

staging:
  <<: *default_settings
  timeout: 60

The << merge key combines the anchor's mappings into the current mapping, with local values taking precedence. This is particularly useful in Docker Compose files for sharing configuration across services.

Common Pitfalls

The Norway Problem

This is YAML's most infamous gotcha. In YAML 1.1 (which many parsers still implement by default), certain unquoted strings are interpreted as booleans:

# YAML 1.1 boolean interpretations — all become true or false!
countries:
  - GB    # String "GB" — fine
  - FR    # String "FR" — fine
  - NO    # Boolean false! Norway disappears
  - DE    # String "DE" — fine

# Also affected:
enabled: yes    # Boolean true (not the string "yes")
enabled: on     # Boolean true (not the string "on")
answer: NO      # Boolean false (not the string "NO")
value: off      # Boolean false (not the string "off")

The fix is to always quote strings that could be misinterpreted:"NO", "yes", "on", "off". YAML 1.2 restricted booleans to only true and false, but many tools still use YAML 1.1 parsers.

Implicit Type Coercion

Numbers and version strings can also be silently misinterpreted:

version: 1.0      # Float 1.0 (not the string "1.0")
version: 1.20     # Float 1.2 (trailing zero dropped!)
phone: 0123456    # Octal number 42798 in YAML 1.1!
timestamp: 2024-01-15  # Parsed as a Date object, not a string

# Safe versions — always quote when in doubt:
version: "1.0"
version: "1.20"
phone: "0123456"
timestamp: "2024-01-15"

Indentation Errors

Since YAML uses indentation for structure, a single misplaced space can change the meaning of your document or cause a parse error. Common mistakes include mixing tabs and spaces (YAML forbids tabs for indentation), inconsistent indentation depths, and accidentally indenting a key under the wrong parent.

YAML in Practice

Docker Compose

services:
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./html:/usr/share/nginx/html:ro
    depends_on:
      - api

  api:
    build: ./api
    environment:
      DATABASE_URL: postgres://db:5432/myapp
      NODE_ENV: production
    ports:
      - "3000:3000"

GitHub Actions

name: CI
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "20"
      - run: npm ci
      - run: npm test

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  labels:
    app: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: myapp:latest
          ports:
            - containerPort: 8080

YAML vs JSON for Configuration

JSON and YAML can represent the same data structures, but they serve different purposes well. JSON's strict syntax (no comments, mandatory quoting, explicit commas) makes it ideal for machine-to-machine data exchange where ambiguity is unacceptable. YAML's human-friendly features — comments, multi-line strings, anchors, and implicit typing — make it the preferred choice for configuration files that humans read and edit frequently.

That said, YAML's flexibility is a double-edged sword. The implicit type coercion that makes YAML concise also causes the Norway problem and version string bugs. JSON's explicitness eliminates these issues entirely. For programmatic config generation, JSON is often safer. For hand-authored config files with comments, YAML is usually more pleasant.

Best Practices

  • Always quote strings that could be misinterpreted — country codes, version numbers, values like "yes", "no", "on", "off", and anything starting with a special character
  • Use a linter — tools like yamllint catch indentation errors, implicit typing issues, and style inconsistencies before they reach production
  • Set a consistent indentation — two spaces is the most common convention; enforce it with editor settings and linter rules
  • Add comments — YAML supports # comments, so use them to explain non-obvious configuration choices
  • Use anchors sparingly — they reduce repetition but can make files harder to follow if overused
  • Validate against a schema — tools like JSON Schema (which also validates YAML) or dedicated YAML schema validators catch structural errors early

Related Guides