Data Encoding Formats: Base64, Hex, URL Encoding Explained

Encoding is the process of converting data from one representation to another so it can be safely transmitted, stored, or processed by different systems. It is critical to understand that encoding is not a security mechanism — it provides no confidentiality whatsoever. Anyone who knows the encoding scheme (and the schemes are public by design) can reverse the process instantly. Encoding exists to solve compatibility and transport problems, not to protect data from unauthorized access.

Base64 Encoding

Base64 is the most widely used binary-to-text encoding on the web. It maps every three bytes of binary data to four printable ASCII characters drawn from an alphabet of 64 characters (A-Z, a-z, 0-9, +, /), with = used for padding. This makes it safe to embed binary content in text-based formats like JSON, XML, HTML, and email bodies.

The trade-off is a 33% size overhead — every 3 bytes of input produce 4 bytes of output. A 3 MB image becomes roughly 4 MB when Base64-encoded. Despite this cost, Base64 is the standard choice for embedding images as data URIs, encoding email attachments via MIME, transmitting binary payloads in JSON APIs, and encoding JWT header and payload segments.

// Base64 encoding example
"Hello World" → "SGVsbG8gV29ybGQ="

// Three bytes (24 bits) become four Base64 characters (24 bits)
// Input:  01001000 01100101 01101100
// Split:  010010 000110 010101 101100
// Index:  18     6      21     44
// Output: S      G      V      s

A URL-safe variant replaces + with - and / with _, preventing conflicts with URL reserved characters. JWTs use this URL-safe variant (often called Base64URL) and omit the padding = characters entirely.

Hexadecimal Encoding

Hexadecimal (hex) encoding represents each byte as two characters from the alphabet 0-9 and a-f. Since each hex digit encodes exactly 4 bits (a nibble), a single byte (8 bits) always maps to exactly two hex characters. This 1:2 byte-to-character ratio means hex encoding has a 100% size overhead — double the size of the original data — making it less space-efficient than Base64.

// Hexadecimal encoding
"Hello" → "48656c6c6f"

// Each byte becomes two hex characters
H = 0x48, e = 0x65, l = 0x6c, l = 0x6c, o = 0x6f

Despite the higher overhead, hex is invaluable for debugging and low-level work because each byte boundary is visually obvious. You will encounter hex encoding in:

  • Hash digests — SHA-256 outputs are almost always displayed as 64 hex characters (32 bytes)
  • CSS color codes#ff6600 represents three bytes: red (255), green (102), blue (0)
  • MAC addresses00:1A:2B:3C:4D:5E
  • Memory dumps and hex editors — byte-level inspection of files and network packets
  • UUID strings550e8400-e29b-41d4-a716-446655440000 is 16 bytes in hex with dashes

URL / Percent Encoding

URL encoding (also called percent encoding) is defined in RFC 3986 and ensures that arbitrary data can be safely included in a URL. Characters that have special meaning in URLs — such as ?, &, =, #, /, and spaces — are replaced with a percent sign followed by their two-digit hex value.

// URL encoding examples
"hello world"        → "hello%20world"
"price=10&qty=2"     → "price%3D10%26qty%3D2"
"café"              → "caf%C3%A9"

// In JavaScript
encodeURIComponent("hello world")  // "hello%20world"
encodeURIComponent("a=1&b=2")     // "a%3D1%26b%3D2"

RFC 3986 defines unreserved characters that never need encoding: A-Z, a-z, 0-9, -, _, ., and ~. Everything else in a query parameter or path segment should be percent-encoded. In JavaScript, encodeURIComponent() handles this correctly for individual components, while encodeURI() preserves the structural characters like / and ? and is meant for encoding full URIs.

ASCII85 / Base85

ASCII85 (also called Base85) is a more space-efficient alternative to Base64 that encodes four bytes of binary data into five ASCII characters, yielding only a 25% overhead compared to Base64's 33%. It uses 85 printable characters from the ASCII range 33 (!) through 117 (u).

ASCII85 was popularized by Adobe for use in PostScript and PDF files, where it appears between <~ and ~> delimiters. A variant called Z85 is used in the ZeroMQ messaging library. Despite its better efficiency, ASCII85 never achieved the ubiquity of Base64 because its larger character set includes characters that are problematic in certain contexts — for example, the double quote, backslash, and angle brackets that would need escaping in JSON, XML, or HTML.

Encoding Comparison

The following table summarizes the key characteristics of each encoding scheme:

Encoding     | Overhead | Alphabet Size | Primary Use Cases
-------------|----------|---------------|----------------------------------
Base64       | ~33%     | 64 chars      | Email, data URIs, APIs, JWTs
Hex          | 100%     | 16 chars      | Hash digests, debugging, colors
URL/Percent  | Variable | Hex per byte  | URLs, query strings, form data
ASCII85      | ~25%     | 85 chars      | PostScript, PDF, ZeroMQ
Base32       | ~60%     | 32 chars      | TOTP codes, case-insensitive contexts

Choosing the Right Encoding

The best encoding depends on your constraints and the environment where the data will be used:

  • Embedding binary in JSON or XML — Use Base64. It is universally supported and expected in these contexts.
  • Passing data in URL parameters — Use URL encoding (encodeURIComponent). If the data is binary, first Base64-encode it with the URL-safe alphabet, then URL-encode if needed.
  • Displaying hash values or byte sequences — Use hexadecimal. Its fixed two-characters-per-byte format makes byte boundaries obvious.
  • Minimizing size overhead in controlled environments — Consider ASCII85 if you control both ends and the character set is safe for your transport.
  • Case-insensitive systems or human transcription — Use Base32. Its smaller alphabet avoids confusing characters like 0/O and 1/l, which is why it is used for TOTP secret keys and backup codes.

In most web development scenarios, Base64 and URL encoding cover the vast majority of needs. Hex is essential for debugging and cryptographic output. ASCII85 and Base32 serve niche roles where their specific properties offer advantages.

Regardless of which encoding you choose, remember the fundamental principle: encoding transforms the representation of data, not its security posture. If you need to protect data, encoding is the wrong tool — use encryption instead.

Related Guides