How Base64 Encoding Works: Step by Step
Base64 encoding converts binary data into a text representation using 64 printable ASCII characters. While the concept is simple, the mechanics involve bit manipulation that can seem opaque at first glance. This guide walks through the algorithm step by step, working through a concrete example so you can see exactly how input bytes become Base64 output.
The Core Idea: 6 Bits Instead of 8
Computers store data in bytes of 8 bits, giving 256 possible values (0-255) per byte. Many of those values correspond to non-printable or control characters that text systems cannot handle safely. Base64 sidesteps this by working in 6-bit groups instead of 8-bit bytes. Six bits give 26 = 64 possible values, each mapped to a character from the Base64 alphabet — all of which are safe, printable ASCII characters.
The mathematical consequence: every 3 input bytes (3 x 8 = 24 bits) map to exactly 4 output characters (4 x 6 = 24 bits). This is why Base64 output is always one-third larger than the input.
The Base64 Alphabet
Each 6-bit value (0 through 63) maps to a specific character. Here is the complete lookup table defined by RFC 4648:
Index Char Index Char Index Char Index Char
----- ---- ----- ---- ----- ---- ----- ----
0 A 16 Q 32 g 48 w
1 B 17 R 33 h 49 x
2 C 18 S 34 i 50 y
3 D 19 T 35 j 51 z
4 E 20 U 36 k 52 0
5 F 21 V 37 l 53 1
6 G 22 W 38 m 54 2
7 H 23 X 39 n 55 3
8 I 24 Y 40 o 56 4
9 J 25 Z 41 p 57 5
10 K 26 a 42 q 58 6
11 L 27 b 43 r 59 7
12 M 28 c 44 s 60 8
13 N 29 d 45 t 61 9
14 O 30 e 46 u 62 +
15 P 31 f 47 v 63 /The pattern is straightforward: uppercase letters first (A-Z = 0-25), then lowercase (a-z = 26-51), then digits (0-9 = 52-61), and finally + and / for indices 62 and 63. The = character is reserved for padding and does not appear in this table.
Step-by-Step Example: Encoding "Hello"
Let us walk through encoding the string Hello from start to finish.
Step 1: Convert Characters to ASCII Byte Values
H = 72
e = 101
l = 108
l = 108
o = 111Step 2: Convert Byte Values to Binary
H = 01001000
e = 01100101
l = 01101100
l = 01101100
o = 01101111Step 3: Concatenate All Bits
Join all the binary representations into one continuous bit stream:
01001000 01100101 01101100 01101100 01101111That gives us 40 bits total (5 bytes x 8 bits).
Step 4: Split into 6-Bit Groups
Regroup the bits into chunks of 6. Since 40 is not a multiple of 6, the last group will be short and needs to be padded with zero bits on the right:
010010 | 000110 | 010101 | 101100 | 011011 | 000110 | 1111[00]
^^ padding zerosWe now have 7 six-bit groups. The last group 111100 includes two padding zeros appended to the original bits 1111.
Step 5: Look Up Each 6-Bit Value in the Alphabet
010010 = 18 → S
000110 = 6 → G
010101 = 21 → V
101100 = 44 → s
011011 = 27 → b
000110 = 6 → G
111100 = 60 → 8Step 6: Apply Padding
The output must be a multiple of 4 characters. We have 7 characters so far, so we need 1 padding character to reach 8:
SGVsbG8=And there it is: Hello encodes to SGVsbG8=.
Why 3 Bytes Become 4 Characters
The math is clean when the input length is a multiple of 3. Three bytes contain 24 bits. Dividing 24 bits into groups of 6 gives exactly 4 groups, and 4 groups produce 4 characters. No padding is needed.
3 bytes = 24 bits = 4 groups of 6 bits = 4 Base64 characters
Input: [byte1][byte2][byte3]
[8 bits][8 bits][8 bits] = 24 bits
Output: [char1][char2][char3][char4]
[6 bits][6 bits][6 bits][6 bits] = 24 bitsThis 3-to-4 ratio is the source of the 33% size overhead inherent in Base64. Every three bytes of input always produce exactly four bytes of output.
Padding Rules
When the input byte count is not a multiple of 3, padding ensures the output length is always a multiple of 4. There are three cases:
Case 1: Input Length is a Multiple of 3
No padding needed. The output divides evenly into 4-character groups.
"ABC" (3 bytes) → "QUJD" (4 chars, no padding)Case 2: Input Length Leaves a Remainder of 1
One leftover byte provides 8 bits. You need 12 bits (two 6-bit groups), so 4 zero bits are appended. The two resulting characters are followed by == to pad the output to 4 characters.
"A" (1 byte) → "QQ==" (2 data chars + 2 padding chars)Case 3: Input Length Leaves a Remainder of 2
Two leftover bytes provide 16 bits. You need 18 bits (three 6-bit groups), so 2 zero bits are appended. The three resulting characters are followed by a single =.
"AB" (2 bytes) → "QUI=" (3 data chars + 1 padding char)Summary Table
Input bytes mod 3 Padding chars Output chars
------------------ ------------- ------------
0 0 4n
1 2 4n + 4
2 1 4n + 4By looking at the number of = signs at the end of a Base64 string, you can immediately tell how many "extra" bytes were in the last group: two = signs mean one extra byte, one = sign means two extra bytes, and no padding means the input was evenly divisible by three.
Decoding: The Reverse Process
Decoding Base64 is the exact reverse:
- Remove any
=padding characters. - Convert each Base64 character back to its 6-bit index value using the alphabet table.
- Concatenate all 6-bit groups into a continuous bit stream.
- Split the bit stream into 8-bit bytes.
- Discard any trailing bits that were added as zero-padding during encoding (these will always be zero and not part of the original data).
For SGVsbG8=, the decoder strips the =, looks up S=18, G=6, V=21, s=44, b=27, G=6, 8=60, reconstructs the bit stream, regroups into bytes, and recovers the original Hello.
Common Encoding Pitfalls
A few issues that frequently trip up developers:
- Character encoding matters. Base64 operates on bytes, not characters. The string "Hello" produces different bytes in UTF-8 vs. UTF-16 vs. ISO-8859-1. Always be explicit about which character encoding you use before applying Base64.
- Line breaks in MIME Base64. The MIME variant inserts a line break (CRLF) every 76 characters. Decoders must be prepared to ignore whitespace. The "plain" Base64 from RFC 4648 does not include line breaks.
- Whitespace handling. Some systems insert spaces, tabs, or newlines into Base64 strings. A robust decoder should strip whitespace before processing, but not all implementations do so by default.
Related Guides
- What Is Base64? — Overview of Base64 and its real-world applications
- URL-Safe Base64 — The variant that replaces
+and/for use in URLs - Base64 in JavaScript — Implementing Base64 encoding and decoding in the browser and Node.js