How Base64 Encoding Works: Step by Step

Base64 encoding converts binary data into a text representation using 64 printable ASCII characters. While the concept is simple, the mechanics involve bit manipulation that can seem opaque at first glance. This guide walks through the algorithm step by step, working through a concrete example so you can see exactly how input bytes become Base64 output.

The Core Idea: 6 Bits Instead of 8

Computers store data in bytes of 8 bits, giving 256 possible values (0-255) per byte. Many of those values correspond to non-printable or control characters that text systems cannot handle safely. Base64 sidesteps this by working in 6-bit groups instead of 8-bit bytes. Six bits give 26 = 64 possible values, each mapped to a character from the Base64 alphabet — all of which are safe, printable ASCII characters.

The mathematical consequence: every 3 input bytes (3 x 8 = 24 bits) map to exactly 4 output characters (4 x 6 = 24 bits). This is why Base64 output is always one-third larger than the input.

The Base64 Alphabet

Each 6-bit value (0 through 63) maps to a specific character. Here is the complete lookup table defined by RFC 4648:

Index  Char    Index  Char    Index  Char    Index  Char
-----  ----    -----  ----    -----  ----    -----  ----
  0      A       16     Q       32     g       48     w
  1      B       17     R       33     h       49     x
  2      C       18     S       34     i       50     y
  3      D       19     T       35     j       51     z
  4      E       20     U       36     k       52     0
  5      F       21     V       37     l       53     1
  6      G       22     W       38     m       54     2
  7      H       23     X       39     n       55     3
  8      I       24     Y       40     o       56     4
  9      J       25     Z       41     p       57     5
 10      K       26     a       42     q       58     6
 11      L       27     b       43     r       59     7
 12      M       28     c       44     s       60     8
 13      N       29     d       45     t       61     9
 14      O       30     e       46     u       62     +
 15      P       31     f       47     v       63     /

The pattern is straightforward: uppercase letters first (A-Z = 0-25), then lowercase (a-z = 26-51), then digits (0-9 = 52-61), and finally + and / for indices 62 and 63. The = character is reserved for padding and does not appear in this table.

Step-by-Step Example: Encoding "Hello"

Let us walk through encoding the string Hello from start to finish.

Step 1: Convert Characters to ASCII Byte Values

H = 72
e = 101
l = 108
l = 108
o = 111

Step 2: Convert Byte Values to Binary

H = 01001000
e = 01100101
l = 01101100
l = 01101100
o = 01101111

Step 3: Concatenate All Bits

Join all the binary representations into one continuous bit stream:

01001000 01100101 01101100 01101100 01101111

That gives us 40 bits total (5 bytes x 8 bits).

Step 4: Split into 6-Bit Groups

Regroup the bits into chunks of 6. Since 40 is not a multiple of 6, the last group will be short and needs to be padded with zero bits on the right:

010010 | 000110 | 010101 | 101100 | 011011 | 000110 | 1111[00]
                                                          ^^ padding zeros

We now have 7 six-bit groups. The last group 111100 includes two padding zeros appended to the original bits 1111.

Step 5: Look Up Each 6-Bit Value in the Alphabet

010010 = 18 → S
000110 =  6 → G
010101 = 21 → V
101100 = 44 → s
011011 = 27 → b
000110 =  6 → G
111100 = 60 → 8

Step 6: Apply Padding

The output must be a multiple of 4 characters. We have 7 characters so far, so we need 1 padding character to reach 8:

SGVsbG8=

And there it is: Hello encodes to SGVsbG8=.

Why 3 Bytes Become 4 Characters

The math is clean when the input length is a multiple of 3. Three bytes contain 24 bits. Dividing 24 bits into groups of 6 gives exactly 4 groups, and 4 groups produce 4 characters. No padding is needed.

3 bytes = 24 bits = 4 groups of 6 bits = 4 Base64 characters

Input:   [byte1][byte2][byte3]
         [8 bits][8 bits][8 bits]   = 24 bits

Output:  [char1][char2][char3][char4]
         [6 bits][6 bits][6 bits][6 bits] = 24 bits

This 3-to-4 ratio is the source of the 33% size overhead inherent in Base64. Every three bytes of input always produce exactly four bytes of output.

Padding Rules

When the input byte count is not a multiple of 3, padding ensures the output length is always a multiple of 4. There are three cases:

Case 1: Input Length is a Multiple of 3

No padding needed. The output divides evenly into 4-character groups.

"ABC" (3 bytes) → "QUJD" (4 chars, no padding)

Case 2: Input Length Leaves a Remainder of 1

One leftover byte provides 8 bits. You need 12 bits (two 6-bit groups), so 4 zero bits are appended. The two resulting characters are followed by == to pad the output to 4 characters.

"A" (1 byte) → "QQ==" (2 data chars + 2 padding chars)

Case 3: Input Length Leaves a Remainder of 2

Two leftover bytes provide 16 bits. You need 18 bits (three 6-bit groups), so 2 zero bits are appended. The three resulting characters are followed by a single =.

"AB" (2 bytes) → "QUI=" (3 data chars + 1 padding char)

Summary Table

Input bytes mod 3    Padding chars    Output chars
------------------   -------------    ------------
       0                  0               4n
       1                  2               4n + 4
       2                  1               4n + 4

By looking at the number of = signs at the end of a Base64 string, you can immediately tell how many "extra" bytes were in the last group: two = signs mean one extra byte, one = sign means two extra bytes, and no padding means the input was evenly divisible by three.

Decoding: The Reverse Process

Decoding Base64 is the exact reverse:

  1. Remove any = padding characters.
  2. Convert each Base64 character back to its 6-bit index value using the alphabet table.
  3. Concatenate all 6-bit groups into a continuous bit stream.
  4. Split the bit stream into 8-bit bytes.
  5. Discard any trailing bits that were added as zero-padding during encoding (these will always be zero and not part of the original data).

For SGVsbG8=, the decoder strips the =, looks up S=18, G=6, V=21, s=44, b=27, G=6, 8=60, reconstructs the bit stream, regroups into bytes, and recovers the original Hello.

Common Encoding Pitfalls

A few issues that frequently trip up developers:

  • Character encoding matters. Base64 operates on bytes, not characters. The string "Hello" produces different bytes in UTF-8 vs. UTF-16 vs. ISO-8859-1. Always be explicit about which character encoding you use before applying Base64.
  • Line breaks in MIME Base64. The MIME variant inserts a line break (CRLF) every 76 characters. Decoders must be prepared to ignore whitespace. The "plain" Base64 from RFC 4648 does not include line breaks.
  • Whitespace handling. Some systems insert spaces, tabs, or newlines into Base64 strings. A robust decoder should strip whitespace before processing, but not all implementations do so by default.

Related Guides