At the end of Chapter 1, we noted that computers represent all information — not just numbers, but text, images, sound, everything — as binary values. And then we said "more on that later." Later is now.
The short version, which we'll spend this entire chapter unpacking: everything is a number, and every number is binary. Pictures are numbers. Letters are numbers. Colors are numbers. The song playing through your earbuds is a very long sequence of numbers. Once you accept that, the rest of this chapter is just working out the details.
Grouping Bits: The Byte
A single bit — a 0 or a 1 — isn't very useful on its own. It can only represent two states. That's enough for a light switch, but not much else.
From the earliest days of computing, engineers settled on grouping bits together into fixed-size chunks. The standard chunk that stuck is eight bits, and we call it a byte.
A group of 8 bits. The standard unit of digital information. With 8 bits, a byte can represent 2⁸ = 256 unique values (0 through 255).
Why 8? Partly history, partly convenience. Eight is a power of 2, which plays nicely with binary arithmetic. And 256 values — 0 through 255 — turned out to be just enough to cover the English alphabet in both cases, digits, punctuation, and basic control characters. More on that shortly.
From bytes, we build larger units. You've seen all of these before — now you know what they actually mean:
| Unit | Symbol | Approximate Size | Example |
|---|---|---|---|
| Kilobyte | KB | 1,024 bytes | A short text message |
| Megabyte | MB | 1,024 KB ≈ 1 million bytes | A high-res photo |
| Gigabyte | GB | 1,024 MB ≈ 1 billion bytes | A feature film (compressed) |
| Terabyte | TB | 1,024 GB ≈ 1 trillion bytes | A large hard drive |
| Petabyte | PB | 1,024 TB | A large data center storage pool |
Representing Integers
We've been doing this since Chapter 1, but now we can be precise. With 8 bits you can represent integers from 0 to 255. With 16 bits, 0 to 65,535. With 32 bits, 0 to 4,294,967,295 — roughly 4.3 billion. The formula: n bits → 2ⁿ possible values → 0 through 2ⁿ − 1.
This matters practically. IPv4 addresses — the numbers that identify every device on the internet — are 32-bit values. That means there are exactly 2³² ≈ 4.3 billion possible addresses. When the internet was designed in the 1970s, that sounded like plenty. It wasn't. IPv4 address exhaustion is a real, ongoing problem, which is exactly why IPv6 switched to 128-bit addresses (2¹²⁸ has 39 digits — we're not running out of those soon).
Integer overflow is a related consequence. If software uses a 32-bit counter and that counter exceeds ~4.3 billion, the value wraps back to zero. This has caused real outages and some famous software failures. The arithmetic is honest — the data type just ran out of room.
Representing Text: ASCII
Numbers have an obvious binary representation. But what about the letter A? There's no natural encoding for a letter — someone had to make one up.
In 1963, a committee of American engineers did exactly that. They created the American Standard Code for Information Interchange, or ASCII — a table mapping every English character to a number between 0 and 127. The table wasn't derived from anything. It was a decision. Uppercase A is 65. Uppercase B is 66. Lowercase a is 97. Space is 32. And so on.
So when you type the word Hi, your computer stores:
| H | i |
| 72 | 105 |
| 01001000 | 01101001 |
Two characters, two bytes, 16 bits. A text file is nothing more than a sequence of these numbers, one per character.
The Limits of ASCII — and the Rise of Unicode
ASCII covered English just fine. The rest of the world, not so much. With only 128 values, there's no room for accented characters, non-Latin alphabets, or anything outside the American keyboard. Japanese, Arabic, Hindi, Chinese — none of it fits in 7 bits.
The solution is Unicode, a far more ambitious project that assigns a unique number — called a code point — to every character in every writing system on earth. Over 149,000 characters and counting.
And yes: emoji are Unicode characters. Every emoji you've ever sent has a Unicode code point, which means it has a number, which means it has a binary representation. Your 😂 is, at the machine level, just another pattern of bits.
Unicode code points are written in a notation like U+1F602. That 1F602 is a hexadecimal number (more on hex shortly) — in decimal it's 128,514. In binary it's a 17-bit number: 1 1111 0110 0000 0010. Every time you send 😂 in a text message, your phone transmits exactly that bit pattern across the network. Your friend's phone receives it, looks it up in the Unicode table, and renders the face. No magic — just a number, agreed upon by every device on earth.
UTF-8 is the most common way Unicode is actually stored on disk and transmitted over the network. It's cleverly designed: ASCII characters use exactly one byte each (identical to plain ASCII), while less common characters use two, three, or four bytes as needed. An English text file in UTF-8 is the same size as in ASCII, while still supporting every character and emoji on the planet. It's the dominant encoding on the modern web.
Representing Images: Pixels and Bitmaps
So far the pattern is holding up. Numbers are stored as binary directly. Text is stored as numbers — an arbitrary but agreed-upon assignment of a code point to every character. Both reduce to bits in the end. But what about something that doesn't feel like a number at all — like a photograph? Images seem fundamentally different. They have shape, color, spatial relationships between things. How do you turn a picture into a string of 1s and 0s?
The answer, as you might suspect by now, is more numbers. A lot of them.
Let's start with the simplest possible image: pure black and white — no color, no shades of gray, just on or off for each dot.
This is a monochrome bitmap. Each dot — called a pixel (short for "picture element") — is represented by exactly one bit: 1 for black, 0 for white. A row of 8 pixels is one byte. A grid of 8×8 pixels is 8 bytes — 64 bits of image data.
The smallest unit of a digital image — short for "picture element." In a monochrome image, one pixel = one bit. In a color image, one pixel = three bytes. Every digital image is a grid of pixels.
The widget below is a live 8×8 monochrome bitmap. Each cell is one pixel, one bit. Click to toggle pixels on and off — watch the binary row values update as you paint.
This is literally how early computer fonts worked. The letters on the first personal computers were stored as 8×8 bitmaps — one byte per row, eight rows per character. Simple, effective, and completely understandable now that you know how it works.
From Black and White to Color: RGB
A one-bit-per-pixel image is limited to exactly two colors. To represent the images we actually use, we need more bits per pixel — and a way to encode color.
Here's the key insight: human eyes have three types of color receptors, sensitive to red, green, and blue light. Every color you can perceive is some mixture of these three signals. Computer displays exploit this directly — each pixel contains three tiny lights, one red, one green, one blue. By varying their brightness independently, the screen can produce any color the human eye can distinguish.
This system is RGB. Each channel — Red, Green, Blue — gets one byte: a value from 0 (off) to 255 (full brightness). Three bytes per pixel, 24 bits total.
Three values — Red, Green, Blue — each from 0 to 255. Three bytes (24 bits) per pixel. 256 × 256 × 256 = 16,777,216 possible colors — more than the human eye can distinguish.
Hexadecimal: A Better Way to Write Bytes
If you're a developer specifying a color, you need to communicate three numbers between 0 and 255. Writing rgb(193, 68, 14) works, but it's verbose. The industry long ago settled on a more compact notation: hexadecimal, or hex.
Let's build up hex from scratch — same approach as Chapter 1's binary introduction. No shortcuts.
You already know two number bases. Decimal is base-10: ten digits (0–9), place values as powers of 10. Binary is base-2: two digits (0–1), place values as powers of 2. Hexadecimal is base-16: sixteen digits, place values as powers of 16.
The question: what are the sixteen digits? Decimal only needs ten symbols. Hex needs sixteen. We only have ten numeric symbols (0–9), so we borrow the first six letters of the alphabet. The complete set of hex digits:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
A = 10. B = 11. C = 12. D = 13. E = 14. F = 15. After F, you've exhausted all sixteen digits and need a new place — exactly like decimal rolls over after 9, or binary rolls over after 1. In hex, you roll over after F.
So counting in hex goes: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11 ... 19, 1A, 1B ... 1F, 20 ...
Place values in hex are powers of 16. The ones place is 16⁰ = 1. The sixteens place is 16¹ = 16. The two-fifty-sixes place is 16² = 256.
Why Hex and Bytes Are a Perfect Match
Here's why the computing world adopted hex so enthusiastically: one byte — 8 bits — maps to exactly two hex digits. Always.
One hex digit represents values 0–15, which is 4 bits (2⁴ = 16). Two hex digits represent values 0–255, which is exactly 8 bits (2⁸ = 256). So every possible byte value maps to exactly one two-character hex value — from 00 to FF.
That's the whole appeal. Instead of writing 11000001 in binary or 193 in decimal, you write C1 in hex. Compact, unambiguous, one-to-one with bytes.
Let's translate this textbook's orange (R: 193, G: 68, B: 14) into hex:
| Channel | Decimal | Calculation | Hex |
|---|---|---|---|
| Red | 193 | 193 ÷ 16 = 12 remainder 1 → C=12, 1=1 | C1 |
| Green | 68 | 68 ÷ 16 = 4 remainder 4 → 4, 4 | 44 |
| Blue | 14 | 14 ÷ 16 = 0 remainder 14 → 0, E=14 | 0E |
Concatenate: C1 + 44 + 0E = C1440E. Prefix with a hash: #C1440E. That is this textbook's exact orange, in the notation used by every web browser, design tool, and CSS file on the planet.
To read a hex color code, reverse the process. Take #FF5733:
- FF: F=15, so 15×16 + 15 = 255 → Red at maximum
- 57: 5×16 + 7 = 87 → Green at about a third
- 33: 3×16 + 3 = 51 → Blue at about a fifth
Result: a saturated orange-red. When you see a hex color code, you're not looking at magic — you're looking at three bytes.
#C1440E looks like a single six-digit hex number, but it's actually three two-digit hex values concatenated — C1 (Red), 44 (Green), 0E (Blue). The # prefix is a web convention that signals "this is a color code," not a general hex notation. When hex values appear elsewhere in computing — memory addresses, error codes, MAC addresses — they're written with a 0x prefix instead. So 0xC1 is the number 193 in hex; #C1440E is a color made of three such bytes. Same digits, different packaging, different prefix to tell them apart.
Representing Sound
Sound is pressure. When a speaker cone pushes forward, it compresses the air in front of it; when it pulls back, it creates a region of lower pressure. Those compressions and rarefactions travel outward as a wave, and when they reach your eardrum, you hear sound. The wave has a shape — a smooth, continuous curve that varies over time.
The simplest sound wave is a sine wave: a perfectly smooth, repeating oscillation. Real sounds — a voice, a guitar, a recording of rain — are vastly more complex, but they're all built from combinations of sine waves at different frequencies and amplitudes. This is the underlying math of audio, and it's why the sine wave is the right place to start.
Two properties define a sound wave. Amplitude is the height of the wave — how much the pressure varies. We perceive amplitude as loudness. Frequency is how many complete cycles occur per second, measured in Hertz (Hz). We perceive frequency as pitch. Middle A on a piano vibrates at 440 Hz. A bass guitar note might be 80 Hz. A dog whistle is above 20,000 Hz — beyond the range of human hearing.
The problem for a digital computer is that both amplitude and frequency are continuous. The wave doesn't snap between values; it flows smoothly. So how does a discrete machine capture it?
Sampling: Slicing the Continuous into Discrete
The answer is sampling: take a measurement of the wave's amplitude at regular intervals, and record each measurement as a number. It's the audio equivalent of the flipbook — a smooth motion becomes a series of still frames. If the frames are close enough together, the eye (or in this case, the ear) perceives continuous motion.
Each sample is just a number — the amplitude of the wave at that instant, stored as a binary integer. Two parameters control how faithfully the digital version captures the original:
Sample rate is how many measurements we take per second, expressed in Hz or kHz. The higher the sample rate, the more precisely the digital version tracks the original wave's shape. There's a mathematical principle called the Nyquist theorem that says you need to sample at least twice the highest frequency you want to capture. Human hearing tops out around 20,000 Hz, so a sample rate of at least 40,000 Hz is needed to capture the full audible range — which is exactly why CD audio uses 44,100 Hz (44.1 kHz). The extra headroom above 40,000 gives engineers room for filtering.
Bit depth is how many bits are used to store each sample — how precisely each amplitude measurement is recorded. CD audio uses 16-bit samples, giving 2¹⁶ = 65,536 possible amplitude values. More bits means a wider dynamic range — the difference between the quietest and loudest sounds that can be faithfully represented. Professional studio audio often uses 24-bit depth (16 million possible values) to preserve detail during recording and mixing.
Put the two together for CD-quality stereo audio:
44,100 samples/sec × 16 bits/sample × 2 channels = 1,411,200 bits/sec ≈ 10 MB per minute
One minute of uncompressed CD audio is about 10 MB. A 3-minute song is roughly 30 MB uncompressed. The same song as an MP3 is typically 3–5 MB. That difference — from 30 MB to 4 MB — is the work of compression, which we'll cover shortly.
Representing Images and Video
We established earlier that a color image is a grid of pixels, each stored as three bytes (R, G, B). The file size math follows directly: multiply pixels by bytes per pixel.
A smartphone photo at 4,000 × 3,000 pixels:
4,000 × 3,000 pixels × 3 bytes/pixel = 36,000,000 bytes ≈ 36 MB uncompressed
Your phone saves that as a 3–5 MB JPEG. We'll get to exactly how in a moment.
Video is images in sequence. Film traditionally runs at 24 frames per second; broadcast television at 30; modern gaming and high-frame-rate video at 60 or even 120. Each frame is a complete image. Multiply the per-frame size by the frame rate:
36 MB/frame × 30 frames/sec = 1,080 MB/sec ≈ 1 GB every second
A 2-hour movie at that rate would require over 7 terabytes of raw storage. That's clearly impractical — which is why compression isn't optional for video. It's a fundamental requirement.
Compression: Making Big Data Small
Compression is the art of representing the same information using fewer bits. It's not magic — it exploits redundancy. Real-world data, it turns out, is deeply redundant. A photo of a blue sky contains millions of pixels that are nearly the same color. A song has long stretches of similar waveforms. A video has consecutive frames where most of the image hasn't changed at all. Compression algorithms find and eliminate that redundancy.
There are two fundamental types of compression, and the distinction matters enormously in practice.
Lossless compression reduces file size without discarding any information. Decompress the file and you get back every single original bit — the result is mathematically identical to what you started with. The tradeoff: lossless compression ratios are modest, typically 2:1 to 4:1. Examples include PNG for images, FLAC for audio, and ZIP for general files.
Lossy compression achieves much higher compression ratios by permanently discarding information that humans are unlikely to notice. The decompressed file is not identical to the original — it's an approximation. The art is in choosing what to throw away. Examples include JPEG for images, MP3 for audio, and H.264/HEVC for video.
How does lossy compression decide what to throw away? It exploits the limits of human perception. JPEG, for example, converts image data into frequency components (a mathematical technique called the Discrete Cosine Transform) and then discards high-frequency detail that the human visual system is less sensitive to. A smooth blue sky compresses beautifully because the high-frequency information is nearly zero — there's almost nothing to discard. A photo of tree bark compresses poorly because every pixel is genuinely different from its neighbors.
MP3 audio compression works similarly, using a model of human hearing called a psychoacoustic model. It identifies sounds that would be masked by louder nearby sounds — the way a loud bass note makes it harder to hear a quiet high note at the same moment — and discards the masked data. The listener's brain fills in the gaps. Done well, most people cannot distinguish an MP3 from uncompressed audio in normal listening conditions. Done poorly (at very low bitrates), the artifacts become audible — the "underwater" sound of a badly compressed MP3.
Video compression goes even further by exploiting temporal redundancy — the fact that consecutive frames in a video are usually very similar. Rather than storing every frame as a complete image, formats like H.264 store a keyframe (a full image) followed by a series of delta frames that only encode what changed. A person talking against a static background has thousands of nearly-identical pixels per frame — only the mouth moves. There's no need to resend those static pixels 30 times per second. This is why a 2-hour movie fits on a 50 GB Blu-ray disc rather than requiring 5,000 GB of raw storage.
Data Types: The Full Picture
We've now seen four things a string of bits can represent: an integer, a text character, a color value, and an audio sample. In every case, the bits are just bits. What matters is the agreed-upon rule for interpreting them.
That rule is a data type. When software declares an int, it tells the computer: treat these 32 bits as an integer. A char means: treat this byte as an ASCII character. The computer doesn't inherently know the difference. It stores bits. The meaning — and the responsibility for getting it right — belongs to the software.
Hardware stores bits. Software assigns meaning. The boundary between them is where most of the interesting problems in computing live.
Here's a widget that makes that concrete. Eight bits, three different interpretations — the bits don't change, only what we decide they mean. Use the R / G / B buttons to choose which color channel the byte controls.
The bit pattern 01000001 is 65 as an integer, A as an ASCII character, and a particular intensity as a color channel value. Same bits. Different context, different meaning.