Chapter 03 · Unit 1: Foundations

Data Representation

00011

You now know that computers store everything as 1s and 0s. But everything is a big word. Numbers, sure. But what about the letter A? A photograph? A song? An emoji? This chapter is where that question gets its full answer.

At the end of Chapter 1, we noted that computers represent all information — not just numbers, but text, images, sound, everything — as binary values. And then we said "more on that later." Later is now.

The short version, which we'll spend this entire chapter unpacking: everything is a number, and every number is binary. Pictures are numbers. Letters are numbers. Colors are numbers. The song playing through your earbuds is a very long sequence of numbers. Once you accept that, the rest of this chapter is just working out the details.

Grouping Bits: The Byte

A single bit — a 0 or a 1 — isn't very useful on its own. It can only represent two states. That's enough for a light switch, but not much else.

From the earliest days of computing, engineers settled on grouping bits together into fixed-size chunks. The standard chunk that stuck is eight bits, and we call it a byte.

Byte

A group of 8 bits. The standard unit of digital information. With 8 bits, a byte can represent 2⁸ = 256 unique values (0 through 255).

Why 8? Partly history, partly convenience. Eight is a power of 2, which plays nicely with binary arithmetic. And 256 values — 0 through 255 — turned out to be just enough to cover the English alphabet in both cases, digits, punctuation, and basic control characters. More on that shortly.

From bytes, we build larger units. You've seen all of these before — now you know what they actually mean:

UnitSymbolApproximate SizeExample
KilobyteKB1,024 bytesA short text message
MegabyteMB1,024 KB ≈ 1 million bytesA high-res photo
GigabyteGB1,024 MB ≈ 1 billion bytesA feature film (compressed)
TerabyteTB1,024 GB ≈ 1 trillion bytesA large hard drive
PetabytePB1,024 TBA large data center storage pool
Bits vs. bytes — don't get burned. Storage is measured in bytes (B, capital). Network speeds are measured in bits per second (b, lowercase). When your internet provider advertises "100 Mbps," that's megabits — not megabytes. A 100 Mbps connection transfers about 12.5 MB per second. This distinction trips up IT professionals regularly — now you won't be one of them.

Representing Integers

We've been doing this since Chapter 1, but now we can be precise. With 8 bits you can represent integers from 0 to 255. With 16 bits, 0 to 65,535. With 32 bits, 0 to 4,294,967,295 — roughly 4.3 billion. The formula: n bits → 2ⁿ possible values → 0 through 2ⁿ − 1.

This matters practically. IPv4 addresses — the numbers that identify every device on the internet — are 32-bit values. That means there are exactly 2³² ≈ 4.3 billion possible addresses. When the internet was designed in the 1970s, that sounded like plenty. It wasn't. IPv4 address exhaustion is a real, ongoing problem, which is exactly why IPv6 switched to 128-bit addresses (2¹²⁸ has 39 digits — we're not running out of those soon).

Integer overflow is a related consequence. If software uses a 32-bit counter and that counter exceeds ~4.3 billion, the value wraps back to zero. This has caused real outages and some famous software failures. The arithmetic is honest — the data type just ran out of room.

Representing Text: ASCII

Numbers have an obvious binary representation. But what about the letter A? There's no natural encoding for a letter — someone had to make one up.

In 1963, a committee of American engineers did exactly that. They created the American Standard Code for Information Interchange, or ASCII — a table mapping every English character to a number between 0 and 127. The table wasn't derived from anything. It was a decision. Uppercase A is 65. Uppercase B is 66. Lowercase a is 97. Space is 32. And so on.

So when you type the word Hi, your computer stores:

H i
72 105
01001000 01101001

Two characters, two bytes, 16 bits. A text file is nothing more than a sequence of these numbers, one per character.

ASCII — Selected Values Char Decimal Binary Char Decimal Binary A 65 01000001 a 97 01100001 B 66 01000010 b 98 01100010 Z 90 01011010 z 122 01111010 0 48 00110000 9 57 00111001 SP 32 00100000 ! 33 00100001 Each character maps to a unique number — arbitrarily assigned by the ASCII committee in 1963.

The Limits of ASCII — and the Rise of Unicode

ASCII covered English just fine. The rest of the world, not so much. With only 128 values, there's no room for accented characters, non-Latin alphabets, or anything outside the American keyboard. Japanese, Arabic, Hindi, Chinese — none of it fits in 7 bits.

The solution is Unicode, a far more ambitious project that assigns a unique number — called a code point — to every character in every writing system on earth. Over 149,000 characters and counting.

And yes: emoji are Unicode characters. Every emoji you've ever sent has a Unicode code point, which means it has a number, which means it has a binary representation. Your 😂 is, at the machine level, just another pattern of bits.

Emoji as Unicode Code Points Every emoji has a unique number. Every number becomes binary. Same system as "A" = 65. EMOJI NAME CODE POINT DECIMAL BINARY 😂 Face with Tears of Joy U+1F602 128514 1 1111 0110 0000 0010 ❤️ Red Heart U+2764 10084 10 0111 0110 0100 🔥 Fire U+1F525 128293 1 1111 0101 0010 0101 👍 Thumbs Up U+1F44D 128077 1 1111 0100 0100 1101 U+ notation means "Unicode code point." Binary shown as raw code point value, grouped in 4-bit nibbles.

Unicode code points are written in a notation like U+1F602. That 1F602 is a hexadecimal number (more on hex shortly) — in decimal it's 128,514. In binary it's a 17-bit number: 1 1111 0110 0000 0010. Every time you send 😂 in a text message, your phone transmits exactly that bit pattern across the network. Your friend's phone receives it, looks it up in the Unicode table, and renders the face. No magic — just a number, agreed upon by every device on earth.

UTF-8 is the most common way Unicode is actually stored on disk and transmitted over the network. It's cleverly designed: ASCII characters use exactly one byte each (identical to plain ASCII), while less common characters use two, three, or four bytes as needed. An English text file in UTF-8 is the same size as in ASCII, while still supporting every character and emoji on the planet. It's the dominant encoding on the modern web.

When encoding goes wrong. You've seen garbled text like ’ where an apostrophe should be. That's an encoding mismatch — the file was written in UTF-8 but read as Latin-1 (or vice versa). The bits are identical; the interpretation is wrong. Same bit pattern, different data type. Exactly the problem we flagged in Chapter 1.

Representing Images: Pixels and Bitmaps

So far the pattern is holding up. Numbers are stored as binary directly. Text is stored as numbers — an arbitrary but agreed-upon assignment of a code point to every character. Both reduce to bits in the end. But what about something that doesn't feel like a number at all — like a photograph? Images seem fundamentally different. They have shape, color, spatial relationships between things. How do you turn a picture into a string of 1s and 0s?

The answer, as you might suspect by now, is more numbers. A lot of them.

Let's start with the simplest possible image: pure black and white — no color, no shades of gray, just on or off for each dot.

This is a monochrome bitmap. Each dot — called a pixel (short for "picture element") — is represented by exactly one bit: 1 for black, 0 for white. A row of 8 pixels is one byte. A grid of 8×8 pixels is 8 bytes — 64 bits of image data.

Pixel

The smallest unit of a digital image — short for "picture element." In a monochrome image, one pixel = one bit. In a color image, one pixel = three bytes. Every digital image is a grid of pixels.

The widget below is a live 8×8 monochrome bitmap. Each cell is one pixel, one bit. Click to toggle pixels on and off — watch the binary row values update as you paint.

This is literally how early computer fonts worked. The letters on the first personal computers were stored as 8×8 bitmaps — one byte per row, eight rows per character. Simple, effective, and completely understandable now that you know how it works.

From Black and White to Color: RGB

A one-bit-per-pixel image is limited to exactly two colors. To represent the images we actually use, we need more bits per pixel — and a way to encode color.

Here's the key insight: human eyes have three types of color receptors, sensitive to red, green, and blue light. Every color you can perceive is some mixture of these three signals. Computer displays exploit this directly — each pixel contains three tiny lights, one red, one green, one blue. By varying their brightness independently, the screen can produce any color the human eye can distinguish.

This system is RGB. Each channel — Red, Green, Blue — gets one byte: a value from 0 (off) to 255 (full brightness). Three bytes per pixel, 24 bits total.

RGB Color Model

Three values — Red, Green, Blue — each from 0 to 255. Three bytes (24 bits) per pixel. 256 × 256 × 256 = 16,777,216 possible colors — more than the human eye can distinguish.

Hexadecimal: A Better Way to Write Bytes

If you're a developer specifying a color, you need to communicate three numbers between 0 and 255. Writing rgb(193, 68, 14) works, but it's verbose. The industry long ago settled on a more compact notation: hexadecimal, or hex.

Let's build up hex from scratch — same approach as Chapter 1's binary introduction. No shortcuts.

You already know two number bases. Decimal is base-10: ten digits (0–9), place values as powers of 10. Binary is base-2: two digits (0–1), place values as powers of 2. Hexadecimal is base-16: sixteen digits, place values as powers of 16.

The question: what are the sixteen digits? Decimal only needs ten symbols. Hex needs sixteen. We only have ten numeric symbols (0–9), so we borrow the first six letters of the alphabet. The complete set of hex digits:

01234567 89A B C D E F
01234567 89101112131415

A = 10. B = 11. C = 12. D = 13. E = 14. F = 15. After F, you've exhausted all sixteen digits and need a new place — exactly like decimal rolls over after 9, or binary rolls over after 1. In hex, you roll over after F.

So counting in hex goes: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, 10, 11 ... 19, 1A, 1B ... 1F, 20 ...

Place values in hex are powers of 16. The ones place is 16⁰ = 1. The sixteens place is 16¹ = 16. The two-fifty-sixes place is 16² = 256.

Why Hex and Bytes Are a Perfect Match

Here's why the computing world adopted hex so enthusiastically: one byte — 8 bits — maps to exactly two hex digits. Always.

One hex digit represents values 015, which is 4 bits (2⁴ = 16). Two hex digits represent values 0255, which is exactly 8 bits (2⁸ = 256). So every possible byte value maps to exactly one two-character hex value — from 00 to FF.

That's the whole appeal. Instead of writing 11000001 in binary or 193 in decimal, you write C1 in hex. Compact, unambiguous, one-to-one with bytes.

Let's translate this textbook's orange (R: 193, G: 68, B: 14) into hex:

ChannelDecimalCalculationHex
Red 193 193 ÷ 16 = 12 remainder 1 → C=12, 1=1 C1
Green 68 68 ÷ 16 = 4 remainder 4 → 4, 4 44
Blue 14 14 ÷ 16 = 0 remainder 14 → 0, E=14 0E

Concatenate: C1 + 44 + 0E = C1440E. Prefix with a hash: #C1440E. That is this textbook's exact orange, in the notation used by every web browser, design tool, and CSS file on the planet.

To read a hex color code, reverse the process. Take #FF5733:

  • FF: F=15, so 15×16 + 15 = 255 → Red at maximum
  • 57: 5×16 + 7 = 87 → Green at about a third
  • 33: 3×16 + 3 = 51 → Blue at about a fifth

Result: a saturated orange-red. When you see a hex color code, you're not looking at magic — you're looking at three bytes.

#C1440E is not one hex number — it's three. A common point of confusion: a CSS color code like #C1440E looks like a single six-digit hex number, but it's actually three two-digit hex values concatenated — C1 (Red), 44 (Green), 0E (Blue). The # prefix is a web convention that signals "this is a color code," not a general hex notation. When hex values appear elsewhere in computing — memory addresses, error codes, MAC addresses — they're written with a 0x prefix instead. So 0xC1 is the number 193 in hex; #C1440E is a color made of three such bytes. Same digits, different packaging, different prefix to tell them apart.
Hex beyond colors. Hexadecimal appears constantly in IT work — memory addresses, MAC addresses, error codes, cryptographic hashes, network packet dumps. Any time you see a string of 0–9 and A–F values, that's hex. The ability to read it fluently is a genuinely useful professional skill.

Representing Sound

Sound is pressure. When a speaker cone pushes forward, it compresses the air in front of it; when it pulls back, it creates a region of lower pressure. Those compressions and rarefactions travel outward as a wave, and when they reach your eardrum, you hear sound. The wave has a shape — a smooth, continuous curve that varies over time.

The simplest sound wave is a sine wave: a perfectly smooth, repeating oscillation. Real sounds — a voice, a guitar, a recording of rain — are vastly more complex, but they're all built from combinations of sine waves at different frequencies and amplitudes. This is the underlying math of audio, and it's why the sine wave is the right place to start.

A Sound Wave (Sine Wave) time → pressure Amplitude (loudness) 1 cycle (period) Frequency = cycles per second (Hz) — pitch 0

Two properties define a sound wave. Amplitude is the height of the wave — how much the pressure varies. We perceive amplitude as loudness. Frequency is how many complete cycles occur per second, measured in Hertz (Hz). We perceive frequency as pitch. Middle A on a piano vibrates at 440 Hz. A bass guitar note might be 80 Hz. A dog whistle is above 20,000 Hz — beyond the range of human hearing.

The problem for a digital computer is that both amplitude and frequency are continuous. The wave doesn't snap between values; it flows smoothly. So how does a discrete machine capture it?

Sampling: Slicing the Continuous into Discrete

The answer is sampling: take a measurement of the wave's amplitude at regular intervals, and record each measurement as a number. It's the audio equivalent of the flipbook — a smooth motion becomes a series of still frames. If the frames are close enough together, the eye (or in this case, the ear) perceives continuous motion.

Analog Wave → Digital Samples ANALOG (CONTINUOUS) Smooth, infinite precision DIGITAL (SAMPLED) Each dot = one sample = one number stored

Each sample is just a number — the amplitude of the wave at that instant, stored as a binary integer. Two parameters control how faithfully the digital version captures the original:

Sample rate is how many measurements we take per second, expressed in Hz or kHz. The higher the sample rate, the more precisely the digital version tracks the original wave's shape. There's a mathematical principle called the Nyquist theorem that says you need to sample at least twice the highest frequency you want to capture. Human hearing tops out around 20,000 Hz, so a sample rate of at least 40,000 Hz is needed to capture the full audible range — which is exactly why CD audio uses 44,100 Hz (44.1 kHz). The extra headroom above 40,000 gives engineers room for filtering.

Bit depth is how many bits are used to store each sample — how precisely each amplitude measurement is recorded. CD audio uses 16-bit samples, giving 2¹⁶ = 65,536 possible amplitude values. More bits means a wider dynamic range — the difference between the quietest and loudest sounds that can be faithfully represented. Professional studio audio often uses 24-bit depth (16 million possible values) to preserve detail during recording and mixing.

Sample Rate vs. Bit Depth SAMPLE RATE (Hz / kHz) BIT DEPTH (bits per sample) Controls time resolution — how closely spaced samples are over time. 8 kHz Telephone quality 44.1 kHz CD audio (standard) 96 kHz High-resolution audio Higher rate = more faithful wave shape = larger file Controls amplitude resolution — how precisely each sample's loudness is stored. 8-bit 256 values (voice, old games) 16-bit 65,536 values (CD standard) 24-bit 16.7M values (studio recording) More bits = wider dynamic range = larger file Both parameters affect quality and file size independently.

Put the two together for CD-quality stereo audio:

44,100 samples/sec × 16 bits/sample × 2 channels = 1,411,200 bits/sec ≈ 10 MB per minute

One minute of uncompressed CD audio is about 10 MB. A 3-minute song is roughly 30 MB uncompressed. The same song as an MP3 is typically 3–5 MB. That difference — from 30 MB to 4 MB — is the work of compression, which we'll cover shortly.

Representing Images and Video

We established earlier that a color image is a grid of pixels, each stored as three bytes (R, G, B). The file size math follows directly: multiply pixels by bytes per pixel.

A smartphone photo at 4,000 × 3,000 pixels:

4,000 × 3,000 pixels × 3 bytes/pixel = 36,000,000 bytes ≈ 36 MB uncompressed

Your phone saves that as a 3–5 MB JPEG. We'll get to exactly how in a moment.

Video is images in sequence. Film traditionally runs at 24 frames per second; broadcast television at 30; modern gaming and high-frame-rate video at 60 or even 120. Each frame is a complete image. Multiply the per-frame size by the frame rate:

36 MB/frame × 30 frames/sec = 1,080 MB/sec ≈ 1 GB every second

A 2-hour movie at that rate would require over 7 terabytes of raw storage. That's clearly impractical — which is why compression isn't optional for video. It's a fundamental requirement.

Resolution, Color Depth, and File Size Format Resolution Pixels × 3 bytes Uncompressed HD Video Frame 1,920 × 1,080 2,073,600 6,220,800 ≈ 6 MB 4K Video Frame 3,840 × 2,160 8,294,400 24,883,200 ≈ 24 MB 12 MP Smartphone Photo 4,000 × 3,000 12,000,000 36,000,000 ≈ 36 MB 4K @ 30fps (1 second) 3,840 × 2,160 × 30 248,832,000 746,496,000 ≈ 720 MB/sec Without compression, modern video is completely impractical to store or transmit.

Compression: Making Big Data Small

Compression is the art of representing the same information using fewer bits. It's not magic — it exploits redundancy. Real-world data, it turns out, is deeply redundant. A photo of a blue sky contains millions of pixels that are nearly the same color. A song has long stretches of similar waveforms. A video has consecutive frames where most of the image hasn't changed at all. Compression algorithms find and eliminate that redundancy.

There are two fundamental types of compression, and the distinction matters enormously in practice.

Lossless compression reduces file size without discarding any information. Decompress the file and you get back every single original bit — the result is mathematically identical to what you started with. The tradeoff: lossless compression ratios are modest, typically 2:1 to 4:1. Examples include PNG for images, FLAC for audio, and ZIP for general files.

Lossy compression achieves much higher compression ratios by permanently discarding information that humans are unlikely to notice. The decompressed file is not identical to the original — it's an approximation. The art is in choosing what to throw away. Examples include JPEG for images, MP3 for audio, and H.264/HEVC for video.

Lossless vs. Lossy Compression LOSSLESS ORIGINAL 36 MB compress COMPRESSED 18 MB decompress RESTORED 36 MB ✓ Bit-for-bit identical to the original file PNG FLAC · ZIP LOSSY ORIGINAL 36 MB encode (data lost) ENCODED 3 MB decode / render DISPLAYED ~36 MB ✗ Not identical — perceptually similar JPEG MP3 · H.264 Lossy formats permanently discard data during encoding. There is no "decompress" — only decode/render.

How does lossy compression decide what to throw away? It exploits the limits of human perception. JPEG, for example, converts image data into frequency components (a mathematical technique called the Discrete Cosine Transform) and then discards high-frequency detail that the human visual system is less sensitive to. A smooth blue sky compresses beautifully because the high-frequency information is nearly zero — there's almost nothing to discard. A photo of tree bark compresses poorly because every pixel is genuinely different from its neighbors.

MP3 audio compression works similarly, using a model of human hearing called a psychoacoustic model. It identifies sounds that would be masked by louder nearby sounds — the way a loud bass note makes it harder to hear a quiet high note at the same moment — and discards the masked data. The listener's brain fills in the gaps. Done well, most people cannot distinguish an MP3 from uncompressed audio in normal listening conditions. Done poorly (at very low bitrates), the artifacts become audible — the "underwater" sound of a badly compressed MP3.

Video compression goes even further by exploiting temporal redundancy — the fact that consecutive frames in a video are usually very similar. Rather than storing every frame as a complete image, formats like H.264 store a keyframe (a full image) followed by a series of delta frames that only encode what changed. A person talking against a static background has thousands of nearly-identical pixels per frame — only the mouth moves. There's no need to resend those static pixels 30 times per second. This is why a 2-hour movie fits on a 50 GB Blu-ray disc rather than requiring 5,000 GB of raw storage.

Video Compression: Keyframes and Delta Frames KEYFRAME Full image stored ~300 KB DELTA FRAME Only changes stored ~8 KB DELTA FRAME Only changes stored ~12 KB KEYFRAME New reference frame ~300 KB Keyframes every few seconds; deltas between.
Why compression settings matter in IT. When you're configuring a video surveillance system, choosing a streaming bitrate for a corporate webcast, setting backup compression on a file server, or specifying storage for a media archive, you're making compression decisions. The lossless/lossy distinction, bitrate, and codec choice all have direct implications for storage cost, bandwidth consumption, and quality. These aren't decisions that only developers make — they're everyday IT infrastructure decisions.
Video Video Compression as Fast As Possible — Techquickie
A quick visual overview of how video compression works in practice — a good complement to everything covered above.

Data Types: The Full Picture

We've now seen four things a string of bits can represent: an integer, a text character, a color value, and an audio sample. In every case, the bits are just bits. What matters is the agreed-upon rule for interpreting them.

That rule is a data type. When software declares an int, it tells the computer: treat these 32 bits as an integer. A char means: treat this byte as an ASCII character. The computer doesn't inherently know the difference. It stores bits. The meaning — and the responsibility for getting it right — belongs to the software.

Hardware stores bits. Software assigns meaning. The boundary between them is where most of the interesting problems in computing live.

Here's a widget that makes that concrete. Eight bits, three different interpretations — the bits don't change, only what we decide they mean. Use the R / G / B buttons to choose which color channel the byte controls.

The bit pattern 01000001 is 65 as an integer, A as an ASCII character, and a particular intensity as a color channel value. Same bits. Different context, different meaning.

Quiz Chapter 3 Quiz
1. How many unique values can a single byte represent?
2. Your internet provider advertises a 500 Mbps connection. A friend says you can download 500 MB per second. Are they right?
3. What is the decimal ASCII value of the uppercase letter A?
4. Why did ASCII need to be replaced by Unicode?
5. What is the decimal value of the hex digit F?
6. The hex color code #FF0000 represents which color?
7. An uncompressed 4000×3000 RGB photo is ~36 MB. Saved as a JPEG it's 4 MB. What accounts for the difference?
8. The bit pattern 01000001 could mean the integer 65, the letter "A", or a color channel intensity. What determines the correct interpretation?