A file is a name attached to a sequence of bytes on a storage device. That's it. Everything else a filesystem appears to do — folders, permissions, ownership, the whole tree — is software bookkeeping on top of that one simple idea.
What Is a Filesystem?
Chapter 5 described storage hardware — spinning disks and NAND flash — as devices that read and write blocks of data at numeric addresses. A hard drive does not know what a "file" is. It knows sector 0, sector 1, sector 2. The filesystem is the software layer, managed by the OS kernel, that imposes structure on top of those raw blocks.
It does this by providing three things raw block storage cannot. The first is names: "report.txt" is a human-readable label that the filesystem maps to a specific set of blocks on disk — change the filename and the same blocks are now called something else. The second is hierarchy: files are organized into directories (folders), which can be nested to arbitrary depth, giving you the tree structure most people already have as a mental model for file storage. The third is metadata: for each file, the filesystem tracks size, creation and modification timestamps, ownership, and permissions — all stored separately from the file's actual content.
The OS kernel accesses all filesystems through a common interface called the Virtual File System (VFS). The VFS means that an application calling "open this file" does not need to know whether the file lives on an ext4 partition, an NTFS drive, a network share, or even the virtual /proc filesystem — the kernel handles the translation.
The Directory Tree
Both Linux and Windows organize files into a hierarchical tree, but they make a fundamentally different choice about where that tree starts.
Linux has a single root, written as /. Every file on the system — whether it lives on the boot drive, a secondary hard drive, a USB stick, or a network share — appears somewhere under this one unified tree. Attaching a storage device to the tree is called mounting: you mount a drive at a directory (its mount point), and from that moment its contents appear as a subdirectory. Unmount it and those files disappear from the tree.
Windows assigns a separate letter to each volume: C:\ for the system drive, D:\ for a secondary drive, and so on. Each drive letter is the root of its own independent tree. There is no single root that contains everything.
For IT professionals, the Linux model's flexibility matters in practice: a database partition can be mounted at /var/lib/mysql, a shared NFS volume at /mnt/fileserver, and a user's home directory at /home/alice — all appearing as one tree, regardless of which physical device or network location the data actually lives on.
Filesystem Formats
A filesystem format defines the specific on-disk data structures used to store file metadata and content. Different formats make different tradeoffs around features, compatibility, and performance.
| Format | Primarily Used On | Max File Size | Permissions | Journaling | Cross-Platform |
|---|---|---|---|---|---|
| ext4 | Linux (default) | 16 TB | Linux rwx | Yes | Read-only on macOS; read/write on Windows via drivers |
| NTFS | Windows (default) | 16 EB | Windows ACLs | Yes | Read-only on macOS; read/write on Linux via drivers |
| FAT32 | USB drives, legacy devices | 4 GB | None | No | Universal — readable by virtually every OS and device |
| exFAT | Flash drives, SD cards | 16 EB | None | No | Universal on modern systems (Windows, macOS, Linux) |
Journaling is a fault-tolerance feature: the filesystem keeps a running log (journal) of changes it is about to make. If the system crashes mid-write, the journal lets the OS replay or discard the incomplete operation on next boot, preventing filesystem corruption. ext4 and NTFS both journal; FAT32 does not, which is why a sudden USB disconnect used to corrupt files.
The practical takeaway: format USB drives as exFAT if they need to move between Windows, Mac, and Linux systems. Use ext4 for Linux servers. Use NTFS for Windows volumes that need permissions.
The File Allocation Table
A filesystem's most important job is maintaining a map from filenames to the physical blocks on disk where each file's content lives. Files are not stored in neat contiguous regions — they grow, shrink, and get scattered across many non-adjacent blocks over time. The filesystem needs an index to keep track of it all.
The most literal version of this index is the File Allocation Table — the on-disk structure that gave the FAT12, FAT16, and FAT32 formats their name. The FAT is an array with one entry per block on the disk. Each entry describes the state of that block:
| FAT Entry Value | Meaning |
|---|---|
0x000 | Free — this block is available for new data |
| A block number N | This file continues at block N (follow the chain) |
| End-of-chain marker | This is the last block of the file |
| Bad-block marker | This block is damaged and should not be used |
To read a file, the OS finds the file's starting block number (stored in the directory entry), then follows the FAT chain — entry by entry — until it hits the end-of-chain marker. The file's content is the concatenation of all the blocks in that chain, in order.
Modern filesystems replace the flat FAT array with more efficient structures, but the core idea — a central index mapping files to blocks — is universal. On Linux, ext4 uses inodes: each file has one inode that stores its metadata (owner, permissions, timestamps) and a list of the block addresses where its data lives, and the directory is just a table of (filename → inode number) mappings. On Windows, NTFS uses the Master File Table (MFT) — a structured database where each record describes one file or directory: its name, timestamps, permissions, and the contiguous runs of blocks containing its data.
In every case, this index is what a "filesystem" fundamentally is. Without it, a storage device is just a flat sequence of numbered blocks with no names, no hierarchy, and no way to find anything.
What Really Happens When You Delete a File
Filesystem operations sound like decisive actions. The underlying mechanics are much simpler — and the gap between what users perceive and what the hardware actually does has real consequences.
Creating a file: The OS allocates a free inode (or MFT record), finds free blocks for the content, writes the data to those blocks, records the block addresses in the inode, and adds a (filename → inode) mapping to the directory. The file exists.
Reading a file: The OS looks up the filename in the directory to get the inode number, reads the inode to find the block addresses, and reads those blocks in sequence.
Updating a file: For changes that fit within existing blocks, the OS reads the block, modifies it in memory, and writes it back. If the file grows, new blocks are allocated and the inode is updated to include them.
Deleting a file: The OS removes the filename from the directory and marks the inode and its blocks as free in the allocation table. That is the entire operation. The bits do not move. The data remains on the storage hardware exactly as it was.
This is why file recovery software works. Tools like Recuva or PhotoRec scan the disk looking for inode or MFT entries that have been marked free but whose blocks have not yet been overwritten. As long as the operating system has not reused those blocks for new data, the original file content is completely intact and fully recoverable — sometimes hours, days, or even weeks after deletion.
Formatting Does Not Delete Your Data
A quick format in Windows rebuilds the filesystem structures from scratch — it creates a fresh, empty MFT and marks all blocks as free — but writes nothing to the data blocks themselves. A full format additionally checks for bad sectors, scanning each block, but still leaves the original data intact underneath. Files deleted before a format are just as recoverable after it.
The reason is the same engineering logic as deletion: writing zeros to every block would be slow and is unnecessary for normal use — the next write will overwrite them anyway. Performance wins over security as a default.
Implications for Hardware Disposal
Deleting files and reformatting before selling or recycling a computer does not protect the data on it. Anyone with a recovery tool and a few minutes can retrieve the previous contents. The correct approach depends on the storage technology:
- HDDs. Overwrite the entire drive with zeros or random data. One pass is sufficient to defeat software recovery. Tools like DBAN (Darik's Boot and Nuke) automate a full-disk overwrite from a bootable USB drive.
- SSDs. Software overwriting is unreliable on flash storage because of wear leveling — the SSD controller remaps writes to fresh cells to spread wear evenly, which can leave old data intact in cells that were bypassed. Most SSD manufacturers expose a firmware command called ATA Secure Erase that resets the drive at the hardware level and is far more reliable than any software approach.
- Most secure — physical destruction. Degaussing (for HDDs) or physical shredding renders data unrecoverable regardless of technique. Required for classified or regulated data disposal.
The Linux Filesystem Hierarchy
The Filesystem Hierarchy Standard (FHS) defines what belongs in each top-level directory on a Linux system. This is not arbitrary — every major Linux distribution follows it, which means you can sit down at an unfamiliar Linux machine and immediately know where to find configuration files, log files, and installed programs.
IT administrators need to know this map. When something breaks, knowing that logs are in /var/log and configuration is in /etc is the difference between troubleshooting efficiently and hunting blind.
NTFS Permissions
Chapter 8 covered the Linux rwx permission model — nine bits, three groups, clean and simple. Windows NTFS uses a more granular system called Access Control Lists (ACLs).
Instead of three fixed groups (owner, group, other), an NTFS ACL can contain any number of Access Control Entries (ACEs) — one for each specific user or group you want to configure. Each entry specifies a precise combination of permissions:
| NTFS Permission | What it allows |
|---|---|
| Full Control | Read, write, delete, and change permissions and ownership |
| Modify | Read, write, and delete — but not change permissions |
| Read & Execute | Open and run files; list directory contents |
| Read | View file contents and attributes only |
| Write | Create and modify files, but not read existing contents |
NTFS permissions also support inheritance: permissions set on a parent folder automatically flow down to files and subfolders within it. This makes large permission structures manageable — set permissions on a department folder once and every file inside inherits them automatically.
In practice, NTFS permissions in enterprise environments are rarely managed file-by-file. Administrators assign permissions to security groups (e.g., "HR-Staff"), then add users to those groups — the permissions follow. This is exactly the kind of thing Active Directory manages at scale.
Directory Services — The Problem They Solve
Imagine an IT department managing a company with 50 servers. Without a directory service, every server maintains its own local user database. When a new employee joins, an IT administrator creates that person's account on each of the 50 servers individually. When the employee changes their password, they need to update it on each server — or the admin does it for them, 50 times. When someone leaves the company, deprovisioning means visiting every server to remove the account.
This is not hypothetical — it is how Unix systems were managed in the 1980s and 1990s, and why every large organization quickly runs into the same problem: distributed identity management does not scale.
A directory service solves this with a central, network-accessible database of identities. Every server in the organization points to the same directory for authentication. One account, one password, works everywhere. Disable the account in one place and access is revoked on every system immediately.
A directory service stores more than just usernames and passwords. A complete enterprise directory contains:
- User accounts — credentials, contact information, role, department
- Security groups — collections of users that share access rights
- Computer accounts — every managed workstation and server as an object in the directory
- Policies — configuration rules that apply automatically to users or machines
- Resource records — printers, shared drives, application registrations
Active Directory
Microsoft's Active Directory (AD) is the dominant enterprise directory service. Introduced with Windows 2000, it is deployed in the vast majority of mid-to-large organizations running Windows infrastructure. If you work in IT, you will encounter Active Directory.
The key structural concepts:
Domain. A named administrative boundary — typically the organization's domain name, like company.com. Every user account, computer, and policy lives within a domain. A domain is the basic unit of management in AD.
Domain Controller (DC). The server running Active Directory Domain Services (AD DS) — the software that stores and manages the directory database. Organizations run at least two DCs for redundancy. Every time someone logs into a domain-joined machine, their credentials are verified against the DC.
Security Groups. A directory full of individual user accounts is not, by itself, very useful — the reason you want one in the first place is to manage permissions at scale, and you do that with groups. A security group is a collection of user accounts. Rather than assigning permissions on a shared drive to fifty individual users, an administrator assigns permissions to a group (e.g., "HR-Staff") and manages membership. Add someone to the group and they immediately have access; remove them and it disappears. Groups are the answer to "how do I give the right people the right access without losing my mind."
Organizational Units (OUs). Once you have users and groups, you need a way to organize them. OUs are containers within a domain used to do exactly that. You might have OUs for each department (IT, HR, Sales), or for each office location, or both. OUs are also the unit to which Group Policy is applied — which is why they exist as a separate concept from groups in the first place.
Group Policy (GPO). Configuration rules that apply automatically to users and computers in an OU or across the whole domain. GPOs can enforce password complexity requirements, deploy software, configure desktop wallpaper, disable USB ports, map network drives — anything that can be configured on a Windows machine can be managed by a GPO. This is how an IT department configures thousands of machines without touching each one individually.
LDAP
LDAP (Lightweight Directory Access Protocol) is the protocol used to query and manage directory service databases. Active Directory is built on LDAP — when a workstation authenticates a user against the domain, or when an application looks up a user's group memberships, it is making LDAP queries under the hood.
IT professionals encounter LDAP most often when configuring applications to authenticate against Active Directory. Almost any enterprise application — email, VPN, ticketing systems, web apps — can be configured to use "LDAP authentication," meaning the application delegates credential verification to the directory service. Users log in with their AD username and password, and no separate account needs to be created inside each application.
LDAP uses a hierarchical addressing system called a Distinguished Name (DN) to uniquely identify every object in the directory:
cn=alice,ou=IT,dc=company,dc=com
Reading right to left: dc=company,dc=com is the domain (company.com), ou=IT is the IT organizational unit, and cn=alice is the common name of the object (Alice's user account). The DN is the full address of that object in the directory tree.
LDAP communicates over port 389 by default. Plain LDAP sends credentials over the wire in cleartext — LDAPS (LDAP over TLS) wraps it in TLS and runs on port 636, for the same reason HTTPS matters over HTTP. Active Directory also supports Kerberos for authentication in addition to LDAP, which is more secure and is the default for domain-joined Windows machines.
DNS and Active Directory
Active Directory is tightly coupled to DNS (Domain Name System). When a computer needs to find a domain controller — to log in a user, authenticate a machine, or apply Group Policy — it does not use a hardcoded IP address. It queries DNS for special records called SRV records that advertise the location of domain controller services.
In practice, Active Directory typically runs its own internal DNS server, and domain-joined machines point to the DC's IP as their DNS server. This means DNS is not just for browsing websites in an enterprise environment — it is the backbone of identity infrastructure.
DNS itself — how it works, how names resolve to IP addresses, and how it scales to the global internet — is the subject of Chapter 10. The key point here is that AD and DNS are inseparable in production environments: if DNS is broken, AD authentication breaks with it.
Chapter 10 asks a different question entirely: now that we know how files and accounts are managed on one machine, what happens when those machines need to talk to each other?