Chapter 9: File Systems & Directory Services — Introduction to Digital Computation

A file is a name attached to a sequence of bytes on a storage device. That's it. Everything else a filesystem appears to do — folders, permissions, ownership, the whole tree — is software bookkeeping on top of that one simple idea.

What Is a Filesystem?

Chapter 5 described storage hardware — spinning disks and NAND flash — as devices that read and write blocks of data at numeric addresses. A hard drive does not know what a "file" is. It knows sector 0, sector 1, sector 2. The filesystem is the software layer, managed by the OS kernel, that imposes structure on top of those raw blocks.

It does this by providing three things raw block storage cannot. The first is names: "report.txt" is a human-readable label that the filesystem maps to a specific set of blocks on disk — change the filename and the same blocks are now called something else. The second is hierarchy: files are organized into directories (folders), which can be nested to arbitrary depth, giving you the tree structure most people already have as a mental model for file storage. The third is metadata: for each file, the filesystem tracks size, creation and modification timestamps, ownership, and permissions — all stored separately from the file's actual content.

The OS kernel accesses all filesystems through a common interface called the Virtual File System (VFS). The VFS means that an application calling "open this file" does not need to know whether the file lives on an ext4 partition, an NTFS drive, a network share, or even the virtual /proc filesystem — the kernel handles the translation.

The Directory Tree

Both Linux and Windows organize files into a hierarchical tree, but they make a fundamentally different choice about where that tree starts.

Linux has a single root, written as /. Every file on the system — whether it lives on the boot drive, a secondary hard drive, a USB stick, or a network share — appears somewhere under this one unified tree. Attaching a storage device to the tree is called mounting: you mount a drive at a directory (its mount point), and from that moment its contents appear as a subdirectory. Unmount it and those files disappear from the tree.

Windows assigns a separate letter to each volume: C:\ for the system drive, D:\ for a secondary drive, and so on. Each drive letter is the root of its own independent tree. There is no single root that contains everything.

For IT professionals, the Linux model's flexibility matters in practice: a database partition can be mounted at /var/lib/mysql, a shared NFS volume at /mnt/fileserver, and a user's home directory at /home/alice — all appearing as one tree, regardless of which physical device or network location the data actually lives on.

Filesystem Formats

A filesystem format defines the specific on-disk data structures used to store file metadata and content. Different formats make different tradeoffs around features, compatibility, and performance.

Format	Primarily Used On	Max File Size	Permissions	Journaling	Cross-Platform
ext4	Linux (default)	16 TB	Linux rwx	Yes	Read-only on macOS; read/write on Windows via drivers
NTFS	Windows (default)	16 EB	Windows ACLs	Yes	Read-only on macOS; read/write on Linux via drivers
FAT32	USB drives, legacy devices	4 GB	None	No	Universal — readable by virtually every OS and device
exFAT	Flash drives, SD cards	16 EB	None	No	Universal on modern systems (Windows, macOS, Linux)

Journaling is a fault-tolerance feature: the filesystem keeps a running log (journal) of changes it is about to make. If the system crashes mid-write, the journal lets the OS replay or discard the incomplete operation on next boot, preventing filesystem corruption. ext4 and NTFS both journal; FAT32 does not, which is why a sudden USB disconnect used to corrupt files.

The practical takeaway: format USB drives as exFAT if they need to move between Windows, Mac, and Linux systems. Use ext4 for Linux servers. Use NTFS for Windows volumes that need permissions.

The File Allocation Table

A filesystem's most important job is maintaining a map from filenames to the physical blocks on disk where each file's content lives. Files are not stored in neat contiguous regions — they grow, shrink, and get scattered across many non-adjacent blocks over time. The filesystem needs an index to keep track of it all.

The most literal version of this index is the File Allocation Table — the on-disk structure that gave the FAT12, FAT16, and FAT32 formats their name. The FAT is an array with one entry per block on the disk. Each entry describes the state of that block:

FAT Entry Value	Meaning
`0x000`	Free — this block is available for new data
A block number N	This file continues at block N (follow the chain)
End-of-chain marker	This is the last block of the file
Bad-block marker	This block is damaged and should not be used

To read a file, the OS finds the file's starting block number (stored in the directory entry), then follows the FAT chain — entry by entry — until it hits the end-of-chain marker. The file's content is the concatenation of all the blocks in that chain, in order.

Modern filesystems replace the flat FAT array with more efficient structures, but the core idea — a central index mapping files to blocks — is universal. On Linux, ext4 uses inodes: each file has one inode that stores its metadata (owner, permissions, timestamps) and a list of the block addresses where its data lives, and the directory is just a table of (filename → inode number) mappings. On Windows, NTFS uses the Master File Table (MFT) — a structured database where each record describes one file or directory: its name, timestamps, permissions, and the contiguous runs of blocks containing its data.

In every case, this index is what a "filesystem" fundamentally is. Without it, a storage device is just a flat sequence of numbered blocks with no names, no hierarchy, and no way to find anything.

What Really Happens When You Delete a File

Filesystem operations sound like decisive actions. The underlying mechanics are much simpler — and the gap between what users perceive and what the hardware actually does has real consequences.

Creating a file: The OS allocates a free inode (or MFT record), finds free blocks for the content, writes the data to those blocks, records the block addresses in the inode, and adds a (filename → inode) mapping to the directory. The file exists.

Reading a file: The OS looks up the filename in the directory to get the inode number, reads the inode to find the block addresses, and reads those blocks in sequence.

Updating a file: For changes that fit within existing blocks, the OS reads the block, modifies it in memory, and writes it back. If the file grows, new blocks are allocated and the inode is updated to include them.

Deleting a file: The OS removes the filename from the directory and marks the inode and its blocks as free in the allocation table. That is the entire operation. The bits do not move. The data remains on the storage hardware exactly as it was.

What "permanent delete" actually means. When Windows asks "Are you sure you want to permanently delete this file?" and you click Yes, "permanently" means only that the file skips the Recycle Bin — it is not moved to a holding area first. The underlying delete operation is identical: the directory entry is removed and the blocks are marked free. The content is still on the disk, unchanged, until something else happens to be written to those same physical blocks.

This is why file recovery software works. Tools like Recuva or PhotoRec scan the disk looking for inode or MFT entries that have been marked free but whose blocks have not yet been overwritten. As long as the operating system has not reused those blocks for new data, the original file content is completely intact and fully recoverable — sometimes hours, days, or even weeks after deletion.

Formatting Does Not Delete Your Data

A quick format in Windows rebuilds the filesystem structures from scratch — it creates a fresh, empty MFT and marks all blocks as free — but writes nothing to the data blocks themselves. A full format additionally checks for bad sectors, scanning each block, but still leaves the original data intact underneath. Files deleted before a format are just as recoverable after it.

The reason is the same engineering logic as deletion: writing zeros to every block would be slow and is unnecessary for normal use — the next write will overwrite them anyway. Performance wins over security as a default.

Implications for Hardware Disposal

Deleting files and reformatting before selling or recycling a computer does not protect the data on it. Anyone with a recovery tool and a few minutes can retrieve the previous contents. The correct approach depends on the storage technology:

HDDs. Overwrite the entire drive with zeros or random data. One pass is sufficient to defeat software recovery. Tools like DBAN (Darik's Boot and Nuke) automate a full-disk overwrite from a bootable USB drive.
SSDs. Software overwriting is unreliable on flash storage because of wear leveling — the SSD controller remaps writes to fresh cells to spread wear evenly, which can leave old data intact in cells that were bypassed. Most SSD manufacturers expose a firmware command called ATA Secure Erase that resets the drive at the hardware level and is far more reliable than any software approach.
Most secure — physical destruction. Degaussing (for HDDs) or physical shredding renders data unrecoverable regardless of technique. Required for classified or regulated data disposal.

The satisfying truth. The reason "delete" doesn't erase bits is not a bug or a conspiracy — it is a deliberate engineering choice that prioritizes performance. A storage device has no concept of "files" at all: it only knows numbered blocks of bytes. The filesystem is just a table telling the OS which blocks belong to which name. Clearing a name from that table takes a fraction of a millisecond; zeroing every block the file occupied might take seconds. Since the blocks will be overwritten soon anyway, the OS skips the work. This is digital computation reduced to first principles: it is all just 1s and 0s sitting in numbered locations, and "deleting" one is as simple — and as incomplete — as crossing a name off a list.

The Linux Filesystem Hierarchy

The Filesystem Hierarchy Standard (FHS) defines what belongs in each top-level directory on a Linux system. This is not arbitrary — every major Linux distribution follows it, which means you can sit down at an unfamiliar Linux machine and immediately know where to find configuration files, log files, and installed programs.

IT administrators need to know this map. When something breaks, knowing that logs are in /var/log and configuration is in /etc is the difference between troubleshooting efficiently and hunting blind.

NTFS Permissions

Chapter 8 covered the Linux rwx permission model — nine bits, three groups, clean and simple. Windows NTFS uses a more granular system called Access Control Lists (ACLs).

Instead of three fixed groups (owner, group, other), an NTFS ACL can contain any number of Access Control Entries (ACEs) — one for each specific user or group you want to configure. Each entry specifies a precise combination of permissions:

NTFS Permission	What it allows
Full Control	Read, write, delete, and change permissions and ownership
Modify	Read, write, and delete — but not change permissions
Read & Execute	Open and run files; list directory contents
Read	View file contents and attributes only
Write	Create and modify files, but not read existing contents

NTFS permissions also support inheritance: permissions set on a parent folder automatically flow down to files and subfolders within it. This makes large permission structures manageable — set permissions on a department folder once and every file inside inherits them automatically.

In practice, NTFS permissions in enterprise environments are rarely managed file-by-file. Administrators assign permissions to security groups (e.g., "HR-Staff"), then add users to those groups — the permissions follow. This is exactly the kind of thing Active Directory manages at scale.

rwx vs ACL at a glance: Linux's three-group model is simpler to reason about and administer at the command line. NTFS ACLs are more expressive — you can grant "alice: Read" and "bob: Modify" and "IT-Admins: Full Control" on the same file — but more complex to audit. Neither is universally better; they reflect different design priorities.

Directory Services — The Problem They Solve

Imagine an IT department managing a company with 50 servers. Without a directory service, every server maintains its own local user database. When a new employee joins, an IT administrator creates that person's account on each of the 50 servers individually. When the employee changes their password, they need to update it on each server — or the admin does it for them, 50 times. When someone leaves the company, deprovisioning means visiting every server to remove the account.

This is not hypothetical — it is how Unix systems were managed in the 1980s and 1990s, and why every large organization quickly runs into the same problem: distributed identity management does not scale.

A directory service solves this with a central, network-accessible database of identities. Every server in the organization points to the same directory for authentication. One account, one password, works everywhere. Disable the account in one place and access is revoked on every system immediately.

A directory service stores more than just usernames and passwords. A complete enterprise directory contains:

User accounts — credentials, contact information, role, department
Security groups — collections of users that share access rights
Computer accounts — every managed workstation and server as an object in the directory
Policies — configuration rules that apply automatically to users or machines
Resource records — printers, shared drives, application registrations

Active Directory

Microsoft's Active Directory (AD) is the dominant enterprise directory service. Introduced with Windows 2000, it is deployed in the vast majority of mid-to-large organizations running Windows infrastructure. If you work in IT, you will encounter Active Directory.

The key structural concepts:

Domain. A named administrative boundary — typically the organization's domain name, like company.com. Every user account, computer, and policy lives within a domain. A domain is the basic unit of management in AD.

Domain Controller (DC). The server running Active Directory Domain Services (AD DS) — the software that stores and manages the directory database. Organizations run at least two DCs for redundancy. Every time someone logs into a domain-joined machine, their credentials are verified against the DC.

Security Groups. A directory full of individual user accounts is not, by itself, very useful — the reason you want one in the first place is to manage permissions at scale, and you do that with groups. A security group is a collection of user accounts. Rather than assigning permissions on a shared drive to fifty individual users, an administrator assigns permissions to a group (e.g., "HR-Staff") and manages membership. Add someone to the group and they immediately have access; remove them and it disappears. Groups are the answer to "how do I give the right people the right access without losing my mind."

Organizational Units (OUs). Once you have users and groups, you need a way to organize them. OUs are containers within a domain used to do exactly that. You might have OUs for each department (IT, HR, Sales), or for each office location, or both. OUs are also the unit to which Group Policy is applied — which is why they exist as a separate concept from groups in the first place.

Group Policy (GPO). Configuration rules that apply automatically to users and computers in an OU or across the whole domain. GPOs can enforce password complexity requirements, deploy software, configure desktop wallpaper, disable USB ports, map network drives — anything that can be configured on a Windows machine can be managed by a GPO. This is how an IT department configures thousands of machines without touching each one individually.

LDAP

LDAP (Lightweight Directory Access Protocol) is the protocol used to query and manage directory service databases. Active Directory is built on LDAP — when a workstation authenticates a user against the domain, or when an application looks up a user's group memberships, it is making LDAP queries under the hood.

IT professionals encounter LDAP most often when configuring applications to authenticate against Active Directory. Almost any enterprise application — email, VPN, ticketing systems, web apps — can be configured to use "LDAP authentication," meaning the application delegates credential verification to the directory service. Users log in with their AD username and password, and no separate account needs to be created inside each application.

LDAP uses a hierarchical addressing system called a Distinguished Name (DN) to uniquely identify every object in the directory:

cn=alice,ou=IT,dc=company,dc=com

Reading right to left: dc=company,dc=com is the domain (company.com), ou=IT is the IT organizational unit, and cn=alice is the common name of the object (Alice's user account). The DN is the full address of that object in the directory tree.

LDAP communicates over port 389 by default. Plain LDAP sends credentials over the wire in cleartext — LDAPS (LDAP over TLS) wraps it in TLS and runs on port 636, for the same reason HTTPS matters over HTTP. Active Directory also supports Kerberos for authentication in addition to LDAP, which is more secure and is the default for domain-joined Windows machines.

Beyond Active Directory: LDAP is an open standard, and other directory services implement it too. OpenLDAP is a popular open-source implementation used on Linux. Microsoft Entra ID (formerly Azure Active Directory) is Microsoft's cloud-hosted directory service — it uses modern protocols (OAuth 2.0, OpenID Connect) rather than raw LDAP, but serves the same identity management purpose for cloud-first organizations.

DNS and Active Directory

Active Directory is tightly coupled to DNS (Domain Name System). When a computer needs to find a domain controller — to log in a user, authenticate a machine, or apply Group Policy — it does not use a hardcoded IP address. It queries DNS for special records called SRV records that advertise the location of domain controller services.

In practice, Active Directory typically runs its own internal DNS server, and domain-joined machines point to the DC's IP as their DNS server. This means DNS is not just for browsing websites in an enterprise environment — it is the backbone of identity infrastructure.

DNS itself — how it works, how names resolve to IP addresses, and how it scales to the global internet — is the subject of Chapter 10. The key point here is that AD and DNS are inseparable in production environments: if DNS is broken, AD authentication breaks with it.

Chapter 10 asks a different question entirely: now that we know how files and accounts are managed on one machine, what happens when those machines need to talk to each other?

Quiz Chapter 9 Quiz

1. What does a filesystem provide that raw block storage cannot?

2. What is the fundamental difference between how Linux and Windows organize their filesystems?

3. Which filesystem format is the default on modern Linux systems and supports both permissions and journaling?

4. Why is FAT32 still commonly used on USB drives despite its 4 GB file size limit?

5. Where does a Linux system store system-wide configuration files?

6. The NTFS "Modify" permission allows a user to do which of the following?

7. What core problem does a directory service solve for an IT organization?

8. In Active Directory, what is an Organizational Unit (OU)?

9. What is LDAP?

10. Why does Active Directory depend on DNS?

11. What is the purpose of the File Allocation Table (or its modern equivalents like inodes and the MFT)?

12. When you "permanently delete" a file in Windows (bypassing the Recycle Bin), what does the operating system actually do?

13. You are preparing to sell your old laptop. You delete all your files and do a full reformat of the drive. Is your personal data now unrecoverable?