Friday, August 22, 2025

C20 How Computers Organize Everything


Files and File Systems

Dr Sudheendra S G provides a comprehensive overview of files and file systems. It covers fundamental concepts, practical mechanisms, and common misconceptions, suitable for a detailed educational briefing.

I. Core Concepts: What is a File?

At its most basic level, a file is a sequence of numbers, with its meaning derived from its format.

  • "All files are just numbers": "Text, photos, songs, videos—under the hood they’re just sequences of numbers. The format tells software how to interpret those numbers."
  • Binary Data + Format: A file is essentially binary data coupled with a specific format that dictates how software interprets that data. For example, the numbers 72 101 108 108 111 spell 'Hello' in ASCII.

File Formats and Metadata

Different file types utilize distinct formats and often include metadata (data about the data) within a header.

  • TXT & ASCII: Text files (TXT) use character encodings like ASCII to map numbers to characters.
  • WAV (Audio): WAV files begin with a header containing critical metadata such as "sample rate, channels, and the literal word WAVE." This information is essential for a player to interpret the audio data correctly.
  • BMP (Image): BMP files store pixel data, often using color models like 24-bit RGB (8 bits each for Red, Green, Blue) to represent colors. For example, 255,255,255 is white, and 0,0,0 is black.
  • Reinforcement: "different formats, same foundation—numbers → meaning via format."

II. From Raw Storage to Organized Files

Hardware perceives storage as a continuous "long line of addressable buckets" or "blocks." It is the file system that imposes structure and meaning on this raw storage.

The Role of the File System

  • Abstraction: "Hardware doesn’t know ‘files’—the file system does." The file system translates abstract concepts like "files" and "folders" into concrete operations on storage blocks.
  • Directory as Table of Contents: The file system maintains a directory, which acts as its "table of contents." This directory maps file names to their physical locations on storage.
  • Initially, in a flat file system, a directory might list Name, Start, Length (for contiguous blocks).
  • Later, for non-contiguous blocks, it evolves to Name, Blocks list.

Blocks, Slack Space, and Fragmentation

To manage storage efficiently and allow files to grow, file systems allocate data in fixed-size units called blocks.

  • Blocks: "fixed-size storage unit used for allocation." Files can occupy multiple, non-adjacent blocks.
  • Slack Space: When a file's data doesn't perfectly fill its last allocated block, the remaining unused bytes in that block constitute slack space.
  • File Growth and Fragmentation:
  • When a file needs to grow, and there's no contiguous space available, the file system allocates new, potentially non-adjacent blocks.
  • Fragmentation: "file’s blocks are scattered." This occurs when a file's data is spread across non-contiguous blocks on the storage device.
  • Impact of Fragmentation: For Hard Disk Drives (HDDs), fragmentation significantly "hurts HDD performance" because the read/write head has to perform "extra movement" (seek steps) to access scattered blocks.
  • Deleting Files: When a file is deleted, "the directory entry gone, data still present until reused." This is why "forensics sometimes ‘undelete’" files, as the data itself remains until overwritten.

Defragmentation

Defragmentation is the process of reorganizing file blocks to be contiguous.

  • Purpose: "Defrag copies blocks around so each file’s blocks are in order." This improves performance, especially on HDDs, by reducing seek times.
  • SSD vs. HDD: It's crucial to note that "Modern SSDs don’t benefit from defrag like HDDs do; they use wear-leveling and the OS uses TRIM." Defragging SSDs is unnecessary and can even reduce their lifespan due to added wear.

III. Organizing Files: Hierarchical File Systems

Flat directories become unmanageable with many files. Hierarchical file systems introduce the concept of folders (directories) to group related files, allowing for nested structures.

  • Structure: The hierarchy starts with a root directory. Directories can contain files and other directories.
  • Directories as Files: "Each directory is itself a directory file (another card) with its own entries."
  • Paths: Paths are "location string from root to file" (e.g., /music/theme.wav, /photos/2024/trip.jpg) that specify a file's location within the hierarchy.
  • Moving Files: "Moving files between folders usually edits metadata, not the data blocks." The file's data blocks remain in place; only the directory entries are updated to reflect the new location.

IV. File Identification and Security

Beyond their data, files carry important identification and security metadata.

Names, Extensions, and Magic Numbers

  • Extensions: Extensions like .txt or .bmp help "humans/OS decide which app to open."
  • Magic Numbers: Many file formats also include "magic numbers" – specific "signature bytes in a header identifying format" (e.g., WAVE, RIFF, %PDF-). This provides a more reliable way to identify a file's type, as extensions can be misleading or changed.
  • Misconception: "Extensions define the file type." Correction: "Programs also check headers/magic."

Permissions

Files also include metadata such as "size, created/modified time, owner, and permissions (read/write/execute)."

  • Access Control: Permissions dictate who (owner, group, other users) can "read/write/execute" a file, providing a layer of security and access control.

V. Key Takeaways and Common Misconceptions

Memorable Lines (Teacher Cues)

  • "Formats give numbers meaning."
  • "The directory is the map; blocks are the land."
  • "Files grow → blocks scatter → fragmentation happens."
  • "Moving a file usually moves a line in a table, not the data."

Common Misconceptions to Pre-empt

  • "Changing a filename changes the data." → Usually only directory metadata changes.
  • "Deleting erases immediately." → Often just frees blocks; data persists until overwritten.
  • "Defrag speeds up SSDs." → No; SSD latency is uniform, and defrag adds wear.
  • "Extensions define the file type." → Programs also check headers/magic.

VI. Summative Concepts (Exit Ticket Prompts)

  1. Why fixed-size blocks? File systems use fixed-size blocks instead of storing files strictly back-to-back to simplify management, allow for file growth, and facilitate non-contiguous allocation, which improves flexibility.
  2. Define fragmentation and its impact on HDDs. Fragmentation occurs when a file's data is scattered across non-contiguous blocks. This hurts HDD performance because the read/write head must move more extensively (seek steps) to access all parts of the file, increasing access time.
  3. What changes when moving a file? When moving a file between folders, only the directory entries are typically updated to reflect the new path; the data blocks themselves usually remain in their physical location.
  4. Why can "deleted" files sometimes be recovered? "Deleted" files can sometimes be recovered because deletion often only removes the file's entry from the directory, marking its blocks as free. The actual data on the storage device remains intact until those blocks are subsequently overwritten by new data.

 


No comments: