Files and File Systems
Dr Sudheendra S G provides a comprehensive overview of files
and file systems. It covers fundamental concepts, practical mechanisms, and
common misconceptions, suitable for a detailed educational briefing.
I. Core Concepts: What is a File?
At its most basic level, a file is a sequence of numbers,
with its meaning derived from its format.
- "All
files are just numbers": "Text, photos, songs, videos—under
the hood they’re just sequences of numbers. The format tells software how
to interpret those numbers."
- Binary
Data + Format: A file is essentially binary data coupled with a
specific format that dictates how software interprets that data. For
example, the numbers 72 101 108 108 111 spell 'Hello' in ASCII.
File Formats and Metadata
Different file types utilize distinct formats and often
include metadata (data about the data) within a header.
- TXT
& ASCII: Text files (TXT) use character encodings like ASCII to
map numbers to characters.
- WAV
(Audio): WAV files begin with a header containing critical metadata
such as "sample rate, channels, and the literal word WAVE." This
information is essential for a player to interpret the audio data
correctly.
- BMP
(Image): BMP files store pixel data, often using color models like
24-bit RGB (8 bits each for Red, Green, Blue) to represent colors. For
example, 255,255,255 is white, and 0,0,0 is black.
- Reinforcement:
"different formats, same foundation—numbers → meaning via
format."
II. From Raw Storage to Organized Files
Hardware perceives storage as a continuous "long line
of addressable buckets" or "blocks." It is the file system
that imposes structure and meaning on this raw storage.
The Role of the File System
- Abstraction:
"Hardware doesn’t know ‘files’—the file system does." The file
system translates abstract concepts like "files" and
"folders" into concrete operations on storage blocks.
- Directory
as Table of Contents: The file system maintains a directory,
which acts as its "table of contents." This directory maps file
names to their physical locations on storage.
- Initially,
in a flat file system, a directory might list Name, Start, Length (for
contiguous blocks).
- Later,
for non-contiguous blocks, it evolves to Name, Blocks list.
Blocks, Slack Space, and Fragmentation
To manage storage efficiently and allow files to grow, file
systems allocate data in fixed-size units called blocks.
- Blocks:
"fixed-size storage unit used for allocation." Files can occupy
multiple, non-adjacent blocks.
- Slack
Space: When a file's data doesn't perfectly fill its last allocated
block, the remaining unused bytes in that block constitute slack space.
- File
Growth and Fragmentation:
- When
a file needs to grow, and there's no contiguous space available, the file
system allocates new, potentially non-adjacent blocks.
- Fragmentation:
"file’s blocks are scattered." This occurs when a file's data is
spread across non-contiguous blocks on the storage device.
- Impact
of Fragmentation: For Hard Disk Drives (HDDs), fragmentation
significantly "hurts HDD performance" because the read/write
head has to perform "extra movement" (seek steps) to access
scattered blocks.
- Deleting
Files: When a file is deleted, "the directory entry gone, data
still present until reused." This is why "forensics sometimes
‘undelete’" files, as the data itself remains until overwritten.
Defragmentation
Defragmentation is the process of reorganizing file
blocks to be contiguous.
- Purpose:
"Defrag copies blocks around so each file’s blocks are in
order." This improves performance, especially on HDDs, by reducing
seek times.
- SSD
vs. HDD: It's crucial to note that "Modern SSDs don’t benefit
from defrag like HDDs do; they use wear-leveling and the OS uses
TRIM." Defragging SSDs is unnecessary and can even reduce their
lifespan due to added wear.
III. Organizing Files: Hierarchical File Systems
Flat directories become unmanageable with many files. Hierarchical
file systems introduce the concept of folders (directories) to group
related files, allowing for nested structures.
- Structure:
The hierarchy starts with a root directory. Directories can contain
files and other directories.
- Directories
as Files: "Each directory is itself a directory file (another
card) with its own entries."
- Paths:
Paths are "location string from root to file" (e.g.,
/music/theme.wav, /photos/2024/trip.jpg) that specify a file's location
within the hierarchy.
- Moving
Files: "Moving files between folders usually edits metadata, not
the data blocks." The file's data blocks remain in place; only the
directory entries are updated to reflect the new location.
IV. File Identification and Security
Beyond their data, files carry important identification and
security metadata.
Names, Extensions, and Magic Numbers
- Extensions:
Extensions like .txt or .bmp help "humans/OS decide which app to
open."
- Magic
Numbers: Many file formats also include "magic numbers" –
specific "signature bytes in a header identifying format" (e.g.,
WAVE, RIFF, %PDF-). This provides a more reliable way to identify a file's
type, as extensions can be misleading or changed.
- Misconception:
"Extensions define the file type." Correction:
"Programs also check headers/magic."
Permissions
Files also include metadata such as "size,
created/modified time, owner, and permissions (read/write/execute)."
- Access
Control: Permissions dictate who (owner, group, other users) can
"read/write/execute" a file, providing a layer of security and
access control.
V. Key Takeaways and Common Misconceptions
Memorable Lines (Teacher Cues)
- "Formats
give numbers meaning."
- "The
directory is the map; blocks are the land."
- "Files
grow → blocks scatter → fragmentation happens."
- "Moving
a file usually moves a line in a table, not the data."
Common Misconceptions to Pre-empt
- "Changing
a filename changes the data." → Usually only directory metadata
changes.
- "Deleting
erases immediately." → Often just frees blocks; data persists
until overwritten.
- "Defrag
speeds up SSDs." → No; SSD latency is uniform, and defrag adds
wear.
- "Extensions
define the file type." → Programs also check headers/magic.
VI. Summative Concepts (Exit Ticket Prompts)
- Why
fixed-size blocks? File systems use fixed-size blocks instead of
storing files strictly back-to-back to simplify management, allow for file
growth, and facilitate non-contiguous allocation, which improves
flexibility.
- Define
fragmentation and its impact on HDDs. Fragmentation occurs when a
file's data is scattered across non-contiguous blocks. This hurts HDD
performance because the read/write head must move more extensively (seek
steps) to access all parts of the file, increasing access time.
- What
changes when moving a file? When moving a file between folders, only
the directory entries are typically updated to reflect the new
path; the data blocks themselves usually remain in their physical
location.
- Why
can "deleted" files sometimes be recovered?
"Deleted" files can sometimes be recovered because deletion
often only removes the file's entry from the directory, marking its blocks
as free. The actual data on the storage device remains intact until those
blocks are subsequently overwritten by new data.
No comments:
Post a Comment