The World Wide Web: Core Concepts and Mechanisms
The World Wide Web (Web) is a crucial application built upon
the Internet's infrastructure. It is characterized by interconnected documents
(web pages) linked together, forming a vast web of information.
1. Distinguishing the Internet vs. Web
A fundamental concept is understanding the difference
between the Internet and the Web. As the source states: “The Internet is the
network of wires, radios, routers, and protocols. The Web is an app running on
top—millions of servers + your browser.”
Internet: The physical and logical infrastructure (wires,
routers, IP addresses, protocols) that allows devices to connect and exchange
data. Other applications like email, online gaming, and messaging also utilize
the Internet.
Web: An application layer built on the Internet, consisting
of web servers hosting pages and web browsers retrieving and displaying them.
2. Pages & Hyperlinks: The Foundation of Connectivity
Web pages are documents that contain content and,
critically, hyperlinks. These hyperlinks allow users to navigate between
different pages by simply clicking on them. The source describes this as links
forming a "giant web," where "clicking follows edges" in a
"mini web graph."
3. URLs & Addressing: Locating Resources
Every web page and resource has a unique address called a
URL (Uniform Resource Locator). A URL provides a structured way to specify
where a resource is located and how to access it. Key components of a URL
include:
Scheme: http or https (defines the protocol).
Host: The domain name (e.g., example.com).
Port (optional): Specifies the port number (e.g., 80 for
HTTP, 443 for HTTPS).
Path: The specific location of the resource on the server
(e.g., /courses).
Query (optional): Parameters passed to the server (e.g.,
?q=cats).
Fragment (optional): Points to a specific section within a
page (#section1), handled client-side.
4. How a Browser Retrieves a Page: The Request-Response
Cycle
Retrieving a web page involves a series of sequential steps:
URL Input: The user types a URL into their browser.
DNS Lookup: The browser needs the IP address of the host
specified in the URL. It asks a DNS resolver: “Browser asks a DNS resolver:
‘What IP is sathvick.com?’ DNS returns an IP so we can connect.” DNS (Domain
Name System) translates human-readable domain names into machine-readable IP
addresses.
TCP Connection: Once the IP address is known, the browser
establishes a TCP (Transmission Control Protocol) connection to the web server
on the specified port (usually 80 for HTTP or 443 for HTTPS).
HTTP Request: The browser sends an HTTP (Hypertext Transfer
Protocol) request to the web server. This request specifies what resource is
desired. A typical GET request includes:
GET /courses HTTP/1.1 (request line: method, path, protocol
version)
Host: sathvick.com (essential for virtual hosting)
User-Agent: ExampleBrowser/1.0 (identifies the browser)
Accept: text/html (preferred content type)
HTTP Response: The web server processes the request and
sends an HTTP response back to the browser. A successful response (200 OK)
includes:
HTTP/1.1 200 OK (status line: protocol, status code, reason
phrase)
Content-Type: text/html; charset=UTF-8 (type of content)
Content-Length: 428 (size of the body)
<html> ... </html> (the actual HTML content)
Common Status Codes:
200 OK: Request successful.
301/302: Redirect.
403: Forbidden.
404 Not Found: "It means 'resource not found' on that
server." (A common misconception is that it means the internet is down).
500: Server error.
HTML Rendering: The browser receives the HTML content and
renders it into the visual web page that the user sees.
HTTPS: For privacy and integrity, modern web communication
often uses HTTPS (HTTP over TLS), which encrypts the HTTP traffic.
5. HTML: Structuring Web Content
HTML (HyperText Markup Language) is the core language for
creating web pages. “Browsers render HTML—text annotated with tags that
describe structure and links.” HTML uses tags to define elements like headings,
paragraphs, lists, and, crucially, links.
Example of basic HTML structure:
<!doctype html>
<html lang="en">
<head>
<meta
charset="utf-8">
<title>Klingon Gear</title>
</head>
<body>
<h1>Klingon
Starter Kit</h1>
<p>Learn
more about <a href="https://www.kli.org/">Klingons</a>.</p>
<h2>Top 3
Items</h2>
<ol>
<li>Bat'leth (<a href="https://www.kli.org/">what
is this?</a>)</li>
<li>Uniform</li>
<li>Dictionary</li>
</ol>
</body>
</html>
head: Contains meta-information about the page (e.g., title,
character set).
body: Contains the visible content of the page.
Tags: <p> for paragraphs, <h1> for main
headings, <a> for hyperlinks (with href attribute for the link
destination), <ol> for ordered lists, <li> for list items.
CSS (Cascading Style Sheets) handles the visual styling, and
JavaScript adds interactive behavior, but HTML provides the fundamental
structure.
6. How Search Engines Work
Search engines automate the process of finding information
on the Web, a task too vast for human-curated directories. They operate through
a three-stage pipeline:
Crawler: Programs that traverse the Web by following
hyperlinks, discovering new pages.
Index: A vast database that stores information about web
pages, mapping keywords to the pages where they appear. “Search engines ‘search
the live web’ instantly.” → They search their index (a snapshot, updated
often).”
Query/Rank: When a user submits a query, the search engine
searches its index for relevant pages. Ranking algorithms then order these
results based on various factors, such as keyword relevance, backlinks, and
page authority (like early Google PageRank), to present the most useful results
first. “Modern engines use hundreds of signals; idea stands.”
7. Net Neutrality: Fair Access to Information
Net neutrality is a principle asserting that “packets should
be treated equally—no throttling/prioritizing based on source or content.” This
means Internet Service Providers (ISPs) should not block, slow down, or charge
more for certain content, applications, or websites.
Equal-priority lanes: All data treated equally.
Paid-priority fast lanes: ISPs could prioritize traffic from
services that pay more, potentially slowing down others.
Debate: Raises questions about who decides on prioritization
(e.g., time-sensitive video calls vs. email) and what safeguards are necessary
to ensure fair access and prevent anti-competitive practices.
Common Misconceptions Addressed:
"The Web is the Internet." → The Web uses the
Internet, similar to other applications.
"IP address = website." → Multiple websites can
share one IP address through virtual hosting (requiring the Host header in HTTP
requests).
"Search engines 'search the live web' instantly."
→ They search their pre-built index, which is regularly updated.
"404 means internet is down." → It signifies that
the requested resource was not found on the specific server.
This briefing covers the essential components, processes,
and policy considerations for understanding the World Wide Web.
No comments:
Post a Comment