Saturday, August 23, 2025

C30 The World Wide Web


The World Wide Web: Core Concepts and Mechanisms

The World Wide Web (Web) is a crucial application built upon the Internet's infrastructure. It is characterized by interconnected documents (web pages) linked together, forming a vast web of information.

 

1. Distinguishing the Internet vs. Web

A fundamental concept is understanding the difference between the Internet and the Web. As the source states: “The Internet is the network of wires, radios, routers, and protocols. The Web is an app running on top—millions of servers + your browser.”

 

Internet: The physical and logical infrastructure (wires, routers, IP addresses, protocols) that allows devices to connect and exchange data. Other applications like email, online gaming, and messaging also utilize the Internet.

Web: An application layer built on the Internet, consisting of web servers hosting pages and web browsers retrieving and displaying them.

2. Pages & Hyperlinks: The Foundation of Connectivity

Web pages are documents that contain content and, critically, hyperlinks. These hyperlinks allow users to navigate between different pages by simply clicking on them. The source describes this as links forming a "giant web," where "clicking follows edges" in a "mini web graph."

 

3. URLs & Addressing: Locating Resources

Every web page and resource has a unique address called a URL (Uniform Resource Locator). A URL provides a structured way to specify where a resource is located and how to access it. Key components of a URL include:

 

Scheme: http or https (defines the protocol).

Host: The domain name (e.g., example.com).

Port (optional): Specifies the port number (e.g., 80 for HTTP, 443 for HTTPS).

Path: The specific location of the resource on the server (e.g., /courses).

Query (optional): Parameters passed to the server (e.g., ?q=cats).

Fragment (optional): Points to a specific section within a page (#section1), handled client-side.

4. How a Browser Retrieves a Page: The Request-Response Cycle

Retrieving a web page involves a series of sequential steps:

 

URL Input: The user types a URL into their browser.

DNS Lookup: The browser needs the IP address of the host specified in the URL. It asks a DNS resolver: “Browser asks a DNS resolver: ‘What IP is sathvick.com?’ DNS returns an IP so we can connect.” DNS (Domain Name System) translates human-readable domain names into machine-readable IP addresses.

TCP Connection: Once the IP address is known, the browser establishes a TCP (Transmission Control Protocol) connection to the web server on the specified port (usually 80 for HTTP or 443 for HTTPS).

HTTP Request: The browser sends an HTTP (Hypertext Transfer Protocol) request to the web server. This request specifies what resource is desired. A typical GET request includes:

GET /courses HTTP/1.1 (request line: method, path, protocol version)

Host: sathvick.com (essential for virtual hosting)

User-Agent: ExampleBrowser/1.0 (identifies the browser)

Accept: text/html (preferred content type)

HTTP Response: The web server processes the request and sends an HTTP response back to the browser. A successful response (200 OK) includes:

HTTP/1.1 200 OK (status line: protocol, status code, reason phrase)

Content-Type: text/html; charset=UTF-8 (type of content)

Content-Length: 428 (size of the body)

<html> ... </html> (the actual HTML content)

Common Status Codes:

200 OK: Request successful.

301/302: Redirect.

403: Forbidden.

404 Not Found: "It means 'resource not found' on that server." (A common misconception is that it means the internet is down).

500: Server error.

HTML Rendering: The browser receives the HTML content and renders it into the visual web page that the user sees.

HTTPS: For privacy and integrity, modern web communication often uses HTTPS (HTTP over TLS), which encrypts the HTTP traffic.

5. HTML: Structuring Web Content

HTML (HyperText Markup Language) is the core language for creating web pages. “Browsers render HTML—text annotated with tags that describe structure and links.” HTML uses tags to define elements like headings, paragraphs, lists, and, crucially, links.

 

Example of basic HTML structure:

 

<!doctype html>

 

<html lang="en">

 

<head>

 

    <meta charset="utf-8">

 

    <title>Klingon Gear</title>

 

</head>

 

<body>

 

    <h1>Klingon Starter Kit</h1>

 

    <p>Learn more about <a href="https://www.kli.org/">Klingons</a>.</p>

 

    <h2>Top 3 Items</h2>

 

    <ol>

 

        <li>Bat'leth (<a href="https://www.kli.org/">what is this?</a>)</li>

 

        <li>Uniform</li>

 

        <li>Dictionary</li>

 

    </ol>

 

</body>

 

</html>

 

head: Contains meta-information about the page (e.g., title, character set).

body: Contains the visible content of the page.

Tags: <p> for paragraphs, <h1> for main headings, <a> for hyperlinks (with href attribute for the link destination), <ol> for ordered lists, <li> for list items.

CSS (Cascading Style Sheets) handles the visual styling, and JavaScript adds interactive behavior, but HTML provides the fundamental structure.

6. How Search Engines Work

Search engines automate the process of finding information on the Web, a task too vast for human-curated directories. They operate through a three-stage pipeline:

 

Crawler: Programs that traverse the Web by following hyperlinks, discovering new pages.

Index: A vast database that stores information about web pages, mapping keywords to the pages where they appear. “Search engines ‘search the live web’ instantly.” → They search their index (a snapshot, updated often).”

Query/Rank: When a user submits a query, the search engine searches its index for relevant pages. Ranking algorithms then order these results based on various factors, such as keyword relevance, backlinks, and page authority (like early Google PageRank), to present the most useful results first. “Modern engines use hundreds of signals; idea stands.”

7. Net Neutrality: Fair Access to Information

Net neutrality is a principle asserting that “packets should be treated equally—no throttling/prioritizing based on source or content.” This means Internet Service Providers (ISPs) should not block, slow down, or charge more for certain content, applications, or websites.

 

Equal-priority lanes: All data treated equally.

Paid-priority fast lanes: ISPs could prioritize traffic from services that pay more, potentially slowing down others.

Debate: Raises questions about who decides on prioritization (e.g., time-sensitive video calls vs. email) and what safeguards are necessary to ensure fair access and prevent anti-competitive practices.

Common Misconceptions Addressed:

"The Web is the Internet." → The Web uses the Internet, similar to other applications.

"IP address = website." → Multiple websites can share one IP address through virtual hosting (requiring the Host header in HTTP requests).

"Search engines 'search the live web' instantly." → They search their pre-built index, which is regularly updated.

"404 means internet is down." → It signifies that the requested resource was not found on the specific server.

This briefing covers the essential components, processes, and policy considerations for understanding the World Wide Web.


No comments: