SY110: Intro to the World-Wide-Web and HTML

Intro to the World-Wide-Web and HTML

Learning Outcomes

After completing these activities you should be able to:

Describe the World-Wide-Web ("web"), its origins, and recent developments.
Describe the differences between the different generations of web technologies (i.e. Web 1.0, Web 2.0, Web 3.0).
Explain the components of a URI.
Identify URI file schemes in which browsers are able to process local data.
Describe the DNS hierarchy and how it's used in a URL
Identify terms related to HTML code - tags, attributes, elements.
Given basic HTML code, describe how it will be rendered by a browser.
Create an HTML file containing basic components to be hosted in a web server.

Computer Architecture Components - Applications — **Computer Architecture - Applications.** The programs and applications that
are accessed and run by users.

This is the first of a series of classes that have shifted from the Operating System (OS) to applications. Keep in mind that all of the other components of Computer Architecture are still relevant and dependent on the vast amount of interconnected systems that make up the World-Wide-Web (WWW). While previous focus may have been on a single computer, our journey will take us to a web of computers that share information and communication standards that make up parts of the Internet.

Introduction to the Web

Diagram of the ARPANET in 1971 — **Architectural diagram of the ARPANET in 1971.** (Salus, 1995)

ARPANET. Mentioned in the Intro to the Cyberspace Domain class, the Advanced Research Projects Agency Network (ARPANET) was the first network that interconnected computers between the University of California, Los Angeles (UCLA), the University of California, Santa Barbara (UCSB), Stanford University, and the University of Utah. The precursor to the Internet began in September of 1969 with two connections, a single node, that was connected to UCLA and quickly expanded to a handful of universities. The first transmission was so unreliable that the connection crashed when the two letters LO were transmitted. Thus, history was made with "LO" but the actual attempt was to access the node by sending LOGIN (ICANN, 2019).

Web 1.0: Information-based. The WWW (W3) was conceptualized in a proposal by Tim Berners-Lee at CERN in 1989- yep the nuclear research organization that's running the experimental Large Hadron Collider (LHC). This proposal attempted to address the scalable problem by developing "a distributed hypertext system." The hypertext system became Hyper Text Markup Language (HTML), the foundational basis for sharing information through websites, as well as the Uniform Resource Identifier (URI) and standards for communications through the HyperText Transfer Protocol (HTTP).
The concept of early web pages are similar to books, where sites were intended only to provide one-way, read-only communications. There was no way to engage or interact with the author through websites. This Massachusetts Institute of Technology (MIT) website published in 1996 on Internet growth at the time is a good example of Web 1.0 - https://www.mit.edu/people/mkgray/net/internet-growth-summary.html.

Search engines helped organize all of this information through the use of metadata. Popular Internet Service Providers (ISPs) in the early 1990's, like America Online (AOL), had their own application-based search programs that used keywords to access websites instead of having to memorize a website's address. Other popular search engines in the late 1990's include AltaVista, Ask Jeeves, Lycos, Netscape, and WebCrawler. Take a look at millions of 1.0-based websites through the Internet Archive - WayBack Machine.

What Happens on the Internet Stays on the Internet
With more than 800 billion websites archived, the WaybackMachine is maintained by the Internet Archive. Back in the early 2000's, it was unimaginable that the entire Internet could be recorded due to the amount of data that would be necessary to store all of that information. Moore's Law may have applied to transistors in chips but when it comes to storage capacity, this article states that storage exceeded Moore's Law by eight times! Even with the technologies to "hide" activities on the Internet, it's likely going to be revealed by future capabilities.

Web 2.0: Interaction-based. Many of today's popular applications allow for interaction and engagement with authors, publishers, famous movie and sports stars, high-ranking military officials, and all levels of government. Think about how online interaction plays a role with the apps you use most. It can be social media, like X (formerly known as Twitter), Meta/Insta, and Snap or streaming media services, such as Netflix, YouTube, Hulu, and Plex. Web 2.0 is interactive-based, user-generated, dynamic content that allows collaborative engagement. Financial Technologies (FinTechs) has been pushing through regulatory hurdles with near-real-time (NRT) payment services for friends, family, and business transactions, across services like PayPal/Venmo, CashApp, and Zelle, that allow interaction and engagement among its users for moving money. All of the data and statistics collected from media, apps, and mobile platforms curate user content, marketing, and preferences custom to each cyber-persona.

Much of the content that interacts between the app and server uses the Application Programming Interface (API) that was brought up in the OS Shells and Permissions class. While Netflix doesn't use a typical web browser-based application, it still relies on port 443 - the same port used to browse secured web pages. Cloud-based services are also a part of this ecosystem because Netflix leverages Content Delivery Networks (CDNs) to provide the fastest streaming across geographically dispersed areas across cloud infrastructures. What is another cloud-based service used at the Naval Academy?

Why are mobile apps dangerous?

Malicious software, also referred to as malware, have been found in 'fast fashion' e-commerce apps like Temu, TikTok, and Pinduoduo - all Chinese based companies. Security analysis has found these apps to bypass user security permissions with the ability to modify settings, access private messages and biometrics, view data from other apps, collect Bluetooth and WiFi network information, and prevent uninstallation. Take a look at this news article from 2023, a US Report on economic and security impacts, and analysis from the Center for Internet Security (CIS).

Web 3.0: Decentralized-based. Since Satoshi Nakamoto released a whitepaper on a Peer-to-Peer Electronic Cash System in 2008, the Internet has exploded on its concepts through decentralized, Distributed Ledger Technologies (DLT) that allow for economic-based transactions. This includes the ability to authenticate digital properties to its original source, which can be a photo, document, artwork, music recording, tweet, or other digital assets that can be controlled by an originator. Sure, digital media can be copied but the same type of argument applies with the deed to a property, a diploma from a well-recognized university, or a DD214 for military service. The premise of the ability to solve a basic problem that has countlessly been disproven due to human errors and corruption in real-world financial transactions through recorded ledgers has now proven itself in digital form and never been found to have a single fraudulent transaction retained in its ledger since minting of the Genesis block (block 0) in 2009 for Bitcoin.

There are many challenges when considering the various methods for developing DLTs, such as Proof-of-Work (PoW) and Proof-of-Stake (PoS). Both have significant vulnerabilities and significant amounts of time is required for technologies to mature. As of July 2025 (Tangem, 2025), there are over 37 million unique cryptocurrencies, with that number expected to surpass 100 million by the end of the year. This exponential growth in the field is unfortunately fraught with fraud and scandals, perpetuating the loss of faith in "crypto" amongst the general population, governments, and regulators across the world. In 2023, with the collapse of the world's largest crypto exchange FTX, the US Commodities Futures Trading Commission estimated that consumers lost more than $8B in savings because theft and corruption by then CEO Sam Bankman-Fried. He used his customers' savings to buy real estate, and as funds to finance his companies' business ventures (CNBC, 2023). He was sentenced to 25 years in prison and ordered to pay $11B in forfeiture in March 2024 (DOJ, 2024).

Number of Cryptocurrencies hits 36 million in July 2025

The opportunities in advancing Web 3.0-based technologies for Decentralized Finance (DeFi), Decentralized Apps (dApps), Artificial Intelligence (AI), virtual assets in the metaverse that deliver Virtual Reality (VR) and Augmented Reality (AR), digital content creators (to include Non-Fungible Tokens-NFTs), and digital identities can be extremely beneficial with improving every facet of life and across all industries like healthcare, financial services, consumer goods, agriculture, education, information technology, space, and others that require provenance. Especially in cybersecurity, DLTs have the potential to solve complex cyberspace supply chain challenges. The White House released the U.S. Government National Standards Strategy for Critical and Emerging Technologies in May 2023, which encompasses DLTs.

Web Terms and Concepts

The "World Wide Web". The World Wide Web is the vast global collection of servers and clients (aka browsers) communicating over the Internet using the HTTP or HTTPS protocols. The Web is an example of a client-server system.

A Server is a computer (and associated programs) that provides a service.
A Client is a computer (or program) that uses a service provided by a server.

A single physical computer CAN play the role of both a server and a client at the same time.

Protocols. Computers and networked (interconnected computers) systems are very complex. Only by employing very careful, highly structured designs can we humans deal with that complexity. Protocols are at the heart of that design. A protocol is an agreement about communication between two communicating parties -- a complete specification of what things can be said, what responses can and must be made, and what these things mean.

A website is a collection of one or more files that contain the content you see displayed when you visit a website with your browser. These files include not only the main webpage, but also image files, script files (that make the webpage do things), and other files.
Web Server. Web servers are computers (and the software running on them) that exist to supply website content, using the HTTP or HTTPS protocol, on demand. What this means is that files for the website are stored on one or more web servers. Some popular web server software includes IIS (from Microsoft), Apache (from the Apache Software Foundation), and NGINX (pronounced "engine-X"). If you're curious, the course website is hosted on an Apache web-server
"Web Client" = Browser. For a user, and for this course, a web client is just a browser, such as Chrome, Firefox, Edge, or Safari. Note that these are applications running on top of a computer's operating system.
"Web" vs. "Internet". The "Web" and the "Internet" are not the same thing. The Internet includes all the infrastructure (clients, servers, routers, switches, cables, radio links, etc) and protocols through which browsers and web servers communicate, but encompasses much more. Many other kinds of communication run over the Internet: e-mail, voice-over-IP telephone calls, remote logins to computers (recall, ssh), file sharing, etc.
- Web — web servers, web clients. HTTP and HTTPS protocols.
- Internet — includes web, but also includes e-mail, streaming media, VOIP, chat, internet gaming, etc., plus all the communications hardware and protocols to connect them.
- "The Internet" >> "The Web"

Browsers and URIs

**Basic Anatomy of a Browser**. In order to view web content, browsers provide basic
functionality for users to access and navigate web code hosted on servers.

Browsers are applications (apps) that are used to read and process web information on users systems. Web information may include HTML, Cascading Style Sheets (CSS), JavaScript, and more. We'll explore these more in the next few lessons. Chrome, FireFox, Edge, and Safari are some of the more common browsers used in computers and mobile devices but they all serve the same purpose: to process information provided by a web server and display it for users to see on their own devices. The anatomy of a browser consists of basic functionality that allows users to use the app to navigate to any website that's online. Here are some main points about browsers:

A browser's primary control (input) is the address bar. You enter into the address bar a URI, or Uniform Resource Identifier (URI), which tells the browser where to find a website resource.
URI. A URI specifies three required components and two optional components:
1. The scheme - what protocol is used to communicate.
2. The authority - the name of the web server to contact.
3. The path - the file path on a web server containing data to be processed.
4. An optional query -used as a delimiter (?) that processes attribute-value pairs.
5. An optional fragment - used as an identifier (#) that processes a secondary resource.

  |------Uniform-Resource-Identifier------|
  https://www.usna.edu/CyberDept/index.php
  \(1)/   \----(2)----|-------(3)--------/

Below is the URI component diagram specified in RFC3896:

  foo://example.com:8042/over/there?name=ferret#nose
  \_/   \______________/\_________/ \_________/ \__/
   |           |            |            |        |
scheme     authority       path        query   fragment

Scheme. Computer network communications are never "official" standards. Instead they are evaluated through the Internet Engineering Task Force (IETF) through Requests for Comments (RFCs). Individuals and companies refer to these RFCs and choose to implement these protocols so that their software will be compatible with the critical mass of other users on the Internet. RFC2616 specifies HTTP communications while the improved and secure HTTPS is under RFC2818. These "comments" lay out requirements for standard communications but are not always adopted, potentially resulting in obsolete protocols due to lack of use or lack of security. Additional schemes supported by web browsers include the file URI [RFC8089] that doesn't necessarily have an established communications protocol, similar to the mailto scheme [RFC6068], as well as other well established protocols like File Transfer Protocol (FTP) [RFC959].

Would you like to know more?

Protocols are typically associated with communication ports that will be covered in the TCP/IP Stack classes. These ports are managed by the Internet Assigned Numbers Authority (IANA) in their Protocol Registries.

DNS Hierarchy — **DNS hierarchy.** Mapping of the DNS hierarchy for the server component of the URI of this website courses.cyber.usna.edu.

Authority. The authority component follows the :// and contains the server's domain name. For the purposes of this course, the authority is a required component of the URI and may also be referred to as the server component. Domain Name System (DNS) is a hierarchical system that uses human-readable names instead of Internet Protocol (IP) addresses to access server information. When considering the URI ssh://faculty.cyber.usna.edu, The Root Domain is always . (not needed as it is always there) with the Top-Level-Domain (TLD) of .edu, a domain of .usna, a subdomain of .cyber, and a sub-sublevel domain of faculty. What are a few other common TLDs that you may know of?

Path. The location of the file path on the server to access specific data. This path is relative to a specific directory (other than the root), so this path is going to be different than the absolute path on the file system itself. On this course's web server, the web service will point to your home directory to load your website for the upcoming Lab 3 assignment. The URI accessed will be https://midn.cyber.usna.edu/~m9999/index.html, whereas the absolute file path on the file system will be /home/mids/m9999/public_html/index.html.

Query. This is going to be relevant in the Server-Side Scripting class, so be sure to pay attention to its application. In fact, if you take a look at the address bar in the browser to view this web page, there's a clear delimiter (?) identified with two attribute-value pairs consisting of type and event.

Fragment. Identifiers (#) can be considered markers that allow browsers to jump to a different part of a page that has an id attribute as part of a HTML tag.

The URI convention was designed to allow users to easily distinguish and recall these components, as compared to using protocol numbers, IP addresses, and port numbers, resulting in the web address format you see today as https://www.usna.edu. A Uniform Resource Locator (URL) is a subset of URI, in which it specifically refers to a website address, as such, https://www.usna.edu may be referred to as a URL.

Knowledge Check: What is the server (or authority) component of the URI used to view this webpage?

Activity: URI File Scheme

Take a look at how browsers process URIs within the address bar by opening local files.

Check to see if you have both Google Chrome and Mozilla FireFox installed on your computer

Click on Windows Start and type chrome
Check to see if the app appears as the icon and Google Chrome will show. If it appears, then the browser is installed.
Repeat the process to see if FireFox is installed.
If you are missing any of the apps, click on Windows Start and type software center followed by the ↩ Enter key.
Software Center window should appear and search for the missing application.
Click on the Install button to install the app on the computer.

Open File Explorer by clicking on Windows Start and type file explorer followed by the ↩ Enter key.
Navigate to your home directory and look for photos you may have on your computer.
Right-click on a photo and select Open with. If Chrome or Firefox is not available, go down to Choose another app followed by clicking on Choose an app on your PC.
Below are the default absolute file paths in which Chrome and FireFox should be installed:
Chrome: C:\Program Files\Google\Chrome\Application\chrome.exe
FireFox: C:\Program Files\Mozillla Firefox\firefox.exe
View the address bar and note the URI format in which the file scheme is displayed.

HTML (Hyper-Text Markup Language)

When typing a website URL in a browser, a request is sent to the server followed by code that is returned and processed. The server code is very different from what is displayed and was designed to identify the structure of the webpage using a markup language, not requiring the use of compilers or other tools typically needed for programming languages. These web pages are plaintext files written in HTML. Hyper Text refers to the linking between documents (i.e. the links on a web page) and the markups (i.e. the "<thing> stuff </thing>") in HTML specify how the content should be interpreted and rendered by a web browser, such as this web page (and all the web pages on the website). The browser doesn't show you the HTML it receives, rather interprets the HTML code and displays the web page. When the browser follows the HTML, it organizes the text, pictures, and content that is viewed by users in what is referred to as rendering HTML. To understand how websites work, and certainly to create your own, you need to know the basics of HTML.

The latest version of HTML5 includes improvements over graphics, to include video and audio, accessibility for mobile devices, additional elements, and Application Programming Interface (API). The websites you'll be building for this course only requires basic elements of HTML but the option to use more advanced Cascading Style Sheets (CSS) and JavaScript will be an option for those brave enough to hone in on your web ninja skills. Below is a base construct of HTML shell code and more can be added to build out a website:

<!DOCTYPE HTML>
<HTML>
  <HEAD>
    <TITLE>My Website</TITLE>
  </HEAD>

  <BODY>
  </BODY>
</HTML>

NOTE: Move your mouse over any of the HTML tags to view its description.

Nothing will be displayed in a browser by running this code because no text has been identified within the BODY. Visually, observe how information is organized (1) identifying that the code was intended for HTML and (2) there is a start and end sequence.

Here are the key points pertaining HTML syntax:

As a markup language, HTML is simply a text file. Editing the code can be done with text editing applications.
- Notepad++, Pulsar, and VS Code are all examples that are approved for use in this course.
Tags are used to identify elements that organize content on a webpage and can be identified within angle brackets (< >). Some elements can be used to change font characteristics, such as the italicize tag <I>..</I> and bold tag <B>..</B>, create headers with <H1>..</H1> tags, or link to other websites by specifying an attribute with an anchor tag <A href="https://www.usna.edu">..</A>. Each of those tags require start and end tags, with a / identifying an end tag.
Not all tags require end tags because the empty elements are self-contained, like line breaks <BR>, horizontal line breaks <HR>, and displaying photos with <IMG src="photo.jpg">.
Nested tags used in paragraphs <P>..</P> or divisions <DIV>..</DIV> and structural tags <HTML>..</HTML> will be wrapped around other code.
There will be ample opportunity to build your own webpage and explore the different tags and attributes throughout HTML!

Below is an example of code that has been built out with some of the elements discussed:

HTML Code

As Rendered in the Browser

<HTML>
  <HEAD>
  </HEAD>
  <BODY>

    <H1>A Simple Web Page</H1>

    <p>This page has <b>two</b> paragraphs.
      The first has an image
      <img src="SleepyFace.JPG"> and
      <a href="https://www.usna.edu">a link</a>.
    </p>

    <p>The second has
      <span style="color:#ff0000;">
        different colors
      </span>, which is cool.<br>
      It also has some funky characters:
      &#0931; &#8680; &#9650;
     </p>

  </BODY>
</HTML>

A Simple Web Page

This page has two paragraphs. The first has an image and a link.

The second has different colors, which is cool.
It also has some funky characters: Σ ⇨ ▲

HTML is designed to organize and present information and content within a browser. Use this next example to locate tags, attributes, and elements within HTML. Then expand knowledge by learning additional tags and attributes that can be utilized throughout HTML code.

HTML Format:

<start_tag attribute="value"> Element <end_tag>

HTML Example:
<a href="https://www.usna.edu"> USNA Website </a> <br>
The <b>quick</b> brown fox jumps over the lazy dog.<br>
<img src="img/fox_dog.jpg" height="250">

Highlight: tag attribute element none
Move your mouse over the terms to highlight those specific parts of HTML

HTML Output:

USNA Website
The quick brown fox jumps over the lazy dog.

Web Security Considerations

Annual Top 10 Web Security Risks.
There's an entire industry focused on web security, from dedicated computer hardware to distributed software security. The Internet has a plethora of web applications and services that continue to be exposed to poor security configuration and practices that lead to vulnerabilities and compromises, resulting in the loss of significant amounts of proprietary information and customer data. Take a look at Open Worldwide Application Security Project (OWASP) Top 10 Web Application Security Risks that are published each year to keep companies and IT organizations up to speed on the latest trends.

Image courtesy of OWASP.

As a client-based application, your browser has the ability to view the code that is provided by a server once a connection has been established. In the browser used to view this webpage (Chrome), click on the menu, select More Tools, followed by Developer Tools to see the code under the Elements tab. Now change tabs to Network and refresh the browser to look at all of the local and remote connections made. Take a look at other websites, such as mail.google.com or www.amazon.com to see the hundreds of requests and connections made by your computer in seconds!

Supplemental Media:

The Internet's First Message

W3Schools

The W3Schools website allows you to build out HTML and visually see what a page would look like in a browser. Try adding the following code into the page and run it:

<IMG src="https://www.usna.edu/CyberDept/_files/images/cys_logo_sm.jpg">
<H1>USNA SY110 Class!!</H1>

HTML in 100 Seconds

Review Questions:

What is the main difference between the web and the Internet?
What are the three required components of a URI?
What are some of the common schemes used to view a website?
Who oversees standards and protocols for network communications, such as web-based HTTPS?
What are the different levels of the DNS hierarchy and associated terms?
Given a basic set of HTML code, how is the format structured and also rendered on a browser?

References

Q. Norton, "Anonymous Tricks Bystanders Into Attacking the Justice Department," Wired, Jun. 3, 2017. [Online]. Available: https://www.wired.com/2012/01/anons-rickroll-botnet/
P. H. Salus, "Casting the Net: From ARPANET to Internet and Beyond," Addison-Wesley Publishing Company, 1995, pp. 63-64.
Statista. (2025). "Number of internet and social media users worldwide as of February 2025." [Online]. Available: https://www.statista.com/statistics/617136/digital-population-worldwide/
ICANN. (2019). "The First Message Transmission." [Online]. Available: https://www.icann.org/en/blogs/details/the-first-message-transmission-29-10-2019-en

Tangem. (2025). "How Many Cryptocurrencies Are There in July 2025?" [Online]. Available: https://tangem.com/en/blog/post/how-many-cryptocurrencies-exist/

CNBC. (2023). "The $8 billion Sam Bankman-Fried criminal trial starts today — here’s what’s at stake and how we got here." [Online]. Available: https://www.cnbc.com/2023/10/03/sam-bankman-fried-criminal-trial-starts-today-heres-whats-at-stake.html

Department of Justice (DOJ). (2024). "Samuel Bankman-Fried Sentenced to 25 Years for His Orchestration of Multiple Fraudulent Schemes." [Online]. Available: https://www.justice.gov/archives/opa/pr/samuel-bankman-fried-sentenced-25-years-his-orchestration-multiple-fraudulent-schemes