Web Architecture
The Internet and World Wide Web (or just Web for short) form the
backbone of all types of modern communications. The Web is just
one of many types of services that is available on the internet (some
others are: e-mail, instant messaging, newsgroups, file transfer,
secure shell, remote desktop and many more). The Web consists of
pages maintained by various private and corporate organizations. Some
of these like http://google.com, http://facebook.com,
http://wikipeida.org, etc contain text, images, audio, videos, and
animations; in addition, they can support sending and receiving data
from users of their pages. Users "browse" pages using a web
browser such as Firefox.
Structured Documents, Links
Web pages are written in a programming language called HTML or
HyperText Markup Language, as well as some other associated
technologies. The HTML document that the user gets for each page
contains the text of the page, as well as links to other material
needed to display the page. This other material can include images,
sound, video, style sheets, scripts and more.
In addition to all the stuff that makes the page look and act
interesting, the page may also contain links to other web pages. These
links are what make the pages into a "web". Each page can link to
many other pages, which in turn can link to many other pages and so
on. Some of these pages can even have links back to the original
page. If you were to draw out a map of all these links between
pages, the map could look very much like a spider web.
When a user browses to a web page, such as http://www.google.com, the
user's web browser goes out and converts the URL or Uniform
Resource Locater into an address for a server computer. The
browser then sends a message to this server computer asking for the
page to display. Each of the other items that is referenced by
this HTML document are then downloaded as well, independently from the
original HTML document. Because of this, to download and display a
single web page, dozens or hundreds of connections must be made to the
server. Additionally, there is nothing to stop the creator of the
HTML document to reference an image or other object on a server
separate from the server that has the HTML document. One example
of this is Google's Image Search, when you find images, you are on a
google page, but inside that page's HTML, there is a reference to the
image from the result page so you can see it before you completely go
to that site.
Web Servers
Each server doesn't know anything about the HTML files that it
sends out. Rather, it just has a directory full of files that is the
web site, when a user's web browser asks for the file, it simply sends
a copy of that file back. There is also a special file called
index.html (some servers use a different name, but this will usually
work) that will be the default that is returned. When your web
browser asks a website for "http://www.google.com/FileName.html", the
browser will go to the server at the address "www.google.com" and ask
for the file in the top directory, symbolized by the single slash (/),
called "FileName.html". However, if the user just types in
"http://www.google.com/" the server will automatically send back the
index.html file (just like if the user entered
"http://www.google.com/index.html¨), since there is no file
specified after the slash. (If the user doesn't enter the slash,
the web browser is smart enough to add it for them.)
For a web server, it doesn't do anything different if the user asks for
"http://servername.com/file.html" or
"http://servername.com/image.jpg". In both cases the server
(located at the address "servername.com") just goes and grabs a file
off its hard disk and sends it back to the user's web browser. This
makes it easy for the person designing the web pages, because they can
simply put all the files they need for the site in a directory on the
server, and then tell the server that is the "root" directory which
will be used when a user types in a slash (/) after the server's
address.