Web Overview

There are three items we need for a web site.

The HyperText Transfer Protocol (HTTP) standard, which lets the web server understand and respond to the web browser.

A HTTP Web Server, which is a computer that runs a Web site. Using the HTTP protocol, the Web server delivers Web pages to browsers as well as other data files to Web-based applications.

A Web Browser or User Agent that knows how to use HTTP. It also must know how to handle the files it receives from a web server. In particular, a web browser must handle errors. This is very important, because most web pages include errors.

HyperText Transfer Protocol

The HyperText Transfer Protocol (HTTP), which lets the web server understand and respond to the User Agent, typically a web browser.

Originally HTTP only had a single Method, GET, which let a web browser request a HTML page from a server. The server response was a status line, and a message of its own. HTTP/1.0 added GET, POST and HEAD, while HTTP/1.1 added 5 new methods: OPTIONS, PUT, DELETE, TRACE and CONNECT.

Here is a list of three digit HTTP status codes returned to a web browser by a web server. You may have seen 404, which means a web page could not be found. Some servers may be configured to return custom status response pages, to provide additional information. You can really have some fun with that facility, if you have time.

HTTP response codes for dummies. 50x: We stuffed up. 40x: You stuffed up. 30x: Ask someone else. 20x: Cool! Any response starting with 2 means things are fine.

The HTTP header field responses from a web server let browsers determine which language a file is, what type of file it is (originally only HTML was acceptable), and much more. In turn, the web browser indicates what it can accept, so ideally, a web browser only receives files it can display. Unfortunately, Internet Explorer 6, 7 and 8 claimed to accept anything, which kind of destroyed the system.

The type of initial web file is typically text/html, which a few years ago was 70% of all web files. This indicates a text file, with HTML markup. However if your file is actually written in XHTML, according to the standard it should be served as application/xhtml+xml. About 30% of files were XHTML, and this is gradually increasing. This website is written in XHTML 1.1.

Although Internet Explorer 6 responds that it will accept all files, when it receives an application/xhtml+xml file, it simply asks if you would like to save the file. It does not display it. I should add that IE6 is perfectly capable of displaying it. Microsoft are very experienced with XML. So the people who had carefully written XHTML files started to serve them as text/html. This guarantees that every such file is now invalid.

You may need to configure your Apache web server correctly to serve XHTML correctly. Dreamhost get XHTML correct, based on the file type, except for not including index.xhtml like this in the Directory Index maintained in mod-mime directives.

DirectoryIndex index.xhtml index.html

I expect Dreamhost will be willing to correct this when I bring it to their attention. In the meanwhile, it is handy for demonstrating how web sites are put together.

A HTTP cookie file may also be sent. This can be used to track what you are doing. There are privacy implications to accepting cookies, however many web sites will not operate if you refuse their cookies. A true cookie monster. Our site does not currently use cookies for anything.

A Web Server

A computer that runs a Web site. Using the HTTP protocol, the Web server delivers Web pages to browsers as well as other data files to Web-based applications.

Web servers are not always used for serving the World Wide Web. They can also be found embedded in devices such as printers, routers, webcams and serving only a local network. I may show an example of a web server designed for a local network.

Where do I get a web server?

Three easy choices.

Your own computer. If you have sufficient upload speed (we in Carlyle Gardens generally do not). First install a web server program such as Apache HTTP server. Installing a web server may be tricky (easy on a Macintosh). Configuring it correctly is absolutely tricky. I strongly advise against doing this.

Many Internet Service Providers that connect you to the internet operate a web server, sometimes free. They let you install small web sites on it. You may not be able to use your own domain name, or that may cost extra. For example, your own BigPond web site will cost between $10 and $20 a month, as will iiNet, and Internode.

A specialist web server provider. Use your own domain name as standard, for web sites, email and much else. Carlyle Gardens Computer Club's web site is one of millions hosted on Dreamhost. I have a number of other web sites there. Not that I have time to update them either.

A Web Browser or User Agent

A web browser is an application such as Microsoft's Internet Explorer, Mozilla Firefox, Opera or Google's Chrome. Apple supply Safari.

Each web browser uses a different web browser engine or layout engine. Microsoft use Trident, Mozilla use Gecko. Apple use WebKit (a fork based on KHTML). Google used WebKit, but have very recently forked WebKit, and are moving to the new version, called Blink. Opera use Presto, but are expected to move to Blink. The web browser engine or layout engine component is used to paint the contents of a web page on a virtual window.

The window can be a display, or a printer, or even a voice synthesiser. Because access to a web page may not be via a browser, we often call it a User Agent. For example, a blind person may not use a web browser.

Calling the web browser a User Agent also helps remind us the most important visitor to many web pages is blind, deaf and dumb. It is a search engine spider or web crawler, which lets others find our web pages (unless the site owner makes exclusions by using a robots.txt file). Our robots.txt file allows all searches. It would be from say Bing, DogPile metasearch, Duck Duck Go, Google), Yahoo's AltaVista.

A lot of the work a web browser does is attempt to correct errors in web pages. In the previous decade, this got so difficult that the authors of web pages started to rebel. Not only did they need to fix errors, they needed to imitate how other web browsers fixed errors. Enough was enough. Error fixing has basically stalled. Not removed, but not improved either. So old web pages will probably continue to work, for a long while. New pages need to actually start following standards.

This makes it increasingly important that web pages be written to conform to the standards. If a web page does not perform, the first thing anyone correcting it will ask is, does it conform to the standard? The first thing they will say to you is, fix it so it is valid.

Because the web browser engine is simply a software component, it can also be used by many different applications. For example, fancy graphic email generated as a web page is now usually displayed within your email application by the web browser engine.

ePub ebooks, which are really just a special packaged and zipped web site, can be displayed by a web browser engine (takes some tricks). ePub 2 from 2007 to 2010 is actually produced using XHTML 1.1 (exactly the same as used in this web site) and CSS 2. ePub 3 is HTML5, with XHTML, CSS 2.1 with some CSS 3, SVG graphics, and includes audio and video. These are not your standard ebooks.

It is my intention to move this web site from XHTML 1.1 to (X)HTML5 soon after the new standard is released, which is by July 2014.