The Internet and the World Wide Web

 

Introduction.  What does the Internet do so that there are so many users?  Overall, it provides a multiplicity of options for communication.  This can be used for all kinds of reasons including business and pleasure.
  • Surf the World Wide Web
  • Create web pages
  • Send and receive e-mail in a fraction of the time required for snail mail
  • Join things such as mailing lists and exchange messages with people who share some common interests
  • Participate in newsgroups where users can read and post messages
  • Transfer files such as  documents, sound and graphics from one computer to another
    • FTP
  • Use a terminal emulation program such as Telnet to connect to a remote computer and make use of it
  • Listen to radio broadcasts
  • View streaming video
  • Chat real time with one or more other users
  • Participate in a virtual meeting making use of things like shared documents, whiteboard drawings and communicating via text, audio and video
  • Make long distance telephone calls
  • Establish a secure VPN - Virtual Private Network by tunneling through the Internet to a private server or LAN

The World Wide Web.  To many people, the Internet is the same as the World Wide Web.  This is not the case however.  The World Wide Web runs over the Internet and is probably its major draw, but it doesn't encompass all of its uses.  The following sections of this web page will present the following web components.

  • HTTP - Hypertext Transfer Protocol
  • HTML - Hypertext Markup Language
  • Web Servers
  • DNS - Domain Name System
  • Web browsers

Hypertext Transfer Protocol.  The acronym HTTP - Hypertext Transfer Protocol appears at the beginning of almost all URLs.  It is a set of rule that governs how files, such as text, graphics, sound and video, are exchanged on the web.  The HTTP standards were developed by the IETF - Internet Engineering Task Force.

As implied by its name, HTTP is used to exchange hypertext files.  These files can include links to other files.  A web server runs an HTTP service or HTTP daemon, which is a program that services HTTP requests.  These requests are transmitted by web browsers, which are essentially HTTP client software programs.

In a very shortened and oversimplified summary, when a user types a web address into a browser's address box or clicks a hyperlink the web browser sends the request to the web server at that address.  The web server processes the request and likely returns the requested resource which might be an HTML page, graphic, or something like a sound file.

If the requested resource is not on the web server, or the user lacks the appropriate permissions for accessing the resource, the web server returns an error message.  A few of the more common error messages are contained in the following list.

  • 401/Unauthorized - access was denied due to an improper authorization header
  • 403/Forbidden - denial of access for unknown reasons
  • 404/File Not Found - the resource is not on this server
  • 500/Internal Error - the  server had a problem that prevented it from processing the request

Hypertext Markup Language.  HTML - Hypertext Markup Language is the most common language used to develop web pages.  It is used to indicate to the receiving web browser how the received file should be formatted for viewing, among other things.  HTML contains tags to indicate things such as line breaks, paragraphs, hyperlinks, tables and so on.

For example, in order to get this following phrase to be bold, centered and red

The following phrase

the following HTML statement was used.

<p align="center"><b><font size="4" color="#FF0000">The following phrase</font></b></p>
 

  • the first tag <p align="center"> creates a new paragraph and centers it

  • the next tag <b>causes text that follows to be displayed in bold

  • the next tag <font size="4" color="#FF0000"> determines the fonts size and color

  • the trailing tags that contain slashes </font></b></p> cause the impact of these tags to terminate

HTML code is usually developed using some sort of web authoring software such as FrontPage, PageMill or DreamWeaver.  Other web designers prefer to write the code manually.  Some develop pages using a mixture of these devices.  Web pages are created and stored in text documents with .html extensions.

Web Servers.  Once web pages are created they are uploaded to a web server.  This web server needs to be connected to the Internet and runs web server software such as the following.

  • Apache - usually for Linux/UNIX

  • IIS - Internet Information Server - for Microsoft platforms

  • Domino - for Lotus servers

  • Suitespot - for Netscape servers

The web server also needs to have a public IP address by which it can be identified by Internet users.  Remember, these names are usually used in place of a particular numeric IP address.

Domain Name System.  We have talked about DNS - Domain Name System elsewhere, but we will give a brief summary here.

When a web user types a name such as http://www.nytimes.com/ into their browser's address box and causes it to send a request for this page, the request fairly quickly gets to a DNS server that translates this name into an IP address.  This IP address is then used to locate the web server on the Internet.  Then the usual occurs depending on the validity of the request.

Web Browsers.  The following list gives the most popular web browsers.  The first two are by far the most widely used.

  • Microsoft's Internet Explorer

  • Netscape Communicator

  • Lynx - text based that can be used on a variety of platforms

  • Amaya - distributed by the W3C - World Wide Web Consortium

  • Emacs/W3 - a web browser for UNIX, Windows, AmigaDOS, OS/2 and VMS

  • QNX Voyager - a web browser for the QNX operating system

  • Opera - a small and fast web browser that works in Windows, Linux and BeOS

Some web browsers are freeware and cost nothing.  Others are shareware, where you can try them before buying them.  Some web browsers are available for download on the web.  Many online services have their own proprietary browsers or they may customize versions of Internet Explorer or Netscape before distributing them to their customers.

Web Services

Search Engines.  Due to the hugely fragmented nature of the web, it can be quite difficult to find what you're actually looking for a number of search engines have cropped up on the web.  Search engines can operate in a number of different ways, though they all seem to rely on the use of databases that have accumulated information about different pages on the web. 

One of the mostly widely used approaches to identify web sites and pages with particular characteristics is to make use of metatag keywords designated by the web developers.  These are done within the HTML associated with a page.

Unfortunately, according to www.wwwmetrics.com less than 20% of the public web is indexed by search engines.  On the other hand, it seems that 85% of all users try to use search engines to locate pages.  This might be somewhat of an indication why making use of search engines can be frustrating.  On the other hand, at least to me, they really do seem to have gotten better at finding desired and appropriate pages.

Several of the major search related web sites are listed below.

  • Yahoo

  • ycos

  • DirectHit

  • Excite

  • AltaVista

  • Northern Light

  • Google

Search engines all seem to have three basic components.

  • A spider which travels from one link to another on the web gathering indexing information.

  • An index implemented with a database that stores important information about each web page the spider collects.

  • A search/retrieval mechanism that provides the interface for users to enter their queries and receive the results.

Some search engines actually search the entire text of a document rather than only the keywords.  This has both its upside and downside implications.

Some search engines allow the user to use compound search criteria developed using Boolean operators such as AND, OR or NOT.

Metasearch engines compile the results from several search engines.  Three of the most well known are in the following list.

  • MetaCrawler

  • SavvySearch

  • Ask Jeeves

These search engines don't actually maintain their own indexes.  They make use of other search engines and compile and collate the information received from these.

Web portals are something that are provided by many search engines as start pages for users.  They typically provide additional information such as news, frequently visited links, maps, web mail, games, forums and phone directories.  These portals can also be customized by the users to better meet their needs.

E-Mail.  One of the most widely spread uses of the Internet is e-mail. 

It has many advantages over other forms of communication such as the following.

  • potential for quick and informal interactions

  • they can be read at the recipient's discretion

  • they can be printed

  • it doesn't need to be plain text

    • attach files

    • use images

    • video

    • sound

    • format text

    • color

  • less expensive than postal mail

  • less expensive than long distance phone calls

E-mail also has some significant disadvantages such as the following.

  • spam can be too easily distributed

  • potential for misinterpretation

  • might be sent impulsively

  • might not actually be read

  • privacy and security

Some sort of software is required to handle e-mail on both clients and servers.  Most e-mail transfers are based on SMTP - Simple Mail Transfer Protocol, POP3 - Post Office Protocol version 3 or IMAP4 - Internet Message Access Protocol version 4.  The Internet standard, based on TCP/IP protocols is SMTP.

Some typical e-mail programs that have been developed for the client are

  • Eudora

  • Pegasus

  • Microsoft Outlook Express

  • Netscape Mail

  • Lotus Notes

Many of these programs also include personal organizer components.

It is also possible and often useful to create mailing lists that agglutinate several different e-mail addresses under one heading.  These are usually used to distribute one message to a group of users  somewhat frequently.  These allow the user to develop the list and store them.  They can also add and delete e-mail addresses in order to keep them up to date.

There can also be reasons to have huge mailing lists such as those for some businesses.  In these instances it is likely the firm will make use of some specialized software such as the following.

  • Majordomo

  • ListServ

  • other web based list servers

Newsgroups.  Newsgroups are similar to mailing lists except that you have more discretion about what information to review.  With mailing lists you are automatically sent materials and need to get off the mailing list in order to not receive materials.

On the other hand, newsgroups require you only to configure your news reading software,  If you don't want to read something you just don't connect.

The standard protocol used by most news servers is NNTP - Network News Protocol.

Most of the news reader software is available in conjunction with the most well known web browsers such as Internet Explorer/Outlook Express, Netscape Communicator and Opera.

File Transfer.  FTP - File Transfer Protocol programs allow a user to download files from one computer on a network o the user's computer.  They also enable a user to upload programs form the user's computer to other computers on a network.

In practice, these files can be anything from MP3 music files to installation programs for things such as upgrades and drivers.  These days, most web browsers include some FTP functionality.  In order to get closer to a machine many users still prefer more focused software such as CuteFTP or WS_FTP.  Almost all FTP programs have some sort of GUI - Graphical User Interface.

Telnet.  Older mainframe oriented networks used to require some sort of terminal emulation programs to run on desktop computers in order for them to be used as "dumb" terminals.  Users could then make use of computing power on the mainframe from a variety of locations.  They could also make use of their desktop computer in stand alone mode.

Telnet is an enhancement of this in that it can allow users to connect to any host on the Internet where they have appropriate permissions.  This requires the client computer to run particular telnet software.  It also requires the host to be running telnet software.  Telnet doesn't allow the transfer of files from one computer to another.  It allows the user to do things like run programs or edit files on the server.

Streaming Media.  Since there are more and more options for high speed access to the Internet it has become more reasonable to transmit/receive radio and/or video signals.  Streaming essentially involves having an adequately steady flow of data from some source on the Internet to the client's computer so that the client can experience a continuous streaming of a particular video or radio segment.

This process usually involves some buffering on the client's computer to adjust for irregularities in the stream.

The most popular applications typically used for radio and/or video streaming are

  • Apple QuickTime

  • Real Video

  • Real Audio

  • Windows Media Player

Live Chat.  Real time communication over the Internet is possible using the IRC - Internet Relay Chat system.  IRC software on clients and servers enables users to connect to particular channels or forums hosted by the server.  These channels or forums are usually dedicated to particular topics of interest to the users.

Popular client programs include the following.

  • mIRC

  • Visual IRC

  • PIRCH

It is also possible to develop web based chat rooms located at particular URLs where web surfers can enter common chat communities and send messages to each other.  These are usually developed in Java or some other scripting language.

Instant messaging programs allow users to send one-to-one messages to other people who are online.  It is typical for a dialog box to pop up on a recipient's monitor containing the text message as it is typed in by the sender.  Some systems also include the capabilities for voice and/or video transmission.

Some popular instant messaging systems include the following.

  • Mirabilis ICQ

  • AOL/Netscape Instant Messenger

  • Microsoft MSN Messenger

  • Yahoo Messenger

  • Tribal Voice Powwow

  • Elf Communications WinTalk

Audio/Video Conferencing.  Audio/Video Conferencing software allows users to conduct meetings online using audio and video technology.  Each user in the conference must be running the appropriate software and have a soundcard, microphone and digital camera attached to their computer.

Popular conferencing software includes the following.

  • CUseeME

  • Microsoft NetMeeting

The ITU - International Telecommunications Union has developed standards for videoconferencing.  H.323 identifies the standard that governs communication over IP and IPX that do not provide QoS.  Conferencing applications that comply with this standard should be able to interoperate with each other.

There are two main categories of video conferencing software.

  • Point-to-Point

    • A one-to-one connection that works somewhat like a video telephone

  • Multipoint

    • Allows for more than two participants to interact simultaneously

Internet Telephony.  Internet Telephony allows you to place long distance calls over the  Internet using Voice over IP technology.  This approach allows the users to avoid telephone company charges for such calls.

These systems used have fairly low quality and required both ends of the connection to be using computers running the same software.

Think about the difficulties involved in such a connection where packets are being sent over the Internet.  You need to be certain there is an adequately fast flow of information so that the conversations continue reasonably.  This situation becomes even more complicated if packets get out of order.  Having a packet network rather than a circuit switched network means there are likely to be some difficulties in attaining desired performance levels.

Newer technologies allow a user to use Internet gateway servers to place calls to regular phone numbers.  The person on the  receiving end doesn't even need to have a computer or an Internet connection.

The following steps are typical.

  1. The caller dials an IP address of an Internet gateway server.

  2. The call is routed over the Internet to the gateway server.

  3. The gateway server routes the call to a PBX in the city containing the call's final destination.

  4. The  call is transferred to an outbound line to the recipient's telephone number.

  5. The call is recognized as a local call by the telephone company because it originates within the PBX in the destination city.

This approach will also work for faxes.

Some of the most popular Internet telephony applications include the  following.

  • Internet Phone for Windows

  • Webphone for Windows

  • NetPhone for MacInstoshes

  • Cyberphone for Windows or UNIX

  • PGPhone for Windows and Macintosh

  • Speak Freely for Windows and UNIX

Most telephony applications are proprietary and require the users at each end to be running the same software if they are communicating through computers.  Fortunately, the packages that run on different platforms also work across platforms.

Virtual Private Networking.  VPN - Virtual Private Networking is something we will get into more depth later in the course.  VPN technology uses the Internet as a conduit.  A secure connection is established between a client and a private LAN by tunneling and encrypting/decrypting information.