Files and File Systems

 

Introduction.  Everyone has created a word processing document and stored it on a computer.  The general purpose word used to describe the document is file.  A file consists of data that can be manipulated as one entity.  For example, several things that can be done to files as a whole are in the following list.
  • opened
  • viewed
  • copied
  • deleted
  • moved

Users also need to be able to do other things like send them across networks and print them.

Each file needs to have a unique name, at least relative to the other files are are directly around it. 

There are many different types of files that are almost always determined by the types of applications in which they are used.  For example, files developed in Microsoft Word have particular formats and extensions that make them incompatible with most other programs.  For example, you couldn't compile them in a C++ compiler or edit them in a photo processor.

File Types.  The following table contains a description of a variety of application programs, their standard file extension and file types.

 

File Extension File Type Description Associated Programs
.avi Audio/Video Movie file Media Players
.bmp Bitmap Graphics file Image Viewers
.doc Document Word Processing Document Word Processing Programs
.gif Graphics Interchange Format Picture Compressed graphics file can also be used in animations Image Viewers and Web Browsers
.hlp Help Help data files Text editors
.htm/.html HTML Web pages including hypertext links Web Browsers
.ini Initialization Configuration files OSs use these to set configurations
.jpg/.jpeg JPEG Picture Compressed graphics file Image Viewers and Web Browsers
.mid/.midi Musical Instrument Digital Interface Sound files for synthesized music Media players
.pdf Portable Document Format High quality, portable text and graphics documents Adobe Readers
.ra Real Audio High quality audio files RealPlayer
.rtf Rich Text Format Formatted text files Word processors or text editors
.tar Tape Archive UNIX archive files UNIX commands
.tif/.tiff Tagged Image Format High quality graphics format Image viewers
.wav Waveform Sound File Sound Media Players

 

File Systems.  Files stored on secondary storage devices such as hard drives, floppies and CDs need to be organized so they can be located and worked with.  The overall approach used to place, retrieve and do other manipulations on files is called the file system.

Different OSs implement different file systems.  Some operating systems can make use of more than one file system.  For example, early Windows 3.x could use only the FAT16 (File Allocation Table) file system.  Windows 2000 can use FAT16, FAT32 or NTFS.

The file system determines naming conventions and the format for specifying a path or route to a file's location.  Naming conventions vary from system to system.  The following list contains some fairly common, though far from universal naming conventions.

  • maximum number of characters allowed
  • maximum length of extensions or file type identifiers
  • whether spaces are allowed
  • case sensitivity
  • special characters
  • format for specifying the path

Pathnames can function in one of the two ways contained in the following list.

  • Absolute (fully qualified) pathname
    • identifies the complete path from the root partition to the  file itself
    • for example

C:\Documents and Settings\fox\My Documents\NetworkingUndergrad\filename.ext

  • Relative pathname

    • identifies the path relative to the location of the current file and folder in which the operating system is currently working

These sorts of paths are consistent across operating systems except that there is usually a distinction between using forward and backward slashes.  Windows and OS/2 use backslashes.  UNIX and Linux use forward slashes.

Most popular operating systems use trees as the basic structure for the file system.  This is true of MS-DOS, all Windows, Macintosh, OS/2, UNIX and Linux.  The root is at the top and containers called directories or folders are created below the root.  These directories or folders can contain other directories or folders.  Any given folder cannot contain either two files or two folders/directories with the same name.  This guarantees unique naming and storage locations even if you use the same names someplace else within a tree.

Popular File Systems.  The following list displays what are probably the most widely used files systems.

  • FAT - File Allocation Table

    • FAT12

    • FAT16

    • FAT32

    • VFAT

  • NTFS - NT File System

  • HPFS - High Performance File System

  • HFS - Macintosh Hierarchical File System

  • NFS - Network File System

We will now survey some of the basic features of each of these file systems.

FAT Systems.  These file systems make use of a file allocation table that provides a mapping of where files are stored.  The files are stored in clusters a unit of storage.  A cluster can only contain data from one file, but files can be spread across several clusters.  Because of this, each improvement in the system resulted in smaller clusters in order to waste less space.

The original FAT systems were developed for MS-DOS which limits filenames to eight characters with three character extensions.  VFAT -Virtual FAT was developed by Microsoft for Windows 95 to allow for things such as longer filenames and disk caching.

NTFS.  NTFS was developed by Microsoft for Windows NT on networks.  It was developed improve on things like FATs lack of ability to implement the file level security absolutely essential for a NOS.  The following list contains several ways NTFS improves on FAT.

  • File level security through NTFS permissions

    • makes use of an ACL - Access Control List to designate accessibility for all users to files regardless of where they are connecting from

  • File level compression

    • NTFS files can be compressed individually as opposed to at a partition level for FAT

  • Sector sparing

    • the file system identifies bad sectors and moves them to a good sector and marks the bad sector as unusable

    • this process is done on the fly and invisible to the user

  • Unicode support

    • Unicode allows for international languages in file naming

  • Support for extremely large partitions

    • up to 16 exabytes or 16 billion bytes of data

  • Long file name support

    • also maintains eight character dot three character names for backward compatibility

NTFS keeps track of clusters using a b-tree/binary tree directory scheme.  B-tree indexing enables automatic viewing of data in a sorted order.  This is different than the linked list structured FAT.  The b-tree infrastructure allows for much faster access to files.

FAT stores information about a file's clusters only in the file allocation table.  NTFS stores such information with each cluster.  NTFS is less vulnerable to file fragmentation, that is getting parts of files stored discontiguously or scattered on the hard drive.

NTFS 5 in Windows 2000 has some additional capabilities beyond those of earlier versions.  Some of these additional capabilities are summarized in the following list.

  • EFS - Encrypting Files System capability for files on a hard disk

  • better support for storage of sparse files

  • support for user quotas which enables administrators to set upper limits on user's storage capacity

  • distributed link tracking to up date links to files when they are moved

  • volume mount points which enable you to mount a disk within a folder

    • in the past you could only mount/map as many disks as there are letters in the alphabet

  • better remote storage support

UNIX File System.  UNIX file systems are organized as hierarchies of directories and subdirectories in which files can be stored.  In UNIX systems, the root directory is denoted with a /.  Directory paths look like

/directory1/subdirectory1a/subdirectory1ab ...

The UNIX system directories are located immediately off of the root.  The standard system directories are listed in the following table.

 

Directory Descritpion
/bin contains executable binary files which are commands or utilities for everyday operation
/dev contains special files that represent physical devices such as printers
/etc contains commands and system files used for system administration
/lib contains libraries used by programs
/tmp contains temporary files which are removed when the system reboots
/usr contains user's home directories along with other files such as the online manual

 

The kernel is the core of the operating system that functions directly with the machine.  It loads when you boot the system.  The root directory contains the kernel file and the bootstrap loader which is a small program that begins the loading of the operating system.

The UNIX file system

  • doesn't allow for spaces in file names
    • Typical ways around this are to name directories and files using something like MyFilesForClass or My_Files_For_Class.
  • does allow the use of periods within file names unlike Windows OSs
  • filenames may have up to 255 characters while excluding some special characters. 
  • UNIX commands are case sensitive
    • this impacts how URLs are entered for UNIX servers regardless of your browser
  • I-node mapping
    • a number called an i-node is used to identify each file or directory
    • this number is mapped by the OS to the location of the data on the disk
    • a table is maintained with a full path name to the corresponding i-node
  • hard links
    • these allow you to associate more than one file with a particular file name
    • done using the ln command
  • flat files
    • lacks elaborate rules for headers for different types of files
    • files are made up of streams of data with no imposed structure
    • the application is responsible for providing formatting

HPFS.  HPFS - High Performance File System was developed by IBM for OS/2 after abandoning a FAT approach.  Some of the major characteristics of HPFS include the following.

  • long file name support for up to 254 characters
  • faster performance than FAT
  • high reliability and recoverability
  • less file fragmentation than FAT

HPFS allows for extended character sets and preserves case, though it isn't case sensitive.

The directory structure divides the disk into bands.  Each band is 16 MB across.  The directory structure and allocation information is located in the middle of each band.  Due to this, no information is more than 6 MB away from the control information on the disk.  This is the basis of HPFS's improved performance relative to FAT.

HPFS supports partitions up to 64 GB and allocates disk space based on sectors rather than the clusters used in FAT.  This helps decrease wasted space and fragmentation.  Due to its sector signatures associated with each sector, information can be recovered from fairly badly corrupted hard disks.

HFS.  The Macintosh makes use of HFS - Hierarchical File System.  Mac disks are partitioned into volumes to support the following options.

  • files
  • directories
  • directory threads
    • contains the name of the directory and its parent directory
  • file threads
    • contains the name of the file and its containing directory

HFS makes use of b-tree indexing algorithms to organize and locate contents in directories.  It also supports file forking in which files are made up of two tines.

  • resource fork - menu items, dialog boxes, etceteras
  • data fork - data stream

HFS makes use of catalogs containing catalog records on each file or directory that function somewhat like UNIX i-nodes.  The file and directory threads described above provide increased capability to recover information if the system crashes.

Outside vendor utilities exist to help improve Mac's accessibility from other OSs.

NFS.  Sun Microsystem's NFS - Network File System has become a standard for file servers.  It makes use of RPC - Remote Procedure Call protocols to communicate.  In order for NFS to interoperate, an NFS client must be installed on the requesting computer and both client and server must be running TCP/IP.  WebNFS, an extension of NFS, can be used to replace HTTP and FTP on the Internet.

Data and Disk Partitions.  Hard disks have become the industry standard for data storage on desktops.  The size of these disks can vary greatly depending on the age and cost.  A hard disk is often made up of a stack of disks.  Data is stored in concentric circles on the disk called tracks.  The disk drive has two heads that read and write data.  The heads are in relatively fixed positions and the disks rotate to enable access to different portions of the disk.  Obviously data access speeds depend on the rotation speeds and seek times.

Data stored on a disk is accessed  according to its sector, track and cylinder.  A cylinder is all of the tracks in the same position on each disk in the stack.  A sector is a division of a track.  All file systems must map the precise location of data on a disk in order to retrieve it again in the future.

Many disks are very large and it can be worthwhile to partition them.  Each of these partitions will appear as different drives in the file system with a different letter assigned as if it were a different physical disk.  This can be useful to do things like physically separate application programs from data from documents and so on.  You can also provide some internal backup on different partitions.

It may also be possible to make use of different file systems on different partitions.  It may also be possible to make use of different operating systems on different partitions.

Microsoft, with Windows 2000, allows for the creation of volumes, which are like partitions except they can be dynamically resized.

Due to the extra effort involved in getting data off of a hard disk, access times are slower than those for making use of RAM.