Some Discussion of HPM
PP93 - 106
The Entity - Relationship Model

 

The E - R Model.  HPM mentions that there is no particular unified notation or standards for the E- R model.  They present a somewhat hybrid approach.  I think they use Visio's images for E - R Diagrams. 

You can ask me to create you an account on the MSDNAA system and use it to get the key for Visio.  But we need to figure out some way to get you a copy of the installation software, which I have on the computer in my office, a CD and my laptop.  We will not get real deep into E- R Diagrams, but there will be some relatively simple ones for homework.

One of the most important aspects of developing databases is determining what are the entities.  In many instances, this is relatively straight forward.  But I have certainly worked on projects where it was not at all obvious what one should use for entities.  But more will be said about this later, in the course and much more in the next semester after you have even more experience.

HPM has a pretty nice example based on Pine Valley Furniture that has

  • supplier
  • shipment
  • item
  • product
  • order
  • customer

as he entities.  They also somewhat touch on the cardinalities of relationships between the entities, which I will leave for next week when we focus on relationships.

At present think of what we used for the entities for Madam Curry.

  • purchasers
  • email_list

Both of these entities relate to different types of customers.  If we were doing an entity diagram we would also need to include

  • something for a product for the purchasers, Madam Curry's book
    • this would become even more important if she was selling more than one item
  • something to be sent to customers on the email_list, likely called email

The emails would likely be distinguished by their date.

But Madam Curry has no intention of keeping track of what is in particular emails in her database, nor does she intend on keeping track of what emails she sends to each customer on the email_list.  So you can get some sense that doing an E - R diagram is a bit of overkill in this instance.  Notice that Madam Curry is also choosing to not keep track of orders at this point in time, though she probably should.  She is likely to have some purchasers buy more than one book and she will likely want to know who these preferred customers are.  For example, maybe these purchasers are giving her book as gifts to others.  We will update this Madam Curry example to include orders during next week and after.

So we will start getting more involved in E - R diagrams next week after we dig more into relationships.  At this point it will be more worthwhile to diagram our examples.

At present we will focus on identifying or modeling entities and attributes as it starts on page 96.

Modeling Entities.  Again, in relational databases, entities pretty much correspond with tables.  But it is still important to have a somewhat more abstract understanding of what an entity is.

Some typical entities are listed in the following table.

 

Person: EMPLOYEE, STUDENT, PATIENT, DOCTOR, CUSTOMER, VENDOR, SALES_REP,
Place: STORE, WAREHOUSE, STATE, WEBSITE, PLACE_OF_SERVICE, RESTAURANT, GARAGE
Object: MACHINE, BUILDING, AUTOMOBILE, COMPUTER,
Event: SALE, REGISTRATION, RENEWAL, RESERVATION, BOOKING, PRESENTATION, MEETING, PURCHASE
Concept: ACCOUNT, COURSE, WORK_CENTER,

 

Entities will usually have quite a variety of instances.  Think of all the customers a retail outlet like J. C. Penney sees each year.  Think of all the different customers an airline deals with.  Obviously, all the different students a school has is likely to be smaller than the number of customer instances these first two examples have.  But it is still quite large.

But sometimes the number of entity instances (records) can be quite limited.  For example, maybe a particular author has written only three books and the BOOK entity relates to only these three possible instances.  Or maybe a production facility only has four different warehouses, one in the northeastern US another in the southeast, another in the northwest and a final one in the southwestern US.

HPM contains a quite involved discussion of some of the more abstract notions associated with entities.  But I think most of these sorts of things are much better learned by working through a variety of examples over time.

Naming and Defining Entities.  These are some standard practices for naming and defining entities.  I do not think they are as cast in stone as HPM seems to make them.  But, these do seem like intelligent directives.

  • use singular nouns
  • entity types should be specific enough to the organization
    • CUSTOMER might be commonly used
    • VENDOR might be commonly used
    • you might need to be more specific than ORDER by using something like PURCHASE_ORDER or MONTHLY_ORDER
  • the name should be concise, yet appropriate and suggestive of the conents
  • approved abbreviations should be created
  • entity types are much more likely to be named as the RESULT of an event rather than for the process
  • it is important to get the names to match when they are used in different places

The following bulleted list summarizes what is contained in HPM as guidelines for defining entity types.

  • definitions should be clear and up front about what they are defining using phrasing such as, "An X is ..."
  • the unique characteristics of each entity type should be included
  • statements about who or what is included in the entity type need to be present
    • a customer is a person that has placed an order
    • a customer is someone who has browsed the website
    • a customer must be another organization with a charitable tax ID and cannot be an individual
  • make sure to describe when instances are created and/or deleted
    • maybe they are never deleted
    • maybe they are deleted if they have not made a purchase in a year
    • maybe they are created when the customer places their first order
    • maybe they are created when they create a profile even though they have not yet placed an order
  • maybe an instance of an entity has he capabilities to change into an instance of another entity
    • a bid may become a contract
    • a buyer may become someone obtaining service
  • some times the definition must specify what sort of history should be retained
    • service history
    • when a shopper becomes a buyer

Modeling Attributes.  In relational databases, attributes correspond to fields in a table.  But it is important to be able to think somewhat more abstractly.

The following table gives examples of some typical entities with some fairly typical attributes.

 

EMPLOYEE: Last_Name, First_Name, Office_Address, Office_City, Office_State, Office_Zip, Office_Phone, Office_Fax, Office_Email, Home_Address, Home_City, Home_State, Home_Zip, Home_Phone, Employment_Start, Department, Social_Security
STUDENT: Student_Number, Last_Name, First_Name, School_Address, School_City, School_State, School_Zip, School_Phone, Longterm_Address, Longterm_City, Longterm_State, Longterm_Zip, Longterm_Phone, Major, Starting_Date
CUSTOMER: Last_Name, First_Name, Address, City, State, Zip, Home_Phone, Cell_Phone, Fax, Email,
STORE: Address, City, State, Zip, Phone, Fax, Building_Owned, Square_Footage,
WAREHOUSE: Address, City, State, Zip, Phone, Fax, Building_Owned, Square_Footage,
AUTOMOBILE: Vehicle_ID, Manufacturer, Make, Year, Color, Weight, Type, Warranty, Warranty_Limit, Insurance_Company
COURSE: Department, Number, Description, Prerequisite(s)
FLIGHT: Flight_Number, Start_Date, Start_Time, Start_Airport, End_Date, End_Time, End_Airport,

 

Obviously, these attributes can change according to situations.  You also should think about which sorts of attributes really need to be listed in their own separate table.  For example,
  • a STUDENT may have stopped and started taking classes a number of times over the years
  • a COURSE can have several or no prerequisites
  • a FLIGHT might have stopovers on the way

So all of the attributes may not be easy to list directly within the entity tables.

Another important aspect of attributes is whether they are required or optional.  There are some attributes that must be present in order to

  • EMPLOYEE name or contact
  • CUSTOMER contact
  • STUDENT number, contact information

But some things might really be optional,

  • a STUDENT major or financial_aid_amount
  • a CUSTOMER fax

It is my experience that you need to assess and determine what attributes are required and which are optional.  But you need to be careful about specifying they are required using the internal database.  When working with most database software, if you have specified a field or attribute as required at the data dictionary and a user tries to move away from an entry that doesn't have such an attribute filled in, then you can get a wide variety of responses from the system.  Much of the time this causes a lot of uncontrolled problems.  Also, you usually don't get particularly helpful feedback about what is wrong.  I strongly prefer using the software or middleware to assess whether certain entries are filled in.  You can also be using this programming to assess the believability and validity of the entries rather than just focusing on whether something is present.  You are also much likelier to get a controlled situation rather than some sort of crash or complication.

Some attributes are called simple or atomic.  Simple attributes cannot be broken down into smaller components.  Other attributes are considered to be composite.  Some examples of some composite attributes are

  • customer addresses - which include street address, city, state and so on.
  • names - which include first name, last name, maybe some hyphenated names

I almost always prefer making use of simple attributes so that larger composite attributes can be composed from what is known in the database.  Trying to parse a composite attribute such as a name down into its components can be quite difficult.

Just think of even a simple situation where someone has entered names into a database in the following format.

  • Shawn J. McNamara
  • Tom Gillespie
  • Trudy Spahn
  • John J. Pandiro-Smith III

Now think about trying to do something as simple as alphabetizing the records.

The book talks about single versus multi-valued attributes.  Obviously, there are plenty of situations where one wants to accumulate attributes for each record.  Maybe you need to assess the experience and abilities of your IT staff.  So you want to know what DBMS they have worked with and the different development languages they are comfortable with.

But, my experience is that in these sorts of situations it is pretty much always important to develop such things by building from simpler single attributes.  Maybe you need a form that has check boxes for people to check which things they are experienced with.  Then these are kept in separate attributes/fields.

When dealing with products in an e-commerce store the developer encounters similar sorts of things.  Some products have no attributes that affect the customer's selection.  Other products involve customers selecting things like sizes, colors, prints, flavors and on and on.  Keeping track of these attributes is very important and they need to be appropriately displayed for the customer.  But how this is dealt with in tables is beyond the scope of this course.

The book talks about stored and derived attributes.  For example, you will surely have the selling prices of everything you are selling online in your database.  These will be stored.  But putting together the total price of a shopping basket based on items and quantities will be a derived attribute.

Another example might be the age of your car.  You are highly unlikely to put in a field in your table to record the age of a car.  It changes over time.  But you will likely put in the Date_of_Purchase as a stored attribute.  You can then derive the age of the car to most any level of detail you want, though most people will think in terms of years.

Identifier attributes are some of the most important attributes for an entity. 

Other things such as name, or phone number may well not work very well to uniquely identify people.  Obviously, many people share phone numbers within the same house.  Totally unrelated people can have the same name.

People usually think of their social security number, drivers license number, student number or something similar as something that can identify the them.  These sorts of numbers often work in databases to identify customers, if you can legally get them.  But plenty of customers don't have these sorts of numbers.  Maybe they are too young, or maybe they just moved into the state, or maybe they aren't a student.

When companies have large varieties of products, they always have their own unique system for uniquely identifying the AEF-1000 camera or the SFP-3315 sunglass lenses.  It can be difficult to communicate with some corporate reps when they use this sort of jargon which is only familiar to them.

Most situations require some sort of identifying number associated with a record.  For example, you often get assigned a case number, ticket number or order number when making a transaction online.  They can usually find your transaction based on your name, phone number and other issues.  But it is much easier for everyone if you can tell them a more specific reference number.

Naming and Defining Attributes.  HPM goes through a pretty involved list describing good practices for naming and defining attributes.

  • attribute names should be singular nouns or noun phrases
    • minimum_payment
    • date_payment_due
  • names should be appropriately unique
    • some times you will want to suggest associations with other attributes
    • you always want to guarantee no confusion
  • follow standard formats
  • similar attributes should reflect this in appropriate portions of their names

For defining attributes, the following summarizes HPM guidelines

  • definition will often parallel the name
  • make it clear what is included and what is not included
  • aliases must be made explicit
  • it can often be important to state what the source of the values for the attribute is within the definition
    • SIC codes
    • state abbreviations
  • needs to specify whether the value is required or optional
  • it can be important to indicate whether the value of the attribute may change once something has been entered
  • for multi-valued attributes, the definition needs to clarify the minimum and maximum number of values
  • can indicate relationships between attributes
  •