Discussion of PP 211

Some Definitions. Normalization is a formal process for deciding which attributes should be grouped together within a table.

The reality is, deciding which attributes should be in which tables is not always an easy decision.

Another definition HPM gives for normalization is the process of successively reducing tables with anomalies to produce smaller, well structured tables. I suspect I understand this definition because I already have a lot of experience with normalizing tables.

The definition that I am most used to working with is given on page 220 of HPM. In my mind it is the most practical way to think of what normalizing tables means. Your tables are normalized if the dependent attributes associated with each primary key in each table do not overlap. Essentially this says,

you want each record to be uniquely specified by a single primary key in every table

you want attributes/fields to appear in only one table

this doesn't include foreign keys which need to appear in other tables

My experience is that you want to be constantly thinking about what attributes go in what tables while trying to ensure your tables are normalized. We will get into what this means and some major reasons why we want to do it with this following bulleted list.

minimize data redundancy - if information is in one table we don't want to repeat it in another - this helps avoid anomalies, inconsistencies between sources and it conserves storage space.

if you repeat data in more than one location you might fail to update some of it appropriately since you are unaware of the alternate sources or forget about them

simplify the enforcement of referential integrity constraints

make sure relationships lead someplace and make it easier to find them if they don't

make it easier to maintain data

insert

update

delete

provide a better overall design so that you get a better representation of how things really are and a better basis for future changes

provide table structures that are more adaptable to changes and things like one-to-many relationships

HPM takes a generally theoretical approach to normalization. This has its validity. But my experience is that you are better off understanding what normalization means and its implications and working intelligently to create normalized tables rather than using some sort of less insightful algorithmic approach as developed in HPM. But I will survey HPM so that you can see how this can all be done entirely systematically. But as often happens with systems, they work to produce the desired effect, but they don't really improve insight into the overall system. I much prefer developing my normalized tables based on insight rather than abstract systematic procedures.

Steps in Normalization. This next process essentially guarantees that you can construct tables that are normalized with respect to each other.

First Normal Form. Any multi-valued attributes have been removed so that there is a single value at the intersection of each row and each column.

Second Normal Form. any partial functional dependencies have been removed.

for example, non-keys are identified by a whole primary key

Third Normal Form. Any transitive dependencies have been removed.

for example, non-keys are identified by only the primary key

Boyce-Codd Normal Form. Any remaining anomalies that result from functional dependencies have been removed.

for example, there was more than one primary key for the same non-keys

Fourth Normal Form. Any multi-valued dependencies have been removed.

Fifth Normal Form. Any remaining anomalies have been removed.

HPM really only focuses on first through third normal forms. These are the most typically implemented.