MIS 515: Data Management

Spring Semester 2004
Prof O.Petkova

March 09,2004

Normalization (First, Second and Third Normal Forms)

Normalization is a method for organizing data elements in a database into tables.

 

Normalization Avoids

  • Duplication of Data  – The same data is listed in multiple lines of the database
  • Insert Anomaly  – A record about an entity cannot be inserted into the table without first inserting information about another entity – Cannot enter a customer without a sales order
  • Delete Anomaly – A record cannot be deleted without deleting a record about a related entity.  Cannot delete a sales order without deleting all of the customer’s information.
  • Update Anomaly – Cannot update information without changing information in many places.  To update customer information, it must be updated for each sales order the customer has placed

 

Normalization is a three stage process – After the first stage, the data is said to be in first normal form, after the second, it is in second normal form, after the third, it is in third normal form

Before Normalization

1.        Begin with a list of all of the fields that must appear in the database.  Think of this as one big table.

2.        Do not include computed fields

3.        One place to begin getting this information is from a printed document used by the system. 

4.        Additional attributes besides those for the entities described on the document can be added to the database.

Before Normalization – Example

See Sales Order from below:

 

Fields in the original data table will be as follows:

ORDERS

SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo, ClerkName, ItemNo, Description, Qty, UnitPrice

Think of this as the baseline – one large table

Some definitions:

  • Functional Dependency  The value of one attribute in a table is determined entirely by the value of the primary key
  • Partial Dependency A type of functional dependency where an attribute is functionally dependent on only part of the primary key (primary key must be a composite key).
  • Transitive Dependency  A type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key.  Thus its value is only indirectly determined by the primary key.

More definitions:

First Normal Form:

A table is in a 1NF if the primary key is determined and there are no multi-valued attributes.

Second Normal Form:

A table is in 2NF if it is in 1NF and the partial dependencies are eliminated.

 

Third Normal Form:

A table is in 3NF if it in 2NF and the transitive dependencies are eliminated.

Normalization:  First Normal Form

  • Eliminate fields with multiple values
  • Determine the primary key.
  • The primary key of the new table (repeating group) is always a composite key; Usually document number and a field uniquely describing the repeating line, like an item number.
  • Draw the functional dependencies

Normalization:  Second Normal Form

  • Remove Partial Dependencies. 
  • Create separate table with the functionally dependent data and the part of the key on which it depends.  Tables created at this step will usually contain descriptions of resources.

Normalization:  Third Normal Form

  • Remove transitive dependencies.
  • Create a separate table containing the attribute and the fields that are functionally dependent on it. Tables created at this step will usually contain descriptions of either resources or agents.  Keep a copy of the key attribute in the original file.

    Normalization of the above table:

1NF: (Primary key is determined and multiple values are eliminated); obviously the combination of SalesOrderNo and ItemNo is the primary key.

 

ORDERS

SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo, ClerkName, ItemNo, Description, Qty, UnitPrice

 

From 1NF to 2NF:

From the table ORDERS we have to identify the partial dependencies:

 

There is a partial dependency (attribute depending on another attribute, that is a part of the primary key) between SalesOrderNo and Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo, ClerkName; so we can eliminate it in a separate table: SALES

SALES

SalesOrderNo, Date, CustomerNo, CustomerName, CustomerAdd, ClerkNo, ClerkName (2NF)

 

There is another partial dependency between IntemNo and Desription, UnitPrice; we can eliminate it into a separate table: ITEMS

ITEMS

ItemNo, Description, UnitPrice (2NF)

 

What is left from the original ORDERS table is:

ORDERS

SalesOrderNo, ItemNo, QTY (2NF)

 

We can claim that the three tables SALES, ITEMS and ORDERS are in 2NF, because they are in 1NF and they don’t contain partial dependencies.

 

From 2NF to 3NF:

We explore the three 2NF tables above for transitive dependencies (attribute depending on another attribute that is not a part of the primary key). Obviously, tables ITEMS and ORDERS have no problems, but in table SALES we can identify two transitive dependencies:

CustomerName and Customer Address (CustomerAdd) depend on CusomerNo

and

ClerkName depends on ClerkNo.

 

We shall eliminate these dependences by removing them from SALES and creating two additional tables:

CUSTOMERS

CustomerNo, CustomerName, CustomerAdd  (3NF)

and

CLERKS

ClerkNo, ClerkName  (3NF)

 

The table SALES will be transformed as follows:

 

SALES

SalesOrderNo, Date, CustomerNo, ClerkNo (3NF)

Please pay attention that we didn’t remove all the information about customers and clerks from the SALES table, but kept the two tables primary keys as foreign keys in SALES.

 

ITEMS and ORDERS are also in 3NF, because they don’t have transitive dependencies.

 

To conclude: We transformed SALES into five 3NF tables:

SALES

CLERKS

CUSTOMERS

ITEMS

ORDERS.