Normalization is the process of designing a data model to efficiently store data in a database. The end result is that redundant data is eliminated, and only data related to the attribute is stored within the table.
For example, let's say we store City, State and ZipCode data for Customers in the same table as Other Customer data. With this approach, we keep repeating the City, State and ZipCode data for all Customers in the same area. Instead of storing the same data again and again, we could normalize the data and create a related table called City. The "City" table could then store City, State and ZipCode along with IDs that relate back to the Customer table, and we can eliminate those three columns from the Customer table and add the new ID column.
Normalization rules have been broken down into several forms. People often refer to the third normal form (3NF) when talking about database design. This is what most database designers try to achieve: In the conceptual stages, data is segmented and normalized as much as possible, but for practical purposes those segments are changed during the evolution of the data model. Various normal forms may be introduced for different parts of the data model to handle the unique situations you may face.
Whether you have heard about normalization or not, your database most likely follows some of the rules, unless all of your data is stored in one giant table. We will take a look at the first three normal forms and the rules for determining the different forms here.
Rules for First
Eliminate repeating groups. This table contains repeating groups of data in the Software column.
Computer | Software |
1 | Word |
2 | Access, Word, Excel |
3 | Word, Excel |
To follow the First Normal Form, we store one type of software for each record.
Computer | Software |
1 | Word |
2 | Access |
2 | Word |
3 | Excel |
3 | Word |
3 | Excel |
Rules for second
Eliminate redundant data plus 1NF. This table contains the name of the software which is redundant data.
Computer | Software |
1 | Word |
2 | Access |
2 | Word |
3 | Excel |
3 | Word |
3 | Excel |
To eliminate the redundant storage of data, we create two tables. The first table stores a reference SoftwareID to our new table that has a unique list of software titles.
Computer | SoftwareID |
1 | 1 |
2 | 2 |
2 | 1 |
3 | 3 |
3 | 1 |
3 | 3 |
SoftwareID | Software |
1 | Word |
2 | Access |
3 | Excel |
Rules for Third
Eliminate columns not dependent on key plus 1NF and 2NF. In this table, we have data that contains both data about the computer and the user.
Computer | User Name | User Hire Date | Purchased |
1 | Joe | 4/1/2000 | 5/1/2003 |
2 | Mike | 9/5/2003 | 6/15/2004 |
To eliminate columns not dependent on the key, we would create the following tables. Now the data stored in the computer table is only related to the computer, and the data stored in the user table is only related to the user.
Computer | Purchased |
1 | 5/1/2003 |
2 | 6/15/2004 |
User | User Name | User Hire Date |
1 | Joe | 5/1/2003 |
2 | Mike | 6/15/2004 |
Computer | User |
1 | 1 |
2 | 1 |
0 comments:
Post a Comment