What is database normalization?

Database normalization is one of the most important aspects in DBMS. It is a technique of organizing data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anomalies. Being a multi-step process it puts data into tabular form by removing duplicated data from the relation tables.

Normalization itself means to bring or return something to actual or normal form. Hence database normalization means to bring data into the form it should be by eliminating extra redundant link and data.

Database normalization is mainly used for 2 reasons:

  • Elimination of data which is completely useless.
  • Logical restoration of data.

Problems without database normalization:

It would be really difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if the database is not Normalized.

Let us take an example with table student

S_idS_NameS_AddressSubject_opted
401AdamNoidaBio
402AlexPanipatMaths
403StuartJammuMaths
404AdamNoidaPhysics
  • Updation Anamoly: To update the address of a student who occurs twice or more than twice in a table, we will have to update S_Address column in all the rows, else data will become inconsistent.
  • Insertion Anomaly: Suppose for a new admission, we have a Student id(S_id), name and address of a student but if the student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion Anomaly.
  • Deletion Anomaly: If (S_id) 401 has only one subject and temporarily he drops it when we delete that row, entire student record will be deleted along with it.

Normalization rules

Normalization rules are divided into 4 normal forms:

  1. First normal form (1NF)
  2. Second normal form (2NF)
  3. Third normal form (3NF)
  4. Boyce-Codd normal form (BCNF)

First normal form

According to 1NF, no two Rows of data must contain the repeating group of information i.e each set of the column must have a unique value, such that multiple columns cannot be used to fetch the same row.  Each table should be organized into rows and each row should have a unique key or a primary key.

Consider this table:

StudentAgeSubject
Adam15Biology, Maths
Alex14Maths
Stuart17Maths

This table is not in 1NF as in First Normal Form, any row must not have a column in which more than one value is saved like separated with commas. 

Rather than that, we must separate such data into multiple rows. The table will look like this in 1NF:

StudentAgeSubject
Adam15Biology
Adam15Maths
Alex14Maths
Stuart17Maths

Second normal form

According to Second Normal Form, there must not be any partial dependency of any column on the primary key.  A table that has concatenated primary key, each column in the table that is not part of the primary key must depend upon the entire concatenated key for its existence.

If this not happens the table is not in 2NF.

StudentAgeSubject
Adam15Biology
Adam15Maths
Alex14Maths
Stuart17Maths

To achieve second normal form in the above table, it would be helpful to split out the subjects into an independent table and match them up using the student names as foreign keys.

Student table in 2NF

StudentAge
Adam15
Alex14
Stuart17

Here candidate key is Student column as Age is dependent on it.

Subject table in 2NF

StudentSubject
AdamBiology
AdamMaths
AlexMaths
StuartMaths

In Subject Table the candidate key will be {Student, Subject} column.

Third normal form

Third Normal form applies that every non-prime attribute of the table must be dependent on the primary key, or we can say that there should not be the case that a non-prime attribute is determined by another non-prime attribute. The table should be in 2NF and without transitive functional dependency.

Consider the table:

Student_Detail Table:

Student_idStudent_nameDOBStreetcityStateZip

Student_id is Primary key, but street, city, and state depend upon Zip. The dependency between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city, and state to the new table, with Zip as the primary key.

New Student_Detail Table:

Student_idStudent_nameDOBZip

Address Table:

ZipStreetcitystate

The advantage of removing the transitive dependencies are:

  • Data duplication is removed.
  • Data integrity is achieved.

Boyce-Codd normal form

Boyce and Codd’s Normal Form is a higher or strict version of the Third Normal form. This normal form deals with the certain type of anomaly that can not be handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.

For a table to be in BCNF following conditions should be satisfied:

  • R must be in 3rd Normal Form
  • and, for each functional dependency ( X -> Y ), X should be a Super Key.

Consider the example:

Suppose there is the following relationship: R(A, B,C,D)

The dependencies are:

A -> BCD

BC -> AD

D -> B

The relationship above is already in 3NF as the keys are A & BC

Therefore in A -> BCD, A is the super key. In BC -> AD, BC is a key but in D -> B, D is not a key.

Therefore we break R into two parts: R1 & R2, where R1(A,D,C) and R2(D,B)

So, this was Database Normalization. Do share the knowledge.

  • Add Your Comment