Normalization in DBMS

By | February 9, 2020

Normalization is a method or technique which is applied to databases, which divides the data into multiple tables so the Data Redundancy of that table could be minimized. Normalization generally applies to the relational database such as SQL, because here data stored in tabular form. The main objective of Normalization is to divide a big table of data into small tables so the repeated data could be reduced up to some extent.

Normalization in DBMS

Before moving further in normalization let’s discuss what is data redundancy and why should we reduce it?

What is data redundancy?

Data Redundancy means repeated data at multiple places in the same table.

For example, let’s take a table from the database:

S.No Name Department College
1 Amber CSE X
2 Bezos CSE X
3 Carry CSE X
4 Deny CSE X
5 Emily CSE X
6 Fizz CSE X

Here the department and college for all the students are the same, and this repeated data is occupying unnecessary space in the memory. So, if we insert any new student data in this table the Department and college for them will also remain the same and this can cause many problems such as extra space, insertion anomaly, deletion anomaly, and update anomaly.

How Normalization can solve this problem?

The normalization will divide this table into such a way so we do not miss any data and we also get an optimal database form this.

S.No Name Department College
1 Amber CSE X
2 Bezos CSE X
3 Carry CSE X
4 Deny CSE X
5 Emily CSE X
6 Fizz CSE X

Normalization

1. <Student_detail>
S.No Name Department
1 Amber CSE
2 Bezos CSE
3 Carry CSE
4 Deny CSE
5 Emily CSE
6 Fizz CSE
2. <Department_Detail>
Department College
CSE X

Here the normalization divides the table into two separate tables and from now we do not have to insert the redundant college data. The Question is even after the normalization we are repeating the department data, this is because there should be a link between the two divided tables which act as a bridge so the data could be fetched. That’s how normalization reduces the data redundancy, here we have used a small example that’s why we can only see the normalization can only reduce the college data but if we apply the normalization a big set of databases where it can reduce the data redundancy at high-level.

Types of Normalization

Normalization can be achieved in many ways and there are 4 types of Normalization that help to perform the normalization process.

  • First Normal Form
  • Second Normal Form
  • Third Normal Form
  • Boyce-Codd Normal Form (BCNF) or 3.5 Normal Form
  • Fourth Normal Form

1. First Normal Form

The first normal form is the first step for the normalization process, and it expects from you to create such a database table so it could be normalized easily. If your table does not follow the condition of 1st Normal Form then it would be considered as a bad database table designed and no other Normal Forms could be applied to it.

Conditions of First Normal Form

  • Each cell of the table must have a single value, which means each cell must have an atomic value.
  • Each column should contain values that are of the same data type.
  • Each Column heading must have a unique name.

Your database table must follow all these conditions to fall under the First Normal form, then only the table would be able to normalize.

Example:

Not in 1st Normal Form In 1st Normal Form
Roll No Name Language
1 Amber Java, Python
2 Bezos Java
3 Carry Python, C++

Here, there is more than one value in Single-cell.

Roll No Name Language
1 Amber Java
1 Amber Python
2 Bezos Java
3 Carry Python
3 Carry C++

Here Every Cell Has an atomic value.

2. Second Normal Form

A table to be in the second Normal Form should satisfy the following conditions.

  • The table should follow the condition of First Normal Form
  • And, there should be no partial dependencies

What is partial Dependency?

A primary key could be generated of a single column or composite of two or more than two columns. If the primary key of the table is formed of the composition of two columns, and a non-key attribute of the table is dependent on the value of only one column of the primary key then it is known as Partial Dependency.

Let’s see it with an example:

Employee_id Employee_name Employee_project_no Project_location
E01 Sam 1244 Outside
E02 Will 1254 Inside
E03 Jack 6223 Inside
E04 Hijo 1244 Outside

Primary key = Employee_id + Employee_project_no

Here we can find the Employee_name using Employe_id, and Employee_project_no has no significant to find the Employee_name, which means the Employee_name is only dependent on Employee_id.

Similarly, the Project_location is only dependent on the Employee_project_no.

So, to be in the second normal form the partial dependency should be eliminated.

To eliminate the partial dependency, we should divide the table into two parts:

1. <Employee Detail>

Employee_id Employee_name Employee_project_no
E01 Sam 1244
E02 Will 1254
E03 Jack 6223
E04 Hijo 1244

Now Primary key= Employee_id

2. <Project Detail>

Employee_project_no Project_location
1244 Outside
1254 Inside
6223 Inside

Now Primary key= Employee_project_no

3. Third Normal Form:

A table is considered in third Normal Form if it follows the following Conditions

  • The table should be in 2nd Normal Form
  • It should not have Transitive Dependency

What is Transitive Dependency?

If any non-prime attribute (not a part of the primary key) of the table depends on another non-prime attribute rather than prime-attribute then it will be termed as Transitive dependency.

Let’s understand it with an example:

Student_Rollno Student_name Marks Subjects Total_Marks
1 Jack 98 Computer 196
1 Jack 99 Math 198
2 Sam 78 Math 500

Here primary key = Student_ID + Student_name

Here Marks represent the marks of One semester and Total_Marks represent the total overall Marks scored by the student throughout the complete year.

Here the Total_marks which is a non-prime attribute depend on the Subjects (non-prime attribute) rather than Student_ID or Student_name.

This is a case Transitive Dependency, so to remove this dependency we dive the table into two parts

1. <Student Detail>

Student_Rollno Student_name Marks Subjects
1 Jack 98 Computer
1 Jack 99 Math
2 Sam 78 Math

Prime-attributes= Student_ID + Student_name

2. <Exam Table>

Subjects Total_Marks
Computer 196
Math 198
Math 500

Prime-attributes= Subjects

4. Boyce-Codd Normal Form (BCNF)

It is an extension of third normal forms that’s why it also is known as 3.5 Normal form. So, to a table to be in BCNF it should satisfy two conditions.

  • The table should be in 3rd Normal form
  • And, for any dependency, A —> B, and A should be a super key.

The main objective of a prime-attributes is, we can access the non-prime attributes, but what if using a non-prime attribute, we can access a prime attribute, if this happens it would violet the second condition of the BCNF.

5. Fourth Normal Form

A table to be in 4th Normal form it should follow two conditions:

  • The table should follow all the condition of BCNF
  • And, it should not have Multiple-Valued Dependency.

What is Multivalued Dependency?

If these three conditions arise in a table then it would be considered as a multi-valued dependency:

  • A —-> B, for a single value of A, if we have more than one value of B, where A is a prime-key attribute and B is a non-prime-key attribute.
  • The table should have at least three columns.
  • If the table has 3 Columns such as A, B, and C, where A—-> B so B and C should be independent.

If all these three conditions are true in a table then there is a Multi-valued dependency in that table and that table does not satisfy the 4th normal form.

Conclusion

So this is all about normalization in DBMS, which removes the data redundancy. These Normalization forms are used to design the structure for the database so we cloud have an Optimal database with minimum data redundancy. Though creating a database that follows all these forms is a tough and challenging task for a Database manager.

You may also interested in:

Leave a Reply

Your email address will not be published. Required fields are marked *