Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in Database

Normalization is a method or technique applied to databases to divide the data into multiple tables and minimize data redundancy. In general, normalization is applicable to relational databases, such as SQL, because such databases store data in a tabular form. The main objective of normalization is to divide a big table of data into small tables so the repeated data could be reduced to some extent.

Normalization in DBMS

Before moving further in normalization, let’s discuss what is data redundancy and why we need to minimize it.

What is data redundancy?

Data redundancy refers to the repetition of data at multiple places in the same table. For example, let’s take a table from a database:

S.No	Name	Department	College
1	Amber	CSE	X
2	Bezos	CSE	X
3	Carry	CSE	X
4	Deny	CSE	X
5	Emily	CSE	X
6	Fizz	CSE	X

Here the department and college for all the students are the same, and this repeated data is occupying unnecessary space in the memory. So, if we insert any new student data in this table the Department and College for them will also remain the same and this can cause many problems such as extra space, insertion anomaly, deletion anomaly, and update anomaly.

How can Normalization solve this problem?

The normalization will divide this table in such a way that we do not miss any data and we also get an optimal database form.

S.No	Name	Department	College
1	Amber	CSE	X
2	Bezos	CSE	X
3	Carry	CSE	X
4	Deny	CSE	X
5	Emily	CSE	X
6	Fizz	CSE	X

Normalization

1. <Student_detail>

S.No	Name	Department
1	Amber	CSE
2	Bezos	CSE
3	Carry	CSE
4	Deny	CSE
5	Emily	CSE
6	Fizz	CSE

2. <Department_Detail>

Department	College
CSE	X

Here the normalization divides the table into two separate tables and from now we do not have to insert the redundant college data. one question that arises here is that even after the normalization why we are repeating the department data? Well, this is because there should be a link between the two divided tables that act as a bridge so the data could be fetched. That’s how normalization reduces data redundancy. Here we have used a small example that’s why we can only see the normalization can only reduce the college data but if we apply the normalization to a big set of databases, it can reduce the data redundancy at a high level.

Types of Normalization

Normalization can be achieved in many ways, and there are 5 types of Normalization that are as follows:

First Normal Form
Second Normal Form
Third Normal Form
Boyce-Codd Normal Form (BCNF) or 3.5 Normal Form
Fourth Normal Form

1. First Normal Form

The first normal form is the first step in the normalization process, and it expects from you to create such a database table so it could be normalized easily. If your table does not follow the condition of the First Normal Form, it would be considered a bad database table design, and no other Normal Forms are applicable to it. Conditions of First Normal Form

Each cell of the table must have a single value, which means each cell must have an atomic value.
Each column should contain values that are of the same data type.
The heading of each column must have a unique name.

Your database table must follow all the above conditions to fall under the First Normal form, and then only the table would be able to normalize. Example:

Not in 1 ^st Normal Form

In 1 ^st Normal Form

Roll No	Name	Language
1	Amber	Java, Python
2	Bezos	Java
3	Carry	Python, C++

There is more than one value in Single-cell.

Roll No	Name	Language
1	Amber	Java
1	Amber	Python
2	Bezos	Java
3	Carry	Python
3	Carry	C++

Every Cell Has an atomic value.

2. Second Normal Form

A table to be in the second Normal Form should satisfy the following conditions:

The table should follow the conditions of the First Normal Form.
There should be no partial dependencies.

What is Partial Dependency?

A primary key could be generated from a single column or a composite of two or more columns. If the primary key of the table is formed from the composition of two columns and a non-key attribute of the table is dependent on the value of only one column of the primary key, it is known as partial dependency. Let’s try to understand it with the following example:

Employee_id	Employee_name	Employee_project_no	Project_location
E01	Sam	1244	Outside
E02	Will	1254	Inside
E03	Jack	6223	Inside
E04	Hijo	1244	Outside

Primary key = Employee_id + Employee_project_no Here, we can find the Employee_name using Employe_id, and Employee_project_no has no significance to find the Employee_name, which means the Employee_name is only dependent on Employee_id. Similarly, the Project_location is only dependent on the Employee_project_no. So, to be in the second normal form, the partial dependency should be eliminated. To eliminate the partial dependency, we should divide the table into two parts:

1. <Employee Detail>

Employee_id	Employee_name	Employee_project_no
E01	Sam	1244
E02	Will	1254
E03	Jack	6223
E04	Hijo	1244

Now Primary key= Employee_id

2. <Project Detail>

Employee_project_no	Project_location
1244	Outside
1254	Inside
6223	Inside

Now Primary key= Employee_project_no

3. Third Normal Form

A table is considered in the Third Normal Form if it meets the following conditions:

The table should be in the Second Normal Form.
It should not have Transitive Dependency.

What is Transitive Dependency?

If any non-prime attribute (not a part of the primary key) of the table depends on another non-prime attribute rather than the prime attribute, it will be termed Transitive Dependency. Let’s understand it with an example:

Student_Rollno	Student_name	Marks	Subjects	Total_Marks
1	Jack	98	Computer	196
1	Jack	99	Math	198
2	Sam	78	Math	500

Here primary key = Student_ID + Student_name Here, Marks represent the marks of one semester and Total_Marks represent the total overall marks scored by the student throughout the year. The Total_marks, which is a non-prime attribute, depends on the Subjects (non-prime attribute) rather than Student_ID or Student_name. This is a case of Transitive Dependency , and to remove this dependency, we need to divide the table into two parts.

1. <Student Detail>

Student_Rollno	Student_name	Marks	Subjects
1	Jack	98	Computer
1	Jack	99	Math
2	Sam	78	Math

Prime-attributes= Student_ID + Student_name

2. <Exam Table>

Subjects	Total_Marks
Computer	196
Math	198
Math	500

Prime-attributes= Subjects

4. Boyce-Codd Normal Form (BCNF)

It is an extension of the Third Normal Form, and that’s why it also is known as the 3.5 Normal form. For a table to be in BCNF, it should satisfy the following two conditions:

The table should be in the Third Normal form.
For any dependency, A ---> B, and A should be a super key.

The main objective of a prime attribute is to let us access the non-prime attributes, but what if using a non-prime attribute, we can access a prime attribute. If this happens, it would violate the second condition of the BCNF.

5. Fourth Normal Form

For a table to be in Fourth Normal Form, it should satisfy the following two conditions:

The table should follow all the conditions of BCNF.
It should not have Multivalued Dependency.

What is Multivalued Dependency?

If these three conditions arise in a table, it would be considered a multi-valued dependency:

A ----> B, for a single value of A. If we have more than one value of B, where A is a prime-key attribute and B is a non-prime-key attribute.
The table should have at least three columns.
If the table has 3 columns such as A, B, and C, where A----> B so B and C should be independent.

If all the above three conditions are true in a table, there is a multivalued dependency in that table and that table does not satisfy the Fourth Normal Form.

Conclusion

So this is all about normalization in DBMS, which helps to eliminate or minimize data redundancy. The Normalization forms that we discussed above are used to design the structure for the database so that we could have an optimal database with minimum data redundancy. Though creating a database that follows all these forms is a challenging task for a Database Manager.

People are also reading:

Normalization in DBMS

Normalization in DBMS

What is data redundancy?

How can Normalization solve this problem?

Normalization

1. <Student_detail>

2. <Department_Detail>

Types of Normalization

1. First Normal Form

2. Second Normal Form

What is Partial Dependency?

1. <Employee Detail>

2. <Project Detail>

3. Third Normal Form

What is Transitive Dependency?

1. <Student Detail>

2. <Exam Table>

4. Boyce-Codd Normal Form (BCNF)

5. Fourth Normal Form

What is Multivalued Dependency?

Conclusion

Related Blogs

History of Database: Evolution of the Database

Data Architecture - A Data Management Framework

DBMS vs RDBMS: A Head-to-Head Comparison

Normalization in DBMS

Table of Content

Normalization in DBMS

What is data redundancy?

How can Normalization solve this problem?

Normalization

1. <Student_detail>

2. <Department_Detail>

Types of Normalization

1. First Normal Form

2. Second Normal Form

What is Partial Dependency?

1. <Employee Detail>

2. <Project Detail>

3. Third Normal Form

What is Transitive Dependency?

1. <Student Detail>

2. <Exam Table>

4. Boyce-Codd Normal Form (BCNF)

5. Fourth Normal Form

What is Multivalued Dependency?

Conclusion

Related Blogs

History of Database: Evolution of the Database

Data Architecture - A Data Management Framework

DBMS vs RDBMS: A Head-to-Head Comparison