Normalization is a method or technique applied to databases to divide the data into multiple tables and minimize data redundancy. In general, normalization is applicable to relational databases, such as SQL, because such databases store data in a tabular form. The main objective of normalization is to divide a big table of data into small tables so the repeated data could be reduced to some extent.
Normalization in DBMS
Before moving further in normalization, let’s discuss what is data redundancy and why we need to minimize it.
What is data redundancy?
Data redundancy refers to the repetition of data at multiple places in the same table. For example, let’s take a table from a database:
S.No  Name  Department  College 
1  Amber  CSE  X 
2  Bezos  CSE  X 
3  Carry  CSE  X 
4  Deny  CSE  X 
5  Emily  CSE  X 
6  Fizz  CSE  X 
Here the department and college for all the students are the same, and this repeated data is occupying unnecessary space in the memory. So, if we insert any new student data in this table the Department and College for them will also remain the same and this can cause many problems such as extra space, insertion anomaly, deletion anomaly, and update anomaly.
How can Normalization solve this problem?
The normalization will divide this table in such a way that we do not miss any data and we also get an optimal database form.
S.No  Name  Department  College 
1  Amber  CSE  X 
2  Bezos  CSE  X 
3  Carry  CSE  X 
4  Deny  CSE  X 
5  Emily  CSE  X 
6  Fizz  CSE  X 
Normalization
1. <Student_detail>
S.No  Name  Department 
1  Amber  CSE 
2  Bezos  CSE 
3  Carry  CSE 
4  Deny  CSE 
5  Emily  CSE 
6  Fizz  CSE 
2. <Department_Detail>
Department  College 
CSE  X 
Here the normalization divides the table into two separate tables and from now we do not have to insert the redundant college data. one question that arises here is that even after the normalization why we are repeating the department data? Well, this is because there should be a link between the two divided tables that act as a bridge so the data could be fetched. That’s how normalization reduces data redundancy. Here we have used a small example that’s why we can only see the normalization can only reduce the college data but if we apply the normalization to a big set of databases, it can reduce the data redundancy at a high level.
Types of Normalization
Normalization can be achieved in many ways, and there are 5 types of Normalization that are as follows:
 First Normal Form
 Second Normal Form
 Third Normal Form
 BoyceCodd Normal Form (BCNF) or 3.5 Normal Form
 Fourth Normal Form
1. First Normal Form
The first normal form is the first step in the normalization process, and it expects from you to create such a database table so it could be normalized easily. If your table does not follow the condition of the First Normal Form, it would be considered a bad database table design, and no other Normal Forms are applicable to it. Conditions of First Normal Form
 Each cell of the table must have a single value, which means each cell must have an atomic value.
 Each column should contain values that are of the same data type.
 The heading of each column must have a unique name.
Your database table must follow all the above conditions to fall under the First Normal form, and then only the table would be able to normalize. Example:
Not in 1 ^{ st } Normal Form  In 1 ^{ st } Normal Form  


2. Second Normal Form
A table to be in the second Normal Form should satisfy the following conditions:
 The table should follow the conditions of the First Normal Form.
 There should be no partial dependencies.
What is Partial Dependency?
A primary key could be generated from a single column or a composite of two or more columns. If the primary key of the table is formed from the composition of two columns and a nonkey attribute of the table is dependent on the value of only one column of the primary key, it is known as partial dependency. Let’s try to understand it with the following example:
Employee_id  Employee_name  Employee_project_no  Project_location 
E01  Sam  1244  Outside 
E02  Will  1254  Inside 
E03  Jack  6223  Inside 
E04  Hijo  1244  Outside 
Primary key = Employee_id + Employee_project_no Here, we can find the Employee_name using Employe_id, and Employee_project_no has no significance to find the Employee_name, which means the Employee_name is only dependent on Employee_id. Similarly, the Project_location is only dependent on the Employee_project_no. So, to be in the second normal form, the partial dependency should be eliminated. To eliminate the partial dependency, we should divide the table into two parts:
1. <Employee Detail>
Employee_id  Employee_name  Employee_project_no 
E01  Sam  1244 
E02  Will  1254 
E03  Jack  6223 
E04  Hijo  1244 
Now Primary key= Employee_id
2. <Project Detail>
Employee_project_no  Project_location 
1244  Outside 
1254  Inside 
6223  Inside 
Now Primary key= Employee_project_no
3. Third Normal Form
A table is considered in the Third Normal Form if it meets the following conditions:
 The table should be in the Second Normal Form.
 It should not have Transitive Dependency.
What is Transitive Dependency?
If any nonprime attribute (not a part of the primary key) of the table depends on another nonprime attribute rather than the prime attribute, it will be termed Transitive Dependency. Let’s understand it with an example:
Student_Rollno  Student_name  Marks  Subjects  Total_Marks 
1  Jack  98  Computer  196 
1  Jack  99  Math  198 
2  Sam  78  Math  500 
Here primary key = Student_ID + Student_name Here, Marks represent the marks of one semester and Total_Marks represent the total overall marks scored by the student throughout the year. The Total_marks, which is a nonprime attribute, depends on the Subjects (nonprime attribute) rather than Student_ID or Student_name. This is a case of Transitive Dependency , and to remove this dependency, we need to divide the table into two parts.
1. <Student Detail>
Student_Rollno  Student_name  Marks  Subjects 
1  Jack  98  Computer 
1  Jack  99  Math 
2  Sam  78  Math 
Primeattributes= Student_ID + Student_name
2. <Exam Table>
Subjects  Total_Marks 
Computer  196 
Math  198 
Math  500 
Primeattributes= Subjects
4. BoyceCodd Normal Form (BCNF)
It is an extension of the Third Normal Form, and that’s why it also is known as the 3.5 Normal form. For a table to be in BCNF, it should satisfy the following two conditions:
 The table should be in the Third Normal form.
 For any dependency, A > B, and A should be a super key.
The main objective of a prime attribute is to let us access the nonprime attributes, but what if using a nonprime attribute, we can access a prime attribute. If this happens, it would violate the second condition of the BCNF.
5. Fourth Normal Form
For a table to be in Fourth Normal Form, it should satisfy the following two conditions:
 The table should follow all the conditions of BCNF.
 It should not have Multivalued Dependency.
What is Multivalued Dependency?
If these three conditions arise in a table, it would be considered a multivalued dependency:
 A > B, for a single value of A. If we have more than one value of B, where A is a primekey attribute and B is a nonprimekey attribute.
 The table should have at least three columns.
 If the table has 3 columns such as A, B, and C, where A> B so B and C should be independent.
If all the above three conditions are true in a table, there is a multivalued dependency in that table and that table does not satisfy the Fourth Normal Form.
Conclusion
So this is all about normalization in DBMS, which helps to eliminate or minimize data redundancy. The Normalization forms that we discussed above are used to design the structure for the database so that we could have an optimal database with minimum data redundancy. Though creating a database that follows all these forms is a challenging task for a Database Manager.
People are also reading:
Leave a Comment on this Post