Data redundancy is a term used about databases and means simply that some data fields appear more than once in the database. Data redundancy is wasteful and inefficient for several reasons and database designers attempt to eliminate it as far as possible by using a technique called data normalization.
Data can appear multiple times in a database for a variety of reasons, depending on the type of organization of business that the database is designed to serve. For example, an online business may have the same customer’s name appearing several times if that customer has bought several different products at different times.
This redundancy gives rise to problems for the IT department responsible for maintaining the database because they must update that customer’s details in numerous different locations. It also means that much storage capacity is wasted, storing the same data multiple times. Worst of all, if one or more instances of the customer’s name are not updated, then the database will contain inconsistent data, and no one will know which set of data is the correct one.
The solution to this problem was first developed in 1970 by Edgar Codd, the inventor of the relational database. Put simply, a relational database in one in which important data, such as customers’ names, in stored only once in a single file, but each customer’s field is defined by its relationship to other files - such as individual product sales.
The process of designing a database so that it is not subject to duplication or redundancy of data with its attendant problems of data corruption and inconsistency, is referred to as normalization.
Normalization requires that the database designer stick to rules, established by the database community, to ensure that data is organized efficiently. These rules are called normal form rules. There are a number of normal forms, each more rigorous than the previous, and each containing the rules of the previous form. Currently there are five levels of normal form.