Peter Lu's Blog: Data integrity, Normalization Process and usage of triggers

. Data Integrity

Data Integrity is one of the cornerstones of the relational model and has been over the years incorporated in every RDBMS (Relational Database Management System) on the market. There are four types of integrity:

· Domain integrity

· Entity integrity

· Referential integrity

· Enterprise integrity

a. Domain integrity

A domain defines the possible values of an attribute. Domain integrity rules govern these values. In a database system, the domain integrity is defined by:

The data type and the length

The NULL value acceptance

The allowable values, through techniques like check constraints or rules

The default value

b. Entity integrity

The entity integrity states that every instance of an entity has to be uniquely identified. The existence of the primary key is the core of the entity integrity. If you defined a primary key for each entity, they follow the entity integrity rule. Some time, UNIQUE constraint is also employed to enforce entity integrity.

c. Referential integrity

The referential integrity rules are enforced by the relationships between entities. As a starting point, the referential integrity rules state that a child instance cannot exist if there is no corresponding parent instance. For example, an order cannot exist without a matching customer, or an order detail cannot exist without the associated order.

d. Enterprise integrity

The last type of integrity is enterprise integrity, also called business rules. These rules, generally implemented through programmatic methods, like stored procedures or triggers on the database server side, define the way the company works. For example, you can state that a customer cannot place a new order if he still owes more than $10,000, or that an order greater than $200,000 has to be approved by the sales manager before being shipped. Enterprise integrity is generally not defined in the data model, but rather in the function model.

In general, there are 2 types of implementation of Data Integrity enforcement. Namely, Declarative Implementation and Procedural Implementation. The following is a table on how they are employed:

Integrity Type	Declarative Implementation	Procedural Implementation
Domain integrity	Data type, nullability DEFAULT constraint CHECK constraint UDTs	Default Rule ( MS recommends not to use Rules, as it will be faced out in the future version of database)
Entity integrity	PRIMARY KEY constraint UNIQUE constraint UNIQUE index	Stored procedure Trigger
Referential integrity	FOREIGN KEY constraint	Stored procedure rigger
Enterprise integrity	NA	Stored procedure rigger

It is generally accepted practices that always use Declarative Implementation whenever possible.

The concept of Data Integrity and guidelines described here are not specific to Microsoft or any specific RDBMS, rather for any RDBMS in general. you can click the following links to find some reference in both MS SQL and Oracle:

From MSDN
http://msdn.microsoft.com/en-us/library/ms184276.aspx

from Oracle
http://download.oracle.com/docs/cd/B28359_01/server.111/b28318/data_int.htm#insertedID1

2. The Relational Model and Normalization

Most of Modern RDBMS engines support up to Third Normal Form. The following are the academy definition of them:

· A relational table is in First Normal Form (1NF) if:

i. It has a primary key.

ii. Each column is atomic.

iii. There is no repeating group of columns.

· A relational table is in Second Normal Form (2NF) if:

i. It is in 1NF.

ii. Every non-key column is fully functionally dependent on the primary key.

· A relational table is in Third Normal Form (3NF) if:

i. It is in 2NF.

ii. Every non-key column is functionally dependent only on the primary key. In other words, a non-key column cannot be dependent on another non-key column.

A normalized database will be a database free of redundant data. However most of Database architects start with a normalized data model and then de-normalize the model for performance reason. A de-normalized database will be a database with some redundant data. It is important to note that application developer (who normally responsible of enforcing enterprise integrity, in another words, they write stored procedures) enjoy the benefit de-normalized database offers, but they do not need to worry about maintaining the redundant data. triggers behind the tables or computed columns on the tables will be responsible for maintaining redundant data.

3. Usage of the triggers

It is true that managing triggers could be challenging for some DBAs. Abuse of triggers could cause poor performance, poor maintainability to the database. However use it for appropriate purpose with well-defined guidelines would improve database performance and database Integrity and development productivity. It is generally accepted practices that triggers are only used for maintaining redundant data and establish audit trail of the transactional data.

Peter Lu's Blog

Data integrity, Normalization Process and usage of triggers

No comments:

Post a Comment

About Me

Blog Archive