Something came to my mind on this topic and gave it some thought
and I would like to share with you.. Please jump in with your take on it, I am
do not think I have the golden solution for it.
Okay let me get it started.
Entity integrity ensure there is no duplicated rows in the
table, in a relational database this is enforced by defining primary key and
unique constraint.
Referential integrity ensure that the reference value are valid
value in the master table. Take an
example, you want to ensure that the state code in the address table is a value
state code in the State table. In a
relational database this is enforced by defining foreign key.
Those are working well.
Wait, they working well before logic delete is implemented. They stop
working as soon as logic delete gets into the picture.
Consider, this case, in our database we have a table call Course
which lists all courses offered by the college.
From business point of view, the course code is the unique of the table.
There is a table call Student, which stores all students if the college. There
is a table call [Course Election] which stored all courses elected by all
students.
Before we logic delete is implemented, the primary key for
Course table is [Course Code], the primary key for Student is [Student Id], the
primary key for [Course Election] is [Student Id] and [Course Code] combine . Then we define the foreign keys in [Couse Election]
for [Course Id] in reference [Course] table and define foreign key in [Course
Election] for [Student Id] in reference [Student] table.
With this design, no row could be added to [Course Election]
without valid [Student Id] or [Course Code], for each [Student Id] and [Course
Code] combine, there is only one row is allowed in [Course Election] Table.
These are all good. But let’s add logic delete flag in Student table and Course
table.
This is how we design it to work, when user update the information
for a student, the original row will be marked as Deleted and a new row is to
be inserted. The similar logic is implemented for Course Table. From the surface, it looks neat. I have one
table storing the current data as well as history data. But if you take a closer look you would find
you created more problems than you plan to solve. First of all, what would be the primary key
for course and Student tables? Just
[Student Id] and [Course Code] would not work, because the same student
may have many rows, with many deleted rows and one active row. How about added the active flag as part of
primary key? That wouldn’t work either. Because there could be many deleted row
for the same student. How about adding a
creation TimeStamp column into the table and make it part of primary key? Well
the table now accept many deleted row for the same person as long the rows are
not created at the same time, but it also accept multiple active row for the
same student as long as there are not created at the same time. This is not
what you want to see.
Enough about the problem on primary key. The situation on
foreign key is not in any way better. Let’s say we have student got deleted
because he or she has transferred to another college, the row is still in the
table for the student, it is just that the Deleted Flag is true. From foreign key point of view, there is no
problem we added an election for a specific course for the transferred
student. But from business point of
view, that is against the business rule.
The conclusion is that as soon as logic delete is implemented,
our hope of using primary key, and foreign key to enforce Entity integrity and
Referential integrity is gone for good.
The question still, the business requirement needs to store the
history information on these important tables, how do we do it? It is a valid question for any data model
designer. Well, this is my solution to
the business problem.
I will define the [Course] Table and [Student] Table as they
are, no logic delete flag, and I will design the [Course Election] Table as pre
normal. This is so that the primary key and foreign keys are working as
expected. Then I will design [Couse Audit]
and [Student Audit] Table. These audit tables have 2 set of columns, one
for current values and one for previous values and with a set of columns
recording, who, when and what kind of information. In these tables, the primary
key will be a guid or an identity column. No foreign keys no unique constraint.
Then I create a trigger for the table to handle AFTER DELETE,
INSERT, UPDATE to insert into the Audit table with the following query
Select *
From inserted I FULL OUTER JOIN Deleted on I.Id = D.Id
With this design, the data history is preserved, the data integrity
is enforced by the database schema, the coding (whether it is .Net coding or store procedure coding ) are
simplified
Historically speaking, logic delete was one of the legacy
carried form Mainframe system in the time there was no relational database yet.
If we carry this legacy item into our modern database design without fully
understanding of the impact of such legacy, we might got ourselves in trouble…
be careful. You want your modern database designed as modern database which data
integrity is enforced by the database engine.
No comments:
Post a Comment