DATABASE VERIFICATION PROJECT
SPONSORED BY NORTEL TO B E COLLEGE (D UNIVERSITY )
I. OBJECTIVE OF THE PROJECT :
1. To understand the verification
problems associated with
the Nortel's legacy database
2. To findout the verification requirements understanding
the table structures and constraints.
Also to collect data values for existing tables
record by record.
3. How to avoid transaction anomalies on existing tables?
4. To understand the nature & impact of new or changed functions
on the existing tables of
the legacy database .
5. Suggesting design guidelines for new database
tables to be created .
II. DELIVERABLES :
AT THE END OF FIRST YEAR OF THE PROJECT (APRIL, 1999) BEC HAS DELIVERED
TO NORTEL AN INTERACTIVE PACKAGE REALIZING THE OBJECTIVES SPECIFIED
IN FIRST THREE POINTS
NOTED UNDER I. THE COMPLETE PACKAGE IS APPENDED TO THIS REPORT ALONG
WITH "README FILE"
AND ILLUSTRATIONS.
DETAILS OF IMPLEMENTATION ISSUES ARE NOTED BELOW.
III. PHASE-I TASKS:
1). BEC has
attempted to achieve first three objectives
in Phase-I.
by developing different interactive software modules described below:
*Collecting/Displaying/Modifying
Table Structures existing in
the legacy Database from field/design office.
*Collecting Data Records against Tables maintaining value-range
and type constraints from actual users.
* The problem arises in legacy data bases
due to incorrect design of table structures,composite
attributes,
etc. as invented by Codd in 70s. In relational theory Codd
also pointed out functional anomalies caused due to the following:
1. Inproper choice of key/primary
key .
2. Not use of normalized table
structures.
3. Random use of composite and
vector
attribute(s)/field(s) in tables.
4. Improper definition of attribute
constraints, functinal dependencies,
null values,etc.
Codd developed Normalization method to overcome
above difficulties.
2). BEC has attempted to tackle the Table Structure Verification
problem
using two separate approaches .
i) Using codd's relational theory for verifying table
structures in
1NF,2NF,3NF normal forms.Therefore
* We have developed table Normalization /Verification
algorithm (upto 3NF)
assuming the following:
1. Any composite attribute can be splitted
into
atomic data-items to ensure 1NF
constraint .
2. Any vector attribute ( delimitted by '$')will
not perticipate in the primary
key determination.
3. Functinal Dependencies between primary key and non-key
attributes can be collected
through users
interactions .
4. Existing table structures can be decomposed into
multiple 3NF tables,of course,
ensuring lossless join(based on
ABU's algorithm).
ii) Since for legacy database
tables functional dependencies and
primary keys may not be known
we have to apply some reverse engineering
method to determine primary keys and
functional dependencies.
The machine learning method applied here is
the'Combined Entropy
approach with uniqueness testing' .To apply
this method data-records
for each will be required.The interactive package has taken care
of that.
ACHIEVEMENTS :
* BEC has been able to develop an interactive software package
which is
capable of perfomming the following activities .
1. Collection/Modification/Deletion of table structures
used in legacy database
of NORTEL .
2. Collection of data-records based on those structures for
applying machine learning
approach .
3. For any given table structure and data set
it is possible by the package to determine
the primary keys using 'Combined
Entropy & Uniqueness
Testing Approach ' .
4. Once the primary key of a table is determined ,
verification of table structure
for 1NF,2Nf,3NF can
be done automatically with the
help of the package
in an interactive manner .
NOTE :
We
could test the s/w package with only 'trkmem'
table structure because dataset of 384 records
were supplied for that table only . Although the structures
of other two tables, namely 'sbsinv' & 'cdmaconf',were
known tests could not be performed for want of sufficient
data-records.
PROPOSED APPROACH FOR SECOND PHASE :
1. With the interactive Software
Package we will be able to collect
a good
number of table structures and data records from users.
These
collections will help us in understanding the system requirements.
Identification
of problems areas will also be easy.
2. For automatic verification it is possible to develope a tool
based on
decission-tree/decision-list methods of machine
learning approach.
The usefulness of proposed method has been already
tested by "trkmem" table.
The beauty of the proposed m/c learning approach is that
for any existing
table structure ,it will create a decission trees/lists
.Once such
decission trees become known for any legacy database tables,they
will act
as verification templates
against any further changes proposed in the
table structures/operations on dataset . The process of verification
can be made automatic. The proposed algorithm will be able to
fulfil
real-time requirements
also.