Data Management & Warehousing - Specialist Consultants in helping you and your Data Warehousing projects
 
 Data Management & Warehousing
 
 • Home • Knowledge Base • Hints & Tips • About Us • Contact Us • Bookmark Us • 
Log In

Username

Password

Remember Me



Main Menu


Quotes

"The most likely way for the world to be destroyed, most experts agree, is by accident. That's where we come in; we're computer professionals. We cause accidents." --Nathaniel Borenstein

Data Warehousing Links


Science & Technolgy News


PC Security


Detecting Changed Data

Description of a method to identify changes from one version of an extract file to the next.

Introduction

When loading data warehouses, it is usually possible to decreases the load time very significantly by processing only changes since the last load, rather than completely refreshing all the data every time. This article describes one approach for detecting changes, which has been used successfully in a number of data warehouse projects.

Background

There are many different ways of capturing changes from an operational source system. This system may itself deliver change information, but this is quite rare. If the source system runs on a modern database, it may be possible to add triggers which capture the changed data. Usually concerns about performance or reliability prevent this approach. In most cases, the data warehouse load process receives a set of extract files which contain data from all relevant records, whether they have changed since the last load or not.

Data Load Approach

Our favoured approach, when dealing with extract files, is to load them into relational database tables with no change to the file structure and minimal validation. All the leading databases have fast and efficient utilities for loading data from files, and once the data is in a database, it is much easier to detect changes, as well as validating, cleaning and transforming the data into a structure suitable for loading into the data warehouse.


Changed Data Capture
Figure 1

The Method

Figure 1 illustrates the technique for detecting changed data, once an extract file has been loaded into a database. The explanation of the diagram refers to Oracle, but the technique is applicable to any modern relational database.

Each box represents a database table, and summarises the process used to produce it.

The PREVIOUS_DATA and CURRENT_DATA tables represent two extracts of the same source data, which needs to be analysed to find new records, modifications and deletions.

Typically, PREVIOUS_DATA and CURRENT_DATA may be of any size, since they are copies of operational data. Normally, all the other tables shown in the diagram are small, since the numbers of new records, modifications and deletions are quite small during the time between extracting the PREVIOUS_DATA and CURRENT_DATA versions.

Each of the steps in the analysis uses set-level operations which work very efficiently in most situations.

The first step is to create the NEW_AND_MOD table, which will contain all the new and modified records. This can be done with the following SQL statement.

create table new_and_mod as
select * from current_data
minus
select * from previous_data;


If the table contains some control columns (such as date loaded, source file record sequence number, etc.) there should be a column list which excludes these, rather than the "*" which means compare all columns.

DEL_AND_MOD will contain all the deleted and modified records and is created as follows:

create table del_and_mod as
select * from previous_data
minus
select * from current_data;


These first two steps access all the data in the CURRENT_DATA and PREVIOUS_DATA tables twice. All further steps use only the changed data which is usually very much smaller.

The next three steps separate the keys of the new, modified and deleted records, as follows:

create table new_key as
select [key_list] from new_and_mod
minus
select [key_list] from del_and_mod;

create table mod_key as
select [key_list] from new_and_mod
intersect
select [key_list] from del_and_mod;

create table del_key as
select [key_list] from del_and_mod
minus
select [key_list] from new_and_mod;


[key_list] is specific for each table. Normally we generate this from metadata which records the definition of each table.

The final stage is to create the whole records for each of the cases: new, modified and deleted. This can be done as follows:

create table new_row as
select a.*
from
new_and_mod a,
new_key b
where
a.[key_list] = b.[key_list];

create table mod_row as
select a.*
from
new_and_mod a,
mod_key b
where
a.[key_list] = b.[key_list];

create table del_row as
select a.*
from
del_and_mod a,
del_key b
where
a.[key_list] = b.[key_list];


There may be no need to create the DEL_ROW table if the content of deleted rows is not interesting. When there is a need to keep full audit information for deleted rows, the DEL_ROW table would be required.

Pre-requisites

For this technique to work, every record must have a unique key. If there are two records with the same key then this process will report that each is a modification of the other. It is very important to identify unique keys in every source extract file. Without unique keys it is not meaningful to update existing records.

In a relational database, primary keys must be unique, and no part of the primary key may be null. The technique described here, does work even if part of the key is null, so long as the entire key is unique. This is because the set arithmetic (intersect and minus) does consider null to be equal to null, whereas the comparison operators used in the SQL where clause exclude null values – one null value is neither equal nor unequal to another value – they are both unknown.

Performance

This technique seems unlikely to be efficient for those of us familiar with online transaction processing systems, where the full table scan is to be avoided at all costs. However, when processing bulk loads, full table scans are often the most efficient method of processing. Running this process immediately after extracting the data, means that from this point onwards we are dealing only with the new, modified and deleted records. This can improve overall extract, transform and load times by several orders of magnitude.

Article Manager module by by George! Software.

Comments

The comments are owned by the poster. We are not responsible for its content.
What about non-Oracle DBs
Posted on: 2006-11-07 12:53:14   By: era_Ser
 
This approach perfectly works on Oracle based systems, but what should we do on, for ex, DB2 or MSSQL databases where MINUS and INTERSECT commands are not implemented??

BR, Sergey


http://ghdhaiir.com
Posted on: 2010-08-28 03:34:11   By: uggugg
 


GHD Straighteners and birkenstock sandals are a famous brand in the world. Our company is focus on supplying the original and the cheapest GHD Straighteners, gucci bags and birkenstock sandals to national people.GHD Straighteners Lead The Fashion, Which Is Loved All Over The World. There Is No Doubt That The New GHD Iv Styler. Is One Of Many Hair Cared Products In The Most Popular Hair Straightener. If you want to know more about our gucci outlet and Birkenstock sandals,you are warmly welcomed to our online shop.




Cheap Christian Louboutin
Posted on: 2010-08-28 15:44:40   By: admin159
 

christian louboutin shoes
christian louboutin
christian louboutin sale
high heel shoes
christian louboutin boots
Christian Louboutin Pumps
Christian Louboutin Sandals
Christian Louboutin boots
Christian Louboutin shoes
Alexander McQueen Shoes
Chanel Ballet Flats
Chanel Slippers
Christain Louboutin shoes new
Christian Louboutin boots
Christian Louboutin Pumps
Christian Louboutin Sandals
Christian Louboutin shoes
Christian Louboutin short boots
Christian Louboutin Wedges
Jimmy Choo shoes
Manolo Blahni Shoes
Tory Burch Ballet Flats
Yves Saint Laurent Shoes



No Subject
Posted on: 2010-08-30 02:51:34   By: tomic
 
Mbt shoes Mbt shoes
GHD Hair Straighteners GHD Hair Straighteners
MLB Shop MLB Shop
buy nfl jerseys buy nfl jerseys
mlb jersey mlb jersey
Shape Ups Shoes Shape Ups Shoes
Vibram five fingers five fingers shoes
cheap mbt shoes on sale cheap mbt shoes on sale
NFL Jerseys Sale NFL Jerseys Sale
mlb jerseys mlb jerseys
mbt shoes on sale mbt shoes on sale
Shape Up Shoes Shape Up Shoes
ed hardy ed hardy wholesale
true religion wholesale true religion
abercrombie fitch wholesale abercrombie fitch
GHD MK4 Pink GHD MK4 Pink
mbt mbt
Vibram five fingers shoes Vibram five fingers shoes
mlb jerseys mlb jerseys
nfl jerseys cheap nfl jerseys
GHD IV Styler GHD IV Styler
discount nfl jerseys cheap nfl jerseys
discount nfl jerseys
Discount NFL Jerseys Discount NFL Jerseys
cheap mlb jerseys cheap mlb jerseys
cheap mbt shoes cheap mbt shoes
Skechers Shape Ups Shoes Skechers Shape Ups Shoes
ghd hair ghd hair
discount vibram five fingers vibram
Wholesale NFL Jerseys Wholesale NFL Jerseys
cheap mbt shoes cheap mbt shoes
ugg boots uk
discount ugg boots
Jordan shoes Jordan Shoes
Nike Air Jordan Shoes Nike Air Jordan Shoes
Air Force One Air Force One
ugg boots on sale
cheap mlb jerseys cheap mlb jerseys
NFL Jerseys Wholesale Wholesale NFL Jerseys
Buy Cheap NFL Jerseys Buy Cheap NFL Jerseys
vibram china five fingers vibram
nba jerseys nba jerseys
discounted mbt shoes discounted mbt shoes
Cheap Shape Up Shoes Cheap Shape Up Shoes
jordan shoes wholesale jordan shoes
cheap ed hardy cheap ed hardy
nfl jerseys nfl jerseys
authentic nba jerseys authentic nba jerseys
NFL Jerseys NFL Jerseys
nike shoes wholesale nike shoe
ugg boots uk
Ghds Ghds
mlb jerseys mlb jerseys
uggs
ghd hair straighteners
Discount Ed Hardy Discount Ed Hardy
ghd hair
Cheap GHD Cheap GHD
vibram vibram finger
uggs discounts
discount mbt shoes discount mbt shoes
MLB Shop MLB Shop
replica handbags wholesale
replica watches wholesale
cheap nba jerseys cheap nba jerseys
Discount NFL Jerseys Discount NFL Jerseys
NFL Jerseys NFL Jerseys
Cheap NFL Jerseys Cheap NFL Jerseys
nfl jersey
GHD Hair Straighteners GHD Hair Straighteners
ugg boots wholesale ugg boots
Cheap Ed Hardy Wholesale Cheap Ed Hardy Wholesale
ugg boots discount ugg boots
discount ugg boots cheap ugg boots
mbt mbt
Discount MLB Jerseys Discount MLB Jerseys
Shape Ups Shoes Shape Ups Shoes
nfl jerseys wholesale nfl jerseys
ed hardy wholesale ed hardy wholesale
nfl jerseys nfl jersey
nba jerseys nba jerseys
GHD Hair Styler GHD Hair Styler
mlb jerseys mlb jersey
the north face mens
the north face women's
the north face denali
ugg boots
discount nfl jerseys discount nfl jerseys
ugg australia
Wholesale Ed Hardy Wholesale Ed Hardy
sheepskin boots
Cheap MLB Jerseys Cheap MLB Jerseys
nhl jerseys nhl jerseys wholesale
mbt sale mbt sale
mlb jersey mlb jersey
nfl jersey nfl jersey
GHD Hair GHD Hair Styler
cheap nfl jerseys cheap nfl jerseys
nhl jerseys
mbt shoes on sale mbt shoes on sale
mbt shoes mbt shoes
nba jersey nba jersey
mbt mbt
MLB Jerseys Wholesale MLB Jerseys Wholesale
discount mbt shoes mbt shoes cheap
cheap car insurance cheap car insurance
home loans home loans
MLB Jersey MLB Jersey
cheap nfl jerseys cheap nfl jerseys




wedding invitation
Posted on: 2010-08-30 08:29:42   By: uggugg
  Edited By: weddinginvitation
On: 2010-08-31 04:42:53
According to proper lace wigs,wedding invitation,cheap bridesmaid dresses,formal hairstyles,hair styles,prom hairstyles, wedding invitation envelopes should be hand-addressed with the guests’names.


fast weight loss
Posted on: 2010-09-02 10:17:26   By: yuejiujin
 
home loans home loans
mbt shoes cheap mbt shoes cheap
mbt shoes cheap mbt shoes cheap
nfl jerseys
cheap nfl jerseys
mbt shoes cheap mbt shoes cheap
discount mbt shoes discount mbt shoes
mbt shoes mbt shoes
mbt mbt
weight Loss programs weight Loss programs
weight loss weight loss
weight loss tips weight loss tips
fast weight loss fast weight loss
cheap car insurance cheap car insurance
car insurance car insurance
ghd hair ghd hair
ghd straighteners ghd straighteners
discount mbt shoes discount mbt shoes
mbt shoes mbt shoes
home equity loans home equity loans


No Subject
Posted on: 2010-09-02 11:51:59   By: chenjing123
 
Links replica watches with silver replica watch jewelry necklace sweetie is none of 4lover.com cheap replica watches you 100% Friendship Bracelets pure copper materials such as sweetie ring mixing into hanliu, each linksukstore.com silver product is pure and beautiful, because in Britain can call pure silver, must strictly followed the British constitution mark quality glashutte replica watches linksgiftstore. Jewelry


womens ralph lauren polo shirts
Posted on: 2010-09-03 10:27:29   By: poloshirtsb2c
 
It is a nice article, thanks for your nice share.
lacoste polo online

[url=http://www.poloshirtsb2c.com] cheap Ralph Lauren Polo shirts [/url]
111




ETIS

We are an associate member of
ETIS

Are you linked in ?

See David Walker's
View David Walker's profile on LinkedIn
Profile

Twitter


Use Skype To Contact Us

Skype Me!

RSS Feeds


googlesvcs from Data Management & Warehousing, the Data Warehousing knowledgebase and consultancy

WWW   Site


Search Search the articles





Who's Online

There are currently
6 guest(s) and
0 member(s) online.
You are an
Anonymous user.
You can register for
free by clicking here

No Members are currently logged in.

Your IP address :
38.107.191.83

Your country is :
Unknown
Unknown

Legal



Valid RSS!
The content of this site is Copyright (©) 1995-2007 Data Management & Warehousing Ltd
Valid XHTML 1.0! Valid CSS!
Registered Office: 138, Finchampstead Road, Wokingham, Berkshire, RG41 2NU, United Kingdom
Registered in England and Wales; Registration Number: 3526504; VAT Registration Number: GB 724 4482 36
This site is powered by phpWebSite © The Web Technology Group, Appalachian State University and licensed under the GNU LGPL