« NoSQL – the new wave against RDBMS | Main | When optimizing - don't forget the Java Virtual Machine (JVM) »

August 03, 2009

Real-time data integration using Change Data Capture

When decisions have to be made quickly? Access to real-time data is one of the key considerations in almost every corporation. Taking strategic decisions based on out-of-date data can produce painful results. Imagine a stockbroker who works using data that is ten minutes out-of-date – costly mistakes will occur as others react to market events quicker. Very strict requirements about the acceptable latency for decision-making data, where every second is important, force companies to find new solutions that can meet these expectations.


Data from DBMS can be extracted in many different ways using SQL, table dumps or use of application that sits over the database. These solutions are suitable in such scenarios, but the question to be asked is – can they really deliver data in near real-time? I doubt it. The high computation cost of processing large amounts of data and the time needed for data transfer make these solutions too slow.

An alternative solution - Change Data Capture

The market is full of vendors like Oracle, Attunity, IBM or Informatica which offer real-time data integration solutions with a wide range of functionality and options. Most of them aim to achieve real-time data delivery using Change Data Capture (CDC). This is a mechanism based on the identification, capture, and delivery of only the changes made to operational/transactional data systems.

A text-book example of data integration using CDC can be found here (“Real-time, Consistent Data” section).

CDC provides users with access to the latest information allowing pro-active measures to be taken based on the near real-time data. Other benefits from the use of  CDC are:

  • increased system efficiency - only small amount of data has to be processed (log files) and transferred (changed data),
  • cost reduction - lowered system and storage requirements,
  • highly available – the system does not pause during data transfer,
  • non-invasive solution - changes are captured using database redo log files, so no modifications are required on the source database.

Any other ways?

CDC has some disadvantages, like the fact that if we want to achieve “real” real-time data integration, changes have to be captured as part of a transaction which adds overhead to the source database at capture time. Also, real-time data integration can be achieved using different methods such as data federation or through the use of middleware technologies that connect applications. In short, CDC appears to be the best-fit in many scenarios that require near real-time data. However, CDC is not a one size fits all and the question that needs to be answered is – how does the theoretical performance compare to that seen in deployed infrastructures?

Bookmark and Share

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a01156f69dc6b970c0115716126b8970c

Listed below are links to weblogs that reference Real-time data integration using Change Data Capture:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

There is also shareplex from quest www.quest.com or databasesync from wisdomforce www.wisdomforce.com. shareplex works in homogeneous and wisdomforce in heterogeneous

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment