« Big Data on Grids or on Clouds? | Main | Scale out your identity management »

December 17, 2009

Oracle and IBM databases: Disk-based vs In-memory

The case for in-memory databases (IMDB) can be made in three simple points (1) performance - data is kept in RAM so no disk I/O limitations (2) HA with built in fail-over (3) support for relational schema and SQL. Current disk based RDBMS can run out of steam when processing large data. Can these problems be solved by migrating from a disk based RDBMS to an IMDB? Any limitations? To find out, I tested one of each from the two leading vendors who together hold 70% of the market share - Oracle's 11g and TimesTen 11g, and IBM's DB2 v9.5 and solidDB 6.3.

Feature overview

The key functionality of IMDB is their support for SQL and stored procedures. This allows developers entrenched in the database SQL world to realize the benefits of in-memory data access. Other technological solutions, such as in-memory caches, require the adoption of a whole new programming paradigm. 

Another thing to highlight is the use of IMDB as a secondary cache to RDBMS. This allows the database administrator to configure commonly used tables and views from the underlying RDBMS to be held in-memory.


Functionality IBM solidDB 6.3
Oracle TimesTen 11g
SQL Support Large subset of SQL92 and selected features of SQL98 and SQL2003 SQL-92

Stored procedures

Subset of DB2 SQL PL PL/SQL

IMDB as a secondary cache to RDBMS

DB2, Microsoft SQL Server, Oracle DB, IDS, Sybase Oracle DB
OS supported Windows, Unix/Linux, Solaris, AIX, HP Windows, Unix/Linux, HP, Solaris, AIX
API JDBC, ODBC, solidDB SA, CLI

JDBC, ODBC, JMS/XLA, CLI

How I tested?

The test environment will not set any performance records, but was sufficient for comparison purposes: Two physical machines with Dual core Pentium 4 CPU 2.6GHz, 1 GB of RAM and Linux CentOS 5.3 x64.

A single instance of each database was installed on each machine (due to scalability limitations - more later) in its default configuration. The same database schema was used, adjusted for small differences in each databases implementation. All tests were run in isolation and averages taken.

The focus was on read operations as the power of IMDB lies in fast data access. ACID operations, such as writes, can involve additional logging overhead leading to reduced performance. The tested dataset was 300,000 records, enough so that all data could be kept in the available RAM.

Results

I am only able to share the trends that I observed, due to license restrictions. Understandable due to the massive differences that can be affected during configuration and tuning. More about this here.

  1. Selects were on average 5x faster for each IMDB compared to its RDBMS brother
  2. Inserts and deletes were found to be (2x and 4x) faster for each IMDB compared to its RDBMS brother

What about scalability?

Horizontal scaling should allow increased data loads to be simply met by adding nodes to the resource pool. This requires a mechanism that transparently routes queries without the application specifying where the data is stored

solidDB only provides transparent queries when architected using two database instances connected in HA HotStandby mode. It doesn't support further scaling or any kind of data partitioning.

TimesTen is more advanced and has Cache Grid technology, which is designed to provide horizontal scalability. A global cache group can be created from a cluster and data partitioned on each node  to enable transparent querying.

Food for thought

IMDB are faster when it comes to selecting, inserting and deleting. However, when RDBMS caching and optimisation algorithms came into play on frequently accessed data, the difference in select speeds are minimal.

To realize the benefits of in-memory databases, they are best deployed as secondary caches or for the processing of datasets that can be accommodated in the available RAM. If selects are ad-hoc and need to run over large datasets then the required data cannot be feasibly kept in RAM without incurring considerable costs.

Scalability limitations of IMDB mean that it can be advisable to use in-memory data-grids, such as Oracle Coherence, GigaSpaces XAP or GemFire. These provide transparent partitioning and data querying out-of-the box. Risks? Yes, all that beautiful SQL logic will have to be refactored into an object oriented language (Java or C#).

Bookmark and Share

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a01156f69dc6b970c012875670e06970c

Listed below are links to weblogs that reference Oracle and IBM databases: Disk-based vs In-memory:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Very interesting post. It opened up my knowledge capital.

why write can be faster?? They should persist at disk also. or you turn off disk persistence at all?

Transaction logs were turned off during the test because I didn't want writing of logs to disk to dim the results. Only writes to RAM were tested without persistence. However transaction logging can be set to relaxed, in which case logs are written asynchronously.

Let's not forget you can use IBM websphere extreme scale which is ibms coherence competitor with all the same caveats as mentioned above

Yes of course, I didn't mention all the solutions out there as it is the subject for a whole new post and research. I only wanted to point out some alternative in-memory data grid solutions and show that there are other vendors on the market. Thanks for your comment, most definitely in-memory data grids require more thorough benchmark and functionality comparison, which I'll try to conduct soon. Of course I'll post the results here.

many modern web applications (especially those designed with the MVC paradigm) make use of a single URL, and serve contents based upon different parameters. In such scenarios, it is irrelevant to report vulnerabilities based on URLs

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment