Oracle and IBM databases: Disk-based vs In-memory
case for in-memory databases (IMDB) can be made in three simple points (1)
performance - data is kept in RAM so no disk I/O limitations (2) HA
with built in fail-over (3) support for relational schema and SQL.
Current disk based RDBMS can run out of steam when processing large
data. Can these problems be solved by migrating from a disk based
RDBMS to an IMDB? Any limitations? To find out, I tested
one of each from the two leading vendors who together hold 70% of
the market share - Oracle's 11g
and TimesTen 11g,
and IBM's DB2
v9.5 and solidDB
The key functionality of IMDB is their support for SQL and stored procedures. This allows developers entrenched in the database SQL world to realize the benefits of in-memory data access. Other technological solutions, such as in-memory caches, require the adoption of a whole new programming paradigm.
|Functionality||IBM solidDB 6.3
||Oracle TimesTen 11g
|SQL Support||Large subset of SQL92 and selected features of SQL98 and SQL2003||SQL-92|
|Subset of DB2 SQL PL||PL/SQL
IMDB as a secondary cache to RDBMS
|DB2, Microsoft SQL Server, Oracle DB, IDS, Sybase||Oracle DB|
|OS supported||Windows, Unix/Linux, Solaris, AIX, HP||Windows, Unix/Linux, HP, Solaris, AIX|
|API||JDBC, ODBC, solidDB SA, CLI||
JDBC, ODBC, JMS/XLA, CLI
How I tested?
The test environment will not set any performance records, but was sufficient for comparison purposes: Two physical machines with Dual core Pentium 4 CPU 2.6GHz, 1 GB of RAM and Linux CentOS 5.3 x64.
A single instance of each database was installed on each machine (due to scalability limitations - more later) in its default configuration. The same database schema was used, adjusted for small differences in each databases implementation. All tests were run in isolation and averages taken.
The focus was on read operations as the power of IMDB lies in fast data access. ACID operations, such as writes, can involve additional logging overhead leading to reduced performance. The tested dataset was 300,000 records, enough so that all data could be kept in the available RAM.
- Selects were on average 5x faster for each IMDB compared to its RDBMS brother
- Inserts and deletes were found to be (2x and 4x) faster for each IMDB compared to its RDBMS brother
What about scalability?
Horizontal scaling should allow increased data loads to be simply met by adding nodes to the resource pool. This requires a mechanism that transparently routes queries without the application specifying where the data is stored
solidDB only provides transparent queries when architected using two database instances connected in HA HotStandby mode. It doesn't support further scaling or any kind of data partitioning.
TimesTen is more advanced and has Cache Grid technology, which is designed to provide horizontal scalability. A global cache group can be created from a cluster and data partitioned on each node to enable transparent querying.
Food for thought
faster when it comes to selecting, inserting and deleting. However,
when RDBMS caching and optimisation algorithms came into play on
frequently accessed data, the difference in select speeds are minimal.
To realize the benefits of in-memory databases, they are best deployed as secondary caches or for the processing of datasets that can be accommodated in the available RAM. If selects are ad-hoc and need to run over large datasets then the required data cannot be feasibly kept in RAM without incurring considerable costs.
Scalability limitations of IMDB mean that it can be advisable to use in-memory data-grids, such as Oracle Coherence, GigaSpaces XAP or GemFire. These provide transparent partitioning and data querying out-of-the box. Risks? Yes, all that beautiful SQL logic will have to be refactored into an object oriented language (Java or C#).