Infinispan narrows the gap between open source and commercial data caches
Recently I attended a lecture presented by Manik Surtani, JBoss Cache & Infinispan project lead. The goal of the talk was to provide a technical overview of both products and outline Infinispan's road-map. Infinispan is the successor to the open-source JBoss Cache. JBoss Cache was originally targeted at simple web page caching and Infinispan builds on this to take it into the Cloud paradigm.
Why did I attend? Well, over the past few years I have worked on projects that have used commercial distributed caching (aka data grid) technologies such as GemFire, GigaSpaces XAP or Oracle Coherence. These projects required more functionality than is currently provided by open-source solutions such as memcached or EHCache. Looking at the road-map for Infinispan, I was struck by its ambition – will it provide the functionality that I need?
Many of the projects that I have been involved in do not use a cache solely for the reading of frequently accessed data. This is a common scenario when using caching technologies and is prevalent in the Web 2.0 world (Facebook, Twitter etc.). The applications that I work with process large amounts of data (up to a million objects per day), performing massive numbers of ad-hoc reads, write and update operations.
What must a cache provide?
- Persistence – the cache must be able to persist all changes made to objects to the underlying storage (e.g. database)
- Cache querying – it must be possible to perform ad-hoc queries for objects that match specified criteria (e.g. using SQL)
- High Availability – if one of cache nodes fails, other nodes take over. No cached information is lost and the user does not experience any down-time
- Scalability – the cache must be horizontally scalable. To reduce the load I can simply add another machine
- Data partitioning – the cache must allow distributing cached of data on multiple nodes using explicitly written partitioning criteria
- Language bindings – the cache must be accessible for clients developed in various programming languages
- Documentation and support – the cache must have extensive and clear documentation with an active community
Based on the above criteria, I want to perform a quick comparison between JBoss Cache, Infinispan and the commercial GigaSpaces XAP. Other leading commercial caches, Oracle Coherence and GemFire have been reviewed internally and will feature in future posts. This table is not a full comparison of all products, only a chosen selection.
|Feature||GigaSpaces XAP 7.0||JBoss Cache Core 3.2.0||Inifinispan 4.0.0 Beta 1|
Mirror service that persists to the underlying storage
|Passivation saves cached data to the underlying storage||Passivation saves cached data to the underlying storage|
|Cache querying||Subset of SQL supported||Lack of SQL support (or any other simplified data query language support)||Scheduled full text search (future regexp support addition possible)|
|Scalability||Horizontally scalable||Horizontally scalable||Horizontally scalable|
|High Availability||Fully supported using backup spaces||Incomplete HA support (if cache is run as Session EJB)||Fully supported using redundant distribution mode|
|Data partitioning||Fully supported||Unsupported||Fully supported|
|Language bindings||Java, C++, .NET, Ruby||Java||Java
Planned (via memcached protocol): PHP, Python, Ruby or C
|Documentation and developer support||Comprehensive documentation and a highly active user forum||Incomplete and slightly disordered documentation||Incomplete and slightly disordered documentation|
The feature comparison table shows that GigaSpaces XAP provides all the features that I am looking for. JBoss Cache is missing many features, most notably the lack of support for data partitioning. What is of great importance is that Infinispan currently supports or is scheduled to support all the features that I require.
Back to the Java Developers' Day Workshop in Krakow, one key thing that grabbed my attention during the presentation was the discussion on dynamic routing. Why? Let's suppose we have the following scenario using GigaSpaces XAP:
I configure the application to run 10 spaces on 3 machines and deploy. Later, as the number of users grow, we dynamically add more machines to cope with the increased demand. Suddenly we have an unexpected peak in users and want to add 2 more machines, but can't.
Why? Because the application was originally configured to use 10 spaces and this number cannot be increased. To solve this, I need to stop the whole infrastructure, reconfigure the number of spaces and restart. This results in down-time that affects users.
The reason that GigaSpaces suffers from this limitation is that it has a fixed space routing table at deployment time.
I described the above scenario to Manic and asked him how it would play out with Infinispan? It turns out that Infinispan does not suffer from this restriction as it uses dynamic routing tables. Theoretically, Infinispan allows you to add any number of machines without incurring any down-time.
Concluding, proprietary caches provide the added functionality and comfort values that are yet to be found in the open source world. Infinispan is feature rich, but still a relatively immature solution, currently in Beta stage. It will take a brave enterprise to use it in one of their mission-critical infrastructures.
Looking to the future, Infinispan will stabilize and mature over the next few years. Moreover, Infinispan is already implementing innovative features such as dynamic routing to solve current limitations in commercial offerings. This will only continue aided by an active developer community.