« What's new in Oracle's 11g release? | Main | Private Data Cloud: 'Do It Yourself' with Eucalyptus »

September 15, 2009

Infinispan narrows the gap between open source and commercial data caches

Recently I attended a lecture presented by Manik Surtani, JBoss Cache & Infinispan project lead. The goal of the talk was to provide a technical overview of both products and outline Infinispan's road-map. Infinispan is the successor to the open-source JBoss Cache. JBoss Cache was originally targeted at simple web page caching and Infinispan builds on this to take it into the Cloud paradigm.

Why did I attend? Well, over the past few years I have worked on projects that have used commercial  distributed caching (aka data grid) technologies such as GemFire, GigaSpaces XAP or Oracle Coherence. These projects required more functionality than is currently provided by open-source solutions such as memcached or EHCache. Looking at the road-map for Infinispan, I was struck by its ambition – will it provide the functionality that I need?

Many of the projects that I have been involved in do not use a cache solely for the reading of frequently accessed data. This is a common scenario when using caching technologies and is prevalent in the Web 2.0 world (Facebook, Twitter etc.). The applications that I work with process large amounts of data (up to a million objects per day), performing massive numbers of ad-hoc reads, write and update operations.

What must a cache provide?

  • Persistence – the cache must be able to persist all changes made to objects to the underlying storage (e.g. database)
  • Cache querying – it must be possible to perform ad-hoc queries for objects that match specified criteria (e.g. using SQL)
  • High Availability – if one of cache nodes fails, other nodes take over. No cached information is lost and the user does not experience any down-time
  • Scalability – the cache must be horizontally scalable. To reduce the load I can simply add another machine
  • Data partitioning – the cache must allow distributing cached of data on multiple nodes using explicitly written partitioning criteria
  • Language bindings – the cache must be accessible for clients developed in various programming languages
  • Documentation and support – the cache must have extensive and clear documentation with an active community

Feature comparison

Based on the above criteria, I want to perform a quick comparison between JBoss Cache, Infinispan and  the commercial GigaSpaces XAP. Other leading commercial caches, Oracle Coherence and GemFire  have been reviewed internally and will feature in future posts. This table is not a full comparison of all products, only a chosen selection.

Feature GigaSpaces XAP 7.0 JBoss Cache Core 3.2.0 Inifinispan 4.0.0 Beta 1
Persistence

Mirror service that persists to the underlying storage

Passivation saves cached data to the underlying storage Passivation saves cached data to the underlying storage
Cache querying Subset of SQL supported Lack of SQL support (or any other simplified data query language support) Scheduled full text search (future regexp support addition possible)
Scalability Horizontally scalable Horizontally scalable Horizontally scalable
High Availability Fully supported using backup spaces Incomplete HA support (if cache is run as Session EJB) Fully supported using redundant distribution mode
Data partitioning Fully supported Unsupported Fully supported
Language bindings Java, C++, .NET, Ruby Java Java
Planned (via memcached protocol): PHP, Python, Ruby or C
Documentation and developer support Comprehensive documentation and a highly active user forum Incomplete and slightly disordered documentation Incomplete and slightly disordered documentation

The feature comparison table shows that GigaSpaces XAP provides all the features that I am looking for. JBoss Cache is missing many features, most notably the lack of support for data partitioning. What is of great importance is that Infinispan currently supports or is scheduled to support all the features that I require.

Infinispan innovation

Back to the Java Developers' Day Workshop in Krakow, one key thing that grabbed my attention during the presentation was the discussion on dynamic routing. Why? Let's suppose we have the following scenario using GigaSpaces XAP:

I configure the application to run 10 spaces on 3 machines and deploy. Later, as the number of users grow, we dynamically add more machines to cope with the increased demand. Suddenly we have an unexpected peak in users and want to add 2 more machines, but can't.

Why? Because the application was originally configured to use 10 spaces and this number cannot be increased. To solve this, I need to stop the whole infrastructure, reconfigure the number of spaces and restart. This results in down-time that affects users.

The reason that GigaSpaces suffers from this limitation is that it has a fixed space routing table at deployment time. 

I described the above scenario to Manic and asked him how it would play out with Infinispan? It turns out that Infinispan does not suffer from this restriction as it uses dynamic routing tables. Theoretically, Infinispan allows you to add any number of machines without incurring any down-time.

Conclusion

Concluding, proprietary caches provide the added functionality and comfort values that are yet to be found in the open source world. Infinispan is feature rich, but still a relatively immature solution, currently in Beta stage. It will take a brave enterprise to use it in one of their mission-critical infrastructures.

Looking to the future, Infinispan will stabilize and mature over the next few years. Moreover, Infinispan is already implementing innovative features such as dynamic routing to solve current limitations in commercial offerings. This will only continue aided by an active developer community.

Bookmark and Share

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a01156f69dc6b970c0120a56cc010970b

Listed below are links to weblogs that reference Infinispan narrows the gap between open source and commercial data caches:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Nice post, infinispan has a roadmap and a very premature source code available, I wonder how long it would take them to have a production level product. GigaSpaces allows you to specify the number of partitions, dynamic repartitoning is done by over provisioning. The difference is Gigaspaces allows you to control that.
You forgot to mention Gigaspaces admin API that gives you full control over the cluster. With Gigaspaces you can provision additional machines over cloud infrastructure such as Amazon and scale the nodes you chose via the admin API, scaling without such capabilities is very limited

I do believe that when they say dynamic routing they refer to node relocation which is a basic feature in GigaSpaces (their documentation does not contain information about that), It would be interesting to see new ideas though.

Hi Mickey, good points but:
1: Yes dynamic routing is exactly what you described but as Manik told Infinispan can add new partitions without any interruption. He described mechanism briefly. I didn't have time to test it so far so I have to believe as he is project lead.
2: Over-provisioning have some limitations. In extreme situations it can be very bad for your system. Let's suppose you have daily peaks like 10000% of average CPU/memory usage but for relatively short time. Isn't it that, over-provisioning number of partitions like 100x times can be very harmful for your system when not in peak time? I haven't tried to even start 100 spaces in one JVM but I will.
3: Yes, GigaSpaces have very nice admin tools. I agree that those are very useful and makes your life easy. Infinispan is too young to have such tools so we can observe in next months if/how they develop some.

In my opinion Infinispan is project worth being observed. Right now it's not an option for me but maybe in future it will become.

Regarding Infinispan admin tools, there is one I know. It is called Jopr. It is a general purpose JMX-based administration tool for JBoss tools, including Infinispan.

I should also add that in addition to language bindings using memcached clients, we also now have a REST API which makes it easy to consume from almost any language/platform.

http://infinispan.blogspot.com/2009/09/introducing-infinispan-rest-server.html

Also, regarding JOPR, there is a lot of stuff on our roadmap to enhance the JOPR plugin to go beyond simple GUI display of JMX statistics. Firstly, JOPR already provides aggregate views on raw data (e.g., if Infinispan exposes a transaction-per-second metric, JOPR can poll and reset this metric every N seconds, and provide a graphical timeline view of how any given metric changes over time).

For the other cool stuff we hope to add to the JOPR plugin, see these links:

http://bit.ly/33pJlK

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment