« What's new in Oracle's 11g release? | Main | Private Data Cloud: 'Do It Yourself' with Eucalyptus »

September 15, 2009

Infinispan narrows the gap between open source and commercial data caches

Recently I attended a lecture presented by Manik Surtani, JBoss Cache & Infinispan project lead. The goal of the talk was to provide a technical overview of both products and outline Infinispan's road-map. Infinispan is the successor to the open-source JBoss Cache. JBoss Cache was originally targeted at simple web page caching and Infinispan builds on this to take it into the Cloud paradigm.

Why did I attend? Well, over the past few years I have worked on projects that have used commercial  distributed caching (aka data grid) technologies such as GemFire, GigaSpaces XAP or Oracle Coherence. These projects required more functionality than is currently provided by open-source solutions such as memcached or EHCache. Looking at the road-map for Infinispan, I was struck by its ambition – will it provide the functionality that I need?

Many of the projects that I have been involved in do not use a cache solely for the reading of frequently accessed data. This is a common scenario when using caching technologies and is prevalent in the Web 2.0 world (Facebook, Twitter etc.). The applications that I work with process large amounts of data (up to a million objects per day), performing massive numbers of ad-hoc reads, write and update operations.

What must a cache provide?

  • Persistence – the cache must be able to persist all changes made to objects to the underlying storage (e.g. database)
  • Cache querying – it must be possible to perform ad-hoc queries for objects that match specified criteria (e.g. using SQL)
  • High Availability – if one of cache nodes fails, other nodes take over. No cached information is lost and the user does not experience any down-time
  • Scalability – the cache must be horizontally scalable. To reduce the load I can simply add another machine
  • Data partitioning – the cache must allow distributing cached of data on multiple nodes using explicitly written partitioning criteria
  • Language bindings – the cache must be accessible for clients developed in various programming languages
  • Documentation and support – the cache must have extensive and clear documentation with an active community

Feature comparison

Based on the above criteria, I want to perform a quick comparison between JBoss Cache, Infinispan and  the commercial GigaSpaces XAP. Other leading commercial caches, Oracle Coherence and GemFire  have been reviewed internally and will feature in future posts. This table is not a full comparison of all products, only a chosen selection.

Feature GigaSpaces XAP 7.0 JBoss Cache Core 3.2.0 Inifinispan 4.0.0 Beta 1
Persistence

Mirror service that persists to the underlying storage

Passivation saves cached data to the underlying storage Passivation saves cached data to the underlying storage
Cache querying Subset of SQL supported Lack of SQL support (or any other simplified data query language support) Scheduled full text search (future regexp support addition possible)
Scalability Horizontally scalable Horizontally scalable Horizontally scalable
High Availability Fully supported using backup spaces Incomplete HA support (if cache is run as Session EJB) Fully supported using redundant distribution mode
Data partitioning Fully supported Unsupported Fully supported
Language bindings Java, C++, .NET, Ruby Java Java
Planned (via memcached protocol): PHP, Python, Ruby or C
Documentation and developer support Comprehensive documentation and a highly active user forum Incomplete and slightly disordered documentation Incomplete and slightly disordered documentation

The feature comparison table shows that GigaSpaces XAP provides all the features that I am looking for. JBoss Cache is missing many features, most notably the lack of support for data partitioning. What is of great importance is that Infinispan currently supports or is scheduled to support all the features that I require.

Infinispan innovation

Back to the Java Developers' Day Workshop in Krakow, one key thing that grabbed my attention during the presentation was the discussion on dynamic routing. Why? Let's suppose we have the following scenario using GigaSpaces XAP:

I configure the application to run 10 spaces on 3 machines and deploy. Later, as the number of users grow, we dynamically add more machines to cope with the increased demand. Suddenly we have an unexpected peak in users and want to add 2 more machines, but can't.

Why? Because the application was originally configured to use 10 spaces and this number cannot be increased. To solve this, I need to stop the whole infrastructure, reconfigure the number of spaces and restart. This results in down-time that affects users.

The reason that GigaSpaces suffers from this limitation is that it has a fixed space routing table at deployment time. 

I described the above scenario to Manic and asked him how it would play out with Infinispan? It turns out that Infinispan does not suffer from this restriction as it uses dynamic routing tables. Theoretically, Infinispan allows you to add any number of machines without incurring any down-time.

Conclusion

Concluding, proprietary caches provide the added functionality and comfort values that are yet to be found in the open source world. Infinispan is feature rich, but still a relatively immature solution, currently in Beta stage. It will take a brave enterprise to use it in one of their mission-critical infrastructures.

Looking to the future, Infinispan will stabilize and mature over the next few years. Moreover, Infinispan is already implementing innovative features such as dynamic routing to solve current limitations in commercial offerings. This will only continue aided by an active developer community.

Bookmark and Share

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a01156f69dc6b970c0120a56cc010970b

Listed below are links to weblogs that reference Infinispan narrows the gap between open source and commercial data caches:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Nice post, infinispan has a roadmap and a very premature source code available, I wonder how long it would take them to have a production level product. GigaSpaces allows you to specify the number of partitions, dynamic repartitoning is done by over provisioning. The difference is Gigaspaces allows you to control that.
You forgot to mention Gigaspaces admin API that gives you full control over the cluster. With Gigaspaces you can provision additional machines over cloud infrastructure such as Amazon and scale the nodes you chose via the admin API, scaling without such capabilities is very limited

I do believe that when they say dynamic routing they refer to node relocation which is a basic feature in GigaSpaces (their documentation does not contain information about that), It would be interesting to see new ideas though.

Hi Mickey, good points but:
1: Yes dynamic routing is exactly what you described but as Manik told Infinispan can add new partitions without any interruption. He described mechanism briefly. I didn't have time to test it so far so I have to believe as he is project lead.
2: Over-provisioning have some limitations. In extreme situations it can be very bad for your system. Let's suppose you have daily peaks like 10000% of average CPU/memory usage but for relatively short time. Isn't it that, over-provisioning number of partitions like 100x times can be very harmful for your system when not in peak time? I haven't tried to even start 100 spaces in one JVM but I will.
3: Yes, GigaSpaces have very nice admin tools. I agree that those are very useful and makes your life easy. Infinispan is too young to have such tools so we can observe in next months if/how they develop some.

In my opinion Infinispan is project worth being observed. Right now it's not an option for me but maybe in future it will become.

Regarding Infinispan admin tools, there is one I know. It is called Jopr. It is a general purpose JMX-based administration tool for JBoss tools, including Infinispan.

I should also add that in addition to language bindings using memcached clients, we also now have a REST API which makes it easy to consume from almost any language/platform.

http://infinispan.blogspot.com/2009/09/introducing-infinispan-rest-server.html

Also, regarding JOPR, there is a lot of stuff on our roadmap to enhance the JOPR plugin to go beyond simple GUI display of JMX statistics. Firstly, JOPR already provides aggregate views on raw data (e.g., if Infinispan exposes a transaction-per-second metric, JOPR can poll and reset this metric every N seconds, and provide a graphical timeline view of how any given metric changes over time).

For the other cool stuff we hope to add to the JOPR plugin, see these links:

http://bit.ly/33pJlK

Hi Pawel,

I came across your site when searching for Infinispan and comparable technologies. Actually some time ago I started off looking at Gigaspaces and was looking for an open source alternative.
I have worked in many projects as a software engineer (mostly 90's) and architect (mostly last 10 years). One of the projects that I came across was processing extremely large XML datasets (containing 1,000,000 + financial transactions) and entered a team where development had been outsourced. The guys had come up with a 'solution' using JaXB. Of course that could not work. At least that is if you were to process all data into memory and work from there (which was exactly what they had been doing). I estimated the max dataset to be around 120,000 depending on available memory. In practice it was much worse of course (only about 30,000 max). So, I recommended a different solution based on SAX or StAX (actually I had done that already). As a spin off of that project I decided to further develop that solution into a MDE based approach. Based on the specification (XSD) the code generator would generate the XML parser, which could send events (logical) to the listening processor. This is what I did and it worked very well. You can find the information here: http://dijkstra-ict.nl/documents/XMLParserTechnologyForProcessingHugeXMLfiles.pdf.

If this is of interest to anyone here, drop me a note.

Kind regards,
Lolke Dijkstra

well explained article!

Here you didn't mention NCache . NCache is also a commercial distributed caching technology. This is the most mature solution of caching to enhance performance and scalability as I’m also using it. NCache offers more features than other caching technologies.

Here you didn't mention NCache . NCache is also a commercial distributed caching technology. This is the most mature solution of caching to enhance performance and scalability as I’m also using it. NCache offers more features than other caching technologies.
Details related to NCache: http://www.alachisoft.com/ncache/index.html

The comments to this entry are closed.