« Infinispan narrows the gap between open source and commercial data caches | Main | Pilgrimage to OracleWorld »

September 25, 2009

Private Data Cloud: 'Do It Yourself' with Eucalyptus

Why are Enterprises implementing Private Clouds if the Public Cloud deployment model is gaining in popularity day-by-day? Guy Rosen summarizes Public Cloud growth within the user base of the  Amazon Elastic Compute Cloud (EC2). Since its debut in 2006, 8.4 million EC2 instances have been launched. Impressive as these statistics are, many enterprises still consider the Public Cloud as currently a no-go area. Reasons include data security  and SLA concerns, data compliance/governance regulations and the complexity of migrating legacy applications. This is where Private Clouds step-in.

Private Clouds provide many of the benefits of the Public Cloud, namely elastic scalability, faster time-to-market and reduced OpEX, all within the Enterprises own perimeter that complies to its governance. Leading commercial Private Cloud products include VMware, Univa UD, Unisys. Open source solutions include products like Globus Nimbus, Enomaly Elastic Computing Platform, RESERVOIR and Eucalyptus.

Yesterday, I attended the Webinar “Convergence of Physical, Virtual and Cloud, during which Dr. Rich Wolski, Chief Technology Officer of  Eucalyptus Systems, described Eucalyptus as Private Cloud data storage. This interested me and I set about learning more.

Technology overview

Eucalyptus enables the creation of Private Clouds that can interface with  Amazon Web Services API, which they view as the de-facto standard. The Enterprise edition, first released in September 2009, is fully compatible with  Amazon EC2 and S3, whereas the open-source version supports almost all functions of EC2 and a limited set of S3. Enterprises can create hybrid Clouds with data and virtual machine images that can be seamlessly accessed from Eucalyptus clouds and Amazon's Elastic Compute Cloud and Simple Storage Service.



Eucalyptus is architected from 5 distinct SOA components as shown in the above diagram. Virtualized computation is provisioned by Cluster Controller, which schedules VM execution on Node Controllers. Data is is stored using the Storage Controller, which implements block-accessed network storage. To share these blocks between VM instances the virtualized storage provided by Walrus is used. The Cloud controller exposes and manages the underlying virtualized computation and storage, i.e. VM management, access control policies, accounting and monitoring.

Walrus persistent data storage

The essence of any Cloud-based solution is how to provision the data to be processed. This is the key to unlocking potential benefits of processing large data in the Cloud.  Eucalyptus uses Walrus to persist data into simple bucket storage, essentially a key-value store. It provides rudimentary methods to operate on the data, including put, get, delete and enables the setting up of access policies. Importantly, the Walrus interface is compatible with Amazon S3 and supports the Amazon Machine Image (AMI).

Walrus is good for simple storage but it does not address the underlying needs of large data computation in the Clouds. Take the scenario of a manufacturers production line quality control. Large volumes of test data is required to be processed based on business logic that defines relationships between this data. These relationships are highly complex and would be non-trivial to model in the application code if using key-value based storage.

What is required is the introduction of an data provisioning layer (i.e. sharding databases, data-cache etc.) to enable complex querying of the data with Walrus providing persistence as a service.

Work in progress

  • Scalability - By definition a Cloud must be scalable. Rich Wolski reports that Eucalyptus can theoretically scale up to 5,000 nodes. Interestingly it remains undefined if these are physical or virtualized nodes. If physical, then it is enough for ~90% of Private Cloud data center needs, but if we are talking about virtualized then this may prove to be a blocker for enterprises that have greater needs.
  • Host OS support - Eucalyptus is packaged for many different Linux distributions (e.g. Ubuntu, Debian, Fedora, CentOS), but currently does not support Windows.
  • Hypervisor support – Citix Xen, KVM fully supported. Support for VMware is only available in Eucalyptus Enterprise Edition.
  • Open-source version lacks enterprise features, such as fail-over support for some of the key components and contains rather basic built-in management tools.

Eucalyptus in action

Currently the largest Cloud infrastructure based on Eucalyptus is NASA's NEBULA. It aims to provide highly scalable storage in the hundreds of thousands of terabytes. It is interesting to note that NEBULA  forms the backbone for NASA's plethora of websites i.e. the delivery of static data with minimal intensive data processing.

Eucalyptus forms the base of Ubuntu Enterprise Cloud that enables organizations to build their own clouds that match the interface of Amazon EC2. As Canonical, the commercial sponsor of Ubuntu says, "their main goal was to create a product compatible with Amazon's Elastic Compute Cloud (EC2), Elastic Block Storage (EBS) and Simple Storage (S3) API".

Eli Lilly, the 10th largest pharmaceutical company in the world,  is using Eucalyptus and Amazon cloud computing  services to support its scientists with on-demand processing power and storage. New servers are now provisioned in 3 minutes compared to 7 and a half weeks. This leads to faster time-to-market for key products.

Final thoughts

Eucalyptus is a technology that is worth investigating by companies that want to run private clouds which comply to their governance. The $5.5 million that was raised in April and subsequent commercial release gives a clear indication of the direction of the project.  Eucalyptus Cloud Computing Platform does not provide all Cloud features but we have to remember that it is not designed as a replacement technology for AWS or any other Public Cloud service.

Bookmark and Share


TrackBack URL for this entry:

Listed below are links to weblogs that reference Private Data Cloud: 'Do It Yourself' with Eucalyptus:


Feed You can follow this conversation by subscribing to the comment feed for this post.

Nice article about private clouds, and why you would want one. However, it seems to me that the biggest reason for a private cloud is: If you don't use all the processing power, memory, and/or storage space in the instance running on Amazon, who gets those resources? AMAZON! If you're running a private cloud, you have all the benefits of dynamic provisioning, scalability, etc., and you get keep the left-over CPU cycles and bits. Better yet, you get to apply those cycles to other loads. As your load increases over time, you can increase the number of physical nodes, which is limited by the rate of your procurement process.

To: captive insurance companies,
Actually, it is slightly more costly to setup than VMWare or the like. However, in terms of hardware cost it is a real bargain. If you look at the Amazon EC2 unit, it is a 2007 1.0 Ghz Xeon process (about 1/4th of an actual processor). I've priced out 4 systems each running dual Xeon 3620's (about 64 EC2 units), 32G RAM, redundant network and power supply, comes to about $25K. Throw in co-location hosting and networking, and you're paying about $1000-$2000/mo. This is equivalent to about 16 of Amazon's High CPU Extra Large (7G RAM, 20 EC2 units), which would cost $7840/mo. At $2K/mo hosting, you break even after only six months.

So, yeah, a private cloud is cheaper if you have the load. And the good thing about Eucalyptus is that it is completely compatible with EC2; so you can expand into Amazon as your load increases, then back-fill with your own hardware.

regulations and the complexity of migrating legacy applications. This is where Private Clouds step-in.

Remember these are "expected" outcomes, they need not to actually happen but you want to show the reviewer that you have a thorough thought process and, based on your due diligence, these are the expected outcomes. You should aim for outcomes that challenge conventional wisdom and push people to think about the problem

The comments to this entry are closed.