« When to migrate your database? | Main | Business continuity with real-time data integration »

April 19, 2010

The cost of High Availability (HA) with Oracle

What's the cost of downtime to your business?  $100,000 per hour, $1,000,000 or more? The recent volcanic ash that has grounded European flights is estimated to be costing the airlines $200M a day. In the IT world, High Availability (HA) architectures allow for disaster recovery as well as uninterrupted business continuity during system failure.

This post focuses on a customer’s backend, comprised of a business application stack supported by a dozen Oracle databases. They wish to equip this infrastructure with HA features and ensure that outages do not cost business. How do we address the challenge of pricing the complete solution, with hardware, software, services and annual support?

The options

Active Data Guard can be used if the locations are far apart, while Oracle RAC promises transparent application failover if they are in close proximity. For Enterprise-class users with database heterogeneity, then GoldenGate software, the 2009 addition to Oracle’s portfolio, is an attractive solution.

And for those with no stringent real-time I/O requirements, backup might be enough, so it’s worth considering Oracle Secure Backup or the external cloud variants that allow us to save on CapEx. With this option, there’s no need for extra hardware if the data is on the cloud. Home-made solutions based on Streams are also not unheard of.

Others will prefer hardware replication with intelligent disk arrays from EMC, Fujitsu, Sun (Oracle) and any other vendor in conjunction with clustering software such as Red Hat Cluster Suite. Those with cost as the number one priority might consider open-source disk array software replication methods such as DRBD.

Two architectures, Three competing solutions

Using our Momentum(tm) methodology (technical and economic analysis) we narrow the field down to two alternative HA offerings. These two solutions are based on introducing a HA management layer through either database clustering or disk array replication.

1) Database clustering "DB v1" Oracle RAC allows databases to be run on the server farm for failover and efficiency purposes. When one server instance fails, the other transparently takes over. For storage redundancy, ASM is used to manage data replication between the storage units. HA is achieved through RAC managing redundancy at all levels. (Note: This design assumes geographical proximity of the redundant nodes due to synchronisation issues).

2) Disk array replication "Disk array" OracleDB is still used, but in this scenario the database is unaware that the underlying architecture provides for business continuity. Instead, intelligent disk arrays transparently perform data replication to the remote location. High Availability is invisible to the upper layers of the software stack (Note: also here, there is a limit on physical distance between sites connected, due to latency and bandwidth characteristics) 

3) And the third solution? "DB v2" Based on exactly the same architecture defined for "DB v1"  and knowing that some associated licensing restrictions would not affect the customer’s operations, we can recalculate the costs using a different licensing model (in this case restrictions were limiting the number of system users).

In short, the finance director will see three solutions to choose from, while the IT architect will only see two.

More on licenses

Oracle currently has a sophisticated licensing scheme and working out the optimum involves exercising the patience of our Oracle sales team. Beyond the alternative licensing models, the usual headache is to calculate the license migration from legacy architecture paradigms. Most vendors defend their business interest so that if you haven’t purchased support in the past, they will make you purchase it backwards, otherwise migration won’t be possible. Such aggressive loyalty schemes can annoy and put customers off who have little choice and so negotiations follow. This is where consultants are used to find an agreeable compromise between both parties.

Pricing

To forecast total project direct capital expenditure, we break it down into the four major CapEx: (1) software licenses (2) hardware (3) services (4) year 1 support package which is normally paid up front.

HA_Cost_Pricing

The original “DB v1” option was priced at $611K. After the license tuning exercise, the total for “DB v2" option came in at $518K - a saving of 93K.

For this type of project, the major cost considerations are hardware and software, while services and support are marginal. For "DB v1", cost breakdown is: 36% hardware, 40% software, 13% services, 11% support.

So, if the hardware only solution removes additional investment in software, can we see significant savings? No. Surprisingly "Disk array" comes in at $561K. Enterprise-grade storage arrays with replication features are not cheap.

Still open to interpretation

The cost of achieving business continuity using software is slightly less, although still comparable, to the hardware-only solution. So hardware or software?

Frustrated strategist: Hardware is not the only answer

This is just another example of how wrong it is to “kill the problem with hardware”. How often do we see people instinctively decide to purchase extra hardware to overcome scalability challenges? To me, this knee-jerk reaction is based on short-termism thinking and is extremely frustrating. Just look. Here, the distributed middleware provides you with the same results, but slightly cheaper. This cost saving will grow dramatically the more you scale out. Why?

Simple - software vendors have greater freedom for volume discounts than the hardware vendors. Once you’ve purchased one license, the vendor’s cost of granting another for free is zero. It’s worth remembering that the larger your installation, the greater the cost gap between hardware-centered and software-centered solutions becomes. For scalability there is no choice, software-centered solutions win every time.

Conservative Tactician: Step back and figure out the context

At the moment this is a small project and we’re not scaling out just yet. It’s surprising how similar the costs of both the hardware and software approaches are. A 7% cost difference isn’t considerable, especially as before the license restrictions work around, the hardware variant was actually cheaper! The price difference itself is not significant enough to help us decide which approach is best.  The factors affecting our decision will not be financial, but contextual:

  • What’s the envisioned technical context of this environment? Is this a dedicated system, or will it be sharing resources with other applications? It’s worth noting that hardware-based replication will equally protect all the data of the software running on top of it. Here you can fix all your HA needs at once, at the risk of volumes of unnecessary replicated data slowing the network. Instead, RAC will only provide business continuity to the data inside Oracle DB; lesser the risk, lesser the reward.
  • What’s the Recovery Time Objective and Recovery Point Objective in your Disaster Recovery Plan? At failure, Oracle RAC will “guarantee” to seamlessly and transparently switch over the cluster control with no data loss, while other solutions require a few seconds or minutes to switch over (so not being completely transparent). Can you really afford this delay without harming your business?
  • How does this fit the generic site policies, such as data center virtualization, disaster recovery plan or power consumption targets? Now the picture can become complicated and various arguments lean towards certain solutions. Imagine your environment is virtualized. In case of an incident, the “Disk array” hardware replication will allow all VMs to be automatically restarted on a secondary site. The downside? Service interruption is around 1 minute, while “DB v1” software solution is seamless. On the other hand with virtualization covering two replicated sites you are approaching licensing hell; you’ll pay up to twice the price for all your programs unless you are able to consolidate. The costing analysis here is quite complex but definitely worth investigating. In the long run you’ll easily save six or seven digits by picking a HA strategy that matches the site policy. 

Of course these are just the fundamentals and there are many more factors to consider. Any HA project that’s undertaken must take into account the indirect, hidden, and long-term cost implications.

Summary

  • With small installations, achieving HA and business continuity of your database-oriented software stack can be implemented with various strategies, including hardware or software solutions at an almost identical cost.
  • Don’t immediately resort to a hardware-only based solution, at least not before proper consultation. Think strategically and ask yourself whether your architecture will need to scale. Cost implications here will be significant.
  • Don’t rely solely on software until your HA solution covers the complete context. You chose software for agility. You don’t want to close your budget with the business continuity architecture that covers only a fraction of your critical systems.
  • Apply methodical analysis during the decision process.


Bookmark and Share

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a01156f69dc6b970c01347fde1c4b970c

Listed below are links to weblogs that reference The cost of High Availability (HA) with Oracle:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

> HA is achieved through RAC managing redundancy at all levels.
It doesn't provide redundant data, right? All RAC nodes share the same SAN. If something goes wrong with your SAN all nodes are "down".

Hi Pawel
Great post with lots of interesting data points that we sometime tend to forget when were dealing with disaster recovery.

IMO the main lessons is to design our system to cope with failure rather then trying to design our system to prevent failure from happening.
That requires that the application and the software will be designed in away that they will be aware of of failure and could handle that.


Interestingly enough i just published a new post on that similar topic describing how a top wall street firm handled that with our Elastic Middleware and the automation that comes with it.

See the reference to my post on that regard here


Dave: but we do have redundancy at all levels in this architecture. The proposed database clustering scenario consists of two servers, two disk arrays, and two redundant sets of network resources to communicate between them. This is what was meant by the mention of storage redundancy managed by ASM. You are correct however, that this is not the default RAC configuration, and RAC alone does not imply redundancy at all levels.

clustering software such as Red Hat Cluster Suite. Those with cost as the number one priority might consider open-source disk array software replication methods such as DRBD.

Have you tried Oracle with the Gridlock high availability cluster (http://www.obsidiandynamics.com/gridlock). It also does off-site replication and integrates with DRBD. Last time I used it, it felt more responsive than Oracle Clusterware. It's also cheaper to run - doesn't require quorum.

Oracle is great i always use oracle 6i still.

I have heard companies claim that the hardware only option is more cost effective and I have heard others claim the same about software for business continuity. I think the choice that fits a business best all depends on the size of the business and the current market.

Thank you for the comment disaster recovery software. This is a very true statement and one that many companies need to remember before making the decision.

OracleDB is still used, but in this scenario the database is unaware that the underlying architecture provides for business continuity.Orkut Scraps Instead, intelligent disk arrays transparently perform data replication to the remote location. High Availability is invisible to the upper layers of the software stack

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment