Webinar: Why Your DR/HA Systems Will Fail….

May 20, 2009

by Doron Pinhas
VP, Field Operations

Last Thursday I had a great webinar discussion with Analyst Christine Taylor from The Taneja Group on one of the greatest threats to recoverability and HA – configuration drift. The event was called Why Your HA/DR Systems Will Fail…and How to Make Sure They Won’t and if you couldn’t join us live, the webinar is now available on-demand. Just go to our website (http://www.continuitysoftware.com) and click on the link under Latest Webinars.

When configuration drift occurs – and it is inevitable – your production or primary infrastructure configurations become different from your recovery or secondary infrastructure. This creates serious data protection and host configuration gaps that threaten your ability to achieve your Recovery Point and Recovery Time Objectives.

Christine and I covered a lot of topics during our conversations, including:

  • Why configuration drift is a process problem, not a technology problem
  • Why disaster recovery and availability testing falls short of addressing the issue
  • How automated testing and monitoring solutions from companies like Symantec and Continuity Software are helping companies bullet-proof their DR/HA strategies

In addition, I provided a detailed look at several common recoverability/availability gaps that are created by configuration drift, why they occur, how they will impact operations, and how you can avoid them.

I hope you get a chance to tune in. I think you’ll find it worthwhile.


Gap Analysis #6: Configuration Drift between Production and HA

May 16, 2009

by Yaniv Valik
SR DR Specialist, DR Assurance Group

Here’s a gap that we frequently see in HA environments.

Gap: Configuration Drifts between HA Cluster Nodes

Risk: Downtime; manual intervention needed to recover

How does it happen?
While there are many ways this can occur, let’s look at one example: the passive node does not have redundancy in the HBA level nor in the DNS configuration. The currently active node is configured with redundancy for these elements. A single HBA/DNS server configuration is a single point of failure. Upon fail-over/switch-over to the currently passive node, the applications running on this cluster will suffer from reduced availability/MTBF and more downtime. In addition, the passive node is configured with significantly less maximum allowed open files, which may lead to application failures. Moreover, the passive node has only 1GB of swap while the active node was configured with additional 4GB. Upon fail-over, the applications may not have sufficient memory to run properly. Lastly, differences in installed products may have various impacts, depending on the product type.

What is the impact?
This will vary depending upon the specific drift, but can include a failure to switch-over/fail-over/switch-over to other node (causing downtime), or reduced performance after fail-over/switch-over which will, at best, create an operations slowdown and at worst leave the node unable to carry the load

Can it happen to me?
This situation occurs frequently in HA environments. The configuration of a host involves so many details that is it very difficult to ensure an HA server is fully synchronized to its production host at all times.

If you like to read more gap analyses, go to our website at http://www.continuitysoftware.com/commongaps


Discussion: DR Strategy for VLDB

May 8, 2009

by Yaniv Valik
SR DR Specialist, DR Assurance Group

This discussion shows how important it is to choose the right replication solution…one that can carry the load.

http://sql-server-performance.com/Community/forums/t/30019.aspx


Follow

Get every new post delivered to your Inbox.