Disaster Recovery and Data Integrity


Introduction.  A disaster recovery plan attempts to anticipate what disasters could hit an organization and set out a plan for responding to these disasters.  The organization needs to develop approaches that will diminish the impact of potential disasters and prepare for quick restoration of services.
  • What disasters could affect your site?
  • What are the chances that each of these disasters might occur?
  • What would be the costs for each of these disasters if they do occur?
  • How quickly do various aspects of the organization need to recover?

A disaster with respect to computing services is a catastrophic event that causes a massive outage or reduction in performance affecting an important aspect of the computing system.  The following list outlines some potential disasters.

  • Natural disasters
    • earthquake
    • hurricane
    • tornado
    • plague
    • lightning strike
    • fire
    • flood
  • Unnatural disasters
    • bomb
    • massive power outage
    • system intrusions

Risk Analysis.  The first step in developing a viable plan is to do some risk analysis.  Often times this is best performed by outside specialists.  Some risks are much more likely to occur.  Consider firms that have their buildings on or near to fault lines.  These firms are much more at risk from earthquakes than firms that are sizably distant from known earthquake zones.  Other firms may be quite close to a coastline that has some tendencies to endure hurricanes.  In this instance it is also important to assess what sorts of damage is most likely to occur.

More typical sorts of disasters have to do with power spikes or even outages.  It is usually important have some sort of system to make sure that spikes cannot directly impact your systems.  It is also typical to have some level of backup power sources such as batteries or maybe even self-standing power supplies.  The necessity of these obviously depends on the situation and it is important to evaluate for these eventualities.

Some firms may also have legal obligations to other firms.  For example, automatic teller machines for a fairly large variety of banks went down when flooding occurred in New Jersey.  All of these banks were outsourcing their ATM with the same vendor.  When very surprising flooding took out the provider many banks were without ATM service for almost a week.

Internet infrastructure providers can have similar obligations.  One stock exchange had contracted with a variety of firms to ensure there was a continuous flow of information between San Francisco and New York City.  Their thinking was that by having alternative connection sources they would be much less vulnerable to outages.  It turned out that one day a backhoe operator in California took out a single major line and this alone took down the data flow.  Little did the exchange know that all of their vendors were contracting their cabling connection from the same source.

Any sorts of legal obligations need to be thoroughly developed in conjunction with the legal department.  These obligations translate into requirements in the disaster recovery plan.

Preparation.  It is almost always important to develop strategies to reduce the impact and costs of disasters.  Sometimes this can be done at little additional cost, though this is not always likely to be possible.  For example, the stock exchange could have limited their vulnerability to data flow breakdowns with just a bit more in depth investigation.  The impact on costs would have been fairly minimal.

In the case of potential flooding it may well be enough to make certain that particular buildings are not in flood plains and that certain capabilities are fairly well out of reach of high water.  The impact of minor earthquakes can also be mitigated with little  additional effort beyond what should be done by everyone everywhere.

In spite of designing to diminish the impact of disasters, you are still going to need to be able to recover from the unforeseen.  It is important to be able to restore essential systems into working condition in a timely manner. 

Restoring essential systems can actually mean rebuilding data and services on new equipment if the old equipment is not operational.  If this might prove to be the case then you need to prearrange sources for replacement hardware.  This may also imply that you have some sort of agreements to get priority over others in the region to have your needs fulfilled.  It also implies you have some very good and relatively invulnerable backups provided elsewhere.

Things like power, telephone and network connectivity are often some of the most basic services you need to have functioning in order to get other services up and running again.

Data Integrity.  While we keep mentioning this indirectly, you also need to make sure that data is not altered by external sources.  If it is, then you also need to make sure you can restore it to what it should be.

  • Backups
  • Virus checking
  • Firewalls and security
  • Cleverness even in the face of the obvious

This loss of data integrity can arise from all kinds of disasters including computer viruses or competitive espionage.  The ability to reproduce data can also be essential to proving intellectual property rights in a court of law.

Most of these fall under the aegis of security development for the overall system and should already be well taken care of.  But it still is important to consider these things when designing recovery plans and intentions.

The Icing.  In some instances it may be necessary to have redundant sites.  These sites may share the load during normal operations.  While it may result in some slow downs, these redundant sources should be configured to be able to take on the demands of the lost site and services.  Making use of such options can be very important in the initial and continuing design.

It is also important to be prepared for dealing with the media.  The media will want to know

  • What happened?
  • What effect it is having on the organization?
  • When services and capabilities will be restored?

You need to be able  to give answers to these questions to the media.

It is usually important to have some sort of link to public relations personnel, either internal or external or both.  It is also important to try and plan ahead on how you will deal with the media.  Developing strategies at the last instant and putting together press releases can be nearly impossible when done during a disaster.  The plans need to include decision makers and those at the highest levels of the chain of command.