Search Disaster Preparedness Blog

Entries in disaster recovery (6)


Possible Human Error Causes Rapid Market Decline

I was going to post something on Human Error in a later post, but in light of today's alleged incident in the U.S. Markets which caused a nearly 1000 point drop in the Dow I thought I would at least touch on it.

Currently here is what we know about the alleged incident - (Which is still being investigated) A trader at Citi entered a "B" for Billion instead of an "M" for Million. This in turn caused a nearly $10 {CORRECTION it was about $20 give or take a few} stock price dip in PG, and subsequently caused the rest of the market to drop before recovering. 

On the good side, these things can be researched, and fixed. However, in an already skittish and heavily volitile market it can easily cause additional panic by the average investor.

In addition to other issues, it has already been proposed that someone else who knows what they are doing could easily manipulate the market and cause an intentional crash (I believe the commentators were implying a cyber-terror event).

Already the investigations by the SEC have begun and we will have to wait and see what the outcome will be.

Look for a future post coming soon about Human Error and the Impact on Business coming soon.


Business Continuity - An Overview

By Sam Neal

An Introduction to Business Continuity

Business continuity planning, encompassing disaster recovery, minimises the impact of an incident on an organisation by ensuring alternate processes are in place for key operational functions. Business continuity planning looks to preserve assets as well as an organisation's ability to achieve its mission, retain acceptable levels of productivity, customer service, and ultimately to stay in business.

Can an organisation be too small for business continuity planning? Business continuity planning is not consigned to large organisations; any provider of a service or product, whether it is financial, manufacturing, distribution or sales, is equally exposed to the effects of a disaster. Are you prepared if something goes wrong?

Surely a business continuity plan is not needed if adequate insurance is in place?

Quite simply insurance does not buy back lost business, it only provides money. If this is not received immediately it could adversely affect cash flow, subsequent profits and client goodwill. Studies suggest that typically only 60% of actual losses are covered. Could your organisation survive the loss? Disaster does not just occur following an incident on a grand scale. A small incident, over a short period, impacting a key process, could severely disrupt an organisation; for example, an incident in the local area that requires evacuation of the premises for hours or even days. Computers still run, phones still work and infrastructure is unharmed but there is no access to any of it until the incident is resolved. Interruption threats come from multiple sources; some more likely than others. Premises may be substantially flooded, destroying servers, or an organisation may be the victim of theft. A business continuity plan examines the likelihood of this happening and considers a response relative to the risk.

It is vital to determine what would be addressed first following an incident. Who would be contacted first? How would staff be notified? To do this you need to examine your organisation, its people, its critical processes and how these are dependent upon considerations such as IT and infrastructure support, internal dependencies and suppliers.

Incident containment and recovery solutions are numerous and varied. If a flood for example, prevented access to your premises, could client service levels continue uninterrupted? The chance of this happening would be greatly increased by your staff logging in from home until full recovery is achieved. Without plans such as this in place how can you convey a level of operational confidence to your clients?

There are many factors and aspects of business continuity. It is important to be realistic and think sensibly about how your organisation would cope with a disruptive incident. Business continuity is about mitigating the impact of this incident by minimising financial losses and protecting your organisation's reputation.

The solutions are not just quick fixes but long-term considerations. It is possible to survive an incident, but not necessarily possible to recover from the long term impact.

Where do I start?

Business continuity concerns each and every organisation. Business systems must be resilient. If business continuity planning fails, so does that of an organisations clients. Not being able to access data, emails, and premises, or even make a phone call all have the potential to damage a business - and that is only the start. A second reason why business continuity is vital is that organisations expect IT support on demand. A business should commit to investment in failover systems in multiple locations, home working and standby power generation on-site, this way directors can be confident that a robust set of business continuity contingencies will be there.

The following pages highlight some key areas of IT business continuity that an organisation should consider. Business continuity is a huge area and this is by no means a definitive guide. What this section will hopefully do is stimulate thoughts and further questions about how you can implement cost-effective IT business continuity plans.

What options are there?

IT business continuity planning needs to address both the hardware and data contained within the system. This section highlights some of the ways you can build protection around your system. It is essential to ensure comprehensive planning is in place by using highly resilient servers, secondary power supplies, dual Internet connections, redundant storage and uninterruptable power supplies. As well as this it is recomended that companies use thin client technologies, such as Citrix and Microsoft® Terminal Services, for remote access, and virtual servers to provide both flexibility and resilience.


You can build a lot of resilience into your IT system hardware. The aim when creating a resilient system is to remove any single point of failure. Hard disks used to store your applications and data are a likely point of failure, making them an area of risk and a key place in which to build resilience. You can build storage resilience by using a Redundant Array of Inexpensive Disks (RAID). By using RAID your system can lose a hard disk and still function without interruption, giving you time to replace the failed disk.

Another way to build resilience is to address the potential failure of power supplies. IT systems prefer clean power supplies; power outages or even dirty power can cause serious problems. You can build resilience into your servers by having hotspare power supplies receiving power from different sources. This way, if one source fails the other continues whilst the failed supply is fixed. As a minimum you should have all your servers on Uninterruptable Power Supplies or UPSs as they are more commonly referred to. UPSs continually clean and smooth the spikes out of power that is provided. In the event of a power outage UPSs keep servers running long enough to safely close them down or switch to an alternative power supply. If you cannot afford to have servers down, then you need to consider alternative power supplies like standby generators that kick in automatically if they detect a power outage.

Using more than one Internet Service Provider (ISP) builds added resilience into your communications infrastructure. If one communication link fails, the other can take over. However, just having different ISPs providing broadband connections is not always enough. A further consideration should be to ensure your links to the Internet do not use the same means of connection. ISPs often use the same cable and exchange, meaning that should there be a problem between your office and the exchange, it is likely you will lose both connections. To avoid this it is recomended implementing an alternative method of connecting to the Internet such as a radio link.

Virtual Servers

Up until recently servers were built and optimised for the hardware and operating system they were running on. Now with the availability of more powerful hardware these physical servers can host multiple operating systems. Each hosted operating system is known as a virtual server. These virtual servers run their own operating systems independently of the host and the other virtual servers. Because they are no longer dependant on the hardware they are running on, it is now very easy to transfer or replicate a virtual server from one physical host to another dissimilar physical host. For business continuity purposes, restoring a server onto dissimilar hardware is a long and complicated process, but with virtual servers the process is far easier and takes a lot less time due to their hardware independence.

Another advantage of virtual servers is that it is possible to run more than one virtual server on a physical host server, thus taking advantage of any spare processing capacity on the server. Also, in a business continuity scenario it is possible to have a few powerful physical servers hosting a number of virtual servers at a remote location, be it a branch office or a hosting centre. Virtual servers can be easily replicated or restored onto these hosts at the other location ready to be enabled in the case of a business continuity scenario.

Thin Clients

For a number of years now it has been possible to access systems remotely as if you were sitting at your computer in the office. Typically you would have a Citrix server, or servers, hosting thin client sessions for each of your users. Users might be sitting in the head office, at a branch or even at home, and can access a server via the Internet. Thin clients offer great advantages in business continuity planning; for example if Citrix servers were used at both the office and the branch office or hosting centre and an incident occurred it would be easy to redirect Citrix thin client sessions to the other Citrix server. This would allow the workforce to carry on working unaffected by the incident.


In order to reduce the time it takes to recover a server or data, replication should be considered. There are a number of different ways of replicating servers and data to other storage devices or servers. By using other storage devices data still has to be recovered. However, if data is replicated to other standby servers it is simply a case of enabling the servers, meaning you can be up and running again quickly using a recent copy of your data. Ideally these standby servers, with the replicated data on them, would be housed at a different location, be it a branch office or a hosting centre.

What about my data?

Having considered your hardware, you also need to address the challenge of protecting your data. Both traditional solutions and new emerging technologies play a key role in comprehensive data protection.

To ensure internal data is protected it is desirable to have implemented a series of solutions. In addition to traditional tape backups many organisations have implemented technology such as Microsoft® System Centre Data Protection Manager (DPM). Due to the massive business benefits DPM offers, it is considered it a key part of any comprehensive business continuity plan.

Traditional Tape Backup

Tapes have traditionally been the most widely used form of backing up data on an IT system. During off-peak hours, the system is backed up to tape. Tapes should then be checked to see if the process has been successful and then taken off-site. This off-site location ensures protection of the data should an incident such as a fire occur.

Backup tapes are a great form of cost-effective backup, but it is important to be aware of associated limitations. A large amount of data can be backed up onto one tape with the process typically being performed out of hours. This in itself might not suit some companies as off-peak hours are less common due to flexible working. Because of the way data is backed up onto tape, recovery times can be quite lengthy as the data has to be located on the tape before it can be restored. In addition, if an incident occurs at the end of the working day, the recovery point back to the last backup would be the night before, meaning that you could lose an entire day's work.

Continuous Data Protection

Continuous data protection is a solution where, as the name suggests, a system's data is continually being backed up. This removes the issues associated with traditional tape backups in that downtime is not necessary as your data is being backed up continuously as changes are made. In order to enable this type of solution, adequate disk storage is required to store the most recent revised data. A snapshot of this data can then be taken periodically; for example daily, and the snapshots can be backed up to tape for longer term storage at your leisure.

Microsoft® System Centre Data Protection Manager (DPM) is a solution based on near continuous data protection.

DPM constantly monitors protected servers and only copies changes saved to the protected server to a DPM server. A major advantage of only bringing the changes across is the significantly reduced bandwidth required to protect the server. Because of this reduced bandwidth it is possible to protect servers in branch offices across a wide area network. DPM is also Microsoft® application aware, meaning that it is compatible with applications such as Microsoft® Exchange and Microsoft® SQL and can therefore protect these accordingly. By using snapshots and by being application aware, DPM can restore Exchange or SQL to within the last 15 minutes. It can also provide up to 512 recovery points by creating periodic snapshots. Snapshots can be created as often as every half hour if required but typically they are created at least once a day. Performing one snapshot a day and capturing changes every 15 minutes means you could have nearly 50,000 recovery points and potentially be able to recover data to any 15 minute point in time over the past year and a half. Realistically though you would normally have two weeks' to a month's data on disk and then offload this to tape for long term protection. DPM has been written with ease of use as a priority. Unlike recovering items from traditional tape backups it is very easy to use the DPM console to find the item you wish to recover, view all its potential recovery points and then recover it to its original location or copy it to a new location. This process takes far less time than it would to recover information from tape. If enabled, it is even possible for users to view previous versions of files and recover them without having to involve their IT departments

Another factor DPM addresses is human error. Traditional tape backups require someone to check the previous night's backup and swap the tapes. Quite often it is assumed that last night's backup happened without any problems and the tapes are duly swapped. If for some reason the backup failed and no one noticed, the tape would be useless. DPM can back up from itself to another DPM server in another location, across the Internet or a wide area network. This can happen automatically and does not require human intervention. Using this method an off-site copy of the system is automatically provided each day. Though tape backups are still recommended for longer term storage, this automatic backup reduces the need to rely solely on them.

In the event of a major incident at your main site, data on your second DPM server can be quickly and easily restored onto alternative servers meaning that you could be up and running quickly. Combine this with virtual servers and thin clients and you have a very cost-effective business continuity plan.

IT support & software reseller JMC is an IT & Communications company based in Manchester, UK. They are a Microsoft Gold Certified Partner & Pegasus Strategic Partner and specialise in business solutions for organisations of any size including some of the biggest sporting organisations in the world. They offer a complete range of tailorable IT products including Microsoft Dynamics GP, Microsoft Dynamics NAV, and Pegasus Opera II.

Article Source:


Plane Crash in Palo Alto Causes Many Businesses and Community to Activate Disaster Plans

As I often say, business disruptions come in many forms, not just from earthquakes, tornadoes and hurricanes.

This point made clear again from the events yesterday morning when a small plane struck a tower that was a single point of failure which carried electricity to the Palo Alto power grid.

The loss of power disrupted roughly 28,000 customers which included, 240 startups and high tech companies such as Facebook headquarters (the Facebook site was unaffected), VMware, and HP. At least Two medical facilities were on back-up generators and had to cancel elective surgeries and re-route emergency patients to other area hospitals.

At least two cell towers were not working, and some land-line services were disrupted, and the areas banks had to activate their contingency plans.

Authorities were asking people not to call 911 regarding the outage, and to reduce water usage.

You can see more on the City of Palo Alto site.

Tesla Motors suffered the worst loss. The three people who died in the plane were employees of the company, and my thoughts and prayers go out to them, their business, and family members who are affected by the loss.

If any good might come from such a tragedy I hope that it makes people aware how vulnerable we are to events such as this, both in our personal lives and from the point of view of business.


AP News Video



Free Report: How to Create a Corporate Culture Dedicated to Business Continuity

This is a recent article I created and give to my clients through my business website at Continuity Corporation. Recently people have doing some relevent searches here on my blog so I thought I would make a few small changes and share one of them here.

How to Create a Corporate Culture Dedicated to Business Continuity -


If you would like more of these please let me know.


Disaster Tip of the Week: Backup Your Files

I've been telling people to back up their data for almost fifteen years now, and I wrote about it in one of my first articles The Importance of Data Backups back in 2005. However, I still constantly run into people and businesses that fail to make frequent data backups and see stories in the news about data loss all the time.

The recent Sidekick and Snow Leopard issues that I did a post on recently also show how vulnerable we can be when we do not properly make these data backups.

Manually backing up your data can save you a lot of headaches in the long run, including issues with compliance and regulatory requirements.

Back when I wrote the article above, someone sent me an email stating that when they have a data loss, they just have the chance to do it all over again and do it better. This may be a great outlook to have, but not a very practical or cost effective one.

According to The Cost of Lost Data, a Pepperdine University report updated in 2003 (Pre-Sarbanes-Oxley) Dr. David Smith estimates the average cost of irrecoverably lost data at more than $10,000 per megabyte lost. This does not take into account the value of the lost data which on average is about $3,400.00 per incident.

In addition to backing up your data, I would also recommend making copies of your Vital records, and other business documents essential to your operations (electronically if you can) and store those off-site as well.

You also don't need an expensive solution, even if you are a small business, indivdual, or on a shoestring budget there are still steps you can take to save and backup your data without having to break the bank.

Fore More excellnet Statistics on Data Loss see this whitepaper from HP & Score:   

Impact on U.S. Small Business of Natural & Man-Made Disasters