By Abe Covello, Structured Senior Systems Engineer
The Playing Field
Before any discussion of backup, archive and disaster recovery, you first need to define what you are talking about in each case. While everyone has a slightly different interpretation of what the terms mean, for me they are:
- Backup – Short-to-medium term storage, fast recovery. Recover data due to hardware/software or human error.
- Archive – Long-term storage, slow recovery. Primarily for compliance (SOX/HIPAA/PCI).
- Disaster Recovery – Short-term storage, medium-to-fast recovery. Recover data due to medium-to-large scale issue (fire/natural disaster or cyberattack) that affects the ability of the business to operate/generate revenue.
Backup and archive are both similar enough to discuss as a single solution. However, due to the number of stakeholders and moving parts, disaster recovery is best looked at as a separate — although related and intertwined — solution to backup. Both protect your business and your data from loss, often in similar ways. But the goal of each is very different.
For years, the prevailing view of backup was that it was an unpleasant necessity. Backups took many, many hours or days to complete. Data was written off to magnetic tapes, shipped to a warehouse somewhere, and often never thought about again. If a recovery was required, you hoped that the tape didn’t break, or the heads weren’t dirty. The recovery itself could take days.
Now, with the widespread use of high-capacity and inexpensive hard drives and fast deduplication brought on by faster processors, more and more companies are writing their backup data to disk in extremely short times. And with the ever-decreasing cost of high-speed internet, public cloud storage is becoming more and more popular as an option for both off-site backup storage and disaster recovery infrastructure. Backing up directly to tape is quickly becoming relegated to a handful of special use cases.
Data loss/corruption and hardware are not the only reasons to have a robust backup strategy. More and more prevalent are the infection of corporate computers by ransomware viruses, where a company’s data is encrypted and held hostage until a ransom is paid.
A newer variant of the typical ransomware that is increasingly gaining prevalence is doxware — sometimes called leakware or extortionware — where the data is actually leaked to the public instead of just encrypted. This is especially problematic if a company has intellectual property (IP) or patent data on their systems. To beat doxware, a good offense includes teaming a backup strategy with security best practices — like deploying robust firewalls and endpoint protection throughout your organization, creating and following strong security and compliance policies, and leveraging today’s advanced security operations center (SOC) management tools (or even outsourcing to a reputable managed SOC service).
When planning a backup strategy, business requirements and technical requirements must be considered. How often are backup restores required? How do data restore requirements affect business operations? The more critical and time sensitive the need, the more robust the solution needs to be.
Tape vs Disk vs Cloud
Tape was used in the past as it was relatively inexpensive with high capacity. Cons include slow speed (to write and recover) as well as the medium’s perceived lack of reliability. It is true that the cost of disk has been higher than tape in the past, but disk cost and capacity has since become more inexpensive. Disk will offer the fastest backup and recovery time. Cloud eliminates the need to own your own physical infrastructure but relies on internet connectivity for restores. Recovering from cloud storage can be time-consuming depending on the amount of data you are trying to recover as well as the internet connectivity to the cloud repository. However, this can be a reliable, cost-effective way to achieve an air-gapped backup copy safely off premises.
Ensuring that your backup strategy meets all of your relevant compliance requirements is very important as monetary fines for non-compliance can be extremely steep. Because of this, the backup strategy should be reviewed by both management and any relevant compliance or legal officers.
Disaster recovery (DR), if it was even considered at all, used to be an afterthought. As businesses become more connected and more reliant on technology and the associated infrastructure, disaster recovery has moved from something only large enterprises needed to consider to a business requirement for even small and mid-sized companies.
A robust disaster recovery strategy allows your business to minimize interruptions even through significant disrupting events. With a strong and tested DR plan in place, you can continue operations through dire events like fires, floods and crippling cyberattacks. Your customers can still place orders, your warehouse can continue to ship goods. These days, a strong disaster recovery strategy can help keep your doors open, your employees employed, and your customers happy — even if the unthinkable happens.
The reason for disaster recovery is to get your business up and running again ASAP after disruption. Decisions about the DR plan should be driven by the business decision makers and implemented (in part) by the technology department. Technology helps achieve the plan; it should not drive the plan.
The biggest factor in the cost and complexity of a disaster recovery solution is the targeted RTO (Recovery Time Objective) and RPO (Recovery Point Objective). The shorter the time frames for both, the higher the cost and complexity tends to be. Having the shortest possible RTO and RPO is indeed ideal, but if your business can function after losing eight or even 12 hours of data, then you can save significantly compared to the cost and complexity required for an RTO/RPO of a few minutes.
What are RPO and RTO, and how do they affect cost and complexity?
Recovery Point Objective: RPO is determined by the amount of time between data protection events and reflects the amount of data that potentially could be lost during a disaster. Simply put: How much business data can you afford to lose? The shorter the RPO, the higher the cost to meet the requirement. While the lowest possible RPO is ideal, the actual number is based entirely on acceptable risk.
Recovery Time Objective: RTO is the amount of time it takes to recover from a data loss event and how long it takes to return to service. Simply put: How long can your business afford to be down? The shorter the RTO, the higher the cost to meet the requirement. Like RPO, the lowest possible number, while ideal, may not be the right business decision based on allowable risk.
RPO and RTO together will drive the requirements of a disaster recovery plan.
Company-owned vs Cloud
If your company has a second location, either a company-owned facility or a data center colocation, the decision for replicating your data and having a second infrastructure can be an easy one. A disaster recovery location doesn’t need to be a mirror image of your primary data center. Often, reusing decommissioned servers and storage is a perfectly valid way to improve total cost of ownership (TCO) and still have an adequate disaster recovery location. Keep in mind, however, for the most robust disaster recovery there should be significant geographical separation between your sites.
If you don’t have a second location, or don’t want to rent space in a datacenter, replicating your data to the public cloud (such as Amazon, Google, or Azure) and utilizing IaaS (Infrastructure-as-a-Service) for your hardware can achieve business resiliency with low initial costs. Just remember that costs can balloon unexpectedly if your actual usage (especially for IaaS services) is higher than forecasted.
Interconnections & Interdependencies
One thing to remember about disaster recovery is that it is a business-wide plan. It should involve all business units or departments to determine the applications and servers used by each group. Make sure to have the IT or Technology group map how applications affect and connect with one another. Having a replication of your ERP server doesn’t help much if you forget your SQL server. Have each group assign a priority to each application or service so that the most important applications and systems can be identified and given the highest replication priority, then continue down the line.
By following this game plan, disaster recovery is much more efficient because the most important systems are already identified and can be dealt with before less impactful systems (i.e., ERP and SQL before file services before print servers).
Business Cost of Data Loss
Laura DiDio, of Information Technology Intelligence Consulting (ITIC), reported that in ITIC’s 2021 survey on the hourly cost of downtime, 91 percent of mid-sized and large enterprises stated that one hour of server downtime cost $300,000 or more. Almost half of those reported the cost being between $1 million and $5 million.
Meanwhile, only 1% of organizations – primarily small businesses with 50 or fewer employees — estimate that hourly downtime costs less than $100,000.
Even at an hourly loss in “only” the tens of thousands of dollars, few small businesses can afford to take that hit.
There are several factors that go into downtime cost calculations, including revenue, operational costs, and customer costs. Perhaps not surprising, many businesses have not properly calculated what an hour or a day of downtime really costs. Figuring out the revenue hit might be easy, but what are costs to operations? And what about the intangible cost associated with the loss of customer confidence?
Revenue: What does it cost your business to be down for a day just from loss of revenue? This is probably the easiest calculation. If you operate Monday-Friday and take orders for an average of $100,000 per week, then you’re looking at losses of $20,000 per day.
Operational cost: What are the operational costs to the business associated with downtime? Wages still need to be paid as does electricity, insurance, and rent/mortgage, etc. — all while your business is not making money. Factor in overtime costs for those working to fix the problem, new equipment costs, fines from regulatory non-compliance, and — potentially — outsourced labor costs, and it is easy to see how quickly operational costs can eclipse revenue losses.
Customer cost: Lost sales can be easy to calculate. Much harder is the loss of confidence from customers. Will they return or will they temporarily or permanently flee to the competition? Business reputation can be very hard to regain once lost.
To Win, Don’t Scrimp on Backup & DR
Think of backup and DR as strong coverage for your business. If you rely on technology and data to compete, then you need proper protection for those resources. Don’t cut corners here just to save a couple of dollars — this is short-term gain for potential long-term pain.
Like the best coaches, carefully prepare your backup and DR game plan. Communicate it to your players and make sure to practice the plays. When disaster strikes, you and your team will be in a great position to take on this adversary and win.
About the Author
Abe Covello has 20 years of experience working with various technology solutions and varied customer environments, focusing on data center infrastructure analysis & design, implementation, integration, and management. He possesses extensive experience in the design and implementation of mission-critical infrastructure projects, including wired and wireless networking, enterprise-level storage, compute & virtualization, and disaster-recovery and backup.