Be aware that AWS Backup is _very_ expensive. We recently stopped using it and switched to AWS DataSync, which is an order of magnitude cheaper. If you want to go even cheaper, S3 replication (not delete markers) will do it for even less.
Backup to S3, use the above to copy it elsewhere.
time0ut
Nice write up. I did something similar at a company recently. The ransomware use case was the primary driver. AWS Backup felt kind of half baked. It also took a lot of work to ensure we could bring the apps up in the recovery account smoothly. Trying to retrofit this into existing stacks was kind of a pain.
There is a YC company called Arpio [0] that does this sort of thing as a service. It can replicate a ton of stuff beyond what Backup does (it also uses Backup for certain things from what I remember). It works as advertised and for most companies is probably worth it vs doing this yourself. I am not affiliated, just worked with it at a customer.
Cross-region backup has never made sense to me. If an entire region goes away - not a temporary outage, but GONE - then the country is probably under attack, and absolutely no one will give a shit that your SaaS product is dead.
tatersolid
Wildfires, hurricanes, tornadoes, blizzards and ice storms, earthquakes… many regional disasters are temporary but it can take very long to bring everything back online. AWS can also always lose a whole region for an extended period due to software and control plane bugs.
Even if all your apps and data stores are active-active multi-region you can be in a world of risk with no DR for a long time if your DR region fails. If your data size is small that vulnerability window might be small but if you’ve got petabytes you’ll be without lifeboat for a days or weeks until you can take another “full” DR copy.
Aurornis
There are more failure modes for a region than “working perfectly” and “irreversibly destroyed”. Having cross-region backup leaves open the possibility of restoration of service or at least key data during an extended outage.
> then the country is probably under attack, and absolutely no one will give a shit that your SaaS product is dead.
Or there’s a severe natural disaster, or a flooded data center due to unforeseen conditions, or any number of things.
If your country is attacked, all business does not immediately halt. War is not an instantaneous phenomenon where an entire country becomes destroyed overnight. People continue living their lives as best they can because they still need to put food on the table and life must go on. I have a number of friends and past coworkers in Ukraine who can attest to how you continue doing your best and pick up the pieces and continue moving back toward normalcy.
jdreaver
There are plausible scenarios where a region can go down for days or more at a time, like natural disasters. I'm not terribly worried about a region going away _forever_, but during a regional outage long enough to start losing business, having data in multiple regions is important so you can restore in another region (if you aren't able to fail over quickly).
deepsun
The most common cause of the outages right now is configuration errors. Even when procedurally they must be limited to AZ only, there is always some region-shared infrastructure that can bring down the whole region altogether.
deathanatos
"Configuration errors" — I'm going to include "bugs" in that —, IME, tend to be global outages more often than regional. If I recount the outages >AZ that I've seen, I think the most recent ones were:
GCP, IAM (global; just like a week and a half ago!)
GCP, VMs etc. (regional!¹)
Azure, application GW (global)
Cloudflare (global)
Azure, IAM (global)
Azure, IAM (global)
You can tell IAM is a point of weakness. (As it kinda must be.)
¹though I wasn't affected by this one, as it was in Europe.
everfrustrated
Notable you don't have AWS on that list.
AWS's definitions for AZ & Regions are by far the strongest in the industry.
GCP has AZ in the same physical complex. Azure Regions would be AZ's under AWS's definition.
timewizard
AWS had a console login issue a while back due to the default region being us-east-1. There are a handful of other services that are exclusively available in that region as well.
jcims
Cross-region backup isn't here to solve for meteor strikes and nuclear war. Most of the major AWS disruptions have been contained within a region. If you're unlucky enough to depend on one, your service is down and you don't know when it will be back up.
If you document and drill an cross-region recovery, in *most* (not all) cases you will be able to more confidently predict when things are going to be running, you'll know what information is there and what isn't and can build processes to communicate expectations to customers and/or regulators.
Spooky23
Telecom infrastructure can and does go out. And degraded performance can impact business significantly.
There’s also benefits for many apps to be closer to the customer. If you’re building out infrastructure in a remote region for that purpose, the marginal cost of getting more out of it may be compelling.
jasonthorsness
In practice I’ve seen multiple companies benefit from having a hot standby in west us and east us. The threat is not destruction the threat is the cloud provider screwing up the platform and they typically do rolling updates so only one region would be impacted at a time.
nodesocket
What’s the benefits of using AWS Backup? If your infrastructure is already defined using Terraform then RDS, EBS snapshots, ElastiCache, S3 already have backup configuration options.
wiether
As the article shows how to do it, with AWS Backup you can do things like cross-account and cross-region backups.
Moreover, AWS Backup is the _Terraform_ of backup in AWS.
You can control all your backups through a single interface, with various policies (scheduling, retention, access...)
For instance, by default, you are limited to 100 Manual RDS Snapshots per account.
With AWS Backup, you can do what you want.
You can define dozens of different rules for the same services/resources.
So you can let teams manage their resources as they want, and have a backup team manage backuping everything from AWS Backup without having to interact with the services/resources themselves.
Backup to S3, use the above to copy it elsewhere.
There is a YC company called Arpio [0] that does this sort of thing as a service. It can replicate a ton of stuff beyond what Backup does (it also uses Backup for certain things from what I remember). It works as advertised and for most companies is probably worth it vs doing this yourself. I am not affiliated, just worked with it at a customer.
[0] https://arpio.io/
Even if all your apps and data stores are active-active multi-region you can be in a world of risk with no DR for a long time if your DR region fails. If your data size is small that vulnerability window might be small but if you’ve got petabytes you’ll be without lifeboat for a days or weeks until you can take another “full” DR copy.
> then the country is probably under attack, and absolutely no one will give a shit that your SaaS product is dead.
Or there’s a severe natural disaster, or a flooded data center due to unforeseen conditions, or any number of things.
If your country is attacked, all business does not immediately halt. War is not an instantaneous phenomenon where an entire country becomes destroyed overnight. People continue living their lives as best they can because they still need to put food on the table and life must go on. I have a number of friends and past coworkers in Ukraine who can attest to how you continue doing your best and pick up the pieces and continue moving back toward normalcy.
¹though I wasn't affected by this one, as it was in Europe.
AWS's definitions for AZ & Regions are by far the strongest in the industry.
GCP has AZ in the same physical complex. Azure Regions would be AZ's under AWS's definition.
If you document and drill an cross-region recovery, in *most* (not all) cases you will be able to more confidently predict when things are going to be running, you'll know what information is there and what isn't and can build processes to communicate expectations to customers and/or regulators.
There’s also benefits for many apps to be closer to the customer. If you’re building out infrastructure in a remote region for that purpose, the marginal cost of getting more out of it may be compelling.
Moreover, AWS Backup is the _Terraform_ of backup in AWS. You can control all your backups through a single interface, with various policies (scheduling, retention, access...)
For instance, by default, you are limited to 100 Manual RDS Snapshots per account. With AWS Backup, you can do what you want. You can define dozens of different rules for the same services/resources.
So you can let teams manage their resources as they want, and have a backup team manage backuping everything from AWS Backup without having to interact with the services/resources themselves.