How We Handle AWS Backups: Snapshots, PITR, and Multi-AZ | Braid Technologies, Inc. Help Center

To ensure data integrity, availability, and fast recovery, our AWS RDS setup includes a multi-layered backup strategy using Snapshots, Point-in-Time Recovery (PITR), and Multi-AZ replication. This approach guarantees that we can restore data efficiently in case of accidental deletion, system failures, or AWS outages.

Backup & Recovery Components

We utilize three key AWS backup mechanisms, each serving a specific purpose:

1. Snapshots (Manual & Automated)

What It Does: Takes a full backup of the database at a specific moment.
Use Case: Used for disaster recovery, long-term backups, and cross-account copies.
How It Works:
- We take daily automated snapshots of our production databases.
- Additional snapshots are created before major changes or deployments.
- Snapshots are stored for 30 days and can be manually restored in case of failure.
Limitations: Snapshots capture only the state at the time they are taken. Changes after the snapshot starts are not included.

2. Point-in-Time Recovery (PITR)

What It Does: Allows restoration of an RDS instance to any second within a retention window (up to 35 days).
Use Case: Recovers from human errors like accidental deletions, bad queries, or unwanted schema changes.
How It Works:
- PITR uses continuous transaction logs recorded by AWS.
- If a mistake happens, we can create a new RDS instance from a specific second before the error.
- The recovered instance is then used to extract and restore lost data.
Limitations: PITR does not work if the original RDS instance was deleted before a restore is initiated.

3. Multi-AZ Replication (High Availability, Not a Backup)

What It Does: Creates a standby replica of the database in a separate Availability Zone (AZ) for failover protection.
Use Case: Ensures high availability, minimizing downtime if the primary instance fails.
How It Works:
- AWS synchronously replicates data between the primary and standby instances.
- If hardware fails in one AZ, AWS automatically promotes the standby.
- Snapshots are taken from the standby, preventing performance impact on production.
Limitations: Multi-AZ does not protect against data corruption or accidental deletions, as changes are replicated instantly.

Why This Strategy Works

By combining Snapshots, PITR, and Multi-AZ, we ensure:

Protection from accidental data loss (PITR recovers deleted or changed data).
Disaster recovery readiness (Snapshots provide full backups for major failures).
High availability (Multi-AZ prevents downtime from hardware failures).
Minimal performance impact (Snapshots run on the standby instance in Multi-AZ setups).

This multi-layered approach balances speed, reliability, and cost-effectiveness, ensuring that our systems remain resilient under all conditions.