How We Handle AWS Backups: Snapshots, PITR, and Multi-AZ

3 min. readlast update: 03.17.2025

To ensure data integrity, availability, and fast recovery, our AWS RDS setup includes a multi-layered backup strategy using Snapshots, Point-in-Time Recovery (PITR), and Multi-AZ replication. This approach guarantees that we can restore data efficiently in case of accidental deletion, system failures, or AWS outages.

Backup & Recovery Components

We utilize three key AWS backup mechanisms, each serving a specific purpose:

1. Snapshots (Manual & Automated)

  • What It Does: Takes a full backup of the database at a specific moment.
  • Use Case: Used for disaster recovery, long-term backups, and cross-account copies.
  • How It Works:
    • We take daily automated snapshots of our production databases.
    • Additional snapshots are created before major changes or deployments.
    • Snapshots are stored for 30 days and can be manually restored in case of failure.
  • Limitations: Snapshots capture only the state at the time they are taken. Changes after the snapshot starts are not included.

2. Point-in-Time Recovery (PITR)

  • What It Does: Allows restoration of an RDS instance to any second within a retention window (up to 35 days).
  • Use Case: Recovers from human errors like accidental deletions, bad queries, or unwanted schema changes.
  • How It Works:
    • PITR uses continuous transaction logs recorded by AWS.
    • If a mistake happens, we can create a new RDS instance from a specific second before the error.
    • The recovered instance is then used to extract and restore lost data.
  • Limitations: PITR does not work if the original RDS instance was deleted before a restore is initiated.

3. Multi-AZ Replication (High Availability, Not a Backup)

  • What It Does: Creates a standby replica of the database in a separate Availability Zone (AZ) for failover protection.
  • Use Case: Ensures high availability, minimizing downtime if the primary instance fails.
  • How It Works:
    • AWS synchronously replicates data between the primary and standby instances.
    • If hardware fails in one AZ, AWS automatically promotes the standby.
    • Snapshots are taken from the standby, preventing performance impact on production.
  • Limitations: Multi-AZ does not protect against data corruption or accidental deletions, as changes are replicated instantly.

Why This Strategy Works

By combining Snapshots, PITR, and Multi-AZ, we ensure:

  • Protection from accidental data loss (PITR recovers deleted or changed data).
  • Disaster recovery readiness (Snapshots provide full backups for major failures).
  • High availability (Multi-AZ prevents downtime from hardware failures).
  • Minimal performance impact (Snapshots run on the standby instance in Multi-AZ setups).

This multi-layered approach balances speed, reliability, and cost-effectiveness, ensuring that our systems remain resilient under all conditions.

Was this article helpful?