26 Jun 2025 3 min read System Design

How Amazon S3 Stores Data

I’m pretty sure many of you have used or at least heard of Amazon S3 — it’s one of those things that just "works" and quietly powers a huge chunk of the internet. Whether you're backing up photos, storing logs, or serving video files, S3 is often the go-to object store.

But have you ever stopped to ask:
“How the heck does S3 store data so reliably and at such insane scale?”

That’s exactly what we’re diving into today.

Wait... Is S3 a Database?

Not quite. S3 is an object storage service, not a relational or NoSQL database. But that doesn’t mean it isn’t doing a ton of “DB-like” things under the hood. In fact, it’s a beautifully engineered distributed system with custom-built storage mechanisms that combine the best ideas from databases, filesystems, and distributed computing.

Think: Dynamo + Erasure Coding + Chunked Blob Store + Metadata DB all stitched together into a storage beast.

🔍 So What Is Actually Stored in S3?

Every object you upload to S3 has two parts:

Metadata – your object’s key, creation date, version, access control, etc.
Binary Data – the actual file content (image, zip, video, log, etc.)

These are stored separately.

🧠 The Metadata Store (Dynamo-ish)

The metadata — all the "what, when, where" info — is stored in an internal, highly available key-value store. From everything we know (based on AWS papers and talks), this is built using Dynamo-style distributed hashing and quorum-based replication.

Every object key (e.g., images/puppy.jpg) becomes a key in the system.
The value includes pointers to where the actual object chunks live.
The metadata DB handles versioning, ownership, ACLs, timestamps, and more.

This layer is optimized for low-latency reads and writes and replicated across multiple availability zones (AZs).

🧱 The Object Data Store (Chunked & Distributed)

When you upload a file, S3 splits it into chunks behind the scenes. These chunks are then:

Erasure coded (not just replicated — more on this in a second)
Distributed across multiple physical disks
Stored in different availability zones, often even different racks and buildings

Why? So even if an entire AZ goes down, your data is safe.

📦 Erasure Coding 101

This is the real magic. Instead of just making 3 full copies of your file (which is expensive), S3 uses erasure coding, which works like this:

Break your data into k parts
Generate m extra parity parts
You can lose up to m chunks and still recover the file

For example:

File split into 10 data chunks
4 parity chunks added
You can survive 4 failures, while storing only ~1.4x the original size (vs 3x for full replication)

That’s cheaper and more durable.

NOTE: This is bit too big to explain in detail here and require its own dedicated post 🙂

📈 Performance + Durability

Amazon promises 11 9’s of durability (yes, 99.999999999%). Here’s how they pull that off:

Redundant storage across 3+ AZs
Automatic background checks and repairs
Erasure coding to reduce storage overhead
No single points of failure
Anti-entropy services that constantly validate and heal corrupted chunks

You’ll probably never even notice all this happening — and that’s the point.

🧾 Read-After-Write Consistency? Yep, Now It’s Strong.

Until 2020, S3 was eventually consistent for overwrites and deletes. That’s changed.
Now S3 offers strong read-after-write consistency by default for new objects, thanks to upgrades in the metadata layer — likely powered by stronger quorum protocols.

This means:

Once you upload a file, any subsequent read from any region will get the latest version.
No waiting for propagation.

It’s a small change on the surface, but a huge architectural leap under the hood.

⚠️ So… Is It Using Any Known DB?

Not directly. S3 doesn’t use off-the-shelf databases like MySQL or Cassandra.

Instead:

Metadata DB → custom key-value store inspired by Dynamo
Object data → stored in a distributed chunked file system
Durability → via erasure coding + replication
Consistency → via quorum-based metadata writes

The system is designed to handle exabytes of data, across millions of disks, while being fault-tolerant, eventually consistent (with strong options), and blazing fast.

💡 What Can We Learn From S3?

S3 is a case study in purpose-built design.

Separate metadata from data
Use erasure coding instead of brute-force replication
Build for failure — expect disks, racks, and entire zones to fail
Decentralize and shard everything
Eventually consistent is often good enough — until it isn’t

Whether you're building a file store, a video platform, or a custom backup system — these lessons are gold.

Conclusion

S3 isn’t “just a bucket” — it’s a carefully engineered distributed storage fabric that runs one of the largest infrastructures in the world.

The takeaway: next time you're storing even a few gigabytes of data, think like S3.
Design for failure. Decouple metadata. Keep it simple, but make it solid.

Want to dive deeper into how other hyperscalers do storage? Ping me and I’ll write about GCP, Azure Blob, or even open-source equivalents like MinIO and Ceph.

Until then — happy storing.
The Dev Learnings Team

Wait... Is S3 a Database?

🔍 So What Is Actually Stored in S3?

🧠 The Metadata Store (Dynamo-ish)

🧱 The Object Data Store (Chunked & Distributed)

📦 Erasure Coding 101

📈 Performance + Durability

🧾 Read-After-Write Consistency? Yep, Now It’s Strong.

⚠️ So… Is It Using Any Known DB?

💡 What Can We Learn From S3?

Conclusion

You might also like...

Erasure Coding Simplified

🌍 What the Heck Is a CDN

Database Partitioning

Pessimistic Locking – When Optimism Just Won’t Cut It

Let’s Talk About Optimistic Concurrency Control