File Storage (Dropbox)
Overview
Designing a File Storage system (Dropbox, Google Drive) tests your understanding of sync, conflict resolution, and efficient storage. The core challenges include: keeping files in sync across devices, handling concurrent edits, deduplicating content (block-level or file-level), and providing a seamless experience for large files. This design matters in interviews because it combines object storage, metadata management, versioning, and real-time sync—and requires careful thinking about consistency (eventual vs strong) and conflict resolution. Companies like Dropbox and Google build these at exabyte scale, and demonstrating you understand chunking, delta sync, and how to minimize bandwidth shows you can design storage systems for the real world.
Requirements
Functional
- Upload, download, delete files
- Sync across multiple devices
- Folder structure and sharing
- Version history and restore
- Conflict resolution (concurrent edits)
- Search files
Non-Functional
- Reliability — no data loss
- Efficiency — dedup, delta sync to save bandwidth
- Scalability — exabytes of storage
- Low latency — sync within seconds
Capacity Estimation
Assume 500M users, 1TB avg per user = 500PB. 10M file ops/day. Block dedup can reduce storage 30-50%. Delta sync reduces bandwidth 70%+.
Architecture Diagram
Component Deep Dive
Sync Client
Monitors file changes, uploads blocks, downloads updates. Delta sync: only changed blocks. Conflict detection.
Block Store
Object store (S3). Stores content-addressable blocks. Dedup by hash. Handles large files via chunking.
Metadata Service
File/folder tree, version, block refs. PostgreSQL or distributed DB. Tracks what blocks each file uses.
Sync Service
Orchestrates sync. Receives client updates, updates metadata, notifies other clients. WebSocket or long poll.
Notification Service
Tells clients when files change (from other devices). Enables real-time sync.
Version Service
Stores file versions. Block refs + metadata. Enables restore.
Database Design
Metadata: file_id, path, user_id, block_refs[], version, modified_at. Blocks: hash → storage_url. Dedup via content hash. PostgreSQL for metadata; object store for blocks.
API Design
| Method | Path | Description |
|---|---|---|
POST | /api/files/upload | Upload file blocks. Body: {path, blocks[]}. Returns version. |
GET | /api/files/download | Download file. Query: path. Returns block URLs. |
GET | /api/sync | Get changes since cursor. Long poll or WebSocket. |
POST | /api/files/restore | Restore previous version. |
Scalability & Trade-offs
- Block vs file dedup: Block dedup saves more space; file dedup is simpler. Block requires chunking.
- Delta sync: Sends only changed blocks; reduces bandwidth. Requires client to track block hashes.
- Conflict resolution: Last-write-wins is simple; operational transform or CRDTs allow collaborative edit.
Related System Designs
Key-Value Store
Designing a distributed Key-Value Store (like Redis or DynamoDB) is a staple system design question at companies buildin...
StorageDistributed Cache
Designing a Distributed Cache (like Memcached or Redis Cluster) tests your understanding of caching strategies, consiste...
InfrastructureURL Shortener (TinyURL)
The URL Shortener (TinyURL-style) system design is a classic interview question that tests your understanding of distrib...