NoSQL Database Interview Questions & Answers

FAANG Interview Preparation

Comprehensive NoSQL interview preparation covering MongoDB, Redis, Cassandra, DynamoDB, and distributed database concepts for senior engineering roles.

12

Questions

3

Easy

5

Medium

4

Hard

Concepts (5)MongoDB (2)Redis (2)Data Modeling (1)DynamoDB (1)Cassandra (1)
Q1 Easy Concepts

SQL vs NoSQL — When should you choose each?

SQL databases excel when you need strong consistency, complex joins, and well-defined schemas. Choose SQL for financial systems, inventory management, or any domain where ACID transactions and referential integrity are critical. The rigid schema enforces data quality and supports complex analytical queries.

NoSQL shines when you need horizontal scaling, schema flexibility, or high write throughput. Use NoSQL for user profiles, product catalogs, real-time analytics, or content management where the data model evolves frequently. NoSQL trades ACID for BASE (Basically Available, Soft state, Eventual consistency) to achieve partition tolerance and availability.

Consider hybrid approaches: use SQL for transactional core data and NoSQL for caching, search, or event streams. The choice often comes down to whether your primary constraint is consistency (SQL) or scalability and flexibility (NoSQL).

Key Takeaways

  • SQL: ACID, rigid schema, complex joins — choose for transactional integrity
  • NoSQL: BASE, flexible schema, horizontal scale — choose for scale and agility
  • Hybrid architectures are common: SQL for core, NoSQL for caching/streams
Q2 Medium Concepts

Explain the CAP Theorem with real-world examples

The CAP theorem states that a distributed system can guarantee only two of three properties: Consistency (every read receives the most recent write), Availability (every request receives a response), and Partition tolerance (the system works despite network failures). In practice, partition tolerance is unavoidable in distributed systems, so you choose between CP and AP.

MongoDB is CP by default: during a partition, it prioritizes consistency over availability. Replica set elections pause writes until a primary is elected. Cassandra and DynamoDB are AP: they favor availability during partitions. Writes succeed to any node, and conflicts are resolved eventually. Cassandra uses tunable consistency per operation.

Real-world mapping: Banking systems choose CP (correct balance over availability). Social feeds choose AP (show something rather than errors). Understanding these tradeoffs is essential for designing distributed systems.

Key Takeaways

  • CAP: Pick two of C, A, P — partition tolerance is non-negotiable in distributed systems
  • MongoDB: CP — consistency over availability during partitions
  • Cassandra/DynamoDB: AP — availability with eventual consistency
Q3 Easy Concepts

What are the different types of NoSQL databases?

NoSQL databases fall into four main categories. Document stores like MongoDB store data as JSON-like documents. Use them for content management, catalogs, or when your schema varies per record. Key-value stores like Redis and DynamoDB map keys to values. They excel at caching, session storage, and simple lookups with sub-millisecond latency.

Column-family stores like Cassandra and HBase organize data by column families, optimizing for wide rows and high write throughput. Use them for time-series data, event logs, or analytics. Graph databases like Neo4j store nodes and relationships. They shine for social networks, recommendation engines, fraud detection, and any domain where relationships matter more than the data itself.

Choosing the right type depends on your access patterns: document for flexible schemas, key-value for speed, column-family for writes and analytics, graph for relationships.

Key Takeaways

  • Document (MongoDB): Flexible schema, nested data — catalogs, CMS
  • Key-Value (Redis, DynamoDB): Fast lookups — cache, sessions
  • Column-Family (Cassandra): High writes, analytics — time-series, logs
  • Graph (Neo4j): Relationships — social, recommendations, fraud
Q4 Medium Concepts

Explain eventual consistency vs strong consistency

Strong consistency guarantees that a read always returns the most recent write. It simplifies application logic but requires coordination across replicas, which can increase latency and reduce availability during partitions.

Eventual consistency allows replicas to diverge temporarily. All replicas will converge to the same state given enough time and no new writes. Variants include read-your-writes (a client sees its own writes immediately), causal consistency (preserves cause-effect order), and session consistency (guarantees within a session).

Cassandra offers tunable consistency: you can require quorum reads and writes for strong consistency, or ONE for availability. DynamoDB uses eventually consistent reads by default but offers strongly consistent reads as an option. Choose based on your tolerance for stale data versus latency and availability requirements.

Key Takeaways

  • Strong consistency: Always latest data — higher latency, lower availability
  • Eventual consistency: Stale reads possible — better latency and availability
  • Cassandra: Tunable per-query; DynamoDB: Strongly consistent read option
Q5 Medium Data Modeling

Explain MongoDB document design patterns — embedding vs referencing

Embedding stores related data inside a single document. Use it when the relationship is one-to-few, the embedded data is always accessed with the parent, and the embedded data rarely grows unbounded. Example: a user with a fixed set of addresses. Embedding reduces round trips and keeps related data co-located.

Referencing stores IDs that point to separate documents. Use it when you have one-to-many or many-to-many relationships, when the child data is accessed independently, or when the child collection can grow large. Example: blog posts with comments — comments can be queried separately and grow indefinitely.

For e-commerce: embed product variants (size, color) in the product document. Reference orders in a separate collection since they're queried independently and grow over time. The key is matching the pattern to your access patterns.

JAVASCRIPT
// Embedding: User with addresses (one-to-few, always accessed together)
const user = {
  _id: 1,
  name: "Alice",
  addresses: [
    { street: "123 Main", city: "NYC", zip: "10001" },
    { street: "456 Oak", city: "LA", zip: "90001" }
  ]
};

// Referencing: Blog post with comments (one-to-many, comments grow)
const post = { _id: 1, title: "MongoDB Design", authorId: 10 };
const comments = [
  { _id: 101, postId: 1, text: "Great post!" },
  { _id: 102, postId: 1, text: "Very helpful" }
];

// Query with $lookup for referenced data
db.posts.aggregate([
  { $match: { _id: 1 } },
  { $lookup: { from: "comments", localField: "_id", foreignField: "postId", as: "comments" } }
]);

Key Takeaways

  • Embed: One-to-few, always together, bounded — fewer round trips
  • Reference: One-to-many, independent access, unbounded — use $lookup
  • Match pattern to access patterns; e-commerce: embed variants, reference orders
Q6 Medium MongoDB

How does the MongoDB Aggregation Pipeline work?

The aggregation pipeline processes documents through a sequence of stages. Each stage transforms the documents and passes results to the next stage. Common stages include $match (filter documents, like WHERE), $group (aggregate by key, like GROUP BY), $project (reshape fields, like SELECT), $lookup (join with another collection), and $unwind (deconstruct arrays into separate documents).

The pipeline executes as a stream: documents flow through stages one at a time, enabling efficient memory use. You can add indexes to support $match and $sort. The order of stages matters: filter early with $match to reduce documents before expensive $group or $lookup operations.

Practical use: analyze sales by region, compute user engagement metrics, or denormalize data for reporting. The pipeline is more expressive than simple find() and supports complex analytics without moving data to a separate system.

JAVASCRIPT
// Sales analysis: total revenue by region, top products
db.orders.aggregate([
  { $match: { status: "completed", createdAt: { $gte: ISODate("2024-01-01") } } },
  { $unwind: "$items" },
  { $group: {
      _id: { region: "$shipping.region", productId: "$items.productId" },
      totalQty: { $sum: "$items.quantity" },
      revenue: { $sum: { $multiply: ["$items.quantity", "$items.price"] } }
  }},
  { $lookup: { from: "products", localField: "_id.productId", foreignField: "_id", as: "product" } },
  { $unwind: "$product" },
  { $project: {
      region: "$_id.region",
      productName: "$product.name",
      revenue: 1,
      totalQty: 1,
      _id: 0
  }},
  { $sort: { revenue: -1 } },
  { $limit: 10 }
]);

Key Takeaways

  • Pipeline: $match → $group → $project → $lookup — stream processing
  • Filter early with $match to reduce workload before $group/$lookup
  • $unwind expands arrays; $lookup performs joins
Q7 Hard MongoDB

How does MongoDB handle sharding and replication?

Replication provides high availability via replica sets. Each set has one primary and multiple secondaries. Writes go to the primary and replicate asynchronously. Read preferences control where reads go: primary (default for strong consistency), primaryPreferred, secondary (for read scaling), secondaryPreferred, or nearest (lowest latency). Write concern (w: 1, w: "majority", w: 0) controls when writes are acknowledged.

Sharding distributes data across shards by a shard key. Choose the shard key carefully: it determines data distribution and query routing. A bad key causes hot shards or scatter-gather queries. Range-based keys can create hotspots; hashed keys distribute evenly but prevent range queries. Chunks are the unit of migration; MongoDB balances chunks across shards.

Best practice: use compound shard keys (e.g., {userId: 1, createdAt: 1}) for common query patterns. Avoid high-cardinality monotonic keys that create hotspots.

JAVASCRIPT
// Shard key selection - compound key for user-scoped queries
sh.shardCollection("app.orders", { userId: 1, createdAt: 1 });

// Read preference - scale reads to secondaries
db.orders.find({ userId: 123 }).readPref("secondaryPreferred");

// Write concern - wait for majority acknowledgment
db.orders.insertOne(order, { writeConcern: { w: "majority" } });

Key Takeaways

  • Replica sets: Primary + secondaries; read preferences control read routing
  • Shard key determines distribution — avoid monotonic keys, consider compound keys
  • Write concern (w: majority) for durability; chunks rebalance automatically
Q8 Easy Redis

What are Redis data structures and their use cases?

Redis offers six core data structures. Strings store simple values; use for caching, counters, or session IDs. Lists are ordered sequences; use for queues, activity feeds, or recent items. Sets are unordered unique collections; use for tags, unique visitors, or set operations (intersection, union). Sorted Sets rank members by score; use for leaderboards, priority queues, or time-weighted rankings.

Hashes store field-value maps; use for object caching (user profiles, product data) to avoid serializing entire objects. Streams are append-only logs with consumer groups; use for event sourcing, message queues, or activity logs.

Practical mapping: cache API responses (Strings), rate limiting (Strings with INCR + EXPIRE), leaderboard (Sorted Sets with ZADD/ZREVRANGE), session store (Hashes), real-time feeds (Streams).

PYTHON
# Strings: caching, rate limiting
redis.set("user:123", json.dumps(user_data), ex=3600)
redis.incr("rate:user:123")
redis.expire("rate:user:123", 60)

# Sorted Sets: leaderboard
redis.zadd("leaderboard", {"alice": 1500, "bob": 1200, "carol": 1400})
redis.zrevrange("leaderboard", 0, 9, withscores=True)

# Hashes: session store
redis.hset("session:abc123", mapping={"user_id": "123", "last_activity": "2024-01-15"})
redis.hgetall("session:abc123")

Key Takeaways

  • Strings: Cache, counters, rate limiting
  • Sorted Sets: Leaderboards, priority queues
  • Hashes: Object caching; Streams: Event logs, message queues
Q9 Medium Redis

Explain Redis caching patterns — Cache-Aside, Write-Through, Write-Behind

Cache-Aside (Lazy Loading) is the most common pattern. The application manages the cache: on a read miss, fetch from DB, populate cache, return data. On writes, update the DB and invalidate (delete) the cache. Pros: cache only holds requested data, simple. Cons: cache miss penalty, possible stale reads if invalidation fails.

Write-Through keeps cache and DB in sync on every write. The cache layer writes to both cache and DB atomically. Reads always hit cache. Pros: strong consistency. Cons: every write has DB latency, unused data may be cached.

Write-Behind (Write-Back) writes to cache first, then asynchronously to DB. Pros: very fast writes, absorbs write bursts. Cons: risk of data loss if cache fails before flush, eventual consistency. Use for high-write, low-criticality data like analytics or click streams.

PYTHON
# Cache-Aside pattern
def get_user(user_id: str):
    cache_key = f"user:{user_id}"
    cached = redis.get(cache_key)
    if cached:
        return json.loads(cached)
    user = db.users.find_one({"_id": user_id})
    if user:
        redis.setex(cache_key, 3600, json.dumps(user))
    return user

def update_user(user_id: str, data: dict):
    db.users.update_one({"_id": user_id}, {"$set": data})
    redis.delete(f"user:{user_id}")  # Invalidate on write

Key Takeaways

  • Cache-Aside: App manages cache; invalidate on write — most common
  • Write-Through: Sync write to cache + DB — strong consistency
  • Write-Behind: Async DB write — fast writes, risk of data loss
Q10 Hard DynamoDB

Explain DynamoDB single-table design

Single-table design stores multiple entity types in one table, using composite keys (PK + SK) to enable multiple access patterns. The key insight: design for access patterns first, not for normalized entities. Use generic attribute names like PK and SK, and overload them with different key schemes (e.g., USER#123, ORDER#456#789).

Partition key determines physical distribution; sort key enables range queries within a partition. GSIs provide alternate access patterns with different key structures. Avoid hot partitions: ensure partition key has high cardinality and distributes load evenly.

Example: USER#123 as PK with PROFILE as SK for user profile; USER#123 with ORDER#timestamp as SK for user's orders; ORDER#456 as PK with ITEM#789 as SK for order line items. One table, many access patterns.

JSON
{
  "TableName": "AppTable",
  "KeySchema": [
    { "AttributeName": "PK", "KeyType": "HASH" },
    { "AttributeName": "SK", "KeyType": "RANGE" }
  ],
  "AttributeDefinitions": [
    { "AttributeName": "PK", "AttributeType": "S" },
    { "AttributeName": "SK", "AttributeType": "S" },
    { "AttributeName": "GSI1PK", "AttributeType": "S" },
    { "AttributeName": "GSI1SK", "AttributeType": "S" }
  ],
  "GlobalSecondaryIndexes": [{
    "IndexName": "GSI1",
    "KeySchema": [
      { "AttributeName": "GSI1PK", "KeyType": "HASH" },
      { "AttributeName": "GSI1SK", "KeyType": "RANGE" }
    ]
  }]
}

// Query: Get user profile
{ KeyConditionExpression: "PK = :pk AND SK = :sk", ExpressionAttributeValues: { ":pk": "USER#123", ":sk": "PROFILE" } }

// Query: Get user's orders
{ KeyConditionExpression: "PK = :pk AND begins_with(SK, :prefix)", ExpressionAttributeValues: { ":pk": "USER#123", ":prefix": "ORDER#" } }

Key Takeaways

  • Design for access patterns first; use composite PK+SK for multiple patterns
  • Overload PK/SK with entity prefixes (USER#, ORDER#) for flexibility
  • GSIs for alternate access; avoid hot partitions with high-cardinality keys
Q11 Hard Cassandra

How does Cassandra's partition key design affect performance?

The partition key determines which node stores the data and how it's distributed. All rows with the same partition key live on the same node. A good partition key has high cardinality to distribute data evenly and matches your query patterns.

Hot partitions occur when one partition receives disproportionate read/write load. Example: using only 'date' as partition key for events creates one partition per day — all writes hit one partition. Solution: add high-cardinality component (e.g., user_id, bucket) to the partition key.

Compound keys: (user_id, timestamp) lets you query a user's events by time range within one partition. Avoid very wide partitions (millions of rows) — they slow reads and create memory pressure. Aim for partitions in the 10MB–100MB range. Use bucketing (e.g., month or week) to bound partition size for time-series data.

SQL
-- Bad: Hot partition - all events for a day in one partition
CREATE TABLE events (
  date DATE PRIMARY KEY,
  event_id UUID,
  data TEXT
);

-- Good: Partition by user + time bucket for even distribution
CREATE TABLE events (
  user_id UUID,
  bucket DATE,  -- e.g., day or week
  event_time TIMESTAMP,
  event_id UUID,
  data TEXT,
  PRIMARY KEY ((user_id, bucket), event_time)
);

-- Query: Get user's events for a day
SELECT * FROM events WHERE user_id = ? AND bucket = ? AND event_time >= ? AND event_time <= ?;

Key Takeaways

  • Partition key = distribution + query pattern; high cardinality avoids hotspots
  • Hot partitions: too few unique values (e.g., date alone) — add user_id or bucket
  • Bound partition size (10–100MB); use bucketing for time-series
Q12 Hard Concepts

How do you handle transactions in NoSQL databases?

MongoDB supports multi-document ACID transactions since 4.0. Use sessions with startTransaction(), perform operations, then commitTransaction() or abortTransaction(). Transactions work across replica sets and sharded clusters. They have performance overhead, so use for operations that truly need atomicity across documents.

DynamoDB offers TransactWriteItems and TransactGetItems for atomic multi-item operations. Up to 100 items per transaction. Use for conditional updates that must succeed or fail together, e.g., transferring inventory between items.

For distributed transactions across services, the saga pattern replaces 2PC. Each service performs a local transaction and publishes an event. If a later step fails, compensating transactions undo previous steps. Saga can be choreographed (events trigger next steps) or orchestrated (central coordinator). Trade-off: eventual consistency and complex failure handling.

JAVASCRIPT
// MongoDB multi-document transaction
const session = client.startSession();
try {
  session.startTransaction();
  await accounts.updateOne({ _id: "A" }, { $inc: { balance: -100 } }, { session });
  await accounts.updateOne({ _id: "B" }, { $inc: { balance: 100 } }, { session });
  await session.commitTransaction();
} catch (e) {
  await session.abortTransaction();
} finally {
  session.endSession();
}

# DynamoDB TransactWriteItems (pseudo-structure)
transact_items = [
  {"Update": {"TableName": "Inventory", "Key": {"PK": "ITEM#1"}, "UpdateExpression": "ADD qty :delta"}},
  {"Update": {"TableName": "Inventory", "Key": {"PK": "ITEM#2"}, "UpdateExpression": "ADD qty :neg_delta"}}
]
dynamodb.transact_write_items(TransactItems=transact_items)

Key Takeaways

  • MongoDB: Multi-document transactions with sessions — ACID across documents
  • DynamoDB: TransactWriteItems for atomic multi-item ops (up to 100 items)
  • Saga pattern for cross-service: compensating transactions, eventual consistency