Design a URL Shortener: A Level-by-Level Interview Walkthrough

Mockrounds Team·May 14, 2026·29 min read

"Design a URL shortener" is the most-asked opening system design question, and the most misunderstood. Candidates treat it as an encoding puzzle, asking how to turn a long URL into a short one, and spend their time on Base62 math. That is the least interesting decision in the problem and the interviewer knows it.

The real question being probed is whether you can recognise that this is a read-dominated, availability-first system where the short code is a detail and the entire score lives in the read path, the failure modes, and how the design degrades under load. Here is the defensible take this walkthrough will earn: on a URL shortener, the encoding scheme is a tiebreaker. The design is won or lost on the redirect path and on how small you can make the blast radius when something fails. This follows the system design framework end to end, but scored the way an interviewer actually scores it: what a Mid, Senior, and Staff answer must each say at every decision, and the follow-up that separates them.

1. Requirements: the scope you negotiate, not the scope you're given

The prompt is deliberately under-specified. What you choose to pin down is itself scored before you draw a single box.

Functional: create a short URL from a long one. Visiting the short URL issues an HTTP redirect to the original. Likely extensions, in priority order: custom aliases, link expiry, and click analytics.

Non-functional, and this is where the design is actually decided:

The system is read-skewed by one to two orders of magnitude, with far more redirects than creations. Every later decision flows from this one fact.
A redirect must be fast (tens of milliseconds) and highly available. A shortener that is down breaks every link anyone ever shared. The availability requirement is unusually strict because the blast radius of an outage is every link in existence, not just new ones.
The mapping must be durable. Losing short_code → long_url silently breaks links forever, with no way for a user to even know.
Analytics can be eventually consistent and lossy. A click count that is approximate and a few seconds stale costs nothing. Conflating this with the durability requirement above is the most common scoping error.

MidSeparate functional from non-functional and name the read skew

"Before I design anything: functionally it's create-and-redirect, with custom aliases and analytics as likely follow-ons. Non-functionally the thing that matters is that this is overwhelmingly read-heavy and the redirect has to be low-latency and highly available. The mapping has to be durable. Analytics can be lossy. I'll design around the read path."

"Why does read-vs-write skew change your design at all?"

A Mid answer that lists requirements but cannot say what the skew changes (caching, replication, where you spend the latency budget) reads as a memorised checklist and caps here.

SeniorQuantify the skew and convert requirements into a budget

"I'd put a number on it. Assume 100:1 reads to writes and I'll size it in a second. That ratio tells me the redirect path gets a latency and availability budget and the write path can be slower and is allowed to be the part that fails first. Durability is non-negotiable on the mapping, eventual consistency is fine on analytics, so those are two different storage problems and I won't solve them with one system."

"Which requirement would you relax under pressure, and which would you never relax?"

Senior is expected to volunteer the read:write number and to split analytics from the mapping as separate durability problems without being asked. A Mid answer treats them as one.

StaffFrame requirements as the blast radius of being wrong

"The requirement I'd anchor the whole design on is that an outage here doesn't degrade new links, it degrades every link ever created. That reframes availability from a number into a blast-radius problem: I'd rather serve a slightly stale redirect from a read replica than return an error, and I want the create path and the redirect path to fail independently. I'd explicitly de-scope strong consistency on reads now and write that down, so we don't accidentally design a system that's consistent and unavailable when the requirement is the opposite."

"What does 'fail independently' cost you?"

Staff is expected to turn a non-functional requirement into an architectural constraint (independent failure domains) and to consciously trade consistency for availability with the CAP reasoning stated, not implied.

2. Capacity estimation: numbers that decide the architecture

Estimation is not arithmetic for its own sake. Each number must change a later decision or it was wasted time. Assume 100M new URLs/month at a 100:1 read:write ratio.

Writes: 100M ÷ (30 × 86,400 s) ≈ ~40 writes/sec. Trivial. The write path is not a scaling problem. It's a correctness problem.
Reads: 40 × 100 ≈ ~4,000 redirects/sec average, call it ~10,000/sec peak. This is the number that matters.
Storage: 100M/month × ~500 bytes/row ≈ 50 GB/month, ~0.6 TB/year. Five years is a few TB, which is small. Storage is not the bottleneck, which means any answer whose centrepiece is "shard the database for storage" has misread the problem.
Cache working set: redirects follow a heavy power law, so the hot tail is small. Cache the top ~20% of links and you absorb the large majority of the 10k/sec. Caching ~20M hot mappings × ~500 bytes ≈ ~10 GB, which fits in a single Redis node's memory, let alone a small cluster.

The conclusion the numbers force: this is not a storage-scaling problem, it is a read-latency and availability problem solved primarily with a cache. That sentence is the entire point of doing the estimation.

MidProduce order-of-magnitude numbers and one conclusion

"~40 writes/sec, ~4,000 reads/sec, well under a TB a year. So storage is easy and the design problem is serving 4–10k redirects/sec fast. I'd put a cache in front of the store."

"Your storage is tiny. Why isn't a single SQL box the whole answer?"

A Mid who can't connect the cache-working-set size to why caching works here (power-law traffic, ~10 GB hot set) is reciting numbers, not reasoning from them.

SeniorDerive the cache strategy from the distribution, not assert it

"The averages hide the real signal: traffic is power-law, so a small hot set serves most reads. Roughly 20M hot mappings at ~500 bytes is ~10 GB, which fits in memory on one cache node. So I'm not caching to reduce storage load, I'm caching because a ~10 GB in-memory set can serve ~90%+ of a 10k/sec read rate at sub-millisecond latency, and the datastore only sees the cold tail plus writes."

"Quantify 'most reads'. What hit rate are you assuming and what breaks if you're wrong by 10 points?"

Senior is expected to tie the estimate to a named hit rate and reason about sensitivity. Mid stops at "add a cache."

StaffUse the numbers to kill complexity

"The most useful thing these numbers do is tell me what not to build. 0.6 TB/year and 40 writes/sec means no sharding for scale, no exotic write path, no distributed transactions. A single primary with replicas is genuinely enough for years, and proposing more is negative signal. I'd state that explicitly so we spend the interview on the read path and failure modes, which is where the actual difficulty is."

"When does that 'single primary is enough' claim break?"

Staff is expected to use estimation to prune the design space and defend a deliberately simple core, and to know the breakpoint where it stops holding (write rate, not storage).

3. API and the redirect status code: the depth tell

Two endpoints:

POST /shorten   { longUrl, customAlias?, expiresAt? }  ->  201 { shortUrl }
GET  /{shortCode}                                       ->  redirect to longUrl

POST /shorten should be idempotent on (user, longUrl) so a retried creation doesn't mint duplicate codes, a detail most candidates miss and a cheap point to win.

The decision that separates levels here is which redirect status code, and it is the single best depth tell in this entire problem:

301 (Permanent): browsers and proxies cache it aggressively. The first click hits your server. Subsequent clicks from that client may never reach you again. Fastest for the user, but you have just destroyed your click analytics and lost the ability to ever change or disable that link's destination.
302 (Found / Temporary): the client re-requests every time. You keep full analytics and full control over the mapping, at the cost of every click hitting your infrastructure.

This is a genuine product-vs-infrastructure trade-off, not a trivia question, and "I'd use 301 because it's faster" is a trap the interviewer is hoping you step in.

MidDefine both endpoints and pick a code with a reason

"POST /shorten returns the short URL and GET /{code} issues the redirect. I'd use a 302 so we can still count clicks and change the target later."

"302 means every click hits you forever. Defend that cost against the 100:1 read load."

A Mid who picks a code by speed alone, or can't say why 302 despite the extra load, reveals they don't see the analytics/control consequence. That's the boundary.

SeniorReason about idempotency and the 301/302 consequence chain

"302, and deliberately. 301 gets cached by the browser, so after the first hit you stop seeing the click entirely, which kills analytics and means you can never disable an abusive link because clients have it memorised. The extra redirect load from 302 is exactly the load the cache is there to absorb, so it's affordable. I'd also make POST /shorten idempotent on (user, longUrl) so client retries don't create duplicates, and return 201 with the existing code on a repeat."

"Is there ever a case you'd accept 301?"

Senior is expected to trace the full consequence chain of 301 (analytics loss → inability to revoke → security problem) and to volunteer idempotency unprompted.

StaffFrame the status code as a reversibility / safety decision

"I treat the 301-vs-302 choice as a reversibility decision, not a performance one. A 301 is effectively an irreversible publish. Once clients cache it, the link is out of our control, which is unacceptable for a system that has to be able to kill phishing and abusive links on demand. So 302 is a safety requirement, and the cost is bounded because the redirect path is cached and cheap. If a specific customer truly needs 301 for SEO link-equity reasons, I'd make it an explicit, per-link opt-in with abuse review, not the default. The default must be the safe, revocable one."

"How does 302 interact with your CDN/edge caching then?"

Staff is expected to connect the status code to operability and abuse response and to handle the edge-caching follow-up: a short TTL at the edge, not a permanent client cache, rather than treat it as settled.

4. Generating the short code

The naïve schema is short_code → long_url. The real question is where the code comes from, and the options form a clear trade-off tree:

Hash the URL (MD5/SHA) and truncate to 7 chars: stateless and simple, but truncation creates a birthday-bound collision problem, so every write needs a read-before-write to check for collisions, which reintroduces a read on the write path and gets worse as the table grows. Identical URLs also collide by design (sometimes wanted, usually not). Rejected as a default for the collision-check cost.
Base62 of a DB auto-increment ID: zero collisions (the ID is unique), but the code is now sequential and enumerable: a competitor can scrape your entire link space and measure your growth, and users can walk other users' links. Rejected for the enumeration/abuse exposure.
Key Generation Service (KGS): pre-generate random, unique 7-char keys offline into an unused_keys table. App servers lease a block of a few thousand keys into memory and hand them out. No collision check on the hot path, no enumeration, write path is a pure insert. Usually the strongest answer, at the cost of a new stateful service to operate.

7 Base62 chars give 62⁷ ≈ 3.5 trillion codes. At 100M/month that is ~3,000 years of supply, so key exhaustion is a non-issue and 7 chars is the right fixed length.

MidReach a correct, collision-safe scheme

"I'd Base62-encode a unique counter so codes never collide and I don't need a collision check on write. Seven characters is ~3.5 trillion combinations, far more than we need."

"What's wrong with that the moment it's in production?"

A Mid who cannot answer "sequential codes are enumerable, causing volume leakage and link-walking" caps exactly here. Naming that flaw unprompted is the Senior line.

SeniorVolunteer the enumeration flaw and remove the DB from the write path

"A raw counter is enumerable, which leaks volume and lets people walk other links, so I'd decouple the code from the primary key with a Key Generation Service: pre-generate random unique keys into an unused-keys table, and have each app server lease a block of a few thousand into memory. That removes the database from the write hot path entirely and there's no collision check. If a server dies it loses an in-memory block. That is fine. We burn a few thousand keys out of 3.5 trillion."

"Two app servers, one KGS. How do you guarantee no two servers ever lease the same block?"

Senior is expected to reach an atomic claim on the KGS (a transactional UPDATE ... WHERE status='unused' LIMIT n RETURNING, or an atomic counter) without help. Hand-waving "the KGS handles it" is the boundary.

StaffJustify the added stateful tier against the simpler option

"Before adding a KGS I'd challenge whether we even need unguessable codes. For a public shortener we do, because enumerability is a real abuse and scraping vector. Given that, KGS beats hashing specifically because MD5-truncate forces a read-before-write that grows with the table and reintroduces the hot path we're trying to delete. The honest cost of the KGS is that it's a new stateful service with its own availability. I'd make it multi-region with disjoint key ranges per region so a regional KGS outage degrades to 'that region can't mint new links' while redirects, 99% of traffic, are completely unaffected. That failure-domain isolation is the actual design decision. The encoding is a detail."

"Disjoint ranges per region. What happens on a region split or failover?"

Staff is expected to defend a stateful dependency by bounding its blast radius, and to already be reasoning about cross-region range allocation rather than treating the KGS as free.

5. Data model and storage choice

The access pattern is the design input: a single point lookup by short_code, no joins, no range scans, overwhelmingly reads.

Field	Notes
`short_code`	primary key, fixed 7 chars
`long_url`	the destination
`created_at`	for expiry / cleanup jobs
`expires_at`	nullable, for link expiry
`user_id`	nullable, ownership + abuse attribution

Because the pattern is pure key-value, either a partitioned key-value store or a single relational primary with read replicas works. At 0.6 TB/year the relational option is genuinely sufficient. The interesting choice is not SQL-vs-NoSQL in the abstract. It's whether the operational simplicity of one primary plus replicas beats the horizontal-scaling story you don't actually need yet.

MidPick a store that matches the access pattern

"It's a key lookup with no joins, so a key-value store keyed on short_code, or a single indexed SQL table. Either is fine and I'd take whichever the team already runs."

"Defend that against 'we should obviously use NoSQL to scale.'"

A Mid who picks NoSQL by reflex ("it scales") rather than from the access pattern and the tiny data size is pattern-matching, not reasoning. That's the boundary.

SeniorReason from access pattern and data size, not buzzwords

"The pattern is a primary-key point read, so the data model is trivial in either family. The real question is operational. At 0.6 TB/year I don't need horizontal sharding, so a single relational primary with read replicas is the simpler system to run and reason about, and replicas directly serve the read skew. I'd only reach for a partitioned KV store if write rate or working set actually outgrew one node, which the numbers say won't happen for years."

"Your reads outgrow replica capacity one day. What's the first thing you change?"

Senior is expected to choose from data size + ops cost, not the NoSQL reflex, and to know the next scaling step (more replicas / cache before sharding).

StaffTie storage to failure independence

"I'd choose the store that makes the read and write paths fail independently. A single primary with replicas gives me that: writes can be unavailable (KGS/primary issue) while replicas keep serving redirects read-only, which matches the requirement that an outage must never break existing links. Sharding now would add a failure mode (cross-shard ops, rebalancing) to buy scale we don't need, which is negative signal. I'd revisit only when write rate, not storage, forces it, and I'd say so out loud so the simplicity is a stated decision, not an omission."

"Replica lag. Can a freshly created link 404 on a replica?"

Staff is expected to connect storage topology to failure domains and to anticipate the read-your-writes follow-up (route the immediate post-creation read to the primary, or accept brief 404 with a retry).

6. The read path: caching, hot keys, and stampede

This is the product. Optimise it and almost nothing else matters. Get it wrong and nothing else saves you.

Layered: a CDN/edge layer can absorb the hottest links entirely (with a short TTL, never a permanent client cache, see the 301/302 reasoning). Behind it, an in-memory cache (Redis) makes the common case a single memory lookup. The datastore sees only cold misses and writes. Three depth points generic answers skip:

Hit rate is the whole game. At a 95% hit rate the DB sees ~500 reads/sec of 10k. At 85% it sees ~1,500/sec, a 3× swing in DB load from a 10-point hit rate change. This is why "add a cache" is not an answer. The distribution and eviction policy are.
Hot keys can exceed a single cache node's throughput when a link goes viral. Mitigations: replicate hot keys across cache nodes, or push the hot tail to the CDN edge so it never reaches Redis.
Cache stampede: when a hot key expires, thousands of concurrent requests miss simultaneously and hammer the DB for the same key. Mitigations: a per-key lock / request coalescing (one request repopulates, the rest wait), or probabilistic early expiration (refresh slightly before TTL under load).

MidPut a cache in front and explain the hit path

"Redis in front of the datastore. On a redirect, check the cache. On a hit return the 302. On a miss read the DB, populate the cache, then redirect. LRU eviction so hot links stay resident."

"A link goes viral and all the traffic is one key. What happens?"

A Mid who describes the happy-path cache but has no answer for a single hot key or for what happens the instant a hot key expires caps here.

SeniorQuantify hit rate and name stampede / hot-key explicitly

"I'd size it from the power-law: target ~95% hit rate, because the DB load is brutally sensitive to it. 95 vs 85 is a 3× difference in DB reads at our volume. Two failure modes I'd design for now. First, hot keys, where one viral link exceeds a node's throughput, handled by replicating hot keys or pushing them to the edge. Second, cache stampede on expiry, handled with request coalescing so one request repopulates while the rest wait, plus a small random TTL jitter so keys don't expire in lockstep."

"Request coalescing across many app servers. Where does the lock live?"

Senior is expected to name stampede and hot keys unprompted with concrete mitigations and the hit-rate sensitivity number. Mid produces the happy path. Senior produces the failure path.

StaffTreat the cache as an availability tier with its own blast radius

"I treat the cache as part of the availability story, not an optimisation, so I care about how it fails. If Redis goes down, 10k/sec lands on the DB instantly. The cache must fail soft: on cache-layer outage, shed or coalesce aggressively and serve from read replicas rather than fall over, and accept degraded latency over an outage. I'd push the hottest tail to the CDN edge specifically so the most damaging hot-key traffic never depends on Redis being healthy. The design goal is that no single cache node, and not the whole cache tier, can take redirects down. Redirects down means every link is down."

"Edge serves a link you just disabled for abuse. How fast can you purge it?"

Staff is expected to reason about cache-tier failure as a system availability event, design fail-soft behaviour, and handle the edge-purge-vs-revocation tension (short edge TTL + active purge API).

7. Availability, consistency, and the custom-alias write conflict

Reads should favour availability over consistency (AP): a slightly stale or replica-served redirect beats an error, because an error breaks a link permanently from the user's perspective. The mapping is effectively write-once-read-many, so eventual consistency on reads is nearly free. Replication lag only matters in the seconds immediately after creation.

The one place consistency genuinely bites is custom aliases: two users racing for /sale is a uniqueness constraint that must be strongly consistent, even though everything else is relaxed. This split, eventually consistent reads and strongly consistent alias allocation, is the nuanced point here.

MidState the read availability preference

"Favour availability on the read path. Serve from a replica. A stale redirect is better than an error. The mapping rarely changes so eventual consistency is fine."

"Two users both request the alias /sale at the same instant. What happens?"

A Mid who applies "eventual consistency everywhere" and has no special handling for alias uniqueness misses that one sub-problem needs the opposite guarantee. That's the boundary.

SeniorSeparate the consistency model of reads from alias allocation

"Reads are AP, meaning replicas and stale-tolerant. But custom-alias allocation is the exception: it's a uniqueness constraint, so that specific write has to be strongly consistent, using a conditional insert or unique index on the primary so exactly one of the racing requests wins and the other gets a clear conflict. So the system is mostly AP with a small strongly-consistent island around alias creation, and I'd call that out rather than pretend one model fits everything."

"Replica lag means a just-created link 404s for a few seconds. Acceptable?"

Senior is expected to name the mixed consistency model and handle read-your-writes (route the immediate post-create read to the primary, or return a short retry) rather than wave at "eventual."

StaffReason about it as independent failure domains and CAP on purpose

"I'd make the CAP choice explicit and per-path. Redirects choose availability: if the primary is unreachable, replicas keep serving reads read-only and the system stays up for the 99% case. Creation chooses consistency for alias uniqueness and is allowed to be unavailable during a primary outage, because failing to create a new link is recoverable and failing to serve existing ones is not. That asymmetry, where the write path may fail so the read path never has to, is the core availability design, and it's a deliberate CAP decision, not a side effect."

"During a primary failover, what exactly is the user-visible behaviour?"

Staff is expected to assign CAP positions per path with the recoverability argument and describe graceful degradation during failover (reads continue, creates return a clear retryable error).

8. Analytics: the pipeline that must never block a redirect

Click analytics is the classic trap: counting clicks inline on the redirect path couples a lossy, low-value feature to the system's most critical, latency-sensitive path. The rule: the redirect must never wait on analytics.

flowchart LR
  R["Redirect<br/>handler"] -->|302 now| U[Client]
  R -.->|emit event| Q[["Event<br/>queue"]]
  Q --> C["Async<br/>consumers"]
  C --> AGG["Aggregates<br/>(HLL uniques)"]
  C --> DW[("Analytics<br/>store")]

Emit a fire-and-forget event to a queue (Kafka/Kinesis) after issuing the 302 and aggregate asynchronously. For high-cardinality metrics like unique visitors, exact counting is expensive and unnecessary. HyperLogLog gives unique counts within ~2% using a few KB per link, which is the right accuracy for an analytics dashboard and a strong depth signal.

MidDecouple analytics from the redirect

"Don't count synchronously. Issue the redirect first, then emit an event to a queue and let a separate consumer aggregate. If analytics is down, redirects are unaffected."

"Counting unique visitors per link at this scale, exact count is expensive. What do you do?"

A Mid who decouples correctly but assumes exact counting, with no notion that approximate is acceptable here, caps at the boundary.

SeniorChoose approximate aggregation deliberately

"Redirect first, async event to Kafka, consumers aggregate. For unique visitors I'd use HyperLogLog rather than store every visitor id, with ~2% error for a few KB per link, which is the right trade for a dashboard metric. Total clicks can be a simple counter with periodic flush. The pipeline is allowed to lose a small fraction of events. That's consistent with the requirement that analytics is lossy."

"Your consumer is down for an hour. What's the data story?"

Senior is expected to pick approximate structures on purpose with the error/space trade stated, and to be comfortable with bounded loss. Mid decouples but over-specs accuracy.

StaffFrame analytics as strictly subordinate to redirect SLOs

"The design constraint is that analytics has zero claim on the redirect's latency or availability budget. So it's fully asynchronous, the queue absorbs consumer downtime (replay on recovery), and I'd accept approximate and slightly delayed counts as a requirement, not a compromise. HyperLogLog for uniques, idempotent event keys so replay doesn't double-count. If analytics and redirects ever contend for a shared resource, analytics loses by design. The failure I'm engineering out is analytics back-pressure ever touching the redirect path."

"Replay after an outage. How do you not double-count?"

Staff is expected to make analytics explicitly subordinate (zero budget, loses contention by design) and to handle exactly-once-ish via idempotent event ids on replay.

9. Abuse, security, and what changes at 100×

A public shortener is an abuse magnet. This section is where Staff candidates separate hardest and most Mid candidates never go unprompted.

Open redirect / malicious links: shorteners are used to mask phishing and malware. You need link scanning (Safe Browsing-style), the ability to disable a link instantly (which is why 302 over 301, since a 301 you cannot revoke), and abuse-rate controls per account.
Rate limiting: creation must be rate-limited per user/IP or it becomes a spam-link factory. The redirect path needs abuse controls without adding latency to legitimate clicks.
At 100× (10M writes/day, ~1M redirects/sec): the read path scales horizontally on cache, CDN, and replicas. That's the part designed for it. What actually strains is the KGS (multi-region disjoint ranges), the analytics pipeline (partitioned queue), and abuse response latency. Storage still isn't the problem, which is the point.

MidAt least name that abuse exists

"I'd rate-limit creation per user to stop spam, and we need a way to disable a malicious link."

"How do you disable a link that's already been clicked thousands of times?"

A Mid is not expected to design abuse defence, but one who never mentions it even when prompted, or doesn't connect revocation back to the 302 choice, caps here.

SeniorDesign revocation and rate limiting concretely

"Per-account creation rate limits, and a disabled flag checked on the redirect path (cheap, cached) so we can kill an abusive link instantly. This only works because we chose 302. A 301 would be cached client-side and unrevocable. I'd integrate a URL reputation check on creation, async so it doesn't slow creation, quarantining suspicious links."

"Reputation check is async. What serves the link in the gap?"

Senior is expected to connect revocation to the 302 decision and handle the async-scan race (default-deny suspicious patterns, or a brief interstitial until cleared).

StaffFrame abuse as a first-class availability / latency constraint

"Abuse response is an availability feature: the time-to-disable a malicious link is an SLO, because a shortener that can't kill phishing fast is a liability, not just a bug. That SLO is what forces 302 + a cached disabled flag + short edge TTLs with an active purge API, so revocation propagates in seconds even from the edge. At 100× the read path was built to scale. The things I'd actually re-architect are the KGS into per-region ranges and the abuse-propagation path, because those are the parts whose failure has the widest blast radius."

"Edge caching vs instant revocation directly conflict. Resolve it."

Staff is expected to treat time-to-revoke as an SLO and explicitly resolve the edge-TTL-vs-revocation tension rather than leave it implicit.

What this question is really testing

Re-read the boundaries above and the pattern is clear: almost none of the score is in the encoding. It is in recognising the read-skew early, defending a deliberately simple core with numbers, tracing the 301/302 consequence chain, and, at Senior and Staff, naming failure modes (stampede, hot keys, cache-tier outage, unrevocable links) before the interviewer asks. The candidates who struggle are usually the ones who knew all of this but had only ever read it, never said it out loud while someone pushed back with the follow-up digs in each section above.

That gap, knowing it versus performing it under interruption, is the only thing a mock interview trains and solo study cannot. If you can deliver the Senior answers here without notes and survive the "interviewer digs" lines, you are ready for this question. If you can't yet, that is exactly the rep to run. Pricing and bundles.

Loading…