Design a URL Shortener: A Level-by-Level Interview Walkthrough
Loading…
Loading…
"Design a URL shortener" is the most-asked opening system design question, and the most misunderstood. Candidates treat it as an encoding puzzle, asking how to turn a long URL into a short one, and spend their time on Base62 math. That is the least interesting decision in the problem and the interviewer knows it.
The real question being probed is whether you can recognise that this is a read-dominated, availability-first system where the short code is a detail and the entire score lives in the read path, the failure modes, and how the design degrades under load. Here is the defensible take this walkthrough will earn: on a URL shortener, the encoding scheme is a tiebreaker. The design is won or lost on the redirect path and on how small you can make the blast radius when something fails. This follows the system design framework end to end, but scored the way an interviewer actually scores it: what a Mid, Senior, and Staff answer must each say at every decision, and the follow-up that separates them.
The prompt is deliberately under-specified. What you choose to pin down is itself scored before you draw a single box.
Functional: create a short URL from a long one. Visiting the short URL issues an HTTP redirect to the original. Likely extensions, in priority order: custom aliases, link expiry, and click analytics.
Non-functional, and this is where the design is actually decided:
short_code → long_url silently
breaks links forever, with no way for a user to even know."Before I design anything: functionally it's create-and-redirect, with custom aliases and analytics as likely follow-ons. Non-functionally the thing that matters is that this is overwhelmingly read-heavy and the redirect has to be low-latency and highly available. The mapping has to be durable. Analytics can be lossy. I'll design around the read path."
"Why does read-vs-write skew change your design at all?"
A Mid answer that lists requirements but cannot say what the skew changes (caching, replication, where you spend the latency budget) reads as a memorised checklist and caps here.
"I'd put a number on it. Assume 100:1 reads to writes and I'll size it in a second. That ratio tells me the redirect path gets a latency and availability budget and the write path can be slower and is allowed to be the part that fails first. Durability is non-negotiable on the mapping, eventual consistency is fine on analytics, so those are two different storage problems and I won't solve them with one system."
"Which requirement would you relax under pressure, and which would you never relax?"
Senior is expected to volunteer the read:write number and to split analytics from the mapping as separate durability problems without being asked. A Mid answer treats them as one.
"The requirement I'd anchor the whole design on is that an outage here doesn't degrade new links, it degrades every link ever created. That reframes availability from a number into a blast-radius problem: I'd rather serve a slightly stale redirect from a read replica than return an error, and I want the create path and the redirect path to fail independently. I'd explicitly de-scope strong consistency on reads now and write that down, so we don't accidentally design a system that's consistent and unavailable when the requirement is the opposite."
"What does 'fail independently' cost you?"
Staff is expected to turn a non-functional requirement into an architectural constraint (independent failure domains) and to consciously trade consistency for availability with the CAP reasoning stated, not implied.
Estimation is not arithmetic for its own sake. Each number must change a later decision or it was wasted time. Assume 100M new URLs/month at a 100:1 read:write ratio.
The conclusion the numbers force: this is not a storage-scaling problem, it is a read-latency and availability problem solved primarily with a cache. That sentence is the entire point of doing the estimation.
"~40 writes/sec, ~4,000 reads/sec, well under a TB a year. So storage is easy and the design problem is serving 4–10k redirects/sec fast. I'd put a cache in front of the store."
"Your storage is tiny. Why isn't a single SQL box the whole answer?"
A Mid who can't connect the cache-working-set size to why caching works here (power-law traffic, ~10 GB hot set) is reciting numbers, not reasoning from them.
"The averages hide the real signal: traffic is power-law, so a small hot set serves most reads. Roughly 20M hot mappings at ~500 bytes is ~10 GB, which fits in memory on one cache node. So I'm not caching to reduce storage load, I'm caching because a ~10 GB in-memory set can serve ~90%+ of a 10k/sec read rate at sub-millisecond latency, and the datastore only sees the cold tail plus writes."
"Quantify 'most reads'. What hit rate are you assuming and what breaks if you're wrong by 10 points?"
Senior is expected to tie the estimate to a named hit rate and reason about sensitivity. Mid stops at "add a cache."
"The most useful thing these numbers do is tell me what not to build. 0.6 TB/year and 40 writes/sec means no sharding for scale, no exotic write path, no distributed transactions. A single primary with replicas is genuinely enough for years, and proposing more is negative signal. I'd state that explicitly so we spend the interview on the read path and failure modes, which is where the actual difficulty is."
"When does that 'single primary is enough' claim break?"
Staff is expected to use estimation to prune the design space and defend a deliberately simple core, and to know the breakpoint where it stops holding (write rate, not storage).
Two endpoints:
POST /shorten { longUrl, customAlias?, expiresAt? } -> 201 { shortUrl }
GET /{shortCode} -> redirect to longUrl
POST /shorten should be idempotent on (user, longUrl) so a retried
creation doesn't mint duplicate codes, a detail most candidates miss and a
cheap point to win.
The decision that separates levels here is which redirect status code, and it is the single best depth tell in this entire problem:
This is a genuine product-vs-infrastructure trade-off, not a trivia question, and "I'd use 301 because it's faster" is a trap the interviewer is hoping you step in.
"
POST /shortenreturns the short URL andGET /{code}issues the redirect. I'd use a 302 so we can still count clicks and change the target later."
"302 means every click hits you forever. Defend that cost against the 100:1 read load."
A Mid who picks a code by speed alone, or can't say why 302 despite the extra load, reveals they don't see the analytics/control consequence. That's the boundary.
"302, and deliberately. 301 gets cached by the browser, so after the first hit you stop seeing the click entirely, which kills analytics and means you can never disable an abusive link because clients have it memorised. The extra redirect load from 302 is exactly the load the cache is there to absorb, so it's affordable. I'd also make
POST /shortenidempotent on (user, longUrl) so client retries don't create duplicates, and return 201 with the existing code on a repeat."
"Is there ever a case you'd accept 301?"
Senior is expected to trace the full consequence chain of 301 (analytics loss → inability to revoke → security problem) and to volunteer idempotency unprompted.
"I treat the 301-vs-302 choice as a reversibility decision, not a performance one. A 301 is effectively an irreversible publish. Once clients cache it, the link is out of our control, which is unacceptable for a system that has to be able to kill phishing and abusive links on demand. So 302 is a safety requirement, and the cost is bounded because the redirect path is cached and cheap. If a specific customer truly needs 301 for SEO link-equity reasons, I'd make it an explicit, per-link opt-in with abuse review, not the default. The default must be the safe, revocable one."
"How does 302 interact with your CDN/edge caching then?"
Staff is expected to connect the status code to operability and abuse response and to handle the edge-caching follow-up: a short TTL at the edge, not a permanent client cache, rather than treat it as settled.
The naïve schema is short_code → long_url. The real question is where the
code comes from, and the options form a clear trade-off tree:
unused_keys table. App servers lease a block of a few
thousand keys into memory and hand them out. No collision check on the hot
path, no enumeration, write path is a pure insert. Usually the strongest
answer, at the cost of a new stateful service to operate.7 Base62 chars give 62⁷ ≈ 3.5 trillion codes. At 100M/month that is ~3,000 years of supply, so key exhaustion is a non-issue and 7 chars is the right fixed length.
"I'd Base62-encode a unique counter so codes never collide and I don't need a collision check on write. Seven characters is ~3.5 trillion combinations, far more than we need."
"What's wrong with that the moment it's in production?"
A Mid who cannot answer "sequential codes are enumerable, causing volume leakage and link-walking" caps exactly here. Naming that flaw unprompted is the Senior line.
"A raw counter is enumerable, which leaks volume and lets people walk other links, so I'd decouple the code from the primary key with a Key Generation Service: pre-generate random unique keys into an unused-keys table, and have each app server lease a block of a few thousand into memory. That removes the database from the write hot path entirely and there's no collision check. If a server dies it loses an in-memory block. That is fine. We burn a few thousand keys out of 3.5 trillion."
"Two app servers, one KGS. How do you guarantee no two servers ever lease the same block?"
Senior is expected to reach an atomic claim on the KGS (a transactional UPDATE ... WHERE status='unused' LIMIT n RETURNING, or an atomic counter) without help. Hand-waving "the KGS handles it" is the boundary.
"Before adding a KGS I'd challenge whether we even need unguessable codes. For a public shortener we do, because enumerability is a real abuse and scraping vector. Given that, KGS beats hashing specifically because MD5-truncate forces a read-before-write that grows with the table and reintroduces the hot path we're trying to delete. The honest cost of the KGS is that it's a new stateful service with its own availability. I'd make it multi-region with disjoint key ranges per region so a regional KGS outage degrades to 'that region can't mint new links' while redirects, 99% of traffic, are completely unaffected. That failure-domain isolation is the actual design decision. The encoding is a detail."
"Disjoint ranges per region. What happens on a region split or failover?"
Staff is expected to defend a stateful dependency by bounding its blast radius, and to already be reasoning about cross-region range allocation rather than treating the KGS as free.
The access pattern is the design input: a single point lookup by short_code,
no joins, no range scans, overwhelmingly reads.
| Field | Notes |
|---|---|
short_code | primary key, fixed 7 chars |
long_url | the destination |
created_at | for expiry / cleanup jobs |
expires_at | nullable, for link expiry |
user_id | nullable, ownership + abuse attribution |
Because the pattern is pure key-value, either a partitioned key-value store or a single relational primary with read replicas works. At 0.6 TB/year the relational option is genuinely sufficient. The interesting choice is not SQL-vs-NoSQL in the abstract. It's whether the operational simplicity of one primary plus replicas beats the horizontal-scaling story you don't actually need yet.
"It's a key lookup with no joins, so a key-value store keyed on
short_code, or a single indexed SQL table. Either is fine and I'd take whichever the team already runs."
"Defend that against 'we should obviously use NoSQL to scale.'"
A Mid who picks NoSQL by reflex ("it scales") rather than from the access pattern and the tiny data size is pattern-matching, not reasoning. That's the boundary.
"The pattern is a primary-key point read, so the data model is trivial in either family. The real question is operational. At 0.6 TB/year I don't need horizontal sharding, so a single relational primary with read replicas is the simpler system to run and reason about, and replicas directly serve the read skew. I'd only reach for a partitioned KV store if write rate or working set actually outgrew one node, which the numbers say won't happen for years."
"Your reads outgrow replica capacity one day. What's the first thing you change?"
Senior is expected to choose from data size + ops cost, not the NoSQL reflex, and to know the next scaling step (more replicas / cache before sharding).
"I'd choose the store that makes the read and write paths fail independently. A single primary with replicas gives me that: writes can be unavailable (KGS/primary issue) while replicas keep serving redirects read-only, which matches the requirement that an outage must never break existing links. Sharding now would add a failure mode (cross-shard ops, rebalancing) to buy scale we don't need, which is negative signal. I'd revisit only when write rate, not storage, forces it, and I'd say so out loud so the simplicity is a stated decision, not an omission."
"Replica lag. Can a freshly created link 404 on a replica?"
Staff is expected to connect storage topology to failure domains and to anticipate the read-your-writes follow-up (route the immediate post-creation read to the primary, or accept brief 404 with a retry).
This is the product. Optimise it and almost nothing else matters. Get it wrong and nothing else saves you.
Layered: a CDN/edge layer can absorb the hottest links entirely (with a short TTL, never a permanent client cache, see the 301/302 reasoning). Behind it, an in-memory cache (Redis) makes the common case a single memory lookup. The datastore sees only cold misses and writes. Three depth points generic answers skip:
"Redis in front of the datastore. On a redirect, check the cache. On a hit return the 302. On a miss read the DB, populate the cache, then redirect. LRU eviction so hot links stay resident."
"A link goes viral and all the traffic is one key. What happens?"
A Mid who describes the happy-path cache but has no answer for a single hot key or for what happens the instant a hot key expires caps here.
"I'd size it from the power-law: target ~95% hit rate, because the DB load is brutally sensitive to it. 95 vs 85 is a 3× difference in DB reads at our volume. Two failure modes I'd design for now. First, hot keys, where one viral link exceeds a node's throughput, handled by replicating hot keys or pushing them to the edge. Second, cache stampede on expiry, handled with request coalescing so one request repopulates while the rest wait, plus a small random TTL jitter so keys don't expire in lockstep."
"Request coalescing across many app servers. Where does the lock live?"
Senior is expected to name stampede and hot keys unprompted with concrete mitigations and the hit-rate sensitivity number. Mid produces the happy path. Senior produces the failure path.
"I treat the cache as part of the availability story, not an optimisation, so I care about how it fails. If Redis goes down, 10k/sec lands on the DB instantly. The cache must fail soft: on cache-layer outage, shed or coalesce aggressively and serve from read replicas rather than fall over, and accept degraded latency over an outage. I'd push the hottest tail to the CDN edge specifically so the most damaging hot-key traffic never depends on Redis being healthy. The design goal is that no single cache node, and not the whole cache tier, can take redirects down. Redirects down means every link is down."
"Edge serves a link you just disabled for abuse. How fast can you purge it?"
Staff is expected to reason about cache-tier failure as a system availability event, design fail-soft behaviour, and handle the edge-purge-vs-revocation tension (short edge TTL + active purge API).
Reads should favour availability over consistency (AP): a slightly stale or replica-served redirect beats an error, because an error breaks a link permanently from the user's perspective. The mapping is effectively write-once-read-many, so eventual consistency on reads is nearly free. Replication lag only matters in the seconds immediately after creation.
The one place consistency genuinely bites is custom aliases: two users
racing for /sale is a uniqueness constraint that must be strongly
consistent, even though everything else is relaxed. This split, eventually
consistent reads and strongly consistent alias allocation, is the nuanced point
here.
"Favour availability on the read path. Serve from a replica. A stale redirect is better than an error. The mapping rarely changes so eventual consistency is fine."
"Two users both request the alias /sale at the same instant. What happens?"
A Mid who applies "eventual consistency everywhere" and has no special handling for alias uniqueness misses that one sub-problem needs the opposite guarantee. That's the boundary.
"Reads are AP, meaning replicas and stale-tolerant. But custom-alias allocation is the exception: it's a uniqueness constraint, so that specific write has to be strongly consistent, using a conditional insert or unique index on the primary so exactly one of the racing requests wins and the other gets a clear conflict. So the system is mostly AP with a small strongly-consistent island around alias creation, and I'd call that out rather than pretend one model fits everything."
"Replica lag means a just-created link 404s for a few seconds. Acceptable?"
Senior is expected to name the mixed consistency model and handle read-your-writes (route the immediate post-create read to the primary, or return a short retry) rather than wave at "eventual."
"I'd make the CAP choice explicit and per-path. Redirects choose availability: if the primary is unreachable, replicas keep serving reads read-only and the system stays up for the 99% case. Creation chooses consistency for alias uniqueness and is allowed to be unavailable during a primary outage, because failing to create a new link is recoverable and failing to serve existing ones is not. That asymmetry, where the write path may fail so the read path never has to, is the core availability design, and it's a deliberate CAP decision, not a side effect."
"During a primary failover, what exactly is the user-visible behaviour?"
Staff is expected to assign CAP positions per path with the recoverability argument and describe graceful degradation during failover (reads continue, creates return a clear retryable error).
Click analytics is the classic trap: counting clicks inline on the redirect path couples a lossy, low-value feature to the system's most critical, latency-sensitive path. The rule: the redirect must never wait on analytics.
Emit a fire-and-forget event to a queue (Kafka/Kinesis) after issuing the 302 and aggregate asynchronously. For high-cardinality metrics like unique visitors, exact counting is expensive and unnecessary. HyperLogLog gives unique counts within ~2% using a few KB per link, which is the right accuracy for an analytics dashboard and a strong depth signal.
"Don't count synchronously. Issue the redirect first, then emit an event to a queue and let a separate consumer aggregate. If analytics is down, redirects are unaffected."
"Counting unique visitors per link at this scale, exact count is expensive. What do you do?"
A Mid who decouples correctly but assumes exact counting, with no notion that approximate is acceptable here, caps at the boundary.
"Redirect first, async event to Kafka, consumers aggregate. For unique visitors I'd use HyperLogLog rather than store every visitor id, with ~2% error for a few KB per link, which is the right trade for a dashboard metric. Total clicks can be a simple counter with periodic flush. The pipeline is allowed to lose a small fraction of events. That's consistent with the requirement that analytics is lossy."
"Your consumer is down for an hour. What's the data story?"
Senior is expected to pick approximate structures on purpose with the error/space trade stated, and to be comfortable with bounded loss. Mid decouples but over-specs accuracy.
"The design constraint is that analytics has zero claim on the redirect's latency or availability budget. So it's fully asynchronous, the queue absorbs consumer downtime (replay on recovery), and I'd accept approximate and slightly delayed counts as a requirement, not a compromise. HyperLogLog for uniques, idempotent event keys so replay doesn't double-count. If analytics and redirects ever contend for a shared resource, analytics loses by design. The failure I'm engineering out is analytics back-pressure ever touching the redirect path."
"Replay after an outage. How do you not double-count?"
Staff is expected to make analytics explicitly subordinate (zero budget, loses contention by design) and to handle exactly-once-ish via idempotent event ids on replay.
A public shortener is an abuse magnet. This section is where Staff candidates separate hardest and most Mid candidates never go unprompted.
"I'd rate-limit creation per user to stop spam, and we need a way to disable a malicious link."
"How do you disable a link that's already been clicked thousands of times?"
A Mid is not expected to design abuse defence, but one who never mentions it even when prompted, or doesn't connect revocation back to the 302 choice, caps here.
"Per-account creation rate limits, and a disabled flag checked on the redirect path (cheap, cached) so we can kill an abusive link instantly. This only works because we chose 302. A 301 would be cached client-side and unrevocable. I'd integrate a URL reputation check on creation, async so it doesn't slow creation, quarantining suspicious links."
"Reputation check is async. What serves the link in the gap?"
Senior is expected to connect revocation to the 302 decision and handle the async-scan race (default-deny suspicious patterns, or a brief interstitial until cleared).
"Abuse response is an availability feature: the time-to-disable a malicious link is an SLO, because a shortener that can't kill phishing fast is a liability, not just a bug. That SLO is what forces 302 + a cached disabled flag + short edge TTLs with an active purge API, so revocation propagates in seconds even from the edge. At 100× the read path was built to scale. The things I'd actually re-architect are the KGS into per-region ranges and the abuse-propagation path, because those are the parts whose failure has the widest blast radius."
"Edge caching vs instant revocation directly conflict. Resolve it."
Staff is expected to treat time-to-revoke as an SLO and explicitly resolve the edge-TTL-vs-revocation tension rather than leave it implicit.
Re-read the boundaries above and the pattern is clear: almost none of the score is in the encoding. It is in recognising the read-skew early, defending a deliberately simple core with numbers, tracing the 301/302 consequence chain, and, at Senior and Staff, naming failure modes (stampede, hot keys, cache-tier outage, unrevocable links) before the interviewer asks. The candidates who struggle are usually the ones who knew all of this but had only ever read it, never said it out loud while someone pushed back with the follow-up digs in each section above.
That gap, knowing it versus performing it under interruption, is the only thing a mock interview trains and solo study cannot. If you can deliver the Senior answers here without notes and survive the "interviewer digs" lines, you are ready for this question. If you can't yet, that is exactly the rep to run. Pricing and bundles.