Stop Sweating System Design: Demystifying Caching Eviction Policies
If you're interviewing for a mid-to-senior software engineering role, especially at a FAANG company like Amazon, you know system design is the ultimate gatekeeper. While concepts like load balancing and database sharding are standard, interviewers often dive deep into specific components to test your practical knowledge. Caching—and specifically how you manage cache overflow—is a favorite.
At RolePilot, we act as your Candidate Protector. We want to ensure you're ready not just to answer what a cache is, but how it operates under pressure. This breakdown prepares you for the inevitable question: How do LRU and LFU differ, and when would you choose one over the other in a distributed cache like Redis or Memcached?
The Core Challenge: Why Cache Eviction is Essential
Caches (often powered by tools like Redis or Memcached) are lightning-fast storage layers that sit between your application and your persistent database. They are designed to hold frequently accessed data. The problem? Caches have finite memory. When the cache reaches its memory limit and a new item needs to be stored, the system must decide which existing item to evict (kick out) to make room. This decision is governed by an eviction strategy.
Choosing the wrong strategy leads to poor cache hit ratio (meaning users experience slower database lookups), which is a performance killer.
Strategy 1: Least Recently Used (LRU)
LRU is arguably the most common and simplest eviction strategy. The philosophy is straightforward: If data hasn't been accessed recently, it probably won't be accessed soon.
How LRU Works:
- Tracking: LRU requires tracking the access time for every piece of data. This is often implemented using a doubly linked list in conjunction with a hash map. The head of the list holds the most recently used item (MRU), and the tail holds the least recently used item (LRU).
- Eviction: When memory is full, the item at the tail (LRU) is removed.
LRU Pros and Cons
| Pro | Con |
|---|---|
| Simple Implementation: Low computational overhead. | Temporal Spikes: Vulnerable to cache pollution from single, large-scale events (e.g., a massive hourly report) that temporarily push out genuinely popular data. |
| Excellent for Temporal Locality: Highly effective for data that exhibits predictable access patterns based on time. | Metadata Overhead: Requires tracking the time or position of every single item, increasing memory overhead slightly. |
Strategy 2: Least Frequently Used (LFU)
LFU operates on a different heuristic: If data hasn't been accessed often, regardless of when it was last accessed, it should be evicted. The focus shifts from recency to frequency.
How LFU Works:
- Tracking: LFU tracks a counter for every item, incrementing the count each time the item is accessed. Eviction is based on the lowest count.
- Implementation: LFU is more complex to implement efficiently, often relying on frequency heaps or complex data structures like the Ad-hoc implementation (LFU-k) found in some advanced caches.
- Eviction: When memory is full, the item with the lowest access count is removed.
LFU Pros and Cons
| Pro | Con |
|---|---|
| Robust against Temporal Spikes: A single burst of accesses won't immediately evict genuinely popular data, making it great for steady-state data. | Complexity: Higher implementation overhead and computational cost during update operations. |
| Effective for Highly Popular Data: Better at retaining data that is frequently, but consistently, accessed over a long period. | The 'Startup Problem': Newly cached items start with a low frequency count, making them immediately vulnerable to eviction, even if they become popular later. |
LRU vs. LFU: Which One Wins? (The Amazon Answer)
In an interview setting, the correct answer is always "It depends on the access pattern." But here is the technical breakdown:
- Choose LRU when: Your application exhibits strong temporal locality. This is common in session management or recent activity feeds where data that was just used is highly likely to be used again soon.
- Choose LFU when: Your application exhibits strong frequency locality—data that is always popular, even if it has quiet periods. Good examples include configuration settings, static asset links, or immutable product details.
Pro Tip for Interviewers: Amazon (and many large-scale applications) often utilizes variants like Adaptive Replacement Cache (ARC) or TinyLFU. These hybrid strategies attempt to dynamically adjust between LRU and LFU behavior based on current workload, offering a much more efficient real-world solution.
Preparing for System Design Success
Passing system design interviews requires more than just recalling definitions; it demands demonstrating judgment and understanding trade-offs. Knowing how to defend your choice between LRU and LFU shows maturity and operational awareness.
If you're gearing up for a major interview, make sure your resume and cover letter are equally optimized. Don't let the ATS filter you out before you even get to the system design stage. Use our AI tools to ensure your application passes the critical checks.
Ready to stop gambling with your career applications? Check your documents against the systems recruiters use: Run an ATS Reality Check now.
Frequently Asked Questions (FAQ)
Q: Does Redis use LRU or LFU?
Redis supports multiple eviction policies, including volatile-lru, allkeys-lru, volatile-lfu, and allkeys-lfu. By default, if the max memory limit is hit and no specific policy is configured, it will typically fall back to a non-eviction mode, but users commonly set it to an LRU or LFU variant depending on their workload needs.
Q: Why is LFU harder to implement efficiently than LRU?
LFU requires constantly updating and sorting items based on their frequency count. Every access changes the internal order (the heap structure), which requires more computational steps (log N complexity) compared to LRU, which simply moves a node to the head of a linked list (O(1) complexity).
Q: What is the 'cache pollution' problem?
Cache pollution occurs when rarely accessed or one-off data fills up the cache, forcing genuinely popular items to be evicted. This is a common issue with basic LRU when a sudden, short burst of traffic (a temporal spike) temporarily dominates the cache space.