We had a critical service that often got overwhelmed, not by one client app but by different apps over time. One week it was app A, the next week app B, each with its own buggy code suddenly spamming the service.
The quick fix suggested was caching, since a lot of requests were for the same query. But after debating, we went with rate limiting instead. Our reasoning: caching would just hide the bad behavior and keep the broken clients alive, only for them to cause failures in other downstream systems later. By rate limiting, we stopped abusive patterns across all apps and forced bugs to surface. In fact, we discovered multiple issues in different apps this way.
Takeaway: caching is good, but it is not a replacement for fixing buggy code or misuse. Sometimes the better fix is to protect the service and let the bugs show up where they belong.
It's funny how I encountered a problem which went exactly the opposite way! We initially introduced a rate limiter that was adequate for the time, but with the product scaling up it stopped being adequate, and any failures with 429 were either ignored, or closed as client bugs. Only after some time we realized that the rate of requests scaled up approximately with the rate of product growth, and a quick fix was to simply remove the limiter, but after a couple of times when DB decided to take a nap after being overwhelmed, we added a caching layer.
Just goes to show that there is no silver bullet - context, experience and good amount of gut feeling is paramount.
Something that was drilled into me early in my career was that you cannot expect your cache to be up 100% of the time. The logical extension of that is your main DB needs to be able to handle 100% of your traffic at a moment’s notice. Not only has this kind of thinking saved my ass on several occasions, but it’s also actually kept my code much cleaner. I don’t want to say rate limiters and circuit breakers are the mark of bad engineering, butttt they’re usually just good engineering deferred.
Reminds me of gas plumbing, the indoor lines are only a few psi above ambient, but the lines themselves have to take line pressure to 300psi is case the regulator fails. It's good advice!
There are times when a cache is appropriate, but I often find that it's more appropriate for the cache to be on the side of whoever is making all the requests. This isn't applicable when that is e.g. millions of different clients all making their own requests, but rather when we're talking about one internal service putting heavy load on another one.
The team with the demanding service can add a cache that's appropriate for their needs, and will be motivated to do so in order to avoid hitting the rate limit (or reduce costs, which should be attributed to them).
You cannot trust your clients. Period. It doesn’t matter if they’re internal or external. If you design (and test!) with this assumption in mind, you’ll never have a bad day. I’ve really never understood why teams and companies have taken this defensive stance that their service is being “abused” despite having nothing even resembling an SLA. It seemed pretty inexcusable to not have a horizontally scaling service back in 2010 when I first started interning at tech companies, and I’m really confused why this is still an issue today.
I fully agree. The rate limits are how you control the behaviour of the clients. My suggestion of leaving caching to the clients, which they may want to do in order to avoid hitting the rate limit.
>why teams and companies have taken this defensive stance that their service is being “abused” despite having nothing even resembling an SLA.
I mean because bad code on a fast client system can cause a load higher than all other users put together. This is why half the internet is behind something like cloudflare these days. Limiting, blocking, and banning has to be baked in.
You can never trust clients to behave. If your goal is to reduce infra cost, sure, rate limiting is an acceptable answer. But is it really that hard to throw on a cache and provision your service to be horizontally scalable?
Scaling matters, but why pay for abusive clients or bots? Adding a cache is easy; the hard part is invalidation, sync, and thundering herd. Use it if the product needs it, not as a band-aid.
Excellent read. It highlights key aspects like health checks, server restarts, warm up, and load shedding, all of which make load balancing an already hard problem even harder.
Willy from HAProxy has a good write-up on this. In their benchmarks, least-connections usually beat P2C, but P2C was never the worst and is arguably a saner default when least-connections isn’t available.
The article link: https://www.haproxy.com/blog/power-of-two-load-balancing
Author here. Two quick thoughts:
1. As I covered in an earlier part of this series, service discovery is not always easy at scale. High churn, partial failures, and the cost of health checks can make it tricky to get right.
2. Using server-side metrics for load balancing is a great idea. In many setups, feedback is embedded in response headers or health check responses so the LB can make more informed routing decisions. Hodor at LinkedIn is a good example of this in practice:
https://www.linkedin.com/blog/engineering/data-management/ho...
I was thinking something along the lines of a “map” with all the backends and their capabilities that would be recomputed every N seconds and atomically switched with the previous one. The LB woukd then be able to decide where to send a request and also have a precomputed backup option in case the first choice would become unavailable. You could also use those metrics to signal that a node needs to be drained of traffic for example, so no more new connections towards it.
I understand the complexities of having a large set of distributed services behind load balancers, I just think there could be a better way of choosing a backend based not only on least requests, TTFB and an OK response from a health check every N seconds.
Author here. Absolutely, HAProxy’s sticktables is a powerful way to implement advanced routing logic, and they’ve been around for years. This series focuses on explaining the broader concepts and tradeoffs rather than diving deep into any single implementation, and since it also covers other aspects of reverse proxies, the focus on load balancing here is mostly to present the challenges and high-level ideas.
Glad you found it a good effort, and I agree there’s room to go deeper in future posts.
algorithms is pretty hard as a spelling: its derived from something like Al Gorism - the name of an Arab chap who documented an early notion. By the time English has decided to create a word, you can be sure it will be ... painful!
Keep going mate, you have a great writing style and presentation.
Author here. Thanks for sharing these thoughts. You’re right that DSR, ASN-based routing, SRV records, and other lower-layer approaches are important in certain setups.
This post is focused primarily on Layer 7 load balancing, connection and request routing based on application-level information, so it doesn’t go into Layer 3/4 techniques like DSR or network-level optimizations. Those are certainly worth covering in a broader series that spans the full stack.
Thanks for sharing this! I’m the author of the blog post.
Happy to answer any questions about the scenarios in the article or dive deeper into specifics like slow start tuning, consistent hashing trade-offs, or how different proxy architectures handle dynamic backends.
Always curious to hear how others have tackled these issues in production.
The quick fix suggested was caching, since a lot of requests were for the same query. But after debating, we went with rate limiting instead. Our reasoning: caching would just hide the bad behavior and keep the broken clients alive, only for them to cause failures in other downstream systems later. By rate limiting, we stopped abusive patterns across all apps and forced bugs to surface. In fact, we discovered multiple issues in different apps this way.
Takeaway: caching is good, but it is not a replacement for fixing buggy code or misuse. Sometimes the better fix is to protect the service and let the bugs show up where they belong.