Cloudflare is Down, Again!
Cloudflare went down again. If you were on the internet that day, you probably noticed. Half the sites you tried to visit threw 500 errors, and the other half loaded like it was 2005.
What gets me about these outages is the blast radius. Cloudflare sits in front of so many services that when it goes down, it takes a chunk of the internet with it. We're talking millions of domains behind one provider's control plane. That's a concentration risk most teams don't think about until it's too late.
From an SRE perspective, this is the classic single-point-of-failure problem at internet scale. Your app can have five nines of uptime internally, but if your CDN/DNS provider has a bad day, none of that matters. Your users see a broken page.
The uncomfortable question: should you run multi-CDN? It's expensive, operationally complex, and most teams decide the risk doesn't justify the cost. Until it does. There's no clean answer here, just trade-offs.