Skip to main content

Cloudflare 1.1.1.1 Outage Report (July 14, 2025): Global DNS Disruption Root Cause Analysis

 

Cloudflare logo with colorful '1.1.1.1' text above the slogan 'The free app that makes your Internet safer' on a white background."

Key takeaways

  • Global DNS outage: Cloudflare's 1.1.1.1 resolver failed worldwide for 62 minutes on July 14, 2025, due to a configuration error in their service topology .
  • Root cause: A dormant misconfiguration from June 6 linked 1.1.1.1 to a non-production service. When activated, it withdrew critical IP prefixes globally .
  • Traffic impact: UDP/TCP/DoT queries dropped sharply, but DNS-over-HTTPS (DoH) via cloudflare-dns.com stayed stable thanks to separate IPs .
  • Unrelated hijack: Tata Communications (AS4755) advertised 1.1.1.0/24 during the outage, worsening routing issues for some users .
  • Resolution: Cloudflare restored services by 22:54 UTC after reverting configurations and manually re-announcing routes .

Why 1.1.1.1 matters for the internet

You might not think much about DNS resolvers, but they’re like the phonebooks of the internet. Cloudflare’s 1.1.1.1 launched back in 2018 as a faster, privacy-focused alternative to ISP-provided DNS. It quickly became one of the most used resolvers globally, handling billions of queries daily. The service uses anycast routing to direct traffic to the nearest data center, which usually means quick responses and reliability. But on July 14, that same design amplified a failure across every continent. For users relying solely on 1.1.1.1, the internet basically stopped working—websites wouldn’t load, apps froze, and confusion spread. Alot of folks didn’t realize how dependent they’d become on this single service until it vanished .


Timeline of the outage: When everything went dark

Here’s how the incident unfolded, minute by minute :

  • 21:48 UTC: A config change for Cloudflare’s Data Localization Suite (DLS) triggered a global refresh. This activated the dormant error from June 6.
  • 21:52 UTC: 1.1.1.1 prefixes began withdrawing from BGP tables. DNS traffic plummeted within minutes.
  • 21:54 UTC: Tata Communications (AS4755) started advertising 1.1.1.0/24—an unrelated hijack now visible due to Cloudflare’s withdrawal.
  • 22:01 UTC: Internal alerts fired. Incident declared.
  • 22:20 UTC: Fix deployed after reverting configurations.
  • 22:54 UTC: Full service restoration after routes stabilized.

Table: Affected IP ranges during the outage

Table displaying IP prefixes. IPv4 column lists four prefixes; IPv6 column lists corresponding prefixes, with one missing entry. Green header separates columns.

This 62-minute disruption showed how a small config error can cascade into global chaos. Engineers initially missed the June 6 mistake because it didn’t cause immediate problems—no alerts, no complaints. But when that second change hit, it all unraveled fast .


Technical breakdown: What actually broke

The core issue was a service topology misconfiguration. Cloudflare uses internal systems to map which IPs should be advertised where, especially for services like their Data Localization Suite (DLS) that restrict traffic to specific regions. On June 6, a config update accidentally tied 1.1.1.1’s prefixes to a non-production DLS service. Since that service wasn’t live yet, no one noticed .

Then, on July 14, an engineer attached a test location to that same DLS service. This triggered a global refresh of routing policies. Because of the earlier error, 1.1.1.1’s topology got reduced to one offline data center. Routers worldwide immediately withdrew announcements for its IP ranges. Traffic couldn’t reach Cloudflare’s DNS servers at all.

The legacy system managing these topologies lacked safeguards like canary deployments or staged rollouts. A peer-reviewed change still went global in one shot—no gradual testing, no kill switches. Cloudflare’s newer topology system avoids hardcoded IP lists, but migrating between systems created fragility. They’ve since acknowledged this "error-prone" approach needs retiring .


Why detection took 9 minutes: Monitoring gaps

Cloudflare’s internal alerts didn’t fire until 22:01 UTC—9 minutes after traffic nosedived. Why the delay? A few reasons stand out:

  1. No immediate metric drops: The BGP withdrawal caused routing failure, not server crashes. Queries didn’t fail; they never arrived. Monitoring systems tuned for server errors missed this.
  2. Alert thresholds: Teams avoid overly sensitive alerts to prevent false alarms. As one Hacker News comment noted, operators often wait 5+ minutes before escalating to avoid "alert fatigue" .
  3. Legacy dependencies: Health checks relied on systems that themselves needed DNS resolution, creating blind spots during outages.

This lag highlights a tricky balance: catching failures fast without drowning teams in noise. Cloudflare’s post-mortem implies tighter BGP monitoring might help, but they haven’t detailed specific fixes yet .


The BGP hijack that wasn’t: Tata’s role

As Cloudflare’s routes vanished, something weird happened: Tata Communications (AS4755) started advertising 1.1.1.0/24. ThousandEyes observed this hijack propagating through some networks, worsening connectivity for users whose queries got routed to Tata .

Crucially, this wasn’t malicious. Tata likely advertised 1.1.1.0/24 due to old internal configurations—that prefix was used for testing long before Cloudflare claimed it. Once Cloudflare re-announced their routes, Tata withdrew the hijacked prefix. But for ~25 minutes, it added chaos. This incident underscores how fragile BGP can be when major routes vanish unexpectedly .


Impact analysis: Who felt the outage?

The outage hit hardest for users and apps relying exclusively on 1.1.1.1. But patterns emerged in the data :

  • Protocol differences:
    • UDP/TCP/DoT traffic dropped ~90% (these use IPs like 1.1.1.1 directly).
    • DoH (DNS-over-HTTPS) via cloudflare-dns.com stayed near normal. Its IPs weren’t tied to the faulty topology.
  • Backup resolver users: People using 1.1.1.1 with 1.0.0.1 or third-party DNS (e.g., 8.8.8.8) saw minimal disruption. Failovers kicked in.
  • Regional variances: Reports spiked in North America, Europe, and Asia. Cloudflare Radar confirmed global impact.

Table: Traffic recovery post-fix

"Traffic Restoration Timeline table shows three events from 22:20 to 22:54 UTC. Traffic restoration levels progress from 40% to 98% restored."

Ironically, the outage proved Cloudflare’s DoH resilience. By decoupling DNS from raw IPs, it avoided single points of failure. As one user noted, "DoH was working" when traditional DNS failed .


Lessons for the internet’s infrastructure

This outage wasn’t a cyberattack or hardware failure—it was process and system design flaws. Key takeaways for engineers :

  1. Staged rollouts save lives: Had Cloudflare used canary deployments for config changes, they’d have caught the error in one region first. Their new topology system supports this, but legacy tech didn’t.
  2. Validate dormant configs: "No impact" isn’t "safe." Systems must flag unused configurations that could activate later.
  3. Enforce resolver redundancy: Clients should always use multiple DNS resolvers (e.g., 1.1.1.1 + 8.8.8.8). Single-provider setups risk total outages.
  4. Monitor routing layer: Services need BGP/advertisement visibility, not just server health.

Cloudflare’s pledged to accelerate retiring legacy systems. But as they noted, "This was a humbling event." For the rest of us, it’s a reminder: even giants stumble, and backups matter .


FAQs about the Cloudflare 1.1.1.1 outage

Q: Could using 1.0.0.1 as a backup have helped?
A: Yes, but not completely. 1.0.0.1 shares infrastructure with 1.1.1.1, so both failed. Ideal backups use unrelated resolvers like Google’s 8.8.8.8 or Quad9 .

Q: Why did DNS-over-HTTPS (DoH) keep working?
A: DoH uses domain names (e.g., cloudflare-dns.com), not raw IPs. Those domains resolved via unaffected infrastructure. Always prefer DoH/DoT domains over IPs for resilience .

Q: Was this a BGP hijack?
A: Partially, but not by Cloudflare. Tata’s route advertisement was a side effect of Cloudflare’s withdrawal—not the cause. It amplified issues for some users though .

Q: How often does Cloudflare go down?
A: Rarely. In the last 30 days, 1.1.1.1 had 99.09% uptime vs. 99.99% for Google’s 8.8.8.8. This was an exception, not routine .

Q: Did the outage affect other Cloudflare services?
A: Mostly no. Core CDN, security, and dashboard services use different IPs and weren’t withdrawn. The 1.1.1.1 resolver was the primary casualty .

Comments

Popular posts from this blog

Block Stock Soars 10% on S&P 500 Entry, Replaces Hess Effective July 23, 2025

  Key Takeaways S&P 500 Entry : Block (formerly Square) joins the S&P 500 on  July 23, 2025 , replacing Hess after its acquisition by Chevron . Market Reaction : Block’s stock surged  >10%  post-announcement as funds rebalanced portfolios to include it . Challenges Persist : Despite the boost, Block’s 2025 performance remains  down 14%  YTD due to weak Q1 results and tariff-related macro concerns . Strategic Significance : Entry validates Block’s pivot to blockchain/fintech and accelerates crypto’s mainstream adoption . Next Catalyst : Q2 earnings on  August 7  will test whether S&P-driven demand offsets economic headwinds . The Big News: Block Is Joining the S&P 500 Come July 23rd, Block—y’know, the company behind Square and Cash App—steps into the S&P 500. They’re takin’ Hess’s spot, which is exitin’ after Chevron wrapped up that $54 billion buyout. Hess had some juicy oil assets down in Guyana, but Chevron finally closed ...

Scale AI Layoffs: 200 Employees Cut as Company Admits GenAI Over-Expansion

  Key Takeaways Scale AI cut 200 employees (14% of staff) and 500 contractors  weeks after Meta invested $14.3 billion for a 49% stake in the company . Founder Alexandr Wang left to lead Meta’s new AI division , prompting interim CEO Jason Droege to restructure teams citing "excessive bureaucracy" and over-hiring in generative AI . Major clients like Google and OpenAI reduced work with Scale AI  following the Meta deal, triggering revenue concerns . Restructuring consolidates 16 specialized teams into 5 core units  (code, languages, experts, experimental, audio) to prioritize enterprise and government contracts . The layoffs highlight industry-wide pressure  as AI firms face scrutiny over costs, productivity gains, and business sustainability . What Actually Went Down at Scale AI? Scale AI just laid off 200 full-time employees. That’s 14% of their workforce. Plus, they cut ties with 500 contractors globally. The news hit on July 16, 2025, barely a month after Me...

Sarepta Stock Plunges 40% as FDA Moves to Halt Gene Therapy Shipments

  Key Takeaways Sarepta Therapeutics stock plunged ~40% following a second patient death linked to its gene therapy Elevidys . FDA may pull Elevidys off the market as safety concerns mount; shipments halted for non-ambulatory patients . Therapy initially approved controversially in 2023 for ages 4-5, later expanded amid efficacy debates . Year-to-date stock loss exceeds 87%, erasing billions in market value . Duchenne muscular dystrophy patients face renewed uncertainty as treatment risks outweigh benefits for some . The Bloodbath on Nasdaq Sarepta Therapeutics stock cratered 40% in premarket trading June 16, 2025. It opened at $13.60—a far cry from its 52-week high of $150.48 . The collapse wasn't a surprise to those watching the ticker. Shares had been dying a slow death all year. By July, the year-to-date loss hit 87.5% . Shareholders stared at ruins. Trading volume exploded to 60 million shares. Average volume is 5.9 million . The market cap vaporized—$1.336 billion intraday. D...

Gen Z Stare Decoded: Viral TikTok Trend or Societal Mirror?

Key Takeaways The Gen Z stare manifests in two primary forms : a vacant expression from service workers during perceived unreasonable customer interactions ( customer service stare ) and from Gen Z customers instead of verbal responses ( customer stare ) . Pandemic isolation critically impacted social skill development : Reduced face-to-face interaction during formative years limited practice with conversational norms and non-verbal cues for many Gen Zers . It’s often misinterpreted as rudeness or disinterest : Older generations may perceive the stare as disrespectful, while Gen Z frequently views it as a legitimate response to inauthentic or inefficient interactions . Underlying factors include heightened anxiety and authenticity values : Gen Z prioritizes genuine communication and may reject performative politeness, while also experiencing higher rates of social anxiety . Workplaces are adapting training programs : Organizations recognize gaps in soft skills like interpersonal commun...

Uber Deploys 20,000 Lucid Robotaxis with Nuro Tech in 6-Year Deal

  Key Takeaways Uber commits to buying  20,000 Lucid Gravity SUVs  equipped with Nuro’s Level 4 autonomous tech over six years, starting in 2026 . $300 million investment  in Lucid anchors the deal, with additional “multi-hundred-million” funding for Nuro . First commercial launch targets a  major U.S. city in late 2026 , with prototype testing underway in Las Vegas . Lucid’s  450-mile range  and redundant systems enable high vehicle utilization, reducing downtime . Nuro pivots from delivery bots to AV licensing, securing its largest commercial deal to date . The $300 Million Handshake: Uber, Lucid, and Nuro Forge Robotaxi Future Uber just dropped a massive bet on driverless rides. Announced today, they’re pouring  $300 million directly into Lucid  and another undisclosed (but larger) chunk into self-driving startup Nuro. This ain't just cash though—it’s a binding promise to deploy  20,000+ electric robotaxis  over six years. These ...

Homebuilders Slash Prices at 3-Year High as Weak Demand, Affordability Issues Persist

Key Takeaways Price cuts surge : 38% of builders reduced prices in July 2025—the highest rate since tracking began in 2022, averaging 5% discounts . Affordability crisis : Mortgage rates near 7% and record-high prices push buyers to the sidelines, forcing builders to offer incentives like rate buydowns . Regional divergence : Builder confidence dropped sharply in the South/West but held steady in the Northeast/Midwest . Inventory imbalance : New home supply hit 9.8 months (double existing homes), pressuring builders to clear stock . Why Homebuilders are Slashing Prices Like Never Before So builder confidence’s been stuck below 50 for  15 straight months , yeah? That’s wild—means more than a year of pessimism. And even though it ticked up  1 point  in July, it’s still way down from 41 last year. What’s propping it up? Mostly that new budget bill giving tax breaks. But honestly? Mortgage rates just won’t budge from their annoying 7% perch . Buyer traffic tells the real stor...