www.thousandeyes.com Open in urlscan Pro
95.100.153.97  Public Scan

Submitted URL: http://www.thousandeyes.com/blog/facebook-outage-analysis?utm_source=marketo&utm_medium=email&utm_campaign=na_q4fy22_all_all...
Effective URL: https://www.thousandeyes.com/blog/facebook-outage-analysis?utm_source=marketo&utm_medium=email&utm_campaign=na_q4fy22_all_all...
Submission: On October 06 via api from SE — Scanned from DE

Form analysis 2 forms found in the DOM

<form id="mktoForm_1117" data-success-url="/blog/success" novalidate="novalidate" class="mktoForm mktoHasWidth mktoLayoutLeft" style="transition: opacity 0.6s ease 0s; opacity: 1;" __bizdiag="-1845637950" __biza="WJ__">
  <style type="text/css"></style>
  <div class="mktoFormRow">
    <div class="mktoFieldDescriptor mktoFormCol" style="margin-bottom: 5px;">
      <div class="mktoOffset" style="width: 5px;"></div>
      <div class="mktoFieldWrap mktoRequiredField"><label for="Email" id="LblEmail" class="mktoLabel mktoHasWidth" style="">
          <div class="mktoAsterix">*</div>
        </label>
        <div class="mktoGutter mktoHasWidth" style="width: 5px;"></div><input id="Email" name="Email" placeholder="Business Email" maxlength="255" aria-labelledby="LblEmail InstructEmail" type="email"
          class="mktoField mktoEmailField mktoHasWidth mktoRequired" aria-required="true" style=""><span id="InstructEmail" tabindex="-1" class="mktoInstruction"></span>
        <div class="mktoClear"></div>
      </div>
      <div class="mktoClear"></div>
    </div>
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow">
    <div class="mktoFieldDescriptor mktoFormCol" style="margin-bottom: 5px;">
      <div class="mktoOffset" style="width: 5px;"></div>
      <div class="mktoFieldWrap mktoRequiredField"><label for="Personal_Country__c" id="LblPersonal_Country__c" class="mktoLabel mktoHasWidth" style="">
          <div class="mktoAsterix">*</div>
        </label>
        <div class="mktoGutter mktoHasWidth" style="width: 5px;"></div><select id="Personal_Country__c" name="Personal_Country__c" aria-labelledby="LblPersonal_Country__c InstructPersonal_Country__c" class="mktoField mktoHasWidth mktoRequired"
          aria-required="true" style="">
          <option value="">Country</option>
          <option value="US">United States</option>
          <option value="AF">Afghanistan</option>
          <option value="AX">Åland Islands</option>
          <option value="AL">Albania</option>
          <option value="DZ">Algeria</option>
          <option value="AS">American Samoa</option>
          <option value="AD">Andorra</option>
          <option value="AO">Angola</option>
          <option value="AI">Anguilla</option>
          <option value="AQ">Antarctica</option>
          <option value="AG">Antigua and Barbuda</option>
          <option value="AR">Argentina</option>
          <option value="AM">Armenia</option>
          <option value="AW">Aruba</option>
          <option value="AU">Australia</option>
          <option value="AT">Austria</option>
          <option value="AZ">Azerbaijan</option>
          <option value="BS">Bahamas</option>
          <option value="BH">Bahrain</option>
          <option value="BD">Bangladesh</option>
          <option value="BB">Barbados</option>
          <option value="BY">Belarus</option>
          <option value="BE">Belgium</option>
          <option value="BZ">Belize</option>
          <option value="BJ">Benin</option>
          <option value="BM">Bermuda</option>
          <option value="BT">Bhutan</option>
          <option value="BO">Bolivia</option>
          <option value="BQ">Bonaire</option>
          <option value="BA">Bosnia and Herzegovina</option>
          <option value="BW">Botswana</option>
          <option value="BV">Bouvet Island</option>
          <option value="BR">Brazil</option>
          <option value="IO">British Indian Ocean Territory</option>
          <option value="BN">Brunei Darussalam</option>
          <option value="BG">Bulgaria</option>
          <option value="BF">Burkina Faso</option>
          <option value="BI">Burundi</option>
          <option value="CV">Cape Verde</option>
          <option value="KH">Cambodia</option>
          <option value="CM">Cameroon</option>
          <option value="CA">Canada</option>
          <option value="KY">Cayman Islands</option>
          <option value="CF">Central African Republic</option>
          <option value="TD">Chad</option>
          <option value="CL">Chile</option>
          <option value="CN">China</option>
          <option value="CX">Christmas Island</option>
          <option value="CC">Cocos (Keeling) Islands</option>
          <option value="CO">Colombia</option>
          <option value="KM">Comoros</option>
          <option value="CG">Congo</option>
          <option value="CD">Democratic Republic of the Congo</option>
          <option value="CK">Cook Islands</option>
          <option value="CR">Costa Rica</option>
          <option value="CI">Côte d'Ivoire</option>
          <option value="HR">Croatia</option>
          <option value="CU">Cuba</option>
          <option value="CW">Curaçao</option>
          <option value="CY">Cyprus</option>
          <option value="CZ">Czech Republic</option>
          <option value="DK">Denmark</option>
          <option value="DJ">Djibouti</option>
          <option value="DM">Dominica</option>
          <option value="DO">Dominican Republic</option>
          <option value="EC">Ecuador</option>
          <option value="EG">Egypt</option>
          <option value="SV">El Salvador</option>
          <option value="GQ">Equatorial Guinea</option>
          <option value="ER">Eritrea</option>
          <option value="EE">Estonia</option>
          <option value="ET">Ethiopia</option>
          <option value="FK">Falkland Islands</option>
          <option value="FO">Faroe Islands</option>
          <option value="FJ">Fiji</option>
          <option value="FI">Finland</option>
          <option value="FR">France</option>
          <option value="GF">French Guiana</option>
          <option value="PF">French Polynesia</option>
          <option value="TF">French Southern Territories</option>
          <option value="GA">Gabon</option>
          <option value="GM">Gambia</option>
          <option value="GE">Georgia</option>
          <option value="DE">Germany</option>
          <option value="GH">Ghana</option>
          <option value="GI">Gibraltar</option>
          <option value="GR">Greece</option>
          <option value="GL">Greenland</option>
          <option value="GD">Grenada</option>
          <option value="GP">Guadeloupe</option>
          <option value="GU">Guam</option>
          <option value="GT">Guatemala</option>
          <option value="GG">Guernsey</option>
          <option value="GN">Guinea</option>
          <option value="GW">Guinea-Bissau</option>
          <option value="GY">Guyana</option>
          <option value="HT">Haiti</option>
          <option value="HM">Heard Island and McDonald Islands</option>
          <option value="VA">Holy See (Vatican City State)</option>
          <option value="HN">Honduras</option>
          <option value="HK">Hong Kong</option>
          <option value="HU">Hungary</option>
          <option value="IS">Iceland</option>
          <option value="IN">India</option>
          <option value="ID">Indonesia</option>
          <option value="IR">Iran</option>
          <option value="IQ">Iraq</option>
          <option value="IE">Ireland</option>
          <option value="IM">Isle of Man</option>
          <option value="IL">Israel</option>
          <option value="IT">Italy</option>
          <option value="JM">Jamaica</option>
          <option value="JP">Japan</option>
          <option value="JE">Jersey</option>
          <option value="JO">Jordan</option>
          <option value="KZ">Kazakhstan</option>
          <option value="KE">Kenya</option>
          <option value="KI">Kiribati</option>
          <option value="KP">North Korea</option>
          <option value="KR">South Korea</option>
          <option value="KW">Kuwait</option>
          <option value="KG">Kyrgyzstan</option>
          <option value="LA">Laos</option>
          <option value="LV">Latvia</option>
          <option value="LB">Lebanon</option>
          <option value="LS">Lesotho</option>
          <option value="LR">Liberia</option>
          <option value="LY">Libya</option>
          <option value="LI">Liechtenstein</option>
          <option value="LT">Lithuania</option>
          <option value="LU">Luxembourg</option>
          <option value="MO">Macao</option>
          <option value="MK">Macedonia</option>
          <option value="MG">Madagascar</option>
          <option value="MW">Malawi</option>
          <option value="MY">Malaysia</option>
          <option value="MV">Maldives</option>
          <option value="ML">Mali</option>
          <option value="MT">Malta</option>
          <option value="MH">Marshall Islands</option>
          <option value="MQ">Martinique</option>
          <option value="MR">Mauritania</option>
          <option value="MU">Mauritius</option>
          <option value="YT">Mayotte</option>
          <option value="MX">Mexico</option>
          <option value="FM">Micronesia</option>
          <option value="MD">Moldova</option>
          <option value="MC">Monaco</option>
          <option value="MN">Mongolia</option>
          <option value="ME">Montenegro</option>
          <option value="MS">Montserrat</option>
          <option value="MA">Morocco</option>
          <option value="MZ">Mozambique</option>
          <option value="MM">Myanmar</option>
          <option value="NA">Namibia</option>
          <option value="NR">Nauru</option>
          <option value="NP">Nepal</option>
          <option value="NL">Netherlands</option>
          <option value="NC">New Caledonia</option>
          <option value="NZ">New Zealand</option>
          <option value="NI">Nicaragua</option>
          <option value="NE">Niger</option>
          <option value="NG">Nigeria</option>
          <option value="NU">Niue</option>
          <option value="NF">Norfolk Island</option>
          <option value="MP">Northern Mariana Islands</option>
          <option value="NO">Norway</option>
          <option value="OM">Oman</option>
          <option value="PK">Pakistan</option>
          <option value="PW">Palau</option>
          <option value="PS">Palestine</option>
          <option value="PA">Panama</option>
          <option value="PG">Papua New Guinea</option>
          <option value="PY">Paraguay</option>
          <option value="PE">Peru</option>
          <option value="PH">Philippines</option>
          <option value="PN">Pitcairn</option>
          <option value="PL">Poland</option>
          <option value="PT">Portugal</option>
          <option value="PR">Puerto Rico</option>
          <option value="QA">Qatar</option>
          <option value="RE">Réunion</option>
          <option value="RO">Romania</option>
          <option value="RU">Russian Federation</option>
          <option value="RW">Rwanda</option>
          <option value="BL">Saint Barthélemy</option>
          <option value="SH">Saint Helena, Ascension and Tristan da Cunha</option>
          <option value="KN">Saint Kitts and Nevis</option>
          <option value="LC">Saint Lucia</option>
          <option value="MF">Saint Martin (French part)</option>
          <option value="PM">Saint Pierre and Miquelon</option>
          <option value="VC">Saint Vincent and the Grenadines</option>
          <option value="WS">Samoa</option>
          <option value="SM">San Marino</option>
          <option value="ST">Sao Tome and Principe</option>
          <option value="SA">Saudi Arabia</option>
          <option value="SN">Senegal</option>
          <option value="RS">Serbia</option>
          <option value="SC">Seychelles</option>
          <option value="SL">Sierra Leone</option>
          <option value="SG">Singapore</option>
          <option value="SX">Sint Maarten</option>
          <option value="SK">Slovakia</option>
          <option value="SI">Slovenia</option>
          <option value="SB">Solomon Islands</option>
          <option value="SO">Somalia</option>
          <option value="ZA">South Africa</option>
          <option value="GS">South Georgia and the South Sandwich Islands</option>
          <option value="SS">South Sudan</option>
          <option value="ES">Spain</option>
          <option value="LK">Sri Lanka</option>
          <option value="SD">Sudan</option>
          <option value="SR">Suriname</option>
          <option value="SJ">Svalbard and Jan Mayen</option>
          <option value="SZ">Swaziland</option>
          <option value="SE">Sweden</option>
          <option value="CH">Switzerland</option>
          <option value="SY">Syrian Arab Republic</option>
          <option value="TW">Taiwan</option>
          <option value="TJ">Tajikistan</option>
          <option value="TZ">Tanzania</option>
          <option value="TH">Thailand</option>
          <option value="TL">Timor-Leste</option>
          <option value="TG">Togo</option>
          <option value="TK">Tokelau</option>
          <option value="TO">Tonga</option>
          <option value="TT">Trinidad and Tobago</option>
          <option value="TN">Tunisia</option>
          <option value="TR">Turkey</option>
          <option value="TM">Turkmenistan</option>
          <option value="TC">Turks and Caicos Islands</option>
          <option value="TV">Tuvalu</option>
          <option value="UG">Uganda</option>
          <option value="UA">Ukraine</option>
          <option value="AE">United Arab Emirates</option>
          <option value="GB">United Kingdom</option>
          <option value="UM">United States Minor Outlying Islands</option>
          <option value="UY">Uruguay</option>
          <option value="UZ">Uzbekistan</option>
          <option value="VU">Vanuatu</option>
          <option value="VE">Venezuela</option>
          <option value="VN">Viet Nam</option>
          <option value="VG">Virgin Islands, British</option>
          <option value="VI">Virgin Islands, U.S.</option>
          <option value="WF">Wallis and Futuna</option>
          <option value="EH">Western Sahara</option>
          <option value="YE">Yemen</option>
          <option value="ZM">Zambia</option>
          <option value="ZW">Zimbabwe</option>
        </select><span id="InstructPersonal_Country__c" tabindex="-1" class="mktoInstruction"></span>
        <div class="mktoClear"></div>
      </div>
      <div class="mktoClear"></div>
    </div>
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow">
    <div class="mktoFieldDescriptor mktoFormCol" style="margin-bottom: 5px;">
      <div class="mktoOffset" style="width: 5px;"></div>
      <div class="mktoFieldWrap mktoRequiredField"><label for="legalBasisforProcessingWeb" id="LbllegalBasisforProcessingWeb" class="mktoLabel mktoHasWidth" style="">
          <div class="mktoAsterix">*</div>By submitting this form, I agree to the processing of my personal data by ThousandEyes as described in the
          <a href="https://www.thousandeyes.com/trust/privacy" target="_blank" class="mchNoDecorate" id="">Privacy Statement</a>. I also agree to receive marketing communications regarding ThousandEyes research, products, educational materials and
          community events. I can <a href="https://www.thousandeyes.com/email-subscriptions" target="_blank" id="">unsubscribe</a> anytime.
        </label>
        <div class="mktoGutter mktoHasWidth" style="width: 5px;"></div>
        <div class="mktoLogicalField mktoCheckboxList mktoHasWidth mktoRequired" style="width: 150px;"><input name="legalBasisforProcessingWeb" id="legalBasisforProcessingWeb" type="checkbox" value="yes" aria-required="true"
            aria-labelledby="LbllegalBasisforProcessingWeb InstructlegalBasisforProcessingWeb" class="mktoField"><label for="legalBasisforProcessingWeb" id="LbllegalBasisforProcessingWeb"></label></div><span id="InstructlegalBasisforProcessingWeb"
          tabindex="-1" class="mktoInstruction"></span>
        <div class="mktoClear"></div>
      </div>
      <div class="mktoClear"></div>
    </div>
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="Direct_Marketing_Opt_In__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="1" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="conversionURL" class="mktoField mktoFieldDescriptor mktoFormCol"
      value="https://www.thousandeyes.com/blog/facebook-outage-analysis?utm_source=marketo&amp;utm_medium=email&amp;utm_campaign=na_q4fy22_all_all_facebookoutageanalysis_email" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="UTM_Source__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="marketo" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="UTM_Medium__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="email" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="UTM_Campaign__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="na_q4fy22_all_all_facebookoutageanalysis_email" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="UTM_Term__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="organic" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="UTM_Source_Persistent__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="organic" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="UTM_Medium_Persistent__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="organic" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="UTM_Campaign_Persistent__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="organic" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="UTM_Term_Persistent__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="organic" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="Last_Lead_Source__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="Blog" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="Last_Lead_Source_Detail__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="Blog" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoFormRow"><input type="hidden" name="GCLID__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="" style="margin-bottom: 5px;">
    <div class="mktoClear"></div>
  </div>
  <div class="mktoButtonRow"><span class="mktoButtonWrap mktoNative" style=""><button type="submit" class="mktoButton">Subscribe</button></span></div><input type="hidden" name="formid" class="mktoField mktoFieldDescriptor" value="1117"><input
    type="hidden" name="munchkinId" class="mktoField mktoFieldDescriptor" value="772-KGG-249">
</form>

<form data-success-url="/blog/success" novalidate="novalidate" class="mktoForm mktoHasWidth mktoLayoutLeft"
  style="font-family: inherit; font-size: 13px; color: rgb(51, 51, 51); visibility: hidden; position: absolute; top: -500px; left: -1000px; width: 1600px;" __bizdiag="-1811765295" __biza="WJ__"></form>

Text Content

ThousandEyes is part of Cisco   Learn More →
About Cisco
New Blog
Facebook Outage Analysis — October 4, 2021
Read the Blog →
×

Product

Platform Overview
Digital Experience Monitoring
Browser Synthetics
Internet & WAN
End User Monitoring
Internet Insights™
Global Vantage Points
Pricing
Solutions + Industries

Customer Digital Experience
Enterprise Digital Experience
Industries

Carriers and Hosting
Consumer Web
Financial Services
Healthcare
Media and Entertainment
Public Sector
Retail
Learn

Blog
Internet Outages Map
The Internet Report Show
Outage Analyses
Resources
Webinars
About

About Us
Partners
Careers
Newsroom
Contact Us
Free Trial
Language

English (English)
Japanese (日本語)
German (Deutsch)
French (Français)
Login
Request Demo
subscribe
 * Product
   Digital Experience Monitoring
   Browser Synthetics
   Monitor and optimize web application performance with network-aware
   synthetics
   End User Monitoring
   Assure network performance and web app experience from employee devices
   Internet & WAN
   Gain insights into every network you rely on—from the edge, to the Internet
   and cloud
   Internet Insights™
   Leverage collective intelligence to understand how service provider outages
   impact your digital service
    * ThousandEyes Platform
    * Global Vantage Points
    * Solution Comparison
    * Pricing
   
   ThousandEyes Overview
   What is ThousandEyes?
   Watch Video
 * Solutions
   Use Case
   Enterprise Digital Experience
   Deliver Uncompromised User Experience From Application to WAN to Remote
   Workspace
   Customer Digital Experience
   Deliver Your Applications and Services Without Any Disruptions
   Industries
   Carriers & Hosting
   Consumer Web
   Financial Services
   Healthcare
   Media & Entertainment
   Public Sector
   Retail
   FEATURED BRIEF
   Campus Service Assurance with Cisco Catalyst 9000 Switches
   READ NOW
 * Learn
   Blog
   Musings on all things Internet and Cloud Intelligence
   The Internet Report Show
   Learn what's working, and what's breaking on the Internet in this weekly
   video podcast
   Internet Outages Map
   Real-time map of global Internet health
   Outage Analyses
   Read the latest outage analyses
   Research
   Your hub for data-driven insights into the state of Cloud, SaaS and the
   Internet
   Resource Center
   Browse through our library of White Papers, Case Studies, eBooks,
   Infographics, Webinars and more to learn more about ThousandEyes and Digital
   Experience Monitoring.
    * Industry Events
    * Learning Center
    * Webinars
   
   FEATURED WEBINAR
   The Future of Network and Application Visibility
   WATCH NOW
 * About
   About Us
   
   Newsroom
   
   Careers
   
   Partners
   
    * Contact Us

 * 
 * 
 * blog
 * Request Demo
 * subscribe subscribe
 * 


OUTAGE ANALYSES


FACEBOOK OUTAGE ANALYSIS

By Angelique Medina

| October 4, 2021 | 20 min read




SUMMARY

On October 4, 2021, Facebook experienced a prolonged outage preventing users
from around the globe from reaching its services. The following is an ongoing
analysis of the outage, updated periodically as we have more information to
share.

--------------------------------------------------------------------------------

[Oct 5, 10:45 am PT]

On October 4th, between approximately 15:40 UTC - 22:45 UTC, Facebook suffered
one of the largest outages on record for a major application provider in terms
of breadth and duration as Facebook, Instagram, and WhatsApp were offline and
unavailable globally for more than seven hours. While the DNS failures could
have caused the apps to go offline, Facebook’s large-scale BGP route withdrawals
precipitating the incident, along with other signals, point to issues that
impacted Facebook more broadly.

At a minimum, the unprecedented length of the outage should be seen as an
indication that the issue went beyond simply a DNS service outage. Something
significant occurred that not only took down their internal DNS service, but
also prevented a highly sophisticated network operations team supporting the
most highly trafficked site on the Internet from resolving the issue in short
order.

Facebook Engineering has published a blog sharing some details about the events
that unfolded, which you can read here. In this post, we’ll attempt to answer
some of the most common questions we’re getting by unpacking the outage from
multiple angles. Later today, we’ll be publishing an episode of the Internet
Report where we’ll cover not only what happened and its impact, but also the
precipitating event, takeaways, and lessons to be learned. 


FIRST OF ALL, WHY IS DNS IMPORTANT AND WHAT HAPPENED TO FACEBOOK’S INTERNAL DNS
SERVICE?

DNS is the first step in reaching any site on the Internet. Its failure would
prevent the reachability of a site, even if the site itself and the
infrastructure it was hosted on was available. In the case of Facebook, they
internally host their DNS nameservers, which store the authoritative records for
their domains. Facebook maintains four nameservers (each served by many physical
servers) — a, b, c, and d — as seen in figure 1.

Figure 1. Facebook’s four nameservers with IPv4 and IPv6 addresses

Each of those nameservers is covered by a different IP prefix, or Internet
“route” (more on that later), covering a range of IP addresses.

At approximately 15:40 UTC, Facebook’s service started to go offline, as users
were unable to resolve its domains to IP addresses through the DNS.

Figure 2. Access to facebook.com failing due to DNS errors

As this was happening, we could also see that queries through the DNS hierarchy
for the facebook.com A record were failing due to Facebook’s nameservers
becoming unreachable (see figure 3).

Figure 3. DNS trace test failing due to unresponsive nameservers

Now, a word on the DNS. The DNS is so critical to the reachability of sites and
web applications, that most major service providers don’t mess about with it.
For example, Amazon stores the authoritative DNS records for amazon.com not on
its own infrastructure (which, as one of the top public cloud providers, is
amongst the most heavily used in the world), but on two separate external DNS
services, Dyn (Oracle) and UltraDNS (Neustar) — even though Amazon AWS offers
its own DNS service. 

Figure 4. Amazon.com domain records hosted by third-party DNS services

Not only does Amazon use external services to host its records, it notably uses
two providers. Why is this notable? As a critical Internet infrastructure, DNS
has, notoriously, been targeted for attack by malicious actors, as in the case
of the massive DDoS attack on Dyn in 2016 or the route hijacking of Amazon’s DNS
service Route 53 in 2018. By hosting with two different providers, Amazon can
ensure that its site is reachable even if one of its providers were to be
unavailable for whatever reason.


WHY DIDN’T FACEBOOK MOVE THEIR DNS RECORDS TO AN EXTERNAL DNS SERVICE PROVIDER
AND GET THEIR SERVICES BACK ONLINE? 

Nameserver records, which are served by top level domain (TLD) servers (in this
case, com. TLD) can be long-lived records — which makes sense given that app and
site operators are not frequently moving their records around — unlike A and
AAAA records, which often change very frequently for major sites, as the DNS can
be used to balance traffic across application infrastructure and point users to
the optimal server for their best experience. In the case of Facebook, their
nameserver records have a two day shelf life (see figure 5), meaning that even
if they were to move their records to an external service, it could take up to
two days for some users to reach Facebook, as the original nameserver records
would continue to persist in the wilds of the Internet until they expire.


Figure 5. Facebook’s nameserver records have a 172800 second (48 hour) expiry

So moving to a secondary provider after the incident began wasn’t a practical
option for Facebook to resolve the issue. Better to focus on getting the service
back up. 


WHY DID FACEBOOK’S INTERNAL DNS SERVICE GO DOWN IN THE FIRST PLACE?

Like DNS, BGP is one of those scary acronyms that frequently comes up when any
major event goes down on the Internet. And like DNS, it is essential vocabulary
for Internet literacy. BGP is the way that traffic gets routed across the
Internet. You can think of it as a telephone chain. I tell Sally (my peer) where
to reach me. She in turn calls her friends and neighbors (her peers) and tells
them to call her if they want to reach me. They in turn call their contacts
(their peers) to tell them the same, and the chain continues until, in theory,
anyone who wants to reach me has some “path” to me through a chain of
connections — some may be long, some short.

Moments before the outage, at approximately 15:39 UTC, Facebook issued a series
of BGP route withdrawals covering hundreds of its prefixes — almost all
immediately reversed — that effectively removed its DNS nameservers from the
Internet. Depending on where Internet Service Providers sat on the Internet,
they would have seen these route changes almost immediately or up to ten minutes
later. 

While most of the withdrawn routes were readvertised, those covering its DNS
nameservers were not (with one exception). Prior to the outage, seven (IPv4)
prefixes covering its internal DNS service were actively advertised (see below):

129.134.0.0/17
129.134.30.0/23
129.134.30.0/24
129.134.31.0/24
185.89.218.0/23
185.89.218.0/24
185.89.219.0/24

The key routes above are the /24 ones, since those are more specific and would
have been preferred. The /23 prefixes are umbrella or “covering” prefixes for
the /24 prefixes. Finally, a /17 covers 129.134.30.0/23, 129.134.30.0/24, and
129.134.31.0/24 prefixes — a covering route for Facebook’s nameservers ‘a’ and
‘b’. All vanished from global routing tables on or about 15:39 UTC, with the
exception of the /17 (more on that later). 

To illustrate how this outage was experienced from the standpoint of ISPs and
transit providers, who route user traffic to Facebook, we took a snapshot of
Cogent’s routing table as it was before the outage at 12:00 UTC on October 4th
and during the outage at 16:00 UTC. Facebook had 309 prefixes advertised at
12:00 UTC and 259 prefixes at 16:00 UTC. Only the following prefixes were
“missing”:

129.134.25.0/24
129.134.26.0/24
129.134.27.0/24
129.134.28.0/24
129.134.29.0/24
129.134.30.0/23
129.134.30.0/24
129.134.31.0/24
129.134.65.0/24
129.134.66.0/24
129.134.67.0/24
129.134.68.0/24
129.134.69.0/24
129.134.70.0/24
129.134.71.0/24
129.134.72.0/24
129.134.73.0/24
129.134.74.0/24
129.134.75.0/24
129.134.76.0/24
129.134.79.0/24
157.240.207.0/24
185.89.218.0/23
185.89.218.0/24
185.89.219.0/24
2a03:2880:f0fc::/47
2a03:2880:f0fc::/48
2a03:2880:f0fd::/48
2a03:2880:f0ff::/48
2a03:2880:f1fc::/47
2a03:2880:f1fc::/48
2a03:2880:f1fd::/48
2a03:2880:f1ff::/48
2a03:2880:f2ff::/48
2a03:2880:ff08::/48
2a03:2880:ff09::/48
2a03:2880:ff0a::/48
2a03:2880:ff0b::/48
2a03:2880:ff0c::/48
2a03:2881:4000::/48
2a03:2881:4001::/48
2a03:2881:4002::/48
2a03:2881:4004::/48
2a03:2881:4006::/48
2a03:2881:4007::/48
2a03:2881:4009::/48
69.171.250.0/24

All of these prefixes covered Facebook nameservers, with the exception of the
last one. Figure 6 shows traffic destined for nameserver ‘c’ getting dropped by
the first Internet hop, as the service provider had no route in its routing
table to get the traffic to its destination. 

Figure 6. Traffic to Facebook nameserver dropped at first Internet hop

The /17 prefix covering 50 percent of Facebook’s DNS and was still advertised
and in service provider routing tables, but as seen in figure 8, all traffic
destined to Facebook nameserver ‘a’ via that route was dropped at Facebook’s
edge. 

Figure 7. DNS traffic dropped at Facebook network edge router

The reason why this advertised route failed could be because it wasn’t set up to
handle traffic to the DNS service (since a /23 and, more importantly, /24s were
actively used before the outage) — or it could indicate that there was an issue
in Facebook’s network, perhaps preventing traffic from routing internally.
Similar behavior was seen during a major outage within Google’s network in 2019.
In that incident, BGP advertisements continued to route traffic to their
network, but the traffic dropped at Google’s network edge because their internal
network was disabled and the border routers had no internal routes to send
traffic to destination servers.

Figure 8. All traffic dropped at Google’s network edge in 2019 incident

You can read our analysis of the Google outage here.

Finally, to provide a fuller picture of the state of Facebook’s network, let’s
look at the final prefix on the list of withdrawn routes, the 69.171.250.0/24,
which is one of the many prefixes for facebook.com. This route wasn’t withdrawn
in the same way that the DNS prefixes were. Figure 9 shows the impact of the
significant and continuous route flapping for that prefix throughout the outage,
effectively rendering that route unusable.

Figure 9. Continuous route flapping observed for 69.171.250.0/24 prefix 

The fact that this route instability was left in place for so long is perhaps an
indication that something beyond the DNS service was amiss. But before we get to
that, let’s take a detour down BGP lane.


SO WHY DID FACEBOOK WITHDRAW ROUTES TO ITS SERVICE IN THE FIRST PLACE?

While we don’t know the specific reason for the configuration update that
sparked this incident, route withdrawals and changes are not uncommon.

BGP isn’t just the way traffic gets routed across the Internet. It’s also a
powerful tool for network operators to shape the flow of traffic to their
services. BGP changes are a normal part of operations in the running of a highly
trafficked network. Reasons range from making changes to a service (for example,
routing traffic to a different prefix to perform maintenance on some part of the
service), traffic engineering to optimize performance for users, changing peers,
changing the nature of a peering relationship, and other operational activities.
Routes can also accidentally get withdrawn due to network configuration updates
gone wrong, router bugs, or changes meant for a single peer getting pushed out
broadly. 


WHY WAS FACEBOOK UNABLE TO RESTORE ROUTES TO ITS SERVICE FOR MORE THAN SEVEN
HOURS?

Go ahead, blame the network. Even if DNS was the domino that toppled it all, and
even if a rogue set of BGP withdrawals was the source of that toppling, like any
BGP route change, it can be changed again. Or can it? History tells us that the
longest lived and most damaging outages can most often be laid at the feet of
some issue with the control plane. Whether through human error or bug, if the
mechanism for network operators to control the network — to make changes to it —
is damaged or severed, that’s when things can go very very wrong. Take the
aforementioned Google outage. In that incident, which lasted about four hours, a
maintenance operation inadvertently took down all of the network controllers for
a region of Google’s network. Without the controllers, the network
infrastructure was effectively headless and unable to route traffic. Google
network engineers were unable to quickly bring the network back online because
their access to the network controllers depended on the very network that was
down. 

Lack of access to the network management system would certainly have prevented
Facebook from rolling back any faulty changes. Access could have been due to
some network change that was part of the original route withdrawals that
precipitated the outage, or it could have been due to a service dependency (for
example, if their internal DNS was a dependency for access to an authentication
service or other key system).

Regardless, even after DNS was restored (and shortly before it failed), we
observed connection issues to facebook.com. Connection issue post-incident could
be due to Facebook servers getting overwhelmed as they worked to build to full
capacity (see figure 10), or it could point to broad issues within Facebook’s
network. 

Figure 10. Facebook access issues persist beyond restoration of DNS service

Notably, connection issues were observed immediately before DNS went down (see
figure 11).

Figure 11. Receive errors observed globally just prior to DNS outage




WHY WERE THERE SO MANY OTHER NETWORK ISSUES REPORTED YESTERDAY, TOO? 

Even apart from the millions of users impacted, reports of issues with services
providers were rife during the outage. ISPs and transit providers would have
been impacted in a couple of ways. First, Facebook accounts for significant
amounts of Internet traffic volumes — and all the queries to its DNS servers
would have been dropped by providers, since they had no routes to that service.
At the same time, greater volumes of DNS queries (and, thus, network traffic)
would have been hitting both DNS providers (and ISPs) since the DNS is
inherently resilient, and when queries to one nameserver failed, DNS resolvers
would have tried the other nameservers — to no avail. What would ordinarily be a
single query, would have been quadrupled during the outage. Not to mention all
those browser refreshes generated by anxious users trying to reach the site.
Facebook’s CTO also reportedly alluded to the stress on its network
post-incident in an email to its employees.


WHEN DID THE INCIDENT END?

The DNS service started to come back online around 22:20 and by approximately
22:45 the incident was effectively over, with most users able to reach Facebook,
as seen in figure 12.



Figure 12. Most global users are able to reach Facebook service by 22:45 UTC


LESSONS LEARNED

Be sure to check back later today for more on this front. We’ll be releasing a
new episode of the Internet Report, where we’ll walk through what we’ve
discussed in this post, but also discuss some of the key takeaways and lessons
learned.

--------------------------------------------------------------------------------

[Oct 4, 3:15 pm PT] Facebook’s DNS service appeared to be fully restored by
approximately 21:30 UTC and Facebook.com is now reachable for most users.

Facebook.com coming back online.

[Oct 4, 12:15 pm PT] Facebook made BGP withdrawals near the time of the
incident, however, 2 prefixes covering two of their 4 DNS nameservers (a and b)
are still being advertised across the Internet. They are reachable on the
Internet but traffic is dropping at Facebook’s network edge.

The 2 DNS nameservers (a and b) are reachable because covering prefix
129.134.0.0/17 is still being advertised, but this advertisement may not have
been designed to support the nameserver service.

The 3 specific prefixes covering a and b nameservers before the incident were
129.134.30.0/23, 129.134.30.0/24, 129.134.31.0/24. The specific routes covering
all 4 nameservers (a-d) were withdrawn from the Internet at approximately 15:39
UTC.

Internet routes to Facebook nameservers ‘a’ and ‘b’ are active, but traffic is
dropped at Facebook edge.



No Internet routes exist for Facebook nameservers ‘c’ and ‘d’ so traffic is
dropped at first ISP router.

[Oct 4, 10:15 am PT] ThousandEyes tests can confirm that at 15:40 UTC on October
4, the Facebook application became unreachable due to DNS failure. Facebook’s
authoritative DNS nameservers became unreachable at that time. The issue is
still ongoing as of 17:02 UTC.

Facebook’s application globally unreachable due to DNS resolution failure.

Published On October 4, 2021

Angelique Medina

Director, Product Marketing

--------------------------------------------------------------------------------

Categories:
Outage Analyses
Tags:
dns dns outage domain name service (dns)
Share This!


--------------------------------------------------------------------------------

  Back to ThousandEyes Blog
sections
 * First of all, why is DNS important and what happened to Facebook’s internal
   DNS service?
 * Why didn’t Facebook move their DNS records to an external DNS service
   provider and get their services back online? 
 * Why did Facebook’s internal DNS service go down in the first place?
 * So why did Facebook withdraw routes to its service in the first place?
 * Why was Facebook unable to restore routes to its service for more than seven
   hours?
 * Why were there so many other network issues reported yesterday, too? 
 * When did the incident end?
 * Lessons Learned

STAY CONNECTED

SUBSCRIBE TO THE INTERNET AND CLOUD INTELLIGENCE BLOG!

Subscribe
Created with Lunacy

STAY CONNECTED


SUBSCRIBE TO THE INTERNET AND CLOUD INTELLIGENCE BLOG!

Subscribe
further reading
Outage Analyses: Akamai DNS Outage Analysis

read blog  


RELATED BLOGS

Outage Analyses

Akamai DNS Outage Analysis
Learn how the July 22nd Akamai DNS outage unfolded, why services experienced the
same outage differently, and three lessons you can take away from this incident.
By Angelique Medina | July 22, 2021 | 10 min read
Outage Analyses

Akamai Prolexic Routed Outage Analysis
Learn how the June 16th Akamai Prolexic Routed outage unfolded and why services
can experience the same outage differently, based on key differences in their
failover plans.
By Mike Hicks | June 24, 2021 | 11 min read
Outage Analyses

Inside the Fastly Outage: Analysis and Lessons Learned
Learn more about how the June 8, 2021 Fastly outage unfolded and how four
different websites experienced the outage very differently.
By Angelique Medina | June 10, 2021 | 18 min read
Load More 


Please enable JavaScript to view the comments powered by Disqus.
Language

--------------------------------------------------------------------------------

Product
 * Digital Experience Monitoring
 * Browser Synthetics
 * Internet & WAN
 * End User Monitoring
 * Internet Insights™
 * ThousandEyes Platform
 * Global Vantage Points
 * Pricing
 * Solution Comparison

Solutions
 * Alibaba Cloud Monitoring
 * AppDynamics
 * AWS Cloud Monitoring
 * BGP Routing
 * CDN Monitoring
 * Cisco Catalyst 9000
 * Cisco SD-WAN
 * Customer Digital Experience
 * DDoS Monitoring
 * DNS Monitoring
 * Dynamics 365 Monitoring
 * Enterprise Digital Experience
 * Google Cloud Monitoring
 * Hybrid WAN Monitoring
 * IaaS Monitoring
 * ISP Monitoring
 * Microsoft 365 Monitoring
 * Microsoft Azure Monitoring
 * Multi-cloud Monitoring
 * Network Device Monitoring
 * SaaS Monitoring
 * Salesforce Monitoring
 * SASE
 * SD-WAN Monitoring
 * Website Monitoring
 * WiFi and LAN Monitoring

Industries
 * Carriers & Hosting
 * Consumer Web
 * Financial Services
 * Healthcare
 * Industrial IoT (IIoT)
 * Media & Entertainment
 * Public Sector
 * Retail

Learn
 * Resource Center
 * The Internet Report
 * Research
 * Outage Analyses
 * Internet Outages Map
 * Blog
 * Webinars

About
 * About Us
 * Newsroom
 * Careers
 * Partners
 * Contact Us

Support
 * Abuse Report
 * Support Login
 * Product Login
 * API Reference
 * Trust
 * Documentation
 * Status

--------------------------------------------------------------------------------

USA Sales:  1 (800) 757-1353
201 Mission Street Suite 1700
San Francisco, CA USA 94105
Legal Resources | Sitemap | Terms of Use | Privacy Statement | Consent Manager  
© 2021 ThousandEyes, Inc. All rights reserved.


CONSENT MANAGER




 * YOUR PRIVACY


 * STRICTLY NECESSARY COOKIES


 * PERFORMANCE COOKIES


 * TARGETING COOKIES


 * FUNCTIONAL COOKIES


FUNCTIONAL COOKIES

Functional Cookies


These cookies enable the website to provide enhanced functionality and
personalization. They may be set by us or by third party providers whose
services we have added to our pages. If you do not allow these cookies then some
or all of these services may not function properly.


YOUR PRIVACY

When you visit any website, it may store or retrieve information on your
browser, mostly in the form of cookies. This information might be about you,
your preferences or your device and is mostly used to make the site work as you
expect it to. The information does not usually directly identify you, but it can
give you a more personalized web experience. Because we respect your right to
privacy, you can choose not to allow some types of cookies. Click on the
different category headings to find out more and change our default settings.
However, blocking some types of cookies may impact your experience of the site
and the services we are able to offer. For more information on the information
we collect and how we use it see the Website Privacy Statement.


STRICTLY NECESSARY COOKIES

Always Active

These cookies are necessary for the website to function and cannot be switched
off in our systems. They are usually only set in response to actions made by you
which amount to a request for services, such as setting your privacy
preferences, logging in or filling in forms. You can set your browser to block
or alert you about these cookies, but some parts of the site will not then work.
These cookies do not store any personally identifiable information.


PERFORMANCE COOKIES

Off Performance Cookies On


These cookies allow us to count visits and traffic sources so we can measure and
improve the performance of our site. They help us to know which pages are the
most and least popular and see how visitors move around the site. All
information these cookies collect is aggregated and therefore anonymous. If you
do not allow these cookies we will not know when you have visited our site, and
will not be able to monitor its performance.


TARGETING COOKIES

Off Targeting Cookies On


These cookies may be set through our site by our advertising partners. They may
be used by those companies to build a profile of your interests and show you
relevant adverts on other sites. They do not store directly personal
information, but are based on uniquely identifying your browser and internet
device. If you do not allow these cookies, you will experience less targeted
advertising.

Save Settings
Allow All

Your Privacy [`dialog closed`]

By continuing to use our website, you acknowledge the use of cookies. Privacy
Statement | Change Settings


 * English (English)
 * Japanese (日本語)
 * German (Deutsch)
 * French (Français)

×

SUBSCRIBE TO THE THOUSANDEYES BLOG

STAY CONNECTED WITH BLOG UPDATES AND OUTAGE REPORTS DELIVERED WHILE THEY'RE
STILL FRESH.

*




*

CountryUnited StatesAfghanistanÅland IslandsAlbaniaAlgeriaAmerican
SamoaAndorraAngolaAnguillaAntarcticaAntigua and
BarbudaArgentinaArmeniaArubaAustraliaAustriaAzerbaijanBahamasBahrainBangladeshBarbadosBelarusBelgiumBelizeBeninBermudaBhutanBoliviaBonaireBosnia
and HerzegovinaBotswanaBouvet IslandBrazilBritish Indian Ocean TerritoryBrunei
DarussalamBulgariaBurkina FasoBurundiCape VerdeCambodiaCameroonCanadaCayman
IslandsCentral African RepublicChadChileChinaChristmas IslandCocos (Keeling)
IslandsColombiaComorosCongoDemocratic Republic of the CongoCook IslandsCosta
RicaCôte d'IvoireCroatiaCubaCuraçaoCyprusCzech
RepublicDenmarkDjiboutiDominicaDominican RepublicEcuadorEgyptEl
SalvadorEquatorial GuineaEritreaEstoniaEthiopiaFalkland IslandsFaroe
IslandsFijiFinlandFranceFrench GuianaFrench PolynesiaFrench Southern
TerritoriesGabonGambiaGeorgiaGermanyGhanaGibraltarGreeceGreenlandGrenadaGuadeloupeGuamGuatemalaGuernseyGuineaGuinea-BissauGuyanaHaitiHeard
Island and McDonald IslandsHoly See (Vatican City State)HondurasHong
KongHungaryIcelandIndiaIndonesiaIranIraqIrelandIsle of
ManIsraelItalyJamaicaJapanJerseyJordanKazakhstanKenyaKiribatiNorth KoreaSouth
KoreaKuwaitKyrgyzstanLaosLatviaLebanonLesothoLiberiaLibyaLiechtensteinLithuaniaLuxembourgMacaoMacedoniaMadagascarMalawiMalaysiaMaldivesMaliMaltaMarshall
IslandsMartiniqueMauritaniaMauritiusMayotteMexicoMicronesiaMoldovaMonacoMongoliaMontenegroMontserratMoroccoMozambiqueMyanmarNamibiaNauruNepalNetherlandsNew
CaledoniaNew ZealandNicaraguaNigerNigeriaNiueNorfolk IslandNorthern Mariana
IslandsNorwayOmanPakistanPalauPalestinePanamaPapua New
GuineaParaguayPeruPhilippinesPitcairnPolandPortugalPuerto
RicoQatarRéunionRomaniaRussian FederationRwandaSaint BarthélemySaint Helena,
Ascension and Tristan da CunhaSaint Kitts and NevisSaint LuciaSaint Martin
(French part)Saint Pierre and MiquelonSaint Vincent and the GrenadinesSamoaSan
MarinoSao Tome and PrincipeSaudi ArabiaSenegalSerbiaSeychellesSierra
LeoneSingaporeSint MaartenSlovakiaSloveniaSolomon IslandsSomaliaSouth
AfricaSouth Georgia and the South Sandwich IslandsSouth SudanSpainSri
LankaSudanSurinameSvalbard and Jan MayenSwazilandSwedenSwitzerlandSyrian Arab
RepublicTaiwanTajikistanTanzaniaThailandTimor-LesteTogoTokelauTongaTrinidad and
TobagoTunisiaTurkeyTurkmenistanTurks and Caicos IslandsTuvaluUgandaUkraineUnited
Arab EmiratesUnited KingdomUnited States Minor Outlying
IslandsUruguayUzbekistanVanuatuVenezuelaViet NamVirgin Islands, BritishVirgin
Islands, U.S.Wallis and FutunaWestern SaharaYemenZambiaZimbabwe



*
By submitting this form, I agree to the processing of my personal data by
ThousandEyes as described in the Privacy Statement. I also agree to receive
marketing communications regarding ThousandEyes research, products, educational
materials and community events. I can unsubscribe anytime.


















Subscribe
Processing


UPGRADE YOUR BROWSER TO VIEW OUR WEBSITE PROPERLY.

Please download the latest version of Chrome, Firefox or Microsoft Edge.

More detail

×