www.thousandeyes.com
Open in
urlscan Pro
95.100.153.97
Public Scan
Submitted URL: http://www.thousandeyes.com/blog/facebook-outage-analysis?utm_source=marketo&utm_medium=email&utm_campaign=na_q4fy22_all_all...
Effective URL: https://www.thousandeyes.com/blog/facebook-outage-analysis?utm_source=marketo&utm_medium=email&utm_campaign=na_q4fy22_all_all...
Submission: On October 06 via api from SE — Scanned from DE
Effective URL: https://www.thousandeyes.com/blog/facebook-outage-analysis?utm_source=marketo&utm_medium=email&utm_campaign=na_q4fy22_all_all...
Submission: On October 06 via api from SE — Scanned from DE
Form analysis
2 forms found in the DOM<form id="mktoForm_1117" data-success-url="/blog/success" novalidate="novalidate" class="mktoForm mktoHasWidth mktoLayoutLeft" style="transition: opacity 0.6s ease 0s; opacity: 1;" __bizdiag="-1845637950" __biza="WJ__">
<style type="text/css"></style>
<div class="mktoFormRow">
<div class="mktoFieldDescriptor mktoFormCol" style="margin-bottom: 5px;">
<div class="mktoOffset" style="width: 5px;"></div>
<div class="mktoFieldWrap mktoRequiredField"><label for="Email" id="LblEmail" class="mktoLabel mktoHasWidth" style="">
<div class="mktoAsterix">*</div>
</label>
<div class="mktoGutter mktoHasWidth" style="width: 5px;"></div><input id="Email" name="Email" placeholder="Business Email" maxlength="255" aria-labelledby="LblEmail InstructEmail" type="email"
class="mktoField mktoEmailField mktoHasWidth mktoRequired" aria-required="true" style=""><span id="InstructEmail" tabindex="-1" class="mktoInstruction"></span>
<div class="mktoClear"></div>
</div>
<div class="mktoClear"></div>
</div>
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow">
<div class="mktoFieldDescriptor mktoFormCol" style="margin-bottom: 5px;">
<div class="mktoOffset" style="width: 5px;"></div>
<div class="mktoFieldWrap mktoRequiredField"><label for="Personal_Country__c" id="LblPersonal_Country__c" class="mktoLabel mktoHasWidth" style="">
<div class="mktoAsterix">*</div>
</label>
<div class="mktoGutter mktoHasWidth" style="width: 5px;"></div><select id="Personal_Country__c" name="Personal_Country__c" aria-labelledby="LblPersonal_Country__c InstructPersonal_Country__c" class="mktoField mktoHasWidth mktoRequired"
aria-required="true" style="">
<option value="">Country</option>
<option value="US">United States</option>
<option value="AF">Afghanistan</option>
<option value="AX">Åland Islands</option>
<option value="AL">Albania</option>
<option value="DZ">Algeria</option>
<option value="AS">American Samoa</option>
<option value="AD">Andorra</option>
<option value="AO">Angola</option>
<option value="AI">Anguilla</option>
<option value="AQ">Antarctica</option>
<option value="AG">Antigua and Barbuda</option>
<option value="AR">Argentina</option>
<option value="AM">Armenia</option>
<option value="AW">Aruba</option>
<option value="AU">Australia</option>
<option value="AT">Austria</option>
<option value="AZ">Azerbaijan</option>
<option value="BS">Bahamas</option>
<option value="BH">Bahrain</option>
<option value="BD">Bangladesh</option>
<option value="BB">Barbados</option>
<option value="BY">Belarus</option>
<option value="BE">Belgium</option>
<option value="BZ">Belize</option>
<option value="BJ">Benin</option>
<option value="BM">Bermuda</option>
<option value="BT">Bhutan</option>
<option value="BO">Bolivia</option>
<option value="BQ">Bonaire</option>
<option value="BA">Bosnia and Herzegovina</option>
<option value="BW">Botswana</option>
<option value="BV">Bouvet Island</option>
<option value="BR">Brazil</option>
<option value="IO">British Indian Ocean Territory</option>
<option value="BN">Brunei Darussalam</option>
<option value="BG">Bulgaria</option>
<option value="BF">Burkina Faso</option>
<option value="BI">Burundi</option>
<option value="CV">Cape Verde</option>
<option value="KH">Cambodia</option>
<option value="CM">Cameroon</option>
<option value="CA">Canada</option>
<option value="KY">Cayman Islands</option>
<option value="CF">Central African Republic</option>
<option value="TD">Chad</option>
<option value="CL">Chile</option>
<option value="CN">China</option>
<option value="CX">Christmas Island</option>
<option value="CC">Cocos (Keeling) Islands</option>
<option value="CO">Colombia</option>
<option value="KM">Comoros</option>
<option value="CG">Congo</option>
<option value="CD">Democratic Republic of the Congo</option>
<option value="CK">Cook Islands</option>
<option value="CR">Costa Rica</option>
<option value="CI">Côte d'Ivoire</option>
<option value="HR">Croatia</option>
<option value="CU">Cuba</option>
<option value="CW">Curaçao</option>
<option value="CY">Cyprus</option>
<option value="CZ">Czech Republic</option>
<option value="DK">Denmark</option>
<option value="DJ">Djibouti</option>
<option value="DM">Dominica</option>
<option value="DO">Dominican Republic</option>
<option value="EC">Ecuador</option>
<option value="EG">Egypt</option>
<option value="SV">El Salvador</option>
<option value="GQ">Equatorial Guinea</option>
<option value="ER">Eritrea</option>
<option value="EE">Estonia</option>
<option value="ET">Ethiopia</option>
<option value="FK">Falkland Islands</option>
<option value="FO">Faroe Islands</option>
<option value="FJ">Fiji</option>
<option value="FI">Finland</option>
<option value="FR">France</option>
<option value="GF">French Guiana</option>
<option value="PF">French Polynesia</option>
<option value="TF">French Southern Territories</option>
<option value="GA">Gabon</option>
<option value="GM">Gambia</option>
<option value="GE">Georgia</option>
<option value="DE">Germany</option>
<option value="GH">Ghana</option>
<option value="GI">Gibraltar</option>
<option value="GR">Greece</option>
<option value="GL">Greenland</option>
<option value="GD">Grenada</option>
<option value="GP">Guadeloupe</option>
<option value="GU">Guam</option>
<option value="GT">Guatemala</option>
<option value="GG">Guernsey</option>
<option value="GN">Guinea</option>
<option value="GW">Guinea-Bissau</option>
<option value="GY">Guyana</option>
<option value="HT">Haiti</option>
<option value="HM">Heard Island and McDonald Islands</option>
<option value="VA">Holy See (Vatican City State)</option>
<option value="HN">Honduras</option>
<option value="HK">Hong Kong</option>
<option value="HU">Hungary</option>
<option value="IS">Iceland</option>
<option value="IN">India</option>
<option value="ID">Indonesia</option>
<option value="IR">Iran</option>
<option value="IQ">Iraq</option>
<option value="IE">Ireland</option>
<option value="IM">Isle of Man</option>
<option value="IL">Israel</option>
<option value="IT">Italy</option>
<option value="JM">Jamaica</option>
<option value="JP">Japan</option>
<option value="JE">Jersey</option>
<option value="JO">Jordan</option>
<option value="KZ">Kazakhstan</option>
<option value="KE">Kenya</option>
<option value="KI">Kiribati</option>
<option value="KP">North Korea</option>
<option value="KR">South Korea</option>
<option value="KW">Kuwait</option>
<option value="KG">Kyrgyzstan</option>
<option value="LA">Laos</option>
<option value="LV">Latvia</option>
<option value="LB">Lebanon</option>
<option value="LS">Lesotho</option>
<option value="LR">Liberia</option>
<option value="LY">Libya</option>
<option value="LI">Liechtenstein</option>
<option value="LT">Lithuania</option>
<option value="LU">Luxembourg</option>
<option value="MO">Macao</option>
<option value="MK">Macedonia</option>
<option value="MG">Madagascar</option>
<option value="MW">Malawi</option>
<option value="MY">Malaysia</option>
<option value="MV">Maldives</option>
<option value="ML">Mali</option>
<option value="MT">Malta</option>
<option value="MH">Marshall Islands</option>
<option value="MQ">Martinique</option>
<option value="MR">Mauritania</option>
<option value="MU">Mauritius</option>
<option value="YT">Mayotte</option>
<option value="MX">Mexico</option>
<option value="FM">Micronesia</option>
<option value="MD">Moldova</option>
<option value="MC">Monaco</option>
<option value="MN">Mongolia</option>
<option value="ME">Montenegro</option>
<option value="MS">Montserrat</option>
<option value="MA">Morocco</option>
<option value="MZ">Mozambique</option>
<option value="MM">Myanmar</option>
<option value="NA">Namibia</option>
<option value="NR">Nauru</option>
<option value="NP">Nepal</option>
<option value="NL">Netherlands</option>
<option value="NC">New Caledonia</option>
<option value="NZ">New Zealand</option>
<option value="NI">Nicaragua</option>
<option value="NE">Niger</option>
<option value="NG">Nigeria</option>
<option value="NU">Niue</option>
<option value="NF">Norfolk Island</option>
<option value="MP">Northern Mariana Islands</option>
<option value="NO">Norway</option>
<option value="OM">Oman</option>
<option value="PK">Pakistan</option>
<option value="PW">Palau</option>
<option value="PS">Palestine</option>
<option value="PA">Panama</option>
<option value="PG">Papua New Guinea</option>
<option value="PY">Paraguay</option>
<option value="PE">Peru</option>
<option value="PH">Philippines</option>
<option value="PN">Pitcairn</option>
<option value="PL">Poland</option>
<option value="PT">Portugal</option>
<option value="PR">Puerto Rico</option>
<option value="QA">Qatar</option>
<option value="RE">Réunion</option>
<option value="RO">Romania</option>
<option value="RU">Russian Federation</option>
<option value="RW">Rwanda</option>
<option value="BL">Saint Barthélemy</option>
<option value="SH">Saint Helena, Ascension and Tristan da Cunha</option>
<option value="KN">Saint Kitts and Nevis</option>
<option value="LC">Saint Lucia</option>
<option value="MF">Saint Martin (French part)</option>
<option value="PM">Saint Pierre and Miquelon</option>
<option value="VC">Saint Vincent and the Grenadines</option>
<option value="WS">Samoa</option>
<option value="SM">San Marino</option>
<option value="ST">Sao Tome and Principe</option>
<option value="SA">Saudi Arabia</option>
<option value="SN">Senegal</option>
<option value="RS">Serbia</option>
<option value="SC">Seychelles</option>
<option value="SL">Sierra Leone</option>
<option value="SG">Singapore</option>
<option value="SX">Sint Maarten</option>
<option value="SK">Slovakia</option>
<option value="SI">Slovenia</option>
<option value="SB">Solomon Islands</option>
<option value="SO">Somalia</option>
<option value="ZA">South Africa</option>
<option value="GS">South Georgia and the South Sandwich Islands</option>
<option value="SS">South Sudan</option>
<option value="ES">Spain</option>
<option value="LK">Sri Lanka</option>
<option value="SD">Sudan</option>
<option value="SR">Suriname</option>
<option value="SJ">Svalbard and Jan Mayen</option>
<option value="SZ">Swaziland</option>
<option value="SE">Sweden</option>
<option value="CH">Switzerland</option>
<option value="SY">Syrian Arab Republic</option>
<option value="TW">Taiwan</option>
<option value="TJ">Tajikistan</option>
<option value="TZ">Tanzania</option>
<option value="TH">Thailand</option>
<option value="TL">Timor-Leste</option>
<option value="TG">Togo</option>
<option value="TK">Tokelau</option>
<option value="TO">Tonga</option>
<option value="TT">Trinidad and Tobago</option>
<option value="TN">Tunisia</option>
<option value="TR">Turkey</option>
<option value="TM">Turkmenistan</option>
<option value="TC">Turks and Caicos Islands</option>
<option value="TV">Tuvalu</option>
<option value="UG">Uganda</option>
<option value="UA">Ukraine</option>
<option value="AE">United Arab Emirates</option>
<option value="GB">United Kingdom</option>
<option value="UM">United States Minor Outlying Islands</option>
<option value="UY">Uruguay</option>
<option value="UZ">Uzbekistan</option>
<option value="VU">Vanuatu</option>
<option value="VE">Venezuela</option>
<option value="VN">Viet Nam</option>
<option value="VG">Virgin Islands, British</option>
<option value="VI">Virgin Islands, U.S.</option>
<option value="WF">Wallis and Futuna</option>
<option value="EH">Western Sahara</option>
<option value="YE">Yemen</option>
<option value="ZM">Zambia</option>
<option value="ZW">Zimbabwe</option>
</select><span id="InstructPersonal_Country__c" tabindex="-1" class="mktoInstruction"></span>
<div class="mktoClear"></div>
</div>
<div class="mktoClear"></div>
</div>
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow">
<div class="mktoFieldDescriptor mktoFormCol" style="margin-bottom: 5px;">
<div class="mktoOffset" style="width: 5px;"></div>
<div class="mktoFieldWrap mktoRequiredField"><label for="legalBasisforProcessingWeb" id="LbllegalBasisforProcessingWeb" class="mktoLabel mktoHasWidth" style="">
<div class="mktoAsterix">*</div>By submitting this form, I agree to the processing of my personal data by ThousandEyes as described in the
<a href="https://www.thousandeyes.com/trust/privacy" target="_blank" class="mchNoDecorate" id="">Privacy Statement</a>. I also agree to receive marketing communications regarding ThousandEyes research, products, educational materials and
community events. I can <a href="https://www.thousandeyes.com/email-subscriptions" target="_blank" id="">unsubscribe</a> anytime.
</label>
<div class="mktoGutter mktoHasWidth" style="width: 5px;"></div>
<div class="mktoLogicalField mktoCheckboxList mktoHasWidth mktoRequired" style="width: 150px;"><input name="legalBasisforProcessingWeb" id="legalBasisforProcessingWeb" type="checkbox" value="yes" aria-required="true"
aria-labelledby="LbllegalBasisforProcessingWeb InstructlegalBasisforProcessingWeb" class="mktoField"><label for="legalBasisforProcessingWeb" id="LbllegalBasisforProcessingWeb"></label></div><span id="InstructlegalBasisforProcessingWeb"
tabindex="-1" class="mktoInstruction"></span>
<div class="mktoClear"></div>
</div>
<div class="mktoClear"></div>
</div>
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="Direct_Marketing_Opt_In__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="1" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="conversionURL" class="mktoField mktoFieldDescriptor mktoFormCol"
value="https://www.thousandeyes.com/blog/facebook-outage-analysis?utm_source=marketo&utm_medium=email&utm_campaign=na_q4fy22_all_all_facebookoutageanalysis_email" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="UTM_Source__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="marketo" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="UTM_Medium__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="email" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="UTM_Campaign__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="na_q4fy22_all_all_facebookoutageanalysis_email" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="UTM_Term__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="organic" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="UTM_Source_Persistent__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="organic" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="UTM_Medium_Persistent__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="organic" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="UTM_Campaign_Persistent__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="organic" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="UTM_Term_Persistent__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="organic" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="Last_Lead_Source__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="Blog" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="Last_Lead_Source_Detail__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="Blog" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoFormRow"><input type="hidden" name="GCLID__c" class="mktoField mktoFieldDescriptor mktoFormCol" value="" style="margin-bottom: 5px;">
<div class="mktoClear"></div>
</div>
<div class="mktoButtonRow"><span class="mktoButtonWrap mktoNative" style=""><button type="submit" class="mktoButton">Subscribe</button></span></div><input type="hidden" name="formid" class="mktoField mktoFieldDescriptor" value="1117"><input
type="hidden" name="munchkinId" class="mktoField mktoFieldDescriptor" value="772-KGG-249">
</form>
<form data-success-url="/blog/success" novalidate="novalidate" class="mktoForm mktoHasWidth mktoLayoutLeft"
style="font-family: inherit; font-size: 13px; color: rgb(51, 51, 51); visibility: hidden; position: absolute; top: -500px; left: -1000px; width: 1600px;" __bizdiag="-1811765295" __biza="WJ__"></form>
Text Content
ThousandEyes is part of Cisco Learn More → About Cisco New Blog Facebook Outage Analysis — October 4, 2021 Read the Blog → × Product Platform Overview Digital Experience Monitoring Browser Synthetics Internet & WAN End User Monitoring Internet Insights™ Global Vantage Points Pricing Solutions + Industries Customer Digital Experience Enterprise Digital Experience Industries Carriers and Hosting Consumer Web Financial Services Healthcare Media and Entertainment Public Sector Retail Learn Blog Internet Outages Map The Internet Report Show Outage Analyses Resources Webinars About About Us Partners Careers Newsroom Contact Us Free Trial Language English (English) Japanese (日本語) German (Deutsch) French (Français) Login Request Demo subscribe * Product Digital Experience Monitoring Browser Synthetics Monitor and optimize web application performance with network-aware synthetics End User Monitoring Assure network performance and web app experience from employee devices Internet & WAN Gain insights into every network you rely on—from the edge, to the Internet and cloud Internet Insights™ Leverage collective intelligence to understand how service provider outages impact your digital service * ThousandEyes Platform * Global Vantage Points * Solution Comparison * Pricing ThousandEyes Overview What is ThousandEyes? Watch Video * Solutions Use Case Enterprise Digital Experience Deliver Uncompromised User Experience From Application to WAN to Remote Workspace Customer Digital Experience Deliver Your Applications and Services Without Any Disruptions Industries Carriers & Hosting Consumer Web Financial Services Healthcare Media & Entertainment Public Sector Retail FEATURED BRIEF Campus Service Assurance with Cisco Catalyst 9000 Switches READ NOW * Learn Blog Musings on all things Internet and Cloud Intelligence The Internet Report Show Learn what's working, and what's breaking on the Internet in this weekly video podcast Internet Outages Map Real-time map of global Internet health Outage Analyses Read the latest outage analyses Research Your hub for data-driven insights into the state of Cloud, SaaS and the Internet Resource Center Browse through our library of White Papers, Case Studies, eBooks, Infographics, Webinars and more to learn more about ThousandEyes and Digital Experience Monitoring. * Industry Events * Learning Center * Webinars FEATURED WEBINAR The Future of Network and Application Visibility WATCH NOW * About About Us Newsroom Careers Partners * Contact Us * * * blog * Request Demo * subscribe subscribe * OUTAGE ANALYSES FACEBOOK OUTAGE ANALYSIS By Angelique Medina | October 4, 2021 | 20 min read SUMMARY On October 4, 2021, Facebook experienced a prolonged outage preventing users from around the globe from reaching its services. The following is an ongoing analysis of the outage, updated periodically as we have more information to share. -------------------------------------------------------------------------------- [Oct 5, 10:45 am PT] On October 4th, between approximately 15:40 UTC - 22:45 UTC, Facebook suffered one of the largest outages on record for a major application provider in terms of breadth and duration as Facebook, Instagram, and WhatsApp were offline and unavailable globally for more than seven hours. While the DNS failures could have caused the apps to go offline, Facebook’s large-scale BGP route withdrawals precipitating the incident, along with other signals, point to issues that impacted Facebook more broadly. At a minimum, the unprecedented length of the outage should be seen as an indication that the issue went beyond simply a DNS service outage. Something significant occurred that not only took down their internal DNS service, but also prevented a highly sophisticated network operations team supporting the most highly trafficked site on the Internet from resolving the issue in short order. Facebook Engineering has published a blog sharing some details about the events that unfolded, which you can read here. In this post, we’ll attempt to answer some of the most common questions we’re getting by unpacking the outage from multiple angles. Later today, we’ll be publishing an episode of the Internet Report where we’ll cover not only what happened and its impact, but also the precipitating event, takeaways, and lessons to be learned. FIRST OF ALL, WHY IS DNS IMPORTANT AND WHAT HAPPENED TO FACEBOOK’S INTERNAL DNS SERVICE? DNS is the first step in reaching any site on the Internet. Its failure would prevent the reachability of a site, even if the site itself and the infrastructure it was hosted on was available. In the case of Facebook, they internally host their DNS nameservers, which store the authoritative records for their domains. Facebook maintains four nameservers (each served by many physical servers) — a, b, c, and d — as seen in figure 1. Figure 1. Facebook’s four nameservers with IPv4 and IPv6 addresses Each of those nameservers is covered by a different IP prefix, or Internet “route” (more on that later), covering a range of IP addresses. At approximately 15:40 UTC, Facebook’s service started to go offline, as users were unable to resolve its domains to IP addresses through the DNS. Figure 2. Access to facebook.com failing due to DNS errors As this was happening, we could also see that queries through the DNS hierarchy for the facebook.com A record were failing due to Facebook’s nameservers becoming unreachable (see figure 3). Figure 3. DNS trace test failing due to unresponsive nameservers Now, a word on the DNS. The DNS is so critical to the reachability of sites and web applications, that most major service providers don’t mess about with it. For example, Amazon stores the authoritative DNS records for amazon.com not on its own infrastructure (which, as one of the top public cloud providers, is amongst the most heavily used in the world), but on two separate external DNS services, Dyn (Oracle) and UltraDNS (Neustar) — even though Amazon AWS offers its own DNS service. Figure 4. Amazon.com domain records hosted by third-party DNS services Not only does Amazon use external services to host its records, it notably uses two providers. Why is this notable? As a critical Internet infrastructure, DNS has, notoriously, been targeted for attack by malicious actors, as in the case of the massive DDoS attack on Dyn in 2016 or the route hijacking of Amazon’s DNS service Route 53 in 2018. By hosting with two different providers, Amazon can ensure that its site is reachable even if one of its providers were to be unavailable for whatever reason. WHY DIDN’T FACEBOOK MOVE THEIR DNS RECORDS TO AN EXTERNAL DNS SERVICE PROVIDER AND GET THEIR SERVICES BACK ONLINE? Nameserver records, which are served by top level domain (TLD) servers (in this case, com. TLD) can be long-lived records — which makes sense given that app and site operators are not frequently moving their records around — unlike A and AAAA records, which often change very frequently for major sites, as the DNS can be used to balance traffic across application infrastructure and point users to the optimal server for their best experience. In the case of Facebook, their nameserver records have a two day shelf life (see figure 5), meaning that even if they were to move their records to an external service, it could take up to two days for some users to reach Facebook, as the original nameserver records would continue to persist in the wilds of the Internet until they expire. Figure 5. Facebook’s nameserver records have a 172800 second (48 hour) expiry So moving to a secondary provider after the incident began wasn’t a practical option for Facebook to resolve the issue. Better to focus on getting the service back up. WHY DID FACEBOOK’S INTERNAL DNS SERVICE GO DOWN IN THE FIRST PLACE? Like DNS, BGP is one of those scary acronyms that frequently comes up when any major event goes down on the Internet. And like DNS, it is essential vocabulary for Internet literacy. BGP is the way that traffic gets routed across the Internet. You can think of it as a telephone chain. I tell Sally (my peer) where to reach me. She in turn calls her friends and neighbors (her peers) and tells them to call her if they want to reach me. They in turn call their contacts (their peers) to tell them the same, and the chain continues until, in theory, anyone who wants to reach me has some “path” to me through a chain of connections — some may be long, some short. Moments before the outage, at approximately 15:39 UTC, Facebook issued a series of BGP route withdrawals covering hundreds of its prefixes — almost all immediately reversed — that effectively removed its DNS nameservers from the Internet. Depending on where Internet Service Providers sat on the Internet, they would have seen these route changes almost immediately or up to ten minutes later. While most of the withdrawn routes were readvertised, those covering its DNS nameservers were not (with one exception). Prior to the outage, seven (IPv4) prefixes covering its internal DNS service were actively advertised (see below): 129.134.0.0/17 129.134.30.0/23 129.134.30.0/24 129.134.31.0/24 185.89.218.0/23 185.89.218.0/24 185.89.219.0/24 The key routes above are the /24 ones, since those are more specific and would have been preferred. The /23 prefixes are umbrella or “covering” prefixes for the /24 prefixes. Finally, a /17 covers 129.134.30.0/23, 129.134.30.0/24, and 129.134.31.0/24 prefixes — a covering route for Facebook’s nameservers ‘a’ and ‘b’. All vanished from global routing tables on or about 15:39 UTC, with the exception of the /17 (more on that later). To illustrate how this outage was experienced from the standpoint of ISPs and transit providers, who route user traffic to Facebook, we took a snapshot of Cogent’s routing table as it was before the outage at 12:00 UTC on October 4th and during the outage at 16:00 UTC. Facebook had 309 prefixes advertised at 12:00 UTC and 259 prefixes at 16:00 UTC. Only the following prefixes were “missing”: 129.134.25.0/24 129.134.26.0/24 129.134.27.0/24 129.134.28.0/24 129.134.29.0/24 129.134.30.0/23 129.134.30.0/24 129.134.31.0/24 129.134.65.0/24 129.134.66.0/24 129.134.67.0/24 129.134.68.0/24 129.134.69.0/24 129.134.70.0/24 129.134.71.0/24 129.134.72.0/24 129.134.73.0/24 129.134.74.0/24 129.134.75.0/24 129.134.76.0/24 129.134.79.0/24 157.240.207.0/24 185.89.218.0/23 185.89.218.0/24 185.89.219.0/24 2a03:2880:f0fc::/47 2a03:2880:f0fc::/48 2a03:2880:f0fd::/48 2a03:2880:f0ff::/48 2a03:2880:f1fc::/47 2a03:2880:f1fc::/48 2a03:2880:f1fd::/48 2a03:2880:f1ff::/48 2a03:2880:f2ff::/48 2a03:2880:ff08::/48 2a03:2880:ff09::/48 2a03:2880:ff0a::/48 2a03:2880:ff0b::/48 2a03:2880:ff0c::/48 2a03:2881:4000::/48 2a03:2881:4001::/48 2a03:2881:4002::/48 2a03:2881:4004::/48 2a03:2881:4006::/48 2a03:2881:4007::/48 2a03:2881:4009::/48 69.171.250.0/24 All of these prefixes covered Facebook nameservers, with the exception of the last one. Figure 6 shows traffic destined for nameserver ‘c’ getting dropped by the first Internet hop, as the service provider had no route in its routing table to get the traffic to its destination. Figure 6. Traffic to Facebook nameserver dropped at first Internet hop The /17 prefix covering 50 percent of Facebook’s DNS and was still advertised and in service provider routing tables, but as seen in figure 8, all traffic destined to Facebook nameserver ‘a’ via that route was dropped at Facebook’s edge. Figure 7. DNS traffic dropped at Facebook network edge router The reason why this advertised route failed could be because it wasn’t set up to handle traffic to the DNS service (since a /23 and, more importantly, /24s were actively used before the outage) — or it could indicate that there was an issue in Facebook’s network, perhaps preventing traffic from routing internally. Similar behavior was seen during a major outage within Google’s network in 2019. In that incident, BGP advertisements continued to route traffic to their network, but the traffic dropped at Google’s network edge because their internal network was disabled and the border routers had no internal routes to send traffic to destination servers. Figure 8. All traffic dropped at Google’s network edge in 2019 incident You can read our analysis of the Google outage here. Finally, to provide a fuller picture of the state of Facebook’s network, let’s look at the final prefix on the list of withdrawn routes, the 69.171.250.0/24, which is one of the many prefixes for facebook.com. This route wasn’t withdrawn in the same way that the DNS prefixes were. Figure 9 shows the impact of the significant and continuous route flapping for that prefix throughout the outage, effectively rendering that route unusable. Figure 9. Continuous route flapping observed for 69.171.250.0/24 prefix The fact that this route instability was left in place for so long is perhaps an indication that something beyond the DNS service was amiss. But before we get to that, let’s take a detour down BGP lane. SO WHY DID FACEBOOK WITHDRAW ROUTES TO ITS SERVICE IN THE FIRST PLACE? While we don’t know the specific reason for the configuration update that sparked this incident, route withdrawals and changes are not uncommon. BGP isn’t just the way traffic gets routed across the Internet. It’s also a powerful tool for network operators to shape the flow of traffic to their services. BGP changes are a normal part of operations in the running of a highly trafficked network. Reasons range from making changes to a service (for example, routing traffic to a different prefix to perform maintenance on some part of the service), traffic engineering to optimize performance for users, changing peers, changing the nature of a peering relationship, and other operational activities. Routes can also accidentally get withdrawn due to network configuration updates gone wrong, router bugs, or changes meant for a single peer getting pushed out broadly. WHY WAS FACEBOOK UNABLE TO RESTORE ROUTES TO ITS SERVICE FOR MORE THAN SEVEN HOURS? Go ahead, blame the network. Even if DNS was the domino that toppled it all, and even if a rogue set of BGP withdrawals was the source of that toppling, like any BGP route change, it can be changed again. Or can it? History tells us that the longest lived and most damaging outages can most often be laid at the feet of some issue with the control plane. Whether through human error or bug, if the mechanism for network operators to control the network — to make changes to it — is damaged or severed, that’s when things can go very very wrong. Take the aforementioned Google outage. In that incident, which lasted about four hours, a maintenance operation inadvertently took down all of the network controllers for a region of Google’s network. Without the controllers, the network infrastructure was effectively headless and unable to route traffic. Google network engineers were unable to quickly bring the network back online because their access to the network controllers depended on the very network that was down. Lack of access to the network management system would certainly have prevented Facebook from rolling back any faulty changes. Access could have been due to some network change that was part of the original route withdrawals that precipitated the outage, or it could have been due to a service dependency (for example, if their internal DNS was a dependency for access to an authentication service or other key system). Regardless, even after DNS was restored (and shortly before it failed), we observed connection issues to facebook.com. Connection issue post-incident could be due to Facebook servers getting overwhelmed as they worked to build to full capacity (see figure 10), or it could point to broad issues within Facebook’s network. Figure 10. Facebook access issues persist beyond restoration of DNS service Notably, connection issues were observed immediately before DNS went down (see figure 11). Figure 11. Receive errors observed globally just prior to DNS outage WHY WERE THERE SO MANY OTHER NETWORK ISSUES REPORTED YESTERDAY, TOO? Even apart from the millions of users impacted, reports of issues with services providers were rife during the outage. ISPs and transit providers would have been impacted in a couple of ways. First, Facebook accounts for significant amounts of Internet traffic volumes — and all the queries to its DNS servers would have been dropped by providers, since they had no routes to that service. At the same time, greater volumes of DNS queries (and, thus, network traffic) would have been hitting both DNS providers (and ISPs) since the DNS is inherently resilient, and when queries to one nameserver failed, DNS resolvers would have tried the other nameservers — to no avail. What would ordinarily be a single query, would have been quadrupled during the outage. Not to mention all those browser refreshes generated by anxious users trying to reach the site. Facebook’s CTO also reportedly alluded to the stress on its network post-incident in an email to its employees. WHEN DID THE INCIDENT END? The DNS service started to come back online around 22:20 and by approximately 22:45 the incident was effectively over, with most users able to reach Facebook, as seen in figure 12. Figure 12. Most global users are able to reach Facebook service by 22:45 UTC LESSONS LEARNED Be sure to check back later today for more on this front. We’ll be releasing a new episode of the Internet Report, where we’ll walk through what we’ve discussed in this post, but also discuss some of the key takeaways and lessons learned. -------------------------------------------------------------------------------- [Oct 4, 3:15 pm PT] Facebook’s DNS service appeared to be fully restored by approximately 21:30 UTC and Facebook.com is now reachable for most users. Facebook.com coming back online. [Oct 4, 12:15 pm PT] Facebook made BGP withdrawals near the time of the incident, however, 2 prefixes covering two of their 4 DNS nameservers (a and b) are still being advertised across the Internet. They are reachable on the Internet but traffic is dropping at Facebook’s network edge. The 2 DNS nameservers (a and b) are reachable because covering prefix 129.134.0.0/17 is still being advertised, but this advertisement may not have been designed to support the nameserver service. The 3 specific prefixes covering a and b nameservers before the incident were 129.134.30.0/23, 129.134.30.0/24, 129.134.31.0/24. The specific routes covering all 4 nameservers (a-d) were withdrawn from the Internet at approximately 15:39 UTC. Internet routes to Facebook nameservers ‘a’ and ‘b’ are active, but traffic is dropped at Facebook edge. No Internet routes exist for Facebook nameservers ‘c’ and ‘d’ so traffic is dropped at first ISP router. [Oct 4, 10:15 am PT] ThousandEyes tests can confirm that at 15:40 UTC on October 4, the Facebook application became unreachable due to DNS failure. Facebook’s authoritative DNS nameservers became unreachable at that time. The issue is still ongoing as of 17:02 UTC. Facebook’s application globally unreachable due to DNS resolution failure. Published On October 4, 2021 Angelique Medina Director, Product Marketing -------------------------------------------------------------------------------- Categories: Outage Analyses Tags: dns dns outage domain name service (dns) Share This! -------------------------------------------------------------------------------- Back to ThousandEyes Blog sections * First of all, why is DNS important and what happened to Facebook’s internal DNS service? * Why didn’t Facebook move their DNS records to an external DNS service provider and get their services back online? * Why did Facebook’s internal DNS service go down in the first place? * So why did Facebook withdraw routes to its service in the first place? * Why was Facebook unable to restore routes to its service for more than seven hours? * Why were there so many other network issues reported yesterday, too? * When did the incident end? * Lessons Learned STAY CONNECTED SUBSCRIBE TO THE INTERNET AND CLOUD INTELLIGENCE BLOG! Subscribe Created with Lunacy STAY CONNECTED SUBSCRIBE TO THE INTERNET AND CLOUD INTELLIGENCE BLOG! Subscribe further reading Outage Analyses: Akamai DNS Outage Analysis read blog RELATED BLOGS Outage Analyses Akamai DNS Outage Analysis Learn how the July 22nd Akamai DNS outage unfolded, why services experienced the same outage differently, and three lessons you can take away from this incident. By Angelique Medina | July 22, 2021 | 10 min read Outage Analyses Akamai Prolexic Routed Outage Analysis Learn how the June 16th Akamai Prolexic Routed outage unfolded and why services can experience the same outage differently, based on key differences in their failover plans. By Mike Hicks | June 24, 2021 | 11 min read Outage Analyses Inside the Fastly Outage: Analysis and Lessons Learned Learn more about how the June 8, 2021 Fastly outage unfolded and how four different websites experienced the outage very differently. By Angelique Medina | June 10, 2021 | 18 min read Load More Please enable JavaScript to view the comments powered by Disqus. Language -------------------------------------------------------------------------------- Product * Digital Experience Monitoring * Browser Synthetics * Internet & WAN * End User Monitoring * Internet Insights™ * ThousandEyes Platform * Global Vantage Points * Pricing * Solution Comparison Solutions * Alibaba Cloud Monitoring * AppDynamics * AWS Cloud Monitoring * BGP Routing * CDN Monitoring * Cisco Catalyst 9000 * Cisco SD-WAN * Customer Digital Experience * DDoS Monitoring * DNS Monitoring * Dynamics 365 Monitoring * Enterprise Digital Experience * Google Cloud Monitoring * Hybrid WAN Monitoring * IaaS Monitoring * ISP Monitoring * Microsoft 365 Monitoring * Microsoft Azure Monitoring * Multi-cloud Monitoring * Network Device Monitoring * SaaS Monitoring * Salesforce Monitoring * SASE * SD-WAN Monitoring * Website Monitoring * WiFi and LAN Monitoring Industries * Carriers & Hosting * Consumer Web * Financial Services * Healthcare * Industrial IoT (IIoT) * Media & Entertainment * Public Sector * Retail Learn * Resource Center * The Internet Report * Research * Outage Analyses * Internet Outages Map * Blog * Webinars About * About Us * Newsroom * Careers * Partners * Contact Us Support * Abuse Report * Support Login * Product Login * API Reference * Trust * Documentation * Status -------------------------------------------------------------------------------- USA Sales: 1 (800) 757-1353 201 Mission Street Suite 1700 San Francisco, CA USA 94105 Legal Resources | Sitemap | Terms of Use | Privacy Statement | Consent Manager © 2021 ThousandEyes, Inc. All rights reserved. CONSENT MANAGER * YOUR PRIVACY * STRICTLY NECESSARY COOKIES * PERFORMANCE COOKIES * TARGETING COOKIES * FUNCTIONAL COOKIES FUNCTIONAL COOKIES Functional Cookies These cookies enable the website to provide enhanced functionality and personalization. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly. YOUR PRIVACY When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer. For more information on the information we collect and how we use it see the Website Privacy Statement. STRICTLY NECESSARY COOKIES Always Active These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. These cookies do not store any personally identifiable information. PERFORMANCE COOKIES Off Performance Cookies On These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance. TARGETING COOKIES Off Targeting Cookies On These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising. Save Settings Allow All Your Privacy [`dialog closed`] By continuing to use our website, you acknowledge the use of cookies. Privacy Statement | Change Settings * English (English) * Japanese (日本語) * German (Deutsch) * French (Français) × SUBSCRIBE TO THE THOUSANDEYES BLOG STAY CONNECTED WITH BLOG UPDATES AND OUTAGE REPORTS DELIVERED WHILE THEY'RE STILL FRESH. * * CountryUnited StatesAfghanistanÅland IslandsAlbaniaAlgeriaAmerican SamoaAndorraAngolaAnguillaAntarcticaAntigua and BarbudaArgentinaArmeniaArubaAustraliaAustriaAzerbaijanBahamasBahrainBangladeshBarbadosBelarusBelgiumBelizeBeninBermudaBhutanBoliviaBonaireBosnia and HerzegovinaBotswanaBouvet IslandBrazilBritish Indian Ocean TerritoryBrunei DarussalamBulgariaBurkina FasoBurundiCape VerdeCambodiaCameroonCanadaCayman IslandsCentral African RepublicChadChileChinaChristmas IslandCocos (Keeling) IslandsColombiaComorosCongoDemocratic Republic of the CongoCook IslandsCosta RicaCôte d'IvoireCroatiaCubaCuraçaoCyprusCzech RepublicDenmarkDjiboutiDominicaDominican RepublicEcuadorEgyptEl SalvadorEquatorial GuineaEritreaEstoniaEthiopiaFalkland IslandsFaroe IslandsFijiFinlandFranceFrench GuianaFrench PolynesiaFrench Southern TerritoriesGabonGambiaGeorgiaGermanyGhanaGibraltarGreeceGreenlandGrenadaGuadeloupeGuamGuatemalaGuernseyGuineaGuinea-BissauGuyanaHaitiHeard Island and McDonald IslandsHoly See (Vatican City State)HondurasHong KongHungaryIcelandIndiaIndonesiaIranIraqIrelandIsle of ManIsraelItalyJamaicaJapanJerseyJordanKazakhstanKenyaKiribatiNorth KoreaSouth KoreaKuwaitKyrgyzstanLaosLatviaLebanonLesothoLiberiaLibyaLiechtensteinLithuaniaLuxembourgMacaoMacedoniaMadagascarMalawiMalaysiaMaldivesMaliMaltaMarshall IslandsMartiniqueMauritaniaMauritiusMayotteMexicoMicronesiaMoldovaMonacoMongoliaMontenegroMontserratMoroccoMozambiqueMyanmarNamibiaNauruNepalNetherlandsNew CaledoniaNew ZealandNicaraguaNigerNigeriaNiueNorfolk IslandNorthern Mariana IslandsNorwayOmanPakistanPalauPalestinePanamaPapua New GuineaParaguayPeruPhilippinesPitcairnPolandPortugalPuerto RicoQatarRéunionRomaniaRussian FederationRwandaSaint BarthélemySaint Helena, Ascension and Tristan da CunhaSaint Kitts and NevisSaint LuciaSaint Martin (French part)Saint Pierre and MiquelonSaint Vincent and the GrenadinesSamoaSan MarinoSao Tome and PrincipeSaudi ArabiaSenegalSerbiaSeychellesSierra LeoneSingaporeSint MaartenSlovakiaSloveniaSolomon IslandsSomaliaSouth AfricaSouth Georgia and the South Sandwich IslandsSouth SudanSpainSri LankaSudanSurinameSvalbard and Jan MayenSwazilandSwedenSwitzerlandSyrian Arab RepublicTaiwanTajikistanTanzaniaThailandTimor-LesteTogoTokelauTongaTrinidad and TobagoTunisiaTurkeyTurkmenistanTurks and Caicos IslandsTuvaluUgandaUkraineUnited Arab EmiratesUnited KingdomUnited States Minor Outlying IslandsUruguayUzbekistanVanuatuVenezuelaViet NamVirgin Islands, BritishVirgin Islands, U.S.Wallis and FutunaWestern SaharaYemenZambiaZimbabwe * By submitting this form, I agree to the processing of my personal data by ThousandEyes as described in the Privacy Statement. I also agree to receive marketing communications regarding ThousandEyes research, products, educational materials and community events. I can unsubscribe anytime. Subscribe Processing UPGRADE YOUR BROWSER TO VIEW OUR WEBSITE PROPERLY. Please download the latest version of Chrome, Firefox or Microsoft Edge. More detail ×