www.sciencedirect.com
Open in
urlscan Pro
162.159.130.81
Public Scan
URL:
https://www.sciencedirect.com/topics/computer-science/malicious-domain
Submission: On November 18 via manual from PL — Scanned from DE
Submission: On November 18 via manual from PL — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Skip to Main content ScienceDirect * Journals & Books * Help * SearchSearchSearch RegisterSign in MALICIOUS DOMAIN RELATED TERMS: * Classification (Machine Learning) * Malware * Botnets * Domain Name System View all TopicsNavigate Right PlusAdd to Mendeley DownloadDownload as PDF BellSet alert InfoAbout this page DNS NETWORK SECURITY Allan Liska, Geoffrey Stowe, in DNS Security, 2016 FAST-FLUX DOMAINS One of the reasons that malicious domains tend to have lower TTLs is the widespread use of fast-flux domains. Fast-flux domains are used by attackers a means of obscuring and protecting their real infrastructure. In a fast-flux attack, the attacker compromises a number of easy targets, such as unprotected computers or insecure home routers. These routers are then used as tunnels to redirect command and control messages and exfiltrated data to and from the real infrastructure. Using a combination of DNS round robin and low TTLs, the attacker will constantly update the A records for the subdomains in the domain. Every time the malware on the host has a new request the DNS query response returns a new IP address. The captured data or command response is sent to the compromised host and forwarded on to the real infrastructure, which also sends out commands redirected through the same set of compromised hosts. In addition to fast-flux domains, there are also double-flux domains. Double-flux domains also use the same fast-flux technique on the authoritative name servers for the domain. In a double-flux scenario, name servers for the domain are also compromised hosts. When a query comes into the name server it is forward through the compromised hosts and to the real authoritative name server. Again, this allows the attacker to protect her authoritative DNS infrastructure and continue to manage her fast-flux hosts without interruption. If the IP addresses for the fake authoritative name servers are blocked, she simply changes to new compromised hosts. View chapterPurchase book Read full chapter URL: https://www.sciencedirect.com/science/article/pii/B9780128033067000061 SYSTEM EXPLOITATION Aditya K Sood, Richard Enbody, in Targeted Cyber Attacks, 2014 4.6.2 INFECTING A WEB SITE An infected web site contains malicious code in the form of HTML that manipulates the browser to perform illegitimate actions. This malicious code is usually placed in the interactive frames known as iframes. An iframe is an inline frame that is used by browsers to embed an HTML document within another HTML document. For example, the ads you see on a web page are often embedded in iframes: a web page provides an iframe to an advertiser who fetches content from elsewhere to display. From an attacker’s viewpoint, an iframe is particularly attractive because it can execute JavaScript, that is, it is a powerful and flexible HTML element. In addition, an iframe can be sized to be 0×0 so that it effectively isn’t displayed while doing nefarious things. In the context of drive-by downloads, its primary use is to stealthily direct a user from the current page to a malicious page hosting a BEP. A basic representation of iframe is shown in Listing 4.2. Sign in to download full-size image Listing 4.2. Example of a normal and obfuscated iframe. The “I-1” and “I-2” representations of iframe codes are basic. The “I-3” represents the obfuscated iframe code which means the iframe code is scrambled so that it is not easy to interpret it. Basically, attackers use the obfuscated iframes to deploy malicious content. The “I-3” representation is an outcome of running Dean Edward’s packer on “I-2”. The packer applied additional JavaScript codes with eval functions to scramble the source of iframe by following simple compression rules. However, when both “I-2” and “I-3” are placed in HTML web page execute the same behavior. The packer uses additional JavaScript functions and performs string manipulation accordingly by retaining the execution path intact. Once a web site is infected, an iframe has the ability to perform following operations: • Redirect: The attacker injects code into the target web site to redirect users to a malicious domain. A hidden iframe is popular because it can execute code. One approach is for the iframe to simply load malware from a malicious domain and execute it in the user’s browser. If that isn’t feasible or is blocked, an iframe can be used to redirect the browser to a malicious domain hosting a BEP. The iframe may be obfuscated to hide its intent. • Exploit: The attacker deploys an automated exploit framework such as BEP on the malicious domain. A malicious iframe can load specific exploit directly from the BEP. The attacker can also perform server side or client side redirects [36,37] to coerce a browser to connect to a malicious domain. Generally, iframes used in targeted attacks are obfuscated, so that code interpretation becomes hard and web site scanning services fail to detect the malicious activity. View chapterPurchase book Read full chapter URL: https://www.sciencedirect.com/science/article/pii/B9780128006047000048 INFECTING THE TARGET Aditya K Sood, Richard Enbody, in Targeted Cyber Attacks, 2014 3.3 MODEL B: SPEAR PHISHING ATTACK: EMBEDDED MALICIOUS LINKS In the model discussed above, the attacker can alter the attack vector. Instead of sending malicious attachments, the attacker embeds malicious links in the spear phishing e-mails for distribution to the target audience. On clicking the link, user’s browser is directed to the malicious domain running a Browser Exploit Pack (BEP) [5]. Next, the BEP fingerprints the browser details including different components such as plugins to detect any vulnerability, which can be exploited to download malware. This attack is known as a drive-by download attack in which target users are coerced to visit malicious domains through social engineering [6]. The attacker can create custom malicious domains, thus avoiding the exploitation of legitimate web sites to host malware. The custom malicious domains refer to the domains registered by attackers which are not well known and remain active for a short period of time to avoid detection. This design is mostly used for broadly distributed infections rather than targeted ones. However, modifications in the attack patterns used in drive-by download make the attack targeted in nature. The context of malware infection stays the same but the modus operandi varies. Table 3.1 shows the different types of spear phishing e-mails with attachments that have been used in the last few years to conduct targeted cyber attacks. The “Targeted E-mail Theme” shows the type of content used by attackers in the body of e-mail. The themes consist of various spheres of development including politics, social, economic, nuclear, etc. Table 3.1. An Overview of Structure of E-mails Used in Targeted Attacks in Last Years Targeted E-Mail ThemeDateSubjectFilenameCVEJob | Socio – Political ground07/25/2012• Application • Japanese manufacturing • A Japanese document • Human rights activists in China • New Microsoft excel table.xls (password: 8861) • q }(24.7.1).xls • 240727.xls • 8D823C0A3DADE8334B6C1974E2D6604F.xls • Seminiar.xls 2012-0158Socio - Political ground03/12/2012–06/12/2012• TWA’s speech in the meeting of United States Commission for human rights • German chancellor again comments on Lhasa protects • Tibetan environmental situations for the past 10 years • Public Talk by the Dalai Lama_Conference du Dalai Lama Ottawa, Saturday, 28th April 2012 • An Urgent Appeal Co-signed by Three Tibetans • Open Letter To President Hu • The Speech.doc • German Chancellor Again Comments on Lhasa Protects.doc • Tibetan environmental statistics.xls • Public Talk by the Dalai Lama.doc • Appeal to Tibetans To Cease Self-Immolation.doc • Letter.doc 2010-0333Socio - Political ground01/06/2011Three big risks to China’s economy in 2011Three big risks to China’s economy in 2011.doc2010-3333Socio - Political ground01/24/2011Variety Liao taking – taking political atlas LiaoAT363777.7z | 44.doc2010-3970Economic situation03/02/2012Iran’s oil and nuclear situationIran’s oil and nuclear situation.xls2012-0754Nuclear operations03/17/2011Japan nuclear radiation leakage and vulnerability analysisNuclear Radiation Exposure and Vulnerability Matrix.xls2011-0609Nuclear weapon program04/12/2011Japan’s nuclear reactor secret: not for energy but nuclear weaponsJapan Nuclear Weapons Program.doc2011-0611Anti-trust policy04/08/2011Disentangling Industrial Policy and Competition Policy in ChinaDisentangling Industrial Policy and Competition Policy in China.doc2011-0611Organization meeting details06/20/2010Meeting agendaAgenda.pdf2010-1297Nuclear security summit and research posture04/01/2010Research paper on nuclear posture review 2010 and upcoming Nuclear security summitResearch paper on nuclear posture review 2010.pdf2010-0188Military balance in Asia05/04/2010Asian-pacific security stuff if you are interestedAssessing the Asian balance.pdf2010-0188Disaster relief05/09/2010ASEM cooperation relief on Capacity Building of disaster reliefConcept paper.pdf2010-0188US-Taiwan relationship02/24/2009US-Taiwan exchange program enhancementA_Chronology_of_Milestone_events.xls US_Taiwan_Exchange_in-depth_Rev.pdf2009-0328National defense law mobilization03/30/2010China and foreign military modernizationWebMemo.pdf2009-4324Water contamination in Gulf07/06/2010EPA’s water sampling reportWater_update_part1.pdf Water_update_part2.pdf2010-1297Rumours about currency reforms03/24/2010Rumours in N Korea March 2010Rumours in N Korea March 2010.pdf2010-0188Chinese currency03/23/2010Talking points on Chinese currencyEAIBB No. 512.pdf2009-4324Trade policy03/23/20102010 Trade Policy AgendaThe_full_Text_of_Trade_Policy_Agenda.pdf2010-0188Chinese annual plenary session03/18/2010Report on NPC 2010NPC Report.pdf2009-4324Unmanned aircraft systems01/03/20102009 DOD UAS ATC ProceduresDOD_UAS_Class_D_Procedures[signed].pdf2008-0655Human rights02/26/2009FW: Wolf letter to secretary Clinton regarding China human rights2.23.09 Sec. of State Letter.pdf2009-0658NBC interview09/08/2009Asking for an interview from NBC journalistInterview Topics.docUnknownChines defense01/28/2010Peer-Review: Assessing Chinese military transparencyPeer-Review - Assessing Chinese military transparency.pdf2009-4324Asian Terrorism report10/13/2009Terrorism in AsiaRL34149.pdfUnknownCountry threats01/07/2010Top risks of 2010UnknownUnknownCounter terrorism05/06/2008RSIS commentary 54/2009 ending the LTTERSIS.zipUnknownAnti-piracy mission01/13/2010The China’s navy budding overseas presenceWm_2752.pdfUnknownNational security01/20/2010Road Map for Asian-Pacific SecurityRoad-map for Asian-Pacific Security.pdf2009-4324US president secrets11/23/2009The three undisclosed secret of president Obama TourObamaandAsia.pdf2009-1862 The model of waterholing attack discussed in the following section is a variant of drive-by download attack. View chapterPurchase book Read full chapter URL: https://www.sciencedirect.com/science/article/pii/B9780128006047000036 GRAPH THEORY Leigh Metcalf, William Casey, in Cybersecurity and Applied Mathematics, 2016 5.1 AN INTRODUCTION TO GRAPH THEORY A graph in mathematics consists of a set of vertices and a pairing created with distinct vertices. This pairing creates an edge. In visualizing the graph, the vertices are points while the edges are lines connecting two points. The graph is generally written in the form G = (V,E) where V represents the set of vertices and E represents the set of edges. If we let v1 and v2 represent vertices, then an edge is written as the pair (v1,v2). Then we say that v1 and v2 are connected by that edge. Practically speaking, a graph is a way of modeling relationships where vertices can be considered a collection of entities. Entities can be people, systems, routers, DNS names, IP addresses, or malware samples. The edges are then the relationships between two entities. Suppose we have a collection of malicious domains and the IP addresses to which they resolve. Then a domain name has a relationship to an IP address if the domain name resolves to that IP address. In this case, we have an edge between the two, and thus creating a graph that models the domain names and IP addresses of malware. Examining this graph can tell us more about the malware network. Does it use the same IP addresses over and over? Are they scattered, unrelated? Does one IP address serve most of the domains or is it possible that there is one domain that uses a multitude of IP addresses? Analyzing these graphs enables us to answer these questions and more about malicious domains and their IP addresses. The point of using a graph is that we do not need to know much about the malicious software or the domains. We only need to consider the properties of a graph. We could also draw the graph, as we do for examples in this chapter. This becomes increasingly uninformative once our graph gets large. For a graph with 10 vertices and 20 edges, drawing it out lets us see the important vertices in the graph and potentially the interesting formations in it. However, this becomes increasingly uninformative and complex once graph (ie, the number of vertices) gets large. Drawing a graph with a million edges is nearly impossible by hand. Using math and graph theory allows us to skip the drawing process and summarize information modeled by the graph. Also, we do not need to know what our graph is modeling in order to find properties of the graph, we just need the graph. This chapter will cover graphs, their properties, and modeling data with them. View chapterPurchase book Read full chapter URL: https://www.sciencedirect.com/science/article/pii/B9780128044520000051 REPUTATION-BASED DETECTION Chris Sanders, Jason Smith, in Applied Network Security Monitoring, 2014 MALWARE DOMAIN LIST Regardless of the global concerns related to targeted attacks by sophisticated adversaries, the majority of an analyst’s day will be spent investigating incidents related to malware infections on their systems. Because of this, it becomes pertinent to be able to detect malware at both the host and network level. One of the easiest ways to detect malware at the network level is to use public reputation lists that contain IP addresses and domain names that are known to be associated with malware-related communication. Malware Domain List (MDL) is a non-commercial community project that maintains lists of malicious domains and IP addresses. The project is supported by an open community of volunteers, and relies upon those volunteers to both populate the list, and vet it to ensure that items are added and removed from the list as necessary. MDL allows you to query its list on an individual basis, or download the list in a variety of formats. This includes CSV format, an RSS feed, and a hosts.txt formatted list. They also provide lists that include only new daily list entries, and lists of sites that were once on the list but have now been cleaned or taken offline. MDL is one of the largest and most used reputation lists available. I’ve seen many organizations that have had a great deal of success detecting malware infections and botnet command and control (C2) by using MDL as an input for reputation-based detection. The vastness of MDL can sometimes result in false positives, so an alert generated from a friendly host visiting an entry found on MDL isn’t enough by itself to automatically declare an incident. When one of these alerts is generated, you should investigate other data sources and a wider range of communication from the friendly host to attempt to determine if there are other signs of an infection or compromise. You can learn more about MDL at http://www.malwaredomainlist.com. View chapterPurchase book Read full chapter URL: https://www.sciencedirect.com/science/article/pii/B9780124172081000088 DETECTION MECHANISMS, INDICATORS OF COMPROMISE, AND SIGNATURES Chris Sanders, Jason Smith, in Applied Network Security Monitoring, 2014 VARIABLE INDICATORS If the detection mechanisms used in your network were only configured to detect attacks where known indicators were used, then you would likely eventually miss detecting something bad. At some point, we have to account for variable indicators, which are indicators for which values are not known. These are usually derived by creating a sequence of events for which an attack might occur (forming a behavioral indicator), and identifying where variables exist. Essentially, it examines a theoretical attack, rather than one that has already occurred. This root-cause type of analysis is something performed on specific attack techniques, rather than instances of attacks executed by an individual adversary. I like to think of variable indicators as resembling a movie script, where you know what will happen, but not who will play each particular role. Also, just like a movie script, there is always the potential for improvisation with a skilled actor. Variable indicators are not entirely useful for deployment to signature-based detection mechanisms, but find a great deal of use with solutions like Bro. We can see an example of developing variable indicators by revisiting the scenario we looked at in the last section. Instead of basing the attack scenario on an attack that has actually occurred, we will base it on a theoretical attack. Restated, the attack scenario would broadly play out as follows: 1. A user received an e-mail message with a malicious attachment. 2. The user opens the attachment, triggering the download of a file from a malicious domain. 3. The file was used to overwrite a system file with the malicious version of that file. 4. Code within the malicious file was executed, triggering an encrypted connection to a malicious server. 5. Once the connection was established, a large amount of data was exfiltrated from the system. These steps represent behavioral indicators that contain multiple variable atomic and computed indicators. We can enumerate some of these indicators here: • VB-1: A user received an e-mail message with a malicious attachment. • VA-1: E-Mail Address • VA-2: E-Mail Subject • VA-3: Malicious E-Mail Source Domain • VA-4: Malicious E-Mail Source IP Address • VA-5: Malicious Attachment File Name • VC-1: Malicious Attachment MD5 Hash • VB-2: The user opens the attachment, triggering the download of a file from a malicious domain. • VA-6: Malicious Redirection Domain/IP • VA-7: Malicious Downloaded File Name • VC-2: Malicious Downloaded File MD5 Hash • VB-3: The file was used to overwrite a system file with the malicious version of that file. • VB-4: Code within the malicious file was executed, triggering an encrypted connection to a malicious server on a non-standard port. • VA-8: External C2 IP Address • VA-9: External C2 Port • VA-10: External C2 Protocol • VB-5: Once the connection was established, a large amount of data was exfiltrated from the system. In this example, the V in the indicator names describes a variable component of the indicator. As we’ve laid it out, there are potentially ten variable atomic indicators, two variable computed indicators, and five variable behavioral indicators. Now, we can hypothesize methods in which these indicators can be built into signatures to be paired with detection mechanisms. Variable indicators will commonly be reused and combined in order to derive detection for broad attack scenarios. • VB-1 (VA-3/VA-4) VB-2 (VA-6) VB-4 (VA-8) VB-5 (VA-8): Snort/Suricata rule to detect communication with known bad reputation IP addresses and domains • VB-1 (VA-5/VC-1) VB-2 (VA-7/VC-2): Bro script to pull files off the wire and compare their names and MD5 hashes with a list of known bad reputation file names and MD5 hashes. • VB-1 (VA-5/VC-1) VB-2 (VA-7/VC-2): Bro script to pull files off the wire and place them into a sandbox that performs rudimentary malware analysis. • VB-2 (VA-6/VA-7/VC-2): HIDS signature to detect the browser being launched from a document • VB-3: HIDS signature to detect a system file being overwritten • VB-4 (VA-9/VA-10) VB-5: A Bro script to detect encrypted traffic occurring on a non-standard port • VB-4 (VA-9/VA-10) VB-5: A Snort/Suricata rule to detect encrypted traffic occurring on a non-standard port • VB-5: Custom written script that uses session data statistics to detect large volumes of outbound traffic from workstations SOC analysts commonly monitor information security news sources like conference proceedings and the blogs and Twitter feeds from industry experts. This allows the SOC to stay abreast of new and emerging attack techniques so that the organization’s defensive posture can be modeled around these techniques. When this happens, it becomes incredibly useful to break down the attack into variable indicators. When platform-specific signatures are provided, those can be reverse engineered into individual indicators so that they can be used in conjunction with the detection mechanisms in place on your network. These are incredibly useful exercise for NSM analysts. It helps the analyst to better understand how attacks work, and how detection mechanisms can be used to effectively detect the different phases of an attack. The components of the variable indicator can be used for all varieties of detection, and they are most useful in determining how to detect things with unknown entities. View chapterPurchase book Read full chapter URL: https://www.sciencedirect.com/science/article/pii/B9780124172081000076 DOMAIN NAME SYSTEM SECURITY AND PRIVACY: A CONTEMPORARY SURVEY Aminollah Khormali, ... David Mohaisen, in Computer Networks, 2021 4.2.2 ASSOCIATION ANALYSIS The strong associations of domains with known malicious domains can be utilized to detect malicious domain names. For example, Khalil et al. [55] have designed an association-based scheme for detection of malicious domains with high accuracy and coverage. Furthermore, Gao et al. [52] have utilized the temporal correlation analysis of DNS queries to identify a wide range of correlated malicious domain groups, e.g., phishing, spam, and DGA-generated domains based on the related known malicious anchor domains. Yadav et al. [60] have utilized statistical measures such as Kullback–Leibler divergence, Jaccard index, and Levenshtein for domain-flux botnet detection. Gomez et al. [139] have studied the application of visualization for understanding the DNS-based network threat analysis. Discussion and Open Directions. Despite numerous advantages of machine learning approaches, there are still risks and limitations of using them in operation. The foremost challenge is the acquisition and labeling of relevant data from representative vantage points to maximize insights. Even if the data is collected correctly, capturing DNS traffic results in a very large amount of data to analyze, which would be expensive in term of computation and storage. In addition, the performance of machine learning algorithms is contingent upon their structure and learning algorithms. It should be noticed that selecting improper structure or learning algorithms might result in poor results; thus, it is mandatory to try different algorithms for each problem. Furthermore, the training phase of the algorithm would be a time-consuming process, even if the dataset is small, requiring training heuristics. Based on our exploration of the literature, we believe there is a significant need into further automation with machine learning, not only for labeling, but also for the appropriate representation of features fed into machine learning algorithms through abstract structures (e.g., dependency representations, such as graphs) and deep neural networks (e.g., serving as high-quality data extractors). Table 12. List of common query types and their description. Query typeDefinitionAIPv4 addressAAAAIPv6 addressMXMail exchanger recordNSAuthoritative name serverTXTArbitrary text stringsPTRPointer (IP address/hostname)SRVService (service/hostname)SOAStart of AuthorityCNAMECanonical Name (Alias/canonical)DSDelegation of SigningDNSKEYDNSSEC public keyNSECNext SECure (No record/two points) Table 13. Summary of the machine learning methods used in the literature. Here AR is Accuracy Rate, while FPR represents False Positive Rate. The abbreviations of ML algorithms are described in Table 1. WorkApplicationML algorithmARFPRDTRFNBKNNSVMMLP[132]Malicious network✓✓97.1%1.6%[23]Cache poisoning✓✓88.0%1.0%[55]Malicious domain✓✓✓99.0%1.0%[18]Cache poisoning✓✓✓✓✓91.9%0.6%[77]Parked domains✓98.7%0.5%[159]Phishing✓95.5%3.5% Table 14. Topical classification of the DNS research methods addressed in the literature, with sample work. WorkPDNSADNSAnalysisScopeWorkPDNSADNSAnalysisScope[132]✓✓[116]✓✓[48]✓[117]✓[133]✓[77]✓[114]✓[142]✓[134]✓✓[145]✓[13]✓[118]✓[12]✓[24]✓[49]✓[10]✓[42]✓✓[146]✓[68]✓✓[20]✓[23]✓✓[17]✓[52]✓[22]✓✓[135]✓[50]✓[55]✓✓[80]✓[136]✓✓✓[100]✓[18]✓[120]✓[47]✓[124]✓[54]✓[164]✓[56]✓[165]✓[137]✓ View article Read full article URL: https://www.sciencedirect.com/science/article/pii/S1389128620313001 A RECENT REVIEW OF CONVENTIONAL VS. AUTOMATED CYBERSECURITY ANTI-PHISHING TECHNIQUES Issa Qabajeh, ... Francisco Chiclana, in Computer Science Review, 2018 4.1 DATABASES (BLACKLIST AND WHITELIST) A database driven approach to fight phishing, called blacklist, was developed by several research projects [2,50,51]). This approach is based on using a predefined list containing domain names or URLs for websites that have been recognised as harmful. A blacklisted website may lose up to 95% of its usual traffic, which will hinder the website’s revenue capacity and eventually profit [23]. This is the primary reason that web masters and web administrators give great attention to the problem of blacklisting. According to Mohammad et al. [11,12], there are two types of blacklists in computer security: Domain/URL Based. These are real time URL lists that contain malicious domain names and normally look for spam URLs within the body of emails. Internet Protocol Based. These are real time URL or domain server blacklists that contain IP addresses who, in real-time, change their status. Often, mailbox providers, such as Yahoo for example, check domain server blacklists to evaluate whether the sending server (source) is run by someone who allows other users to send from their own source. Users, businesses, or computer software enterprises can create blacklists. Whenever a website is about to be browsed, the browser checks the URL in the blacklist. If the URL exists in the blacklist, a certain action is taken to warn the user of the possibility of a security breach. Otherwise, no action will be taken as the website’s URL is not recognised as harmful. Currently, there are a few hundred blacklists which are publically available, among which we can mention the ATLAS blacklist from Arbor Networks, BLADE Malicious URL Analysis, DGA list, CYMRU Bogon list, Scumware.org list, OpenPhish list, Google blacklist, and Microsoft blacklist [52]. Since any user or small to large organisation can create blacklists, the currently public available blacklists have different levels of security effectiveness, particularly with respect to two factors: 1. Times the blacklist gets updated and its consistent availability. 2. Results quality with respect to accurate phishing detection rate. Marketers, users, and businesses tend to use Google and Microsoft blacklists when compared with other publically available blacklists commonly use because of their lower false positive rates. A study by [2] analysing blacklists concluded that they contain on average 47% to 83% phishing websites. Blacklists often are stored on servers, but can also be available locally in a computer machine as well [25]. Thus, the process of checking whether a URL is part of the blacklist is executed whenever a website is about to be visited by the user, in which case the server or local machine uses a particular search method to verify the process and derive an action. The blacklist usually gets updated periodically. For example, Microsoft blacklist is normally updated every nine hours to six days, whereas Google blacklist gets updated every twenty hours to twelve days [11,12]. Hence, the time window needed to amend the blacklist by including new malicious URLs, or excluding a possible false positive URLs, may allow phishers to launch and succeed in their phishing attacks. In other words, phishers have significant time to initiate a phishing attack before their websites get blocked This is an obvious limitation of using the blacklist approach in tracking false websites [18]. Another study by APWG revealed that over 75% of phishing domains have been genuinely serving legitimate websites and when blocked imply that several trustworthy websites will be added to the blacklist, which causes a drastic reduction in the website’s revenue and hinder its reputation [9]. After the creation of blacklists, many automated anti-phishing tools normally used by software companies such as McAfee, Google, Microsoft, were proposed. For instance, The Anti-Phishing Explorer 9, McAfee Site Advisor, and Google Safe Base are three common anti-phishing tools based on the blacklist approach. Moreover, companies such as VeriSign developed anti-phishing internet crawlers that gather massive numbers of websites to identify clones in order to assist in differentiating between legitimate and phishing websites. There have been some attempts to look into creating whitelists, i.e. legitimate URL databases, in contrast to blacklists [53]. Unfortunately, since the majority of newly created websites are initially identified as “suspicious”, this creates a burden on the whitelist approach. To overcome this issue, the websites expected to be visited by the user should exist in the whitelist. This is sometimes problematic in practise because of the large number of possible websites that a user might browse. The whitelist approach is simply impractical since “knowing” in advance what users might be browsing for might be different to those actually visited during the browsing process. Human decision is a dynamic process and often users change their mind and start browsing new websites that they initially never intended to. One of the early developed whitelist was proposed by Chen and Guo [53], which was based on users’ browsing trusted websites. The whitelist monitors the user’s login attempts and if a repeated login was successfully executed this method prompts the user to insert that website into the whitelist. One clear limitation of Chen and Guo’s method is that it assumes that users are dealing with trustful websites, which unfortunately is not always case. Phishzoo is another whitelist technique developed by Afroz and Greenstadt [5]. This technique constructs a website profile using a fuzzy hashing approach in which the website is represented by several criteria that differentiate one website from another including images, HTML source code, URL, and SSL certificate. Phishzoo works as follows: 1. When the user browses a new website, PhishZoo makes a specific profile for that website. 2. The new website’s profile is contrasted with existing profiles in the PhishZoo whitelist. If a full match is found, the newly browsed website is marked trustful. If partly matching, then the website will not be added since it is suspicious If no match is found but the SSL certificate is matched, PhishZoo will instantly amend the existing profile in the whitelist. If no match is found, then a new profile will be created for the website in the whitelist. Recently, Lee et al. [31] investigated the personal security images whitelist approach and its impact on internet banking users’ security. The authors utilised 482 users to conduct a pilot study on a simulated bank website. The results revealed that over 70% of the users during the simulated experiments had given their login credentials despite their personal security image test not being performed. Results also revealed that novice users do not pay high levels of attention to the use of personal images in ebanking, which can be seen as a possible shortcoming for this anti-phishing approach. View article Read full article URL: https://www.sciencedirect.com/science/article/pii/S1574013717302010 ISSUES AND CHALLENGES IN DNS BASED BOTNET DETECTION: A SURVEY Manmeet Singh, ... Sanmeet Kaur, in Computers & Security, 2019 4.5.1 STATE OF THE ARTS Antonakakis and Perdisci (2012) presented a technique for DGA based botnet detection named the Pleiades. Pleiades inspects DNS queries that result in Non-Existent Domains. The system consists of two main components: the DGA Discovery component, DGA Classification and C&C detection component as shown in Fig. 18. In DGA Discovery, all NXDomains are clustered based on the statistical similarity e.g. length, frequency, etc. The idea was to discover domain clusters that belong to the same DGA based botnet. In DGA classification and C&C detection, two models are used. Statistical multi-class classifier model was used to label the cluster e.g. DGA-Conficker-A. Hidden Markov model is used for detecting candidate C&C domains by finding single queried domain by a given host in the cluster. One of the prime limitations of this technique was the consideration of the domain as a single character sequence. The study also explained the limitation in providing the exact count of infected hosts. Sign in to download hi-res image Fig. 18. Overview of. Zhou et al. (2013) presented a system for DGA based botnet detection using DNS traffic analysis. The system consists of two modules. In the pre-handle module, a whitelist consisting of 10k Alexa domains is applied to the captured traffic to significantly reduce benign traffic. In DGA detection module, remaining domains are clustered based on similar live time span and similar visit pattern. The main idea behind the detection system is that the visit pattern and live time span of domain generated by DGA are different from normal domains which have large live time span and dissimilar visit pattern. Unconfirmed domains need longer time for further investigation and as such the system is not effective for real-time detection. Bilge et al. (2014) presented a system for spotting malicious domain named EXPOSURE. Four set of features namely “Time based, DNS answer based, TTL value-based and Domain name-based” are collected as part of feature attribution phase. “Change-Point Detection Algorithm” and “Similar Daily Detection” algorithm is used to classify domain into malicious or benign. J48 decision tree algorithm was used in the training phase of the classifier. The detection scheme reported a very high detection rate of 99.5% with 0.999 as Area under the curve (AUC) and low false positive rate of 0.3%. EXPOSURE can be evaded if the TTL value is set to a larger value or by decreasing the number of DNS queries. Bottazzi and Italiano (2015) presented a data mining approach for detection of algorithmically generated domains. Proxy logs from a large Italian organization consisting of 60,000 workstations and 100,000 users were mined for one month. The approach consists of extracting the second level domain (SLD) from the logs and constructing a knowledge base. For each day, a list of SLD is identified which consists of never seen domains and non-RF-1035 compliant domains. For each day, lexical analysis is done on the SLDs to find the amount of vowel, the number of consonants and amount of numbers. Finally, clustering is done on the length of SLD and amount of numbers. Results show that 5 out of top 8 SLDs cluster indicate the domains are algorithmically generated. Pronounceable domains generated using DGA, however, could not be detected using this technique. Sharifnya and Abadi (2015) presented a botnet detection technique based on the distinction between the domain names generated algorithmically or randomly and between legitimate ones. To detect botnet, a negative reputation system is used. The proposed model is different from other models as it associates a negative score with each host in the network. This score is further used to classify bot according to J48 Decision trees. An initial filtering based on the whitelist is used to separate trusted domains. Domain Group Analysis is done to find hosts which query the same domain. Domain labels are then analyzed if they are algorithmically generated using the correlation coefficient. A score is finally calculated between 0 and 1 indicating whether the host is involved in bot activities or not. Unlike other techniques, it considers the history of malicious domain activity for each host in the network. Nguyen et al. (2015) developed a method for detecting botnet employing Domain Generation Algorithm using collaborative filtering and density-based clustering. A whitelist filtering is used for domain filtering followed by K-means clustering with K = 2 on 2-gram frequency value used as an input. Finally, “Density-based Spatial Clustering of Applications with Noise (DBSCAN) algorithm” is used to find a bot that corresponds to the same botnet. One of the prime limitations of the technique is the inability to detect peer to peer botnets. Erquiaga et al. (2016) presented a technique based on behavioral analysis to detect DGA malware traffic. In the technique, flow is represented as a four-tuple: source and destination Internet Protocol address, port and the protocol. Each flow is aggregated to form a connection. Behavior model is then applied to each connection using well-defined steps. Detection algorithm consists of three phases in which for each connection a Markov chain model is applied which results in a transition matrix and initialization vector. In the second phase of detection, the traffic to be evaluated is connected into a series of letters. In the final stage, the resultant string is evaluated against all detection models. An alert is generated if the probability of the behavioral model exceeds a certain threshold. The paper has distinctively labeled DNS traffic into five groups: normal traffic, Non-DGA traffic (used for spam), DGA type-1, DGA type-2 and fast flux. The study concluded that more realistic results can be obtained if the dataset consists of both normal and botnet traffic. Tong and Nguyen (2016) presented a technique for DGA based botnet detection using semantic and cluster analysis. The proposed system comprises three stages: Domain filtering, DGA filtering, and DGA Clustering. In domain filtering, top 1 million Alexa domains are used as a whitelist. Semantic features like N-gram score, N-gram frequency, entropy, and meaningful character ratio are used to filter out domains. In DGA filtering, correlation matrix and Mahalanobis distance of the domains filtered in the first phase is calculated to filter benign domains. In DGA clustering, K –means is used to cluster the domains filtered in the second stage to different groups such that domains generated by the same domain generation algorithm fall in the same group. The technique considers only linguistic features and as such is bound to be less accurate. The study concluded that improvement can be made by using IP and domain-based features. Kwon et al. (2016) presented a scalable system named PsyBoG for botnet detection from a big DNS traffic. Due to increased Internet penetration, the volume of network traffic is increasing on regular basis, thereby making real-time analysis difficult and as such, there is a need for a scalable system for botnet detection. PsyBoG is a scalable system designed while keeping in mind the increasing volume of traffic. Periodic behavior pattern along with simultaneous behavior pattern is evaluated using DNS traffic to find botnet group. Power spectral density, a signal processing technique is used to find periodic behavior patterns. Power distance is used to find out simultaneous behavior patterns. The system is designed for high accuracy, robustness applicability apart from high scalability. PsyBoG can be evaded by using randomized and slow query pattern. Wang et al. (2017) developed a technique for detection of DGA based botnets named DBod. The technique works on the analysis of failed DNS Requests and creates a cluster of infected and clean machines in a network. The technique was evaluated in an academic environment for 26 months and had shown effective detection results. A score function was introduced which assigns a unique score to a cluster which classifies them as malicious or benign. DBod is unable to detect dormant bots. Increasing the time epoch leads to the greater possibility of detection but leads to higher computation. The study recommends 1 h as an acceptable time for the proposed technique. Table 8 presents a listing of key components of the DGA-based botnet detection techniques discussed in our review. Validation results of various DGA- based botnet detection techniques are presented in Fig. 19. Table 8. DGA-based detection techniques. ReferenceTechniqueWhitelist/Blacklist CriteriaTargeted BotnetDatasetPros and ConsAntonakakis and Perdisci (2012)Alternating Decision tree learning Algorithm and Hidden Markov Model–Zeus V3, BankPatch, Bobax, Conficker, Murofet & Sinowal15 months DNS traffic by a large North American ISP with 2 M Hosts per dayHigh false positive and false negative. Dependent on NXDomain trafficZhou et al. (2013)Domain set active time distributionTop 10 K websites from AlexaConficker, Kraken, Torpig, Srizbi & BobaxPilot DNS server in China with 150 K packets per secondDelay in detection domains not present in backlistsBilge et al. (2014)J48 & Genetic AlgorithmsZeus blacklist, Anubis, Wepawet, Phishtank & Alexa top 100 GlobalConficker, Kraken, Bobax & Srizbi bits2.5 months traffic consisting of 100 billion queriesHigh detection rate. Real-time deployment.Bottazzi and Italiano (2015)Knowledgebase Construction and Clustering on length and number of hits––One month's traffic of a Corporate Network having 60 K hostsEasy evasion by varying length of the domain.Sharifnya and Abadi (2015)Suspicious group Analysis Detector, Kullback-Leibler Divergence, Spearman's rank correlation coefficientWhitelist consisting of Alexa top 100. Blacklist from MurofetConficker.CDataset of benign and malicious DNS queriesLow false alarm rate. Considers the history of hosts suspicious DNS activities.Nguyen et al. (2015)Collaborative filtering, Density-based clustering & Cosine SimilarityAlexa 1 million domain whitelistNecurs, Vawtrak & PalevoTwo weeks of DNS traffic log of about 18,000 systemsCannot detect P2P botnetErquiaga et al. (2016)Markov chains detection algorithm–DGA MalwareDataset from Malware Capture facility Project (Annon, 2019f)The high rate of true negative valuesTong and Nguyen (2016)Modified Mahalanobis distance and K-means–Conficker, Tinba, Bebloh, Tovar-Goz & KrakenTop 1 million websites from Alexa (Alexa, 2017) and DNS-BH Malware Domain BlocklistDependent on botnet family.Kwon et al. (2016)Signal processing technique focusing on the simultaneous and periodic behavior patternLess than 13 queries in one hour–20 DNS traffic from Malware dumps and real-world DNS ServersScalableWang et al. (2017)Numerical Analysis of Request time and request count distribution and Chinese Whispers algorithmSpamhaus, BRBL, SpamCop, and AHBLKraken, Conficker, Cycbot & MurofetTraffic from 10,000 Users of Education Network of Tainan City (May 2013 to June 2015)Low false alarm rate and online detection Sign in to download hi-res image Fig. 19. Validation results of DGA based botnet detection techniques. View article Read full article URL: https://www.sciencedirect.com/science/article/pii/S0167404819301117 RECOMMENDED PUBLICATIONS InfoInfo icon * Computers & Security Journal * Journal of Information Security and Applications Journal * Computers and Electrical Engineering Journal * Journal of Network and Computer Applications Journal Browse books and journals * About ScienceDirect * Remote access * Shopping cart * Advertise * Contact and support * Terms and conditions * Privacy policy We use cookies to help provide and enhance our service and tailor content and ads. By continuing you agree to the use of cookies. Copyright © 2022 Elsevier B.V. or its licensors or contributors. ScienceDirect ® is a registered trademark of Elsevier B.V. ScienceDirect ® is a registered trademark of Elsevier B.V.