www.sciencedirect.com Open in urlscan Pro
162.159.130.81 Public Scan

Back to summary

URL:
https://www.sciencedirect.com/topics/computer-science/malicious-domain
Submission: On November 18 via manual (November 18th 2022, 11:58:36 am UTC) from PL — Scanned from DE

Form analysis
0 forms found in the DOM

Text Content

Skip to Main content
ScienceDirect
* Journals & Books

* Help
* SearchSearchSearch

RegisterSign in

MALICIOUS DOMAIN

RELATED TERMS:

* Classification (Machine Learning)
* Malware
* Botnets
* Domain Name System

View all TopicsNavigate Right

PlusAdd to Mendeley
DownloadDownload as PDF
BellSet alert
InfoAbout this page

DNS NETWORK SECURITY

Allan Liska, Geoffrey Stowe, in DNS Security, 2016

FAST-FLUX DOMAINS

One of the reasons that malicious domains tend to have lower TTLs is the
widespread use of fast-flux domains. Fast-flux domains are used by attackers a
means of obscuring and protecting their real infrastructure. In a fast-flux
attack, the attacker compromises a number of easy targets, such as unprotected
computers or insecure home routers. These routers are then used as tunnels to
redirect command and control messages and exfiltrated data to and from the real
infrastructure.

Using a combination of DNS round robin and low TTLs, the attacker will
constantly update the A records for the subdomains in the domain. Every time the
malware on the host has a new request the DNS query response returns a new IP
address. The captured data or command response is sent to the compromised host
and forwarded on to the real infrastructure, which also sends out commands
redirected through the same set of compromised hosts.

In addition to fast-flux domains, there are also double-flux domains.
Double-flux domains also use the same fast-flux technique on the authoritative
name servers for the domain. In a double-flux scenario, name servers for the
domain are also compromised hosts. When a query comes into the name server it is
forward through the compromised hosts and to the real authoritative name server.
Again, this allows the attacker to protect her authoritative DNS infrastructure
and continue to manage her fast-flux hosts without interruption. If the IP
addresses for the fake authoritative name servers are blocked, she simply
changes to new compromised hosts.

View chapterPurchase book
Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780128033067000061

SYSTEM EXPLOITATION

Aditya K Sood, Richard Enbody, in Targeted Cyber Attacks, 2014

4.6.2 INFECTING A WEB SITE

An infected web site contains malicious code in the form of HTML that
manipulates the browser to perform illegitimate actions. This malicious code is
usually placed in the interactive frames known as iframes. An iframe is an
inline frame that is used by browsers to embed an HTML document within another
HTML document. For example, the ads you see on a web page are often embedded in
iframes: a web page provides an iframe to an advertiser who fetches content from
elsewhere to display. From an attacker’s viewpoint, an iframe is particularly
attractive because it can execute JavaScript, that is, it is a powerful and
flexible HTML element. In addition, an iframe can be sized to be 0×0 so that it
effectively isn’t displayed while doing nefarious things. In the context of
drive-by downloads, its primary use is to stealthily direct a user from the
current page to a malicious page hosting a BEP. A basic representation of iframe
is shown in Listing 4.2.

Listing 4.2. Example of a normal and obfuscated iframe.

The “I-1” and “I-2” representations of iframe codes are basic. The “I-3”
represents the obfuscated iframe code which means the iframe code is scrambled
so that it is not easy to interpret it. Basically, attackers use the obfuscated
iframes to deploy malicious content. The “I-3” representation is an outcome of
running Dean Edward’s packer on “I-2”. The packer applied additional JavaScript
codes with eval functions to scramble the source of iframe by following simple
compression rules. However, when both “I-2” and “I-3” are placed in HTML web
page execute the same behavior. The packer uses additional JavaScript functions
and performs string manipulation accordingly by retaining the execution path
intact.

Once a web site is infected, an iframe has the ability to perform following
operations:

•

Redirect: The attacker injects code into the target web site to redirect users
to a malicious domain. A hidden iframe is popular because it can execute code.
One approach is for the iframe to simply load malware from a malicious domain
and execute it in the user’s browser. If that isn’t feasible or is blocked, an
iframe can be used to redirect the browser to a malicious domain hosting a BEP.
The iframe may be obfuscated to hide its intent.

•

Exploit: The attacker deploys an automated exploit framework such as BEP on the
malicious domain. A malicious iframe can load specific exploit directly from the
BEP.

The attacker can also perform server side or client side redirects [36,37] to
coerce a browser to connect to a malicious domain. Generally, iframes used in
targeted attacks are obfuscated, so that code interpretation becomes hard and
web site scanning services fail to detect the malicious activity.

View chapterPurchase book
Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780128006047000048

INFECTING THE TARGET

Aditya K Sood, Richard Enbody, in Targeted Cyber Attacks, 2014

3.3 MODEL B: SPEAR PHISHING ATTACK: EMBEDDED MALICIOUS LINKS

In the model discussed above, the attacker can alter the attack vector. Instead
of sending malicious attachments, the attacker embeds malicious links in the
spear phishing e-mails for distribution to the target audience. On clicking the
link, user’s browser is directed to the malicious domain running a Browser
Exploit Pack (BEP) [5]. Next, the BEP fingerprints the browser details including
different components such as plugins to detect any vulnerability, which can be
exploited to download malware. This attack is known as a drive-by download
attack in which target users are coerced to visit malicious domains through
social engineering [6]. The attacker can create custom malicious domains, thus
avoiding the exploitation of legitimate web sites to host malware. The custom
malicious domains refer to the domains registered by attackers which are not
well known and remain active for a short period of time to avoid detection. This
design is mostly used for broadly distributed infections rather than targeted
ones. However, modifications in the attack patterns used in drive-by download
make the attack targeted in nature. The context of malware infection stays the
same but the modus operandi varies.

Table 3.1 shows the different types of spear phishing e-mails with attachments
that have been used in the last few years to conduct targeted cyber attacks. The
“Targeted E-mail Theme” shows the type of content used by attackers in the body
of e-mail. The themes consist of various spheres of development including
politics, social, economic, nuclear, etc.

Table 3.1. An Overview of Structure of E-mails Used in Targeted Attacks in Last
Years

Targeted E-Mail ThemeDateSubjectFilenameCVEJob | Socio – Political
ground07/25/2012•

Application

•

Japanese manufacturing

•

A Japanese document

•

Human rights activists in China

•

New Microsoft excel table.xls (password: 8861)

•

}(24.7.1).xls

•

240727.xls

•

8D823C0A3DADE8334B6C1974E2D6604F.xls

•

Seminiar.xls

2012-0158Socio - Political ground03/12/2012–06/12/2012•

TWA’s speech in the meeting of United States Commission for human rights

•

German chancellor again comments on Lhasa protects

•

Tibetan environmental situations for the past 10 years

•

Public Talk by the Dalai Lama_Conference du Dalai Lama Ottawa, Saturday, 28th
April 2012

•

An Urgent Appeal Co-signed by Three Tibetans

•

Open Letter To President Hu

•

The Speech.doc

•

German Chancellor Again Comments on Lhasa Protects.doc

•

Tibetan environmental statistics.xls

•

Public Talk by the Dalai Lama.doc

•

Appeal to Tibetans To Cease Self-Immolation.doc

•

Letter.doc

2010-0333Socio - Political ground01/06/2011Three big risks to China’s economy in
2011Three big risks to China’s economy in 2011.doc2010-3333Socio - Political
ground01/24/2011Variety Liao taking – taking political atlas LiaoAT363777.7z |
44.doc2010-3970Economic situation03/02/2012Iran’s oil and nuclear
situationIran’s oil and nuclear situation.xls2012-0754Nuclear
operations03/17/2011Japan nuclear radiation leakage and vulnerability
analysisNuclear Radiation Exposure and Vulnerability Matrix.xls2011-0609Nuclear
weapon program04/12/2011Japan’s nuclear reactor secret: not for energy but
nuclear weaponsJapan Nuclear Weapons Program.doc2011-0611Anti-trust
policy04/08/2011Disentangling Industrial Policy and Competition Policy in
ChinaDisentangling Industrial Policy and Competition Policy in
China.doc2011-0611Organization meeting details06/20/2010Meeting
agendaAgenda.pdf2010-1297Nuclear security summit and research
posture04/01/2010Research paper on nuclear posture review 2010 and upcoming
Nuclear security summitResearch paper on nuclear posture review
2010.pdf2010-0188Military balance in Asia05/04/2010Asian-pacific security stuff
if you are interestedAssessing the Asian balance.pdf2010-0188Disaster
relief05/09/2010ASEM cooperation relief on Capacity Building of disaster
reliefConcept paper.pdf2010-0188US-Taiwan relationship02/24/2009US-Taiwan
exchange program enhancementA_Chronology_of_Milestone_events.xls
US_Taiwan_Exchange_in-depth_Rev.pdf2009-0328National defense law
mobilization03/30/2010China and foreign military
modernizationWebMemo.pdf2009-4324Water contamination in Gulf07/06/2010EPA’s
water sampling reportWater_update_part1.pdf
Water_update_part2.pdf2010-1297Rumours about currency reforms03/24/2010Rumours
in N Korea March 2010Rumours in N Korea March 2010.pdf2010-0188Chinese
currency03/23/2010Talking points on Chinese currencyEAIBB No.
512.pdf2009-4324Trade policy03/23/20102010 Trade Policy
AgendaThe_full_Text_of_Trade_Policy_Agenda.pdf2010-0188Chinese annual plenary
session03/18/2010Report on NPC 2010NPC Report.pdf2009-4324Unmanned aircraft
systems01/03/20102009 DOD UAS ATC
ProceduresDOD_UAS_Class_D_Procedures[signed].pdf2008-0655Human
rights02/26/2009FW: Wolf letter to secretary Clinton regarding China human
rights2.23.09 Sec. of State Letter.pdf2009-0658NBC interview09/08/2009Asking for
an interview from NBC journalistInterview Topics.docUnknownChines
defense01/28/2010Peer-Review: Assessing Chinese military transparencyPeer-Review
- Assessing Chinese military transparency.pdf2009-4324Asian Terrorism
report10/13/2009Terrorism in AsiaRL34149.pdfUnknownCountry threats01/07/2010Top
risks of 2010UnknownUnknownCounter terrorism05/06/2008RSIS commentary 54/2009
ending the LTTERSIS.zipUnknownAnti-piracy mission01/13/2010The China’s navy
budding overseas presenceWm_2752.pdfUnknownNational security01/20/2010Road Map
for Asian-Pacific SecurityRoad-map for Asian-Pacific Security.pdf2009-4324US
president secrets11/23/2009The three undisclosed secret of president Obama
TourObamaandAsia.pdf2009-1862

The model of waterholing attack discussed in the following section is a variant
of drive-by download attack.

View chapterPurchase book
Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780128006047000036

GRAPH THEORY

Leigh Metcalf, William Casey, in Cybersecurity and Applied Mathematics, 2016

5.1 AN INTRODUCTION TO GRAPH THEORY

A graph in mathematics consists of a set of vertices and a pairing created with
distinct vertices. This pairing creates an edge. In visualizing the graph, the
vertices are points while the edges are lines connecting two points. The graph
is generally written in the form G = (V,E) where V represents the set of
vertices and E represents the set of edges. If we let v1 and v2 represent
vertices, then an edge is written as the pair (v1,v2). Then we say that v1 and
v2 are connected by that edge.

Practically speaking, a graph is a way of modeling relationships where vertices
can be considered a collection of entities. Entities can be people, systems,
routers, DNS names, IP addresses, or malware samples. The edges are then the
relationships between two entities. Suppose we have a collection of malicious
domains and the IP addresses to which they resolve. Then a domain name has a
relationship to an IP address if the domain name resolves to that IP address. In
this case, we have an edge between the two, and thus creating a graph that
models the domain names and IP addresses of malware. Examining this graph can
tell us more about the malware network. Does it use the same IP addresses over
and over? Are they scattered, unrelated? Does one IP address serve most of the
domains or is it possible that there is one domain that uses a multitude of IP
addresses? Analyzing these graphs enables us to answer these questions and more
about malicious domains and their IP addresses. The point of using a graph is
that we do not need to know much about the malicious software or the domains. We
only need to consider the properties of a graph.

We could also draw the graph, as we do for examples in this chapter. This
becomes increasingly uninformative once our graph gets large. For a graph with
10 vertices and 20 edges, drawing it out lets us see the important vertices in
the graph and potentially the interesting formations in it. However, this
becomes increasingly uninformative and complex once graph (ie, the number of
vertices) gets large. Drawing a graph with a million edges is nearly impossible
by hand. Using math and graph theory allows us to skip the drawing process and
summarize information modeled by the graph. Also, we do not need to know what
our graph is modeling in order to find properties of the graph, we just need the
graph. This chapter will cover graphs, their properties, and modeling data with
them.

View chapterPurchase book
Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780128044520000051

REPUTATION-BASED DETECTION

Chris Sanders, Jason Smith, in Applied Network Security Monitoring, 2014

MALWARE DOMAIN LIST

Regardless of the global concerns related to targeted attacks by sophisticated
adversaries, the majority of an analyst’s day will be spent investigating
incidents related to malware infections on their systems. Because of this, it
becomes pertinent to be able to detect malware at both the host and network
level. One of the easiest ways to detect malware at the network level is to use
public reputation lists that contain IP addresses and domain names that are
known to be associated with malware-related communication.

Malware Domain List (MDL) is a non-commercial community project that maintains
lists of malicious domains and IP addresses. The project is supported by an open
community of volunteers, and relies upon those volunteers to both populate the
list, and vet it to ensure that items are added and removed from the list as
necessary.

MDL allows you to query its list on an individual basis, or download the list in
a variety of formats. This includes CSV format, an RSS feed, and a hosts.txt
formatted list. They also provide lists that include only new daily list
entries, and lists of sites that were once on the list but have now been cleaned
or taken offline. MDL is one of the largest and most used reputation lists
available.

I’ve seen many organizations that have had a great deal of success detecting
malware infections and botnet command and control (C2) by using MDL as an input
for reputation-based detection. The vastness of MDL can sometimes result in
false positives, so an alert generated from a friendly host visiting an entry
found on MDL isn’t enough by itself to automatically declare an incident. When
one of these alerts is generated, you should investigate other data sources and
a wider range of communication from the friendly host to attempt to determine if
there are other signs of an infection or compromise.

You can learn more about MDL at http://www.malwaredomainlist.com.

View chapterPurchase book
Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780124172081000088

DETECTION MECHANISMS, INDICATORS OF COMPROMISE, AND SIGNATURES

Chris Sanders, Jason Smith, in Applied Network Security Monitoring, 2014

VARIABLE INDICATORS

If the detection mechanisms used in your network were only configured to detect
attacks where known indicators were used, then you would likely eventually miss
detecting something bad. At some point, we have to account for variable
indicators, which are indicators for which values are not known. These are
usually derived by creating a sequence of events for which an attack might occur
(forming a behavioral indicator), and identifying where variables exist.
Essentially, it examines a theoretical attack, rather than one that has already
occurred. This root-cause type of analysis is something performed on specific
attack techniques, rather than instances of attacks executed by an individual
adversary.

I like to think of variable indicators as resembling a movie script, where you
know what will happen, but not who will play each particular role. Also, just
like a movie script, there is always the potential for improvisation with a
skilled actor. Variable indicators are not entirely useful for deployment to
signature-based detection mechanisms, but find a great deal of use with
solutions like Bro.

We can see an example of developing variable indicators by revisiting the
scenario we looked at in the last section. Instead of basing the attack scenario
on an attack that has actually occurred, we will base it on a theoretical
attack. Restated, the attack scenario would broadly play out as follows:

A user received an e-mail message with a malicious attachment.

The user opens the attachment, triggering the download of a file from a
malicious domain.

The file was used to overwrite a system file with the malicious version of that
file.

Code within the malicious file was executed, triggering an encrypted connection
to a malicious server.

Once the connection was established, a large amount of data was exfiltrated from
the system.

These steps represent behavioral indicators that contain multiple variable
atomic and computed indicators. We can enumerate some of these indicators here:

•

VB-1: A user received an e-mail message with a malicious attachment.

•

VA-1: E-Mail Address

•

VA-2: E-Mail Subject

•

VA-3: Malicious E-Mail Source Domain

•

VA-4: Malicious E-Mail Source IP Address

•

VA-5: Malicious Attachment File Name

•

VC-1: Malicious Attachment MD5 Hash

•

VB-2: The user opens the attachment, triggering the download of a file from a
malicious domain.

•

VA-6: Malicious Redirection Domain/IP

•

VA-7: Malicious Downloaded File Name

•

VC-2: Malicious Downloaded File MD5 Hash

•

VB-3: The file was used to overwrite a system file with the malicious version of
that file.

•

VB-4: Code within the malicious file was executed, triggering an encrypted
connection to a malicious server on a non-standard port.

•

VA-8: External C2 IP Address

•

VA-9: External C2 Port

•

VA-10: External C2 Protocol

•

VB-5: Once the connection was established, a large amount of data was
exfiltrated from the system.

In this example, the V in the indicator names describes a variable component of
the indicator. As we’ve laid it out, there are potentially ten variable atomic
indicators, two variable computed indicators, and five variable behavioral
indicators. Now, we can hypothesize methods in which these indicators can be
built into signatures to be paired with detection mechanisms. Variable
indicators will commonly be reused and combined in order to derive detection for
broad attack scenarios.

•

VB-1 (VA-3/VA-4) VB-2 (VA-6) VB-4 (VA-8) VB-5 (VA-8): Snort/Suricata rule to
detect communication with known bad reputation IP addresses and domains

•

VB-1 (VA-5/VC-1) VB-2 (VA-7/VC-2): Bro script to pull files off the wire and
compare their names and MD5 hashes with a list of known bad reputation file
names and MD5 hashes.

•

VB-1 (VA-5/VC-1) VB-2 (VA-7/VC-2): Bro script to pull files off the wire and
place them into a sandbox that performs rudimentary malware analysis.

•

VB-2 (VA-6/VA-7/VC-2): HIDS signature to detect the browser being launched from
a document

•

VB-3: HIDS signature to detect a system file being overwritten

•

VB-4 (VA-9/VA-10) VB-5: A Bro script to detect encrypted traffic occurring on a
non-standard port

•

VB-4 (VA-9/VA-10) VB-5: A Snort/Suricata rule to detect encrypted traffic
occurring on a non-standard port

•

VB-5: Custom written script that uses session data statistics to detect large
volumes of outbound traffic from workstations

SOC analysts commonly monitor information security news sources like conference
proceedings and the blogs and Twitter feeds from industry experts. This allows
the SOC to stay abreast of new and emerging attack techniques so that the
organization’s defensive posture can be modeled around these techniques. When
this happens, it becomes incredibly useful to break down the attack into
variable indicators. When platform-specific signatures are provided, those can
be reverse engineered into individual indicators so that they can be used in
conjunction with the detection mechanisms in place on your network. These are
incredibly useful exercise for NSM analysts. It helps the analyst to better
understand how attacks work, and how detection mechanisms can be used to
effectively detect the different phases of an attack.

The components of the variable indicator can be used for all varieties of
detection, and they are most useful in determining how to detect things with
unknown entities.

View chapterPurchase book
Read full chapter
URL: https://www.sciencedirect.com/science/article/pii/B9780124172081000076

DOMAIN NAME SYSTEM SECURITY AND PRIVACY: A CONTEMPORARY SURVEY

Aminollah Khormali, ... David Mohaisen, in Computer Networks, 2021

4.2.2 ASSOCIATION ANALYSIS

The strong associations of domains with known malicious domains can be utilized
to detect malicious domain names. For example, Khalil et al. [55] have designed
an association-based scheme for detection of malicious domains with high
accuracy and coverage. Furthermore, Gao et al. [52] have utilized the temporal
correlation analysis of DNS queries to identify a wide range of correlated
malicious domain groups, e.g., phishing, spam, and DGA-generated domains based
on the related known malicious anchor domains. Yadav et al. [60] have utilized
statistical measures such as Kullback–Leibler divergence, Jaccard index, and
Levenshtein for domain-flux botnet detection. Gomez et al. [139] have studied
the application of visualization for understanding the DNS-based network threat
analysis.

Discussion and Open Directions. Despite numerous advantages of machine learning
approaches, there are still risks and limitations of using them in operation.
The foremost challenge is the acquisition and labeling of relevant data from
representative vantage points to maximize insights. Even if the data is
collected correctly, capturing DNS traffic results in a very large amount of
data to analyze, which would be expensive in term of computation and storage. In
addition, the performance of machine learning algorithms is contingent upon
their structure and learning algorithms. It should be noticed that selecting
improper structure or learning algorithms might result in poor results; thus, it
is mandatory to try different algorithms for each problem. Furthermore, the
training phase of the algorithm would be a time-consuming process, even if the
dataset is small, requiring training heuristics.

Based on our exploration of the literature, we believe there is a significant
need into further automation with machine learning, not only for labeling, but
also for the appropriate representation of features fed into machine learning
algorithms through abstract structures (e.g., dependency representations, such
as graphs) and deep neural networks (e.g., serving as high-quality data
extractors).

Table 12. List of common query types and their description.

Query typeDefinitionAIPv4 addressAAAAIPv6 addressMXMail exchanger
recordNSAuthoritative name serverTXTArbitrary text stringsPTRPointer (IP
address/hostname)SRVService (service/hostname)SOAStart of
AuthorityCNAMECanonical Name (Alias/canonical)DSDelegation of
SigningDNSKEYDNSSEC public keyNSECNext SECure (No record/two points)

Table 13. Summary of the machine learning methods used in the literature. Here
AR is Accuracy Rate, while FPR represents False Positive Rate. The abbreviations
of ML algorithms are described in Table 1.

WorkApplicationML algorithmARFPRDTRFNBKNNSVMMLP[132]Malicious
network✓✓97.1%1.6%[23]Cache poisoning✓✓88.0%1.0%[55]Malicious
domain✓✓✓99.0%1.0%[18]Cache poisoning✓✓✓✓✓91.9%0.6%[77]Parked
domains✓98.7%0.5%[159]Phishing✓95.5%3.5%

Table 14. Topical classification of the DNS research methods addressed in the
literature, with sample work.

WorkPDNSADNSAnalysisScopeWorkPDNSADNSAnalysisScope[132]✓✓[116]✓✓[48]✓[117]✓[133]✓[77]✓[114]✓[142]✓[134]✓✓[145]✓[13]✓[118]✓[12]✓[24]✓[49]✓[10]✓[42]✓✓[146]✓[68]✓✓[20]✓[23]✓✓[17]✓[52]✓[22]✓✓[135]✓[50]✓[55]✓✓[80]✓[136]✓✓✓[100]✓[18]✓[120]✓[47]✓[124]✓[54]✓[164]✓[56]✓[165]✓[137]✓

View article
Read full article
URL: https://www.sciencedirect.com/science/article/pii/S1389128620313001

A RECENT REVIEW OF CONVENTIONAL VS. AUTOMATED CYBERSECURITY ANTI-PHISHING
TECHNIQUES

Issa Qabajeh, ... Francisco Chiclana, in Computer Science Review, 2018

4.1 DATABASES (BLACKLIST AND WHITELIST)

A database driven approach to fight phishing, called blacklist, was developed by
several research projects [2,50,51]). This approach is based on using a
predefined list containing domain names or URLs for websites that have been
recognised as harmful. A blacklisted website may lose up to 95% of its usual
traffic, which will hinder the website’s revenue capacity and eventually profit
[23]. This is the primary reason that web masters and web administrators give
great attention to the problem of blacklisting. According to Mohammad et al.
[11,12], there are two types of blacklists in computer security:

Domain/URL Based. These are real time URL lists that contain malicious domain
names and normally look for spam URLs within the body of emails.

Internet Protocol Based. These are real time URL or domain server blacklists
that contain IP addresses who, in real-time, change their status. Often, mailbox
providers, such as Yahoo for example, check domain server blacklists to evaluate
whether the sending server (source) is run by someone who allows other users to
send from their own source.

Users, businesses, or computer software enterprises can create blacklists.
Whenever a website is about to be browsed, the browser checks the URL in the
blacklist. If the URL exists in the blacklist, a certain action is taken to warn
the user of the possibility of a security breach. Otherwise, no action will be
taken as the website’s URL is not recognised as harmful. Currently, there are a
few hundred blacklists which are publically available, among which we can
mention the ATLAS blacklist from Arbor Networks, BLADE Malicious URL Analysis,
DGA list, CYMRU Bogon list, Scumware.org list, OpenPhish list, Google blacklist,
and Microsoft blacklist [52]. Since any user or small to large organisation can
create blacklists, the currently public available blacklists have different
levels of security effectiveness, particularly with respect to two factors:

Times the blacklist gets updated and its consistent availability.

Results quality with respect to accurate phishing detection rate.

Marketers, users, and businesses tend to use Google and Microsoft blacklists
when compared with other publically available blacklists commonly use because of
their lower false positive rates. A study by [2] analysing blacklists concluded
that they contain on average 47% to 83% phishing websites.

Blacklists often are stored on servers, but can also be available locally in a
computer machine as well [25]. Thus, the process of checking whether a URL is
part of the blacklist is executed whenever a website is about to be visited by
the user, in which case the server or local machine uses a particular search
method to verify the process and derive an action. The blacklist usually gets
updated periodically. For example, Microsoft blacklist is normally updated every
nine hours to six days, whereas Google blacklist gets updated every twenty hours
to twelve days [11,12]. Hence, the time window needed to amend the blacklist by
including new malicious URLs, or excluding a possible false positive URLs, may
allow phishers to launch and succeed in their phishing attacks. In other words,
phishers have significant time to initiate a phishing attack before their
websites get blocked This is an obvious limitation of using the blacklist
approach in tracking false websites [18]. Another study by APWG revealed that
over 75% of phishing domains have been genuinely serving legitimate websites and
when blocked imply that several trustworthy websites will be added to the
blacklist, which causes a drastic reduction in the website’s revenue and hinder
its reputation [9].

After the creation of blacklists, many automated anti-phishing tools normally
used by software companies such as McAfee, Google, Microsoft, were proposed. For
instance, The Anti-Phishing Explorer 9, McAfee Site Advisor, and Google Safe
Base are three common anti-phishing tools based on the blacklist approach.
Moreover, companies such as VeriSign developed anti-phishing internet crawlers
that gather massive numbers of websites to identify clones in order to assist in
differentiating between legitimate and phishing websites.

There have been some attempts to look into creating whitelists, i.e. legitimate
URL databases, in contrast to blacklists [53]. Unfortunately, since the majority
of newly created websites are initially identified as “suspicious”, this creates
a burden on the whitelist approach. To overcome this issue, the websites
expected to be visited by the user should exist in the whitelist. This is
sometimes problematic in practise because of the large number of possible
websites that a user might browse. The whitelist approach is simply impractical
since “knowing” in advance what users might be browsing for might be different
to those actually visited during the browsing process. Human decision is a
dynamic process and often users change their mind and start browsing new
websites that they initially never intended to.

One of the early developed whitelist was proposed by Chen and Guo [53], which
was based on users’ browsing trusted websites. The whitelist monitors the user’s
login attempts and if a repeated login was successfully executed this method
prompts the user to insert that website into the whitelist. One clear limitation
of Chen and Guo’s method is that it assumes that users are dealing with trustful
websites, which unfortunately is not always case.

Phishzoo is another whitelist technique developed by Afroz and Greenstadt [5].
This technique constructs a website profile using a fuzzy hashing approach in
which the website is represented by several criteria that differentiate one
website from another including images, HTML source code, URL, and SSL
certificate. Phishzoo works as follows:

When the user browses a new website, PhishZoo makes a specific profile for that
website.

The new website’s profile is contrasted with existing profiles in the PhishZoo
whitelist.

If a full match is found, the newly browsed website is marked trustful.

If partly matching, then the website will not be added since it is suspicious

If no match is found but the SSL certificate is matched, PhishZoo will instantly
amend the existing profile in the whitelist.

If no match is found, then a new profile will be created for the website in the
whitelist.

Recently, Lee et al. [31] investigated the personal security images whitelist
approach and its impact on internet banking users’ security. The authors
utilised 482 users to conduct a pilot study on a simulated bank website. The
results revealed that over 70% of the users during the simulated experiments had
given their login credentials despite their personal security image test not
being performed. Results also revealed that novice users do not pay high levels
of attention to the use of personal images in ebanking, which can be seen as a
possible shortcoming for this anti-phishing approach.

View article
Read full article
URL: https://www.sciencedirect.com/science/article/pii/S1574013717302010

ISSUES AND CHALLENGES IN DNS BASED BOTNET DETECTION: A SURVEY

Manmeet Singh, ... Sanmeet Kaur, in Computers & Security, 2019

4.5.1 STATE OF THE ARTS

Antonakakis and Perdisci (2012) presented a technique for DGA based botnet
detection named the Pleiades. Pleiades inspects DNS queries that result in
Non-Existent Domains. The system consists of two main components: the DGA
Discovery component, DGA Classification and C&C detection component as shown in
Fig. 18. In DGA Discovery, all NXDomains are clustered based on the statistical
similarity e.g. length, frequency, etc. The idea was to discover domain clusters
that belong to the same DGA based botnet. In DGA classification and C&C
detection, two models are used. Statistical multi-class classifier model was
used to label the cluster e.g. DGA-Conficker-A. Hidden Markov model is used for
detecting candidate C&C domains by finding single queried domain by a given host
in the cluster. One of the prime limitations of this technique was the
consideration of the domain as a single character sequence. The study also
explained the limitation in providing the exact count of infected hosts.

Fig. 18. Overview of.

Zhou et al. (2013) presented a system for DGA based botnet detection using DNS
traffic analysis. The system consists of two modules. In the pre-handle module,
a whitelist consisting of 10k Alexa domains is applied to the captured traffic
to significantly reduce benign traffic. In DGA detection module, remaining
domains are clustered based on similar live time span and similar visit pattern.
The main idea behind the detection system is that the visit pattern and live
time span of domain generated by DGA are different from normal domains which
have large live time span and dissimilar visit pattern. Unconfirmed domains need
longer time for further investigation and as such the system is not effective
for real-time detection.

Bilge et al. (2014) presented a system for spotting malicious domain named
EXPOSURE. Four set of features namely “Time based, DNS answer based, TTL
value-based and Domain name-based” are collected as part of feature attribution
phase. “Change-Point Detection Algorithm” and “Similar Daily Detection”
algorithm is used to classify domain into malicious or benign. J48 decision tree
algorithm was used in the training phase of the classifier. The detection scheme
reported a very high detection rate of 99.5% with 0.999 as Area under the curve
(AUC) and low false positive rate of 0.3%. EXPOSURE can be evaded if the TTL
value is set to a larger value or by decreasing the number of DNS queries.

Bottazzi and Italiano (2015) presented a data mining approach for detection of
algorithmically generated domains. Proxy logs from a large Italian organization
consisting of 60,000 workstations and 100,000 users were mined for one month.
The approach consists of extracting the second level domain (SLD) from the logs
and constructing a knowledge base. For each day, a list of SLD is identified
which consists of never seen domains and non-RF-1035 compliant domains. For each
day, lexical analysis is done on the SLDs to find the amount of vowel, the
number of consonants and amount of numbers. Finally, clustering is done on the
length of SLD and amount of numbers. Results show that 5 out of top 8 SLDs
cluster indicate the domains are algorithmically generated. Pronounceable
domains generated using DGA, however, could not be detected using this
technique.

Sharifnya and Abadi (2015) presented a botnet detection technique based on the
distinction between the domain names generated algorithmically or randomly and
between legitimate ones. To detect botnet, a negative reputation system is used.
The proposed model is different from other models as it associates a negative
score with each host in the network. This score is further used to classify bot
according to J48 Decision trees. An initial filtering based on the whitelist is
used to separate trusted domains. Domain Group Analysis is done to find hosts
which query the same domain. Domain labels are then analyzed if they are
algorithmically generated using the correlation coefficient. A score is finally
calculated between 0 and 1 indicating whether the host is involved in bot
activities or not. Unlike other techniques, it considers the history of
malicious domain activity for each host in the network.

Nguyen et al. (2015) developed a method for detecting botnet employing Domain
Generation Algorithm using collaborative filtering and density-based clustering.
A whitelist filtering is used for domain filtering followed by K-means
clustering with K = 2 on 2-gram frequency value used as an input. Finally,
“Density-based Spatial Clustering of Applications with Noise (DBSCAN) algorithm”
is used to find a bot that corresponds to the same botnet. One of the prime
limitations of the technique is the inability to detect peer to peer botnets.

Erquiaga et al. (2016) presented a technique based on behavioral analysis to
detect DGA malware traffic. In the technique, flow is represented as a
four-tuple: source and destination Internet Protocol address, port and the
protocol. Each flow is aggregated to form a connection. Behavior model is then
applied to each connection using well-defined steps. Detection algorithm
consists of three phases in which for each connection a Markov chain model is
applied which results in a transition matrix and initialization vector. In the
second phase of detection, the traffic to be evaluated is connected into a
series of letters. In the final stage, the resultant string is evaluated against
all detection models. An alert is generated if the probability of the behavioral
model exceeds a certain threshold. The paper has distinctively labeled DNS
traffic into five groups: normal traffic, Non-DGA traffic (used for spam), DGA
type-1, DGA type-2 and fast flux. The study concluded that more realistic
results can be obtained if the dataset consists of both normal and botnet
traffic.

Tong and Nguyen (2016) presented a technique for DGA based botnet detection
using semantic and cluster analysis. The proposed system comprises three stages:
Domain filtering, DGA filtering, and DGA Clustering. In domain filtering, top 1
million Alexa domains are used as a whitelist. Semantic features like N-gram
score, N-gram frequency, entropy, and meaningful character ratio are used to
filter out domains. In DGA filtering, correlation matrix and Mahalanobis
distance of the domains filtered in the first phase is calculated to filter
benign domains. In DGA clustering, K –means is used to cluster the domains
filtered in the second stage to different groups such that domains generated by
the same domain generation algorithm fall in the same group. The technique
considers only linguistic features and as such is bound to be less accurate. The
study concluded that improvement can be made by using IP and domain-based
features.

Kwon et al. (2016) presented a scalable system named PsyBoG for botnet detection
from a big DNS traffic. Due to increased Internet penetration, the volume of
network traffic is increasing on regular basis, thereby making real-time
analysis difficult and as such, there is a need for a scalable system for botnet
detection. PsyBoG is a scalable system designed while keeping in mind the
increasing volume of traffic. Periodic behavior pattern along with simultaneous
behavior pattern is evaluated using DNS traffic to find botnet group. Power
spectral density, a signal processing technique is used to find periodic
behavior patterns. Power distance is used to find out simultaneous behavior
patterns. The system is designed for high accuracy, robustness applicability
apart from high scalability. PsyBoG can be evaded by using randomized and slow
query pattern.

Wang et al. (2017) developed a technique for detection of DGA based botnets
named DBod. The technique works on the analysis of failed DNS Requests and
creates a cluster of infected and clean machines in a network. The technique was
evaluated in an academic environment for 26 months and had shown effective
detection results. A score function was introduced which assigns a unique score
to a cluster which classifies them as malicious or benign. DBod is unable to
detect dormant bots. Increasing the time epoch leads to the greater possibility
of detection but leads to higher computation. The study recommends 1 h as an
acceptable time for the proposed technique. Table 8 presents a listing of key
components of the DGA-based botnet detection techniques discussed in our review.
Validation results of various DGA- based botnet detection techniques are
presented in Fig. 19.

Table 8. DGA-based detection techniques.

ReferenceTechniqueWhitelist/Blacklist CriteriaTargeted BotnetDatasetPros and
ConsAntonakakis and Perdisci (2012)Alternating Decision tree learning Algorithm
and Hidden Markov Model–Zeus V3, BankPatch, Bobax, Conficker, Murofet &amp;
Sinowal15 months DNS traffic by a large North American ISP with 2 M Hosts per
dayHigh false positive and false negative.
Dependent on NXDomain trafficZhou et al. (2013)Domain set active time
distributionTop 10 K websites from AlexaConficker, Kraken, Torpig, Srizbi &amp;
BobaxPilot DNS server in China with 150 K packets per secondDelay in detection
domains not present in backlistsBilge et al. (2014)J48 &amp; Genetic
AlgorithmsZeus blacklist, Anubis, Wepawet, Phishtank &amp; Alexa top 100
GlobalConficker, Kraken, Bobax &amp; Srizbi bits2.5 months traffic consisting of
100 billion queriesHigh detection rate.
Real-time deployment.Bottazzi and Italiano (2015)Knowledgebase Construction and
Clustering on length and number of hits––One month's traffic of a Corporate
Network having 60 K hostsEasy evasion by varying length of the domain.Sharifnya
and Abadi (2015)Suspicious group Analysis Detector, Kullback-Leibler Divergence,
Spearman's rank correlation coefficientWhitelist consisting of Alexa top 100.
Blacklist from MurofetConficker.CDataset of benign and malicious DNS queriesLow
false alarm rate.
Considers the history of hosts suspicious DNS activities.Nguyen et al.
(2015)Collaborative filtering, Density-based clustering &amp; Cosine
SimilarityAlexa 1 million domain whitelistNecurs, Vawtrak &amp; PalevoTwo weeks
of DNS traffic log of about 18,000 systemsCannot detect P2P botnetErquiaga et
al. (2016)Markov chains detection algorithm–DGA MalwareDataset from Malware
Capture facility Project (Annon, 2019f)The high rate of true negative valuesTong
and Nguyen (2016)Modified Mahalanobis distance and K-means–Conficker, Tinba,
Bebloh, Tovar-Goz &amp; KrakenTop 1 million websites from Alexa (Alexa, 2017)
and DNS-BH Malware Domain BlocklistDependent on botnet family.Kwon et al.
(2016)Signal processing technique focusing on the simultaneous and periodic
behavior patternLess than 13 queries in one hour–20 DNS traffic from Malware
dumps and real-world DNS ServersScalableWang et al. (2017)Numerical Analysis of
Request time and request count distribution and Chinese Whispers
algorithmSpamhaus, BRBL, SpamCop, and AHBLKraken, Conficker, Cycbot &amp;
MurofetTraffic from 10,000 Users of Education Network of Tainan City (May 2013
to June 2015)Low false alarm rate and online detection

Fig. 19. Validation results of DGA based botnet detection techniques.

View article
Read full article
URL: https://www.sciencedirect.com/science/article/pii/S0167404819301117

RECOMMENDED PUBLICATIONS

InfoInfo icon
* Computers & Security
Journal
* Journal of Information Security and Applications
Journal
* Computers and Electrical Engineering
Journal
* Journal of Network and Computer Applications
Journal

Browse books and journals

* About ScienceDirect
* Remote access
* Shopping cart
* Advertise
* Contact and support
* Terms and conditions
* Privacy policy

We use cookies to help provide and enhance our service and tailor content and
ads. By continuing you agree to the use of cookies.

ScienceDirect ® is a registered trademark of Elsevier B.V.

www.sciencedirect.com Open in urlscan Pro 162.159.130.81 Public Scan

Form analysis 0 forms found in the DOM

Text Content

www.sciencedirect.com Open in urlscan Pro
162.159.130.81 Public Scan

Form analysis
0 forms found in the DOM