www.nextron-systems.com Open in urlscan Pro
91.250.66.252  Public Scan

Submitted URL: https://www.bsk-consulting.de/2015/02/16/write-simple-sound-yara-rules/
Effective URL: https://www.nextron-systems.com/2015/02/16/write-simple-sound-yara-rules/
Submission: On November 28 via manual from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

 * Why Nextron
 * Products
   * L
     * Scanners
       * THORAPT Scanner
         * Use Cases
         * Videos
       * THOR CloudOn-Demand Live Forensic Scans
         * Microsoft Defender ATP
       * THOR ThunderstormTHOR as a Web Service
       * THOR LiteFree IOC and YARA Scanner
         * Online Training
       * Compare our Scanners
     * Endpoint Agents
       * AURORAYour Custom Sigma-based EDR Agent
         * Videos
   * R
     * Management & Analysis
       * ASGARDManagement Center
       * ASGARDAnalysis Cockpit
     * Feeds
       * VALHALLAYARA Rule Feed
 * Solutions
   * Solutions Matrix
   * Security Validation
   * Accelerated Forensic Analysis
   * Supercharged Detection
   * Large Scale Incident Response
 * Partners
   * Become a Partner
   * Authorized Resellers
 * Company
   * About Us / Contact
   * Jobs
   * Certificates & Keys
 * Blog
 * Get Started

Select Page
 * Why Nextron
 * Products
   * L
     * Scanners
       * THORAPT Scanner
         * Use Cases
         * Videos
       * THOR CloudOn-Demand Live Forensic Scans
         * Microsoft Defender ATP
       * THOR ThunderstormTHOR as a Web Service
       * THOR LiteFree IOC and YARA Scanner
         * Online Training
       * Compare our Scanners
     * Endpoint Agents
       * AURORAYour Custom Sigma-based EDR Agent
         * Videos
   * R
     * Management & Analysis
       * ASGARDManagement Center
       * ASGARDAnalysis Cockpit
     * Feeds
       * VALHALLAYARA Rule Feed
 * Solutions
   * Solutions Matrix
   * Security Validation
   * Accelerated Forensic Analysis
   * Supercharged Detection
   * Large Scale Incident Response
 * Partners
   * Become a Partner
   * Authorized Resellers
 * Company
   * About Us / Contact
   * Jobs
   * Certificates & Keys
 * Blog
 * Get Started


HOW TO WRITE SIMPLE BUT SOUND YARA RULES

Feb 16, 2015 | LOKI, THOR, Tool, Tutorial, YARA

During the last 2 years I wrote approximately 2000 Yara rules based on samples
found during our incident response investigations. A lot of security
professionals noticed that Yara provides an easy and effective way to write
custom rules based on strings or byte sequences found in their samples and
allows them as end user to create their own detection tools.
However it makes me sad to see that there are mainly two types of rules
published by the researchers:



 1. rules that generate many false positives and
 2. rules that match only the specific sample and are not much better than a
    hash value.

I therefore decided to write an article on how to build optimal Yara rules,
which can be used to scan single samples uploaded to a sandbox and whole file
systems with a minimal chance of false positives.
These rules are based on contained strings and easy to comprehend. You do not
need to understand the reverse engineering of executables and I decided to avoid
the new Yara modules like “pe” which I still consider as “testing” features that
may lead to memory leaks or other errors when used in practice.


AUTOMATIC RULE GENERATION

First I believed that automatically generated rules can never be as good as
manually created ones. During my work for out IOC scanners THOR and LOKI I had
to create hundreds of Yara rules manually and it became clear that there is an
obvious disadvantage. What I used to do was to extract UNICODE and ASCII strings
from my samples by the following commands:

strings -el samples.exe
strings -a sample.exe


I prefer the UNICODE strings as they are often overlooked and less frequently
changed within a certain malware/tool family. Make sure that you use UNICODE
strings with the “wide” keyword and ASCII strings with the “ascii” keyword in
your rules and use “fullword” if there is a word boundary before and after the
string. The problem with this method is that you cannot decide if the string
that is returned by the commands is unique for this malware or often used in
goodware samples as well.
Look at the extracted strings in the following example:


NTLMSSP

%d.%d.%d.%d

%s\IPC$

\\%s

NT LM 0.12

%s%s%s

%s.exe %s

%s\Admin$\%s.exe

RtlUpcaseUnicodeStringToOemString

LoadLibrary( NTDLL.DLL ) Error:%d


Could you be sure that the string “NT LM 0.12” is a unique one, which is not
used by legitimate software?
To accomplish this task for me I developed “yarGen“, a Yara rule generator that
ships with a huge string database of common and benign software. I used the
Windows system folder files of Windows 2003, Windows 7 and Windows 2008 R2
server, typical software like Microsoft Office, 7zip, Firefox, Chrome, Cygwin
and various Antivirus solution program folders to generate the database. yarGen
allows you to generate your own database or add folders with more goodware to
the existing database.
yarGen extracts all ASCII and UNICODE strings from a sample and removes all
strings that do also appear in the goodware string database. Then it evaluates
and scores every string by using fuzzy regular expressions and the “Gibberish
Detector” that allows yarGen to detect and prefer real language over character
chains without meaning. The top 20 of the strings will be integrated in the
resulting rule.
Let’s look at two examples from my work. A sample of the Enfal Trojan and a SMB
Worm sample.
yarGen generates the following rule for the Enfal Trojan sample:

rule Enfal_Generic {
meta:
description = "Auto-generated rule - from 3 different files"
author = "YarGen Rule Generator"
reference = "not set"
date = "2015/02/15"
super_rule = 1
hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
strings:
$s0 = "urlmon" fullword
$s1 = "Registered trademarks and service marks are the property of their respec" wide
$s2 = "Micorsoft Corportation" fullword wide
$s3 = "IM Monnitor Service" fullword wide
$s4 = "imemonsvc.dll" fullword wide
$s5 = "iphlpsvc.tmp" fullword
$s6 = "XpsUnregisterServer" fullword
$s7 = "XpsRegisterServer" fullword
$s8 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword
$s9 = "tEHt;HuD" fullword
$s10 = "6.0.4.1624" fullword wide
$s11 = "#*8;->)" fullword
$s12 = "%/>#?#*8" fullword
$s13 = "\\%04x%04x\\" fullword
$s14 = "3,8,18" fullword
$s15 = "3,4,15" fullword
$s16 = "3,7,12" fullword
$s17 = "3,4,13" fullword
$s18 = "3,8,12" fullword
$s19 = "3,8,15" fullword
$s20 = "3,6,12" fullword
condition:
all of them
}


The resulting string set contains many useful strings but also random ASCII
characters ($s9, $s11, $s12) that do match on the given sample but are less
likely to produce the same result on other samples of the family.
yarGen generates the following rule for the SMB Worm sample:

rule sig_smb {
meta:
description = "Auto-generated rule - file smb.exe"
author = "YarGen Rule Generator"
reference = "not set"
date = "2015/02/15"
hash = "db6cae5734e433b195d8fc3252cbe58469e42bf3"
strings:
$s0 = "LoadLibrary( NTDLL.DLL ) Error:%d" fullword ascii
$s1 = "SetServiceStatus failed, error code = %d" fullword ascii
$s2 = "%s\\Admin$\\%s.exe" fullword ascii
$s3 = "%s.exe %s" fullword ascii
$s4 = "iloveyou" fullword ascii
$s5 = "Microsoft@ Windows@ Operating System" fullword wide
$s6 = "\\svchost.exe" fullword ascii
$s7 = "secret" fullword ascii
$s8 = "SVCH0ST.EXE" fullword wide
$s9 = "msvcrt.bat" fullword ascii
$s10 = "Hello123" fullword ascii
$s11 = "princess" fullword ascii
$s12 = "Password123" fullword ascii
$s13 = "Password1" fullword ascii
$s14 = "config.dat" fullword ascii
$s15 = "sunshine" fullword ascii
$s16 = "password <=14" fullword ascii
$s17 = "del /a %1" fullword ascii
$s18 = "del /a %0" fullword ascii
$s19 = "result.dat" fullword ascii
$s20 = "training" fullword ascii
condition:
all of them
}


The resulting rules are good enough to use them as they are, but they are far
from an optimal solution. However it is good that so many strings have been
found, which do not appear in the analyzed goodware samples.
If you don’t want to use or download yarGen, you could also use the online tool
Yara Rule Generator provided by Joe Security, which was inspired by/based on
yarGen.
It is not necessary to use a generator if your eye is trained and experienced.
In this case just read the next section and select the strings to match the
requirements of the (what I call) sufficiently generic Yara rules.


SUFFICIENTLY GENERIC YARA RULES

As I said in the introduction rules that generate false positives are pretty
annoying. However the real tragedy is that most of the rules are far too
specific to match on more than one sample and are therefore almost as useful as
a file hash.
What I tend to do with the rules is to check all the strings and put them into
at least 2 different categories:

 * Very specific strings = hard indicators for a malicious sample
 * Rare strings = likely that they do not appear in goodware samples, but
   possible
 * Strings that look common = (Optional) e.g. yarGen output strings that do not
   seem to be specific but didn’t appear in the goodware string database

Check out the modified rules in order to understand this splitting. Ignore the
definition named $mz, I’ll explain it later and look at the string definitions
below.
The definitions starting with $s contain the very specific strings, which I
regard as so special that they would not appear in legitimate software. Note the
typos in both strings: “Micorsoft Corportation” instead of “Microsoft
Corporation” and “Monnitor” instead of “Monitor”.
The strings starting with $x seem to be special (I tend to google the strings)
but I cannot say if they also appear in legitimate software. The definitions
starting with $z seem to be ordinary but have not been part of the goodware
string database so they have to be special in some way.

rule Enfal_Malware_Backdoor {
meta:
description = "Generic Rule to detect the Enfal Malware"
author = "Florian Roth"
date = "2015/02/10"
super_rule = 1
hash0 = "6d484daba3927fc0744b1bbd7981a56ebef95790"
hash1 = "d4071272cc1bf944e3867db299b3f5dce126f82b"
hash2 = "6c7c8b804cc76e2c208c6e3b6453cb134d01fa41"
strings:
$mz = { 4d 5a }
$s1 = "Micorsoft Corportation" fullword wide
$s2 = "IM Monnitor Service" fullword wide
$x1 = "imemonsvc.dll" fullword wide
$x2 = "iphlpsvc.tmp" fullword
$x3 = "{53A4988C-F91F-4054-9076-220AC5EC03F3}" fullword
$z1 = "urlmon" fullword
$z2 = "Registered trademarks and service marks are the property of their" wide
$z3 = "XpsUnregisterServer" fullword
$z4 = "XpsRegisterServer" fullword
condition:
( $mz at 0 ) and
(
( 1 of ($s*) ) or
( 2 of ($x*) and all of ($z*) )
)
and filesize < 40000
}


Now check the condition statement and notice that I combine the rules with a
magic header of an executable defined by $mz and a file size to exclude typical
false positives like Antivirus signature files, browser cache or dictionary
files. Set an ample file size value to avoid false negatives. (e.g. samples
between 100K and 200K => set file size < 300K)
You can see that I decided that a single occurrence of one of the very specific
strings would trigger that rule. ( 1 of $s* )
Than I combine a bunch of less unique strings with most or all of the ordinary
looking strings. ( 2 of $x* and all of $z* )
Let’s look at second example. (see below)
$s1 is a very special string with string formatting placeholders “%s” in
combination with an Admin$ share. $s2 seems to be the typical “svchost.exe” but
contains the number “0” instead of an “O”, which is very uncommon and a clear
indicator for something malicious.
All the definitions starting with $a are special but I cannot say for sure if
they won’t appear in legitimate software. The strings defined by $x seem
ordinary but were produced by yarGen, which means that they did not appear in
the goodware string database.
This special example contains a list of typical passwords which is defined by
$z1..z8.

rule SMB_Worm_Tool_Generic {
meta:
description = "Generic SMB Worm/Malware Signature"
author = "Florian Roth"
reference = "http://goo.gl/N3zx1m"
date = "2015/02/08"
hash = "db6cae5734e433b195d8fc3252cbe58469e42bf3"
strings:
$mz = { 4d 5a }
$s1 = "%s\\Admin$\\%s.exe" fullword ascii
$s2 = "SVCH0ST.EXE" fullword wide
$a1 = "LoadLibrary( NTDLL.DLL ) Error:%d" fullword ascii
$a2 = "\\svchost.exe" fullword ascii
$a3 = "msvcrt.bat" fullword ascii
$a4 = "Microsoft@ Windows@ Operating System" fullword wide
$x1 = "%s.exe %s" fullword ascii
$x2 = "password <=14" fullword ascii
$x3 = "del /a %1" fullword ascii
$x4 = "del /a %0" fullword ascii
$x5 = "SetServiceStatus failed, error code = %d" fullword ascii
$z1 = "secret" fullword ascii
$z2 = "Hello123" fullword ascii
$z3 = "princess" fullword ascii
$z4 = "Password123" fullword ascii
$z5 = "Password1" fullword ascii
$z6 = "sunshine" fullword ascii
$z7 = "training" fullword ascii
$z8 = "iloveyou" fullword ascii
condition:
$mz at 0 and
( 1 of ($s*) and 1 of ($x*) ) or
( all of ($a*) and 2 of ($x*) ) or
( 5 of ($z*) and 2 of ($x*) ) and
filesize < 200000
}


You see that I combined the string definitions in a similar way as before. This
method in combination with the magic header and the file size should be a good
starting point for the final stage – testing.


TESTING

Testing the rules is very important. It seems that most authors decide that the
rules are good enough if they match on the given samples.
You should definitely do the following checks:

 1. Scan the malware samples
 2. Scan a big goodware archive

To carry out the tests download the Yara scanner and run it from the command
line. The goodware directory should include system files from various Windows
versions, typical software and possible false positive sources (e.g. typical CMS
software if you wrote Yara rules that match on malicious web shells)



Yara Rule Testing on Samples and Goodware



If the rule matched on the malicious samples and did not generate a match on the
goodware archive your rule is good enough to test the rule in practice.


UPDATE

Make sure to check Part 2 of “How to Write Simple and Sound YARA Rules”.



NEWSLETTER

New blog posts
(~1 email/month)

Subscribe
 * Subscribe to RSS Feed
 * Follow on Twitter
 * Follow on LinkedIn

RECENT BLOG POSTS

 * ASGARD 2.14 Release November 3, 2022
 * Mjolnir Security: Blue Team Incident Response Training August 29, 2022
 * Antivirus Event Analysis Cheat Sheet v1.10.0 August 13, 2022
 * THOR TechPreview 10.7.3 Features August 3, 2022
 * New Analysis Cockpit 3.5 July 29, 2022
 * Follina CVE-2022-30190 Detection with THOR and Aurora June 13, 2022

BLOG TOPICS

 * Alert (12)
 * APT (6)
 * ASGARD Analysis Cockpit (7)
 * ASGARD Management Center (17)
 * Aurora (2)
 * Bug Report (1)
 * Check Point (1)
 * Command Line (9)
 * LOKI (4)
 * Newsletter (67)
 * Nextron (26)
 * Partner (3)
 * Press (1)
 * Security Fix (5)
 * Security Monitoring (19)
 * Service Notice (1)
 * Sigma (3)
 * SPARK (14)
 * SPARK Core (5)
 * Splunk (2)
 * THOR (51)
 * THOR Cloud (2)
 * THOR Lite (13)
 * Thunderstorm (3)
 * Tool (18)
 * Tutorial (22)
 * VALHALLA (5)
 * Video (3)
 * YARA (16)

RESOURCES

 * Manuals
 * Whitepapers
 * Customer Portal
 * GitHub
 * YouTube

NEWS

 * Blog
 * Newsletter
 * RSS Feed
 * Twitter
 * LinkedIn

IMPRINT & PRIVACY

 * Imprint
 * Privacy Policy
 * Change privacy consent
 * Privacy consents history
 * Revoke privacy consents

About Us / Contact

Nextron Systems GmbH © 2022
All Rights Reserved

Nextron Systems GmbH © 2022. All Rights Reserved.

Privacy preferences

We use cookies and similar technologies on our website and process personal data
about you, such as your IP address. We also share this data with third parties.
The data processing can take place with your consent or on the basis of our
legitimate interest. You can change and revoke your consent within our privacy
policy at any time with effect for the future. To do so, simply click on "Change
privacy settings" or "Revoke consents" in our privacy policy.
● Essential● Services● Statistic
Accept all
Continue without consent
Individual privacy preferences


Privacy policy • Imprint
WordPress Cookie Plugin by Real Cookie Banner