glowbox.cc Open in urlscan Pro
203.161.61.95  Public Scan

URL: https://glowbox.cc/output.php?title=My+SOC+Methodology
Submission: On October 04 via manual from GB — Scanned from GB

Form analysis 0 forms found in the DOM

Text Content

DSHEL@GLOWBOX.CC:~$█

Navigation: Home | Articles | About
⬇



MY SOC METHODOLOGY

This is my process for triaging and investigating alerts in a SOC. It's a bit
high level and assumes you already know some cybersecurity basics and the basics
of how your SIEM works, how to read logs, etc. It's derived from my own
experience using SIEMs and is additionally integrated with theoretical
frameworks that are commonly taught in academic settings. It aims to efficiently
create high value deliverables for the recipient of the SOC's output, regardless
of whether that's a 3rd party client, internal security or even a hobbyist
self-hoster who just wants to monitor their very own SIEM.

This work also aims to be platform agnostic, and so is not tied to any specific
SIEM product, company, methodology, etc. It is purely my own process for
examining alerts in a SIEM/SOC context generically.


MY THOUGHT PROCESS SUMMARIZED:

Basically, when I see an alert, I step through the following basics:

 1. Do I know what the alert means? What in the rule logic caused it to trigger?
    Is this a false positive?
 2. Does the alert provide indicators that detect something from the cyber
    killchain?
 3. Considering the context, is there a defense in depth based remediation to be
    made?
 4. If no, can the rule be tuned to not trigger on useless conditions like this?
 5. Once I do confirm that the alert needs escalation, what do I need to
    escalate?


DO I KNOW WHAT THE RULE MEANS?

This is a key aspect of using any SIEM. How to discover this varies by platform,
and generally comes with experience and exposure. Once you've internalized what
the rule is detecting for the majority of rules in your environment, though,
triage and investigation can become much faster and effective.

Tend to ask and answer the following:

 * What is the log source of the rule?
   * SIEMs work by ingesting logs from endpoints on a network.
   * Could the device the log is from actually have useful artifacts?
 * What do the logs and fields returned by the rule mean?
   * Simple as: do you know what the data you're looking at actually means?
   * This may require looking at vendor documentation if you're unsure.
 * Is the data sufficiently parsed?
   * SIEMs can only make good detections from well-parsed data.
   * If the data isn't parsed well, what do I need to consult to get the data I
     need to spot an indicator?
 * What is this rule supposed to detect?
   * Does that activity fall on the cyber killchain or into MITRE?
   * Does the rule's pattern actually retrieve indicators of that activity?
 * What does the rule actually detect?
   * Does the detection as written target strong indicators?
   * Does it go for weak indicators that have a high FP chance?
   * If the rule is broken and doesn't actually detect what's on the tin, does
     it still detect any useful indicators for potentially malicious activity?
   * What does the logic powering the rule actually do? Is it broken?

If the rule isn't capable of detecting what it's actually named for, it may be
worth tuning immediately. That said, it doesn't explicitly excuse not escalating
the activity -- while it may be a broken rule that doesn't detect what it's
supposed to, it may still be able to detect some suspicious activity by chance.

If the logic, log sources and other issues make sense, we need to determine if
the detected activity falls onto the cyber killchain. If it does, then we want
to start our hunt to possible indicators that prove that this is actual
malicious activity. That's what most of the rest of the article is about!


DOES IT FIT IN THE KILLCHAIN AND DO I HAVE IOCS?

This is where the real "security" work begins. Once we can see that the rule is
detecting relevant information, we can dive and and make a determination as to
whether there's something worth investigating here.

CYBER KILLCHAIN

The cyber killchain is a basic framework for understanding the "steps" of an
attack. There are many out there, some more comprehensive than others, but I
personally like Lockheed's cyber killchain best. This is mainly because it is
short and succinct, and so can be easily memorized. More extensive killchains
may more accurately reflect more real-world cases, but because they're more
extensive, they cannot be memorized easily, and thus cannot be used to make fast
decisions as when performing first line triage or when trying to rapidly sort
massive amounts of information during and investigation.

The links of the killchain are as follows:

 * Reconnaisance
   * This will be where a threat actor attempts to explore vulnerabilities in a
     system. Examples:
     * Nmap/port scans
     * Web application fuzzing
     * Service enumeration techniques
     * Username enumeration
 * Weaponization
   * This is the process of turning a vulnerability into an actually deliverable
     payload. You won't see the process of weaponization in se, but may see
     weaponized code as a host or network indicator. Examples:
     * Shellcode
     * malicious scripts
     * .lnk droppers
     * Malicious Office macros
     * Wordlist building for password attacks
 * Delivery
   * This is how the attacker actually gets his malicious object into the
     environment.
     * Phishing emails
     * Malicious websites
     * Crafted packets coming directly to an exposed service
     * Credentialed access (after a password attack)
 * Exploitation/Execution
   * This is when a vulnerability is leveraged to allow for malicious software
     to execute.
     * Application crashes (could be an indicator of a buffer overflow)
     * LOLBin execution under unusual circumstances
     * Script engines running unusual code
     * Defense evasion TTPs (process injection, obsfucation, AMSI bypass, etc)
 * Installation
   * This is when malware installs itself after initial execution
     * Dropper retrieving stages from C2 infrastructure
     * Registry persistence
     * Modification of DLLs, services, or system files
 * Command and Control
   * This is communication between external assets held by the threat actor and
     the malware within the environment
     * Traffic to unusual hosts
     * Encrypted traffic
     * Traffic on unusual ports
     * Signs of beaconing activity
 * Impact on Objectives
   * This is the threat actor doing what he intends to do within the environment
     * Signs of the killchain restarting internally
     * Pivoting
     * Lateral movement
     * Data Exfiltration
     * Mass resource access (could be an indication of ransomware, datatheft,
       etc)
     * Stealers (e.g. Mimikatz, Lazagne), cryptominers, crypters, in the
       environment

Note that because the Lockheed killchain is short, in order to fully describe an
attack, we may need to think of linking several killchains together. I.e. for a
first stage dropper, "impact on objectives" may simply be to establish a
beachhead presence and then start doing enumeration for privilege escalation.
Once that's acheived, the impact may shift to attempting to enumerate for
opportunities to pivot or do lateral movement through the environment. It might
not be until you've reached the end of this third killchain that you finally
start seeing the threat actors end-game activities like cryptomining,
infostealing, ransomware, etc.

Instead of going over silly examples myself to try to explain what this looks
like, I would instead direct you to The DFIR Report. This site presents a
collection of case studies on real world attacks that lay out the timeline of
events from initial access to final impact on objectives.

In any case, if the activity in an alert could plausibly represent one of the
specific stages on your killchain, then we need to, at minimum, investigate
further in order to hunt for indicators of compromise.

If it does not indicate any sort of activity on a killchain, the rule may
warrant tuning or even removal depending on circumstances. Basically, if it's
not detecting either an availability problem or a killchain item, it sorta begs
the question as to what the rule's point is.

PROBING FOR IOCS/PYRAMID OF PAIN

What we want to do once we confirm that the rule is likely detecting activity
related to a link in the killchain is to manually confirm that we have
indicators of compromise. These are artifacts we can use to verify that actually
is the activity being described by the rule + whether the activity appears to be
legitimately malicious.

A good starting point for identifying these would be the "Pyramid of pain"
model, which gives us a list of indicators that can be ranked by their
difficulty to obfuscate. These are (from strongest to weakest):

 * TTPs
   * The actual behaviours used to accomplish a task (but can be vague to
     establish from SIEM logs).
   * You'll usually only see this indirectly, e.g. through examining several
     host/network artifacts that indicate that a certain TTP was attempted.
 * Tools
   * The software used to perform TTPs
   * Again, often detected indirectly through host and network artifacts, or
     hashes.
 * Network and Host artifacts
   * This could be anything from protocols, file paths, registry keys, command
     lines, etc. If you look up a report on a type of malware, you'll likely
     find lists of this sort of thing.
   * These are often your bread and butter in the SIEM, as they're the sort of
     thing that will show up in ingested logs.
   * Unfortunately, you may lack the visibility to see them, depending on how
     your monitored environment is setup.
   * You'll often find documentation for these in advisories, whitepapers, etc.
   * May also be worth checking github for publically known tools and malware to
     see if you can devise what artifacts would look like from the code.
 * Domain names
   * Domains associated with delivery and C2 activity.
   * Found on threat intel platforms like AbuseIPDB, Alien Vault, etc.
 * IPs
   * IP addresses used for delivery and C2.
   * Found on threat intel platforms like AbuseIPDB, Alien Vault, etc.
   * Fairly weak indicator
     * Try to confirm recent malicious activity reports.
     * Also, ensure multiple sources confirm malicious activity.
 * File hashes
   * The specific file hash of a given file artifact.
   * Virus Total is a good reference here, but not perfect.
   * Very weak indicator; hashes can be changed trivially.

Usually, I'll check threat intelligence source like AbuseIPDB, Alien Vault and
VirusTotal for information about any number of these activities, especially IPs,
hashes, and domains. The sputnik browser extension is useful for this, but also
plain googling too. For host and network indicators, I may simply just google
blindly to see if there are any matches for whitepapers that list the indicator
that I've found as being related to known malware.

Bear strongly in mind that if you're using crowdsourced threat intelligence, try
to have "2 witnesses" for indicators. Publically crowd source threat intel can
be as useless as Youtube comments at worst, so you want to ensure that malicious
activity is attested by multiple sources, preferably on multiple platforms
before you determine that an indicator is truly malicious. I.e. if I see only
one guy on one threat intel platform saying that an IP is bad, I'm probably
going to ignore it, because there's very little attestation to the malicious
behaviour.

On the other hand, tailored indicators presented by a professional source in a
recent whitepaper would be a pretty "strong" indicator for something like host
and network artefacts.

Overall, I want to confirm that the rule picked up on a strong malicious
indicator. If it did, I may have enough to start considering remediations and
how I want to escalate things. If not, or if the indicators are weak, I may need
to dig deeper by pivoting to other logs with the data I have.

FINDING "PIVOTABLES"

In a case where I cannot immediately find a strong IOC with the initial logs
associated with an alert, I may need to "pivot" my investigation into other logs
to see if anything can be discovered through alternative means. For example, if
malicious network traffic is detected and I have IPs, but these end up being
only weak indicators, I might pivot off of the victim IP to see if I can
discover the host name or an associated user account. Once this is found, I may
try something like pulling host logs to see if there are any host indicators
that can be used to corroborate the idea that I've detected malicious activity.
Or, if I see a strange process, I may pivot on the process GUID to see if I can
find an indicator somewhere else in the family of parent and child processes of
a given piece of activity.

As a sort of short list of things I try to pivot on in different cases:

 * GUIDs
   * These tie together objects under a unique identifyer that can often relate
     different things such as processes, threads, etc.
 * Parent Process
   * Similar to above. If a process calls child processes, you'll find them all
     lumped under one parent.
 * Hostname
   * If all you have is an IP from a network log, you may be able to pivot into
     host logs like this
 * Username
   * You may be able to pivot from network to host with this.

The list could really be endless. Just creatively think about what sorts of
information could be directly linked to information that exists in other types
of logs.

If I cannot find any pivotables, I'm in a blindspot, and have to make an
executive decision:

 1. If my initial indicators from the alert were "strong", but I cannot find
    confirmation that the attack succeeded or failed, it may be a good idea to
    keep considering escalating and request confirmation of the activity.
 2. If my initial indicators were "weak", and there is no further evidence of
    plausibly related killchain activity, it is likely safe to suppress.
    * For establishing "plausible relation" consider:
      * Timeline
      * Originating hosts
      * Originating users

Don't be afraid to suppress if all you have is a weak indicator. You can end up
in a very paranoid place trying to correlate events that are 100% unrelated.
This can be a tremendous waste of time, and the longer you sink time into the
investigation, the more likely you are to experience sunk-cost bias and try to
force your logs to point to a true positive that just isn't there. Escalating
this kind of activity can be a major loss of time and money for the recipient
and can contribute to a feeling that you have a tendency to "cry wolf", which
ultimately leads to less trust in the SOC and thus a riskier security posture.

DISCONFIRMING INDICATORS

If at anytime during my investigation I find one of these, I'll consider it a
disconfirmation of malicious activity and pretty much drop my
investigation/consider the activity benign:

 * WTFBins or management software that is confirmed to be expected in the
   environment.
 * Vulnerability scans/malicious traffic coming from confirmed vulnerability
   scanners
 * Malicious activity from confirmed and documented penetration testing sources

Obviously, this stuff could be dangerous if it is not confirmed by the client,
so I'll need to check prior documentation to ensure that stuff from the above is
indeed expected. If it is not confirmed by the client, it needs escalation;
tools like this can very much be used by threat actors to do bad stuff, and
using off the shelf tools could be a way of trying to blend in with expected
activity.

Note that I don't necessarily consider "invulnerability" to be a total
disconfirmation. While it can be something to note to the client to reassure
them that they're not compromised, there still may be actionables related to the
organization's overall security posture that wind up being exposed by the event.
E.g. when a threat actor probes your IIS server for an Apache vuln and it hits
the server, sure, it's not likely to have been a successful compromise...but
your IDS is still misconfigured and let signatured malicious traffic cross your
DMZ without blasting that nonsense out of the sky, and we need to talk about
that.

Same with external scans from non-malicious sources (e.g. Shodan, Censys, etc).
While there is likely no hostile intent from such activity, it can still expose
defense in depth issues that need remediation.


IS THERE A DEFENSE IN DEPTH BASED REMEDIATION?

Defense in depth is a 7 layer model for how overlapping security controls should
be applied to an environment for maximum efficacy.

The model looks like this:

 * Policy
   * This is stuff like password length rules, acceptable use policies,
     seperation of duties, etc.
   * You may make suggestions re: password policy from time to time.
 * Physical
   * This is related to the physical locks and hardware that secures physical,
     on-prem assets.
   * In MSSP land or in a situation where you're monitoring off-prem assets, you
     rarely have much to say here, and physcial detections in a SIEM generally
     tend to be limited to things like USB inserts.
   * This is mostly handled by facilities.
 * External Network
   * This is going to be stuff related to border firewalls that divide the
     internal network from the public internet.
     * Are they configured appropriately to drop incoming malicious traffic?
     * Do they drop outbound C2 traffic?
     * Are your external management ports properly hidden behind VPNs and locked
       down to trusted hosts? Etc.
   * This is a very bread and butter place that you'll find major
     misconfigurations.
 * Internal Network
   * This is more or less related to firewall policies between internal hosts.
     * Are they properly configured to be able to deny pivots and lateral
       movement through the network?
     * Are VLANs and subnets properly segregated from one another?
   * This is a hard spot to offer good advice in because lots of stuff inside of
     a network can trigger false positives. E.g. RMM tools that are used
     internally can strongly resemble RATs being used for pivoting.
 * Host
   * This is stuff on the host itself.
   * HIDS, host firewalls, EDR, etc.
     * Is it up and running?
     * Is it properly configured?
     * Are the server applications themselves properly hardened?
 * Application
   * From a SOC perspective, this is mostly going to involve versioning.
     * Are applications on the server known-vulnerable versions?
   * This gets more detailed when dealing with appsec itself as a discipline,
     though, and at that level will involve things like code auditing for best
     practices and detecting and correcting vulnerable coding practices.
 * Data
   * This is going to involve whether or not sensitive information (either at
     rest or in transit) is properly secured.
   * Usually, you'll look for failures to use encryption when it's appropriate
     to do so here.

If an event exposes that there's something that needs to be tightend up in order
to have perfect defense in depth, then I should have clear remediation advice to
give to the client upon escalation.

If I cannot think of anything wrong with defense in depth for the
organization...then it's debatable whether there's an issue. If defense in depth
is so perfective in an environment that no improvements can be suggested, then
it is simply unlikely that the attack was not successful, and similar future
attacks are likely sufficiently guarded against. There is nothing to tell the
client, and so the issue can be suppressed. I.e. a million exploits bouncing off
the border firewall may represent real indicators of attempted compromise...but
the fact that they're being denied means nothing happened and there's nothing to
tell the network team to change.

Obviously, don't get arrogant, though. Just because you can't think of a
remediation doesn't mean something isn't wrong. Be sure to consult others if
that's a possibility and you feel dubious about something.


CAN THE RULE BE TUNED?

In order to escalate, I want to see:

 * Definitive killchain related activity
 * Strong indicators
 * A defense in depth remediation

If I don't have these, I need to consider whether the rule can be tuned.

Some thoughts I usually have for tuning rules:

 * Does the rule have any logical errors?
   * Poorly formatted booleans, incorrect field parameters, etc.
   * This stuff can totally break a rule and make it fire on garbage.
 * Is the activity the rule is intended to detect actually killchain activity?
   * I.e. is the rule designed to detect actually dangerous activity?
   * If not, why do we have the rule?
 * Do the indicators detected by the rule's logic actually detect the killcahin
   activity it's designed to detect?
   * If not, do we have the parsed data to actually make that detection?
 * How high up the pyramid of pain am I making detections?
   * Rules based on lower tier indicators like IP addresses, hashes, or even
     specific file names/paths (a common host indicator) are probably going to
     be garbage compared to rules that target TTPs.
   * Conversely, TTPs are hard to write rules for.
   * Consult MITRE ATT&CK for detection advice for different TTPs.
 * Do I have parsed data enough to actually make the detection I need?
   * If not, could I get it with a better integration or host agent? Consult the
     client if this is the case.
   * If not, the "bad" rule may simply need to stay in order to be a starting
     point for a human analyst to make an investigation from.
   * Whether to remove or keep will depend on whether the rule triggers on false
     positives so often as to make it a fatigue-source for the analysts, and
     whether the true positive case is actually critical.
 * Does the rule consider whether the activity was remediated or not?
   * E.g. does the rule consider whether the firewall dropped the traffic or
     not?
   * Does the rule integrate with other software to detect whether execution was
     halted, e.g. by EDR or other filters?
   * If not, do we have the parsed data to make this determination?
 * Can an organizational level exclusion be built for the rule?
   * I.e. to not trigger for a given IP, host or user due to it being a known
     benign source of activity? This will need consultation with the client.
   * With permission, we can generally exclude lower pyramid of pain items
     without compromising the rule's ability to make good detections.

In a best case scneario, we should be able to tune a rule to the point that it
almost never produces a false positive, or at least always produces output that
warrants a deep, host level investigation. At this point, the rule can be added
to whatever automation your SIEM is capable of for automatically escalating to
the client. This will usually only be possible if we have very good parsed data
available that nearly always gives us a definitive TTP.

The meh-case scenario would be to simply tune the rule to be less "noisy" for
the human analysts. This will usually be the case if we have middle of the
pyramid data that demands investigation and human triage to make a
determination. Being middle of the pyramid, an equal volume of false and true
positives can occur, which is why the human analyst is needed.

The worst case scneario is when an alert has bottom of the pyramid indicators,
but represents a critical issue that would absolutely need to be reported if
detected. These sorts of alerts tend to cause alert fatigue and have a tendency
to slip through the cracks due to having very high false positive rates. If
parsed data is really this low quality, even human investigations may not be
capable of positively identifying true positive activity. This is the abstract
sort of hell that causes breaches!


ACTUALLY ESCALATING ACTIVITY

If I do have the three factors I want to see in order to escalate, then I will.
While on a professional SOC, you're probably going to use a template to build
out your escalation. A good template will allow you to express 3 things in my
opinion:

 * What are my indicators?
   * What is the affected host?
   * The source and destination IPs?
   * Host indicators like process IDs, file paths, etc?
   * What does threat intelligence or other researcher output say?
 * What do the indicators imply?
   * Why is this a dangerous situation?
 * What needs to be done?
   * I.e. how do I improve defense in depth?
   * If there's evidence of an actual compromise, what's good incident response
     advice?
   * If I lack the data necessary to definitively identify a compromise, what
     does the client need to do to facilitate the investigation?

If your template does not allow you to express these three things, then your
escalations likely are not providing much value to whoever has to actually read
them.

Conversely, if your template contains much more than these three fields, it
likely contains superfluous information that makes generating the escalation by
the analyst inefficient, and makes reading the escalation confusing and less
effective for the recipient.

Remember: your escalations are not just things you make to show your boss that
you're not asleep! They are the deliverable that the client or your internal
teams relies on for advice for how to fix security their issues.


MOST RECENT:

HTB Crafty - 1 August 2024


--------------------------------------------------------------------------------

Intro to Writing Suricata Rules - 9 March 2024


--------------------------------------------------------------------------------

HTB Broker - 29 February 2024


--------------------------------------------------------------------------------

Active Directory notes - 5 February 2024


--------------------------------------------------------------------------------

Infil/Exfil Notes - 5 February 2024


--------------------------------------------------------------------------------

WebDAV and Dirty Office Macros - 7 January 2024


--------------------------------------------------------------------------------

Crackmapexec notes - 7 January 2024


--------------------------------------------------------------------------------

Linux Privesc Notes - 7 January 2024


--------------------------------------------------------------------------------

Windows Privesc Notes - 7 January 2024


--------------------------------------------------------------------------------

Tunnelling Notes - 7 January 2024


--------------------------------------------------------------------------------


TAGS:

Cybersecurity


--------------------------------------------------------------------------------

CTF


--------------------------------------------------------------------------------

Networking


--------------------------------------------------------------------------------

Cheatsheets


--------------------------------------------------------------------------------

Windows


--------------------------------------------------------------------------------

Privesc


--------------------------------------------------------------------------------

Linux


--------------------------------------------------------------------------------

cybersecurity


--------------------------------------------------------------------------------

Payloads


--------------------------------------------------------------------------------

Cheetsheets


--------------------------------------------------------------------------------


⬆