payatu.com Open in urlscan Pro
188.114.97.3  Public Scan

Submitted URL: https://payatu.com/blog/Nikhil-Joshi/machine-learning-effective-fuzzing
Effective URL: https://payatu.com/machine-learning-effective-fuzzing
Submission: On July 25 via api from US — Scanned from NL

Form analysis 1 forms found in the DOM

POST https://payatu.com/subscription/newsletter

<form id="newsletter-subs" class="form" method="POST" action="https://payatu.com/subscription/newsletter"
  style="width: 100% !important;margin-left: 0px !important;padding-left: 0px !important;margin-right: 0px !important;padding-right: 0px !important;text-align: center !important;">
  <input id="newsletter-subs-email" pattern="[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{1,63}$" required="" type="email" class="form__field col-10" placeholder="Your E-Mail Address"
    style="    box-shadow: 10px 10px 5px #850404 !important; width: 21em !important; height: 50px; background: #EAEAEA; border-radius: 5px 0px 0px 5px; font-family: Nunito; font-style: normal; font-weight: normal; font-size: 16px; line-height: 27px; color: #6C757D;">
  <button type="submit" class="btn btn--primary btn--inside uppercase col-4 hvr-grow"
    style=" padding: 3px !important;width: 198px;  background: #221638; border-radius: 0px 5px 5px 0px; height: 55px; font-family: Roboto; font-style: normal; font-weight: normal; font-size: 16px; color: #FFFFFF;">Subscribe</button>
</form>

Text Content

Services
IoT Security Testing Red Team Assessment Product Security AI/ML Security Audit
Web Security Testing
Mobile Security Testing DevSecOps Consulting Code Review Cloud Security Critical
Infrastructure
Products
EXPLIoT
EXPLIoT is framework for IoT security testing
and exploitation.
CloudFuzz
CloudFuzz is platform that lets you code for bugs
by running your software with millions of test cases.
Who we are
About Us Payatu Bandits
Resources
Blogs MasterClass Series Case Studies E-Books New Advisory Media Checklist
Tools
securecode.wiki New
Contact Us
Pune Location Europe Location Australia Location
Top Openings
Security consultant IT sales Pre-Sales Executive Software Developer Embedded
Developer
ALL OPENINGS
Get all of it
Be a Bandit

Services Products Who we are Resources Contact Us We are Hiring

 * Home
 * News
 * Advisory
 * Hardware-Lab
 * Contact-Us
 * Career


Back

Services
Products
Who we are
Resources
News
Advisory
Hardware Lab
Career
Contact Us
Services
IoT Security Testing Red Team Assessment Product Security AI/ML Security Audit
Web Security Testing Mobile Security Testing DevSecOps Consulting Code Review
Cloud Security Critical Infrastructure
Products
EXPLIoT CloudFuzz
Resources
Blogs MasterClass Series Case Studies E-Books New Advisory Media Checklist
Tools
securecode.wiki New
Who we are
About Us Payatu Bandits
Contact Us
Pune Location Europe Location Australia Location
⌂ Home  ›  ☷ All Blogs  ›  ✍ Nikhil-Joshi  › 


MACHINE LEARNING FOR EFFECTIVE FUZZING – CLOUDFUZZ

    Nikhil-Joshi
    05/02/2018


MACHINE LEARNING FOR EFFECTIVE FUZZING – CLOUDFUZZ

05/02/2018 0 Comments in Blog by Nikhil Joshi

In this blog we will see Machine learning techniques that can be used to perform
effective fuzzing on a software system. This system will be integrated with
CloudFuzz. CloudFuzz is an integrated software framework for security based
fuzzing. The end goal is to provide a workflow that will allow continuous
fuzzing and generate reports of the software security vulnerabilities by
analysing crashes on a given piece of software. In CloudFuzz we provide crafted
data to a software system and analyse the system for crashes. Ultimate aim of
fuzzing is to discover bugs and security vulnerabilities in the target software.
Probability of discovering a bug increases with the magnitude of code covered by
the input provided to target software. Generating inputs with high code coverage
is a tricky task. Here is one of the attempts to solve this problem using
machine learning.

 


PROBLEM:

Fuzzing a software with random data may or may not discover new bugs. Also, such
random attempts do not guarantee of covering the complete code.

Hence there should be a system which learns the type and format of input files
and generate similar files to attain higher code coverage.

Since there could be countless number of file formats, our system should be
highly generic and should work for every type of file format. It should not be
bounded by a certain type of input. Eg: If the system is working for .doc files
then it should also work for JPEGs or PDFs, etc.

 


APPROACH TO SOLVE THE PROBLEM:

Following diagram shows how the system works.



Preprocessor:  Preprocessor churns drcov logs (contains code coverage
information) and sample input files and generates a dataset in csv format.
Dataset contains predefined features extracted from sample files and drcov logs.
It is necessary to select the features which contribute to code coverage.

Learner: With the help of .csv generated by preprocessor learner learns the
relationship between input file format, data associated with file and code
coverage. After satisfactory training, learner is ready to predict the code
coverage of new files. Different classification algorithms like Artificial
Neural Networks, Support Vector Machines, etc can be used for learning purpose.

File generator: File generator runs a metaheuristic evolutionary algorithm to
generate and evolve files to attain higher code coverage. Following are the
steps taken by file generator.

 * Crossover: consider two files as parents and generate two new files as a
   result of crossover. We can randomly select a crossover point and reproduce
   to generate new files or we can also randomly interchange a block of data
   between two files to generate new files. Basic crossover operation is shown
   in the diagram below.    
   
 * Mutation: Mutate random data in file. Degree of mutation should be kept very
   less and must be experimented with. This step is very important as it
   introduces new files to the system which can contribute to cover different
   parts of code.
 * Selection: In selection, fitness of every file in current generation is
   calculated using predictor. Only the fittest files (files with higher code
   coverage) are kept and others are discarded.

Yes, instead of using predictor  we can calculate the code coverage of each file
at run time but this will significantly increase the execution time of
algorithm.

Multiple iteration of above steps should sufficiently increase the code coverage
of files. Finally the evolved files can be used for fuzzing.

 


RESULTS:

Above explained system was tested to generate JPEGs and PDFs against their
parser software. For testing JPEG we have used convert utility of Linux as a
target software. Convert parses the input file and converts it to specified
format. For PDF pdfium was used as target software. It parses a PDF file and can
perform operations like extracting the text from document, writing the pages to
images, etc.

Results are explained in graphs below. Plot explains code coverage i.e. basic
block count (on y-axis) for every output file. Red dashed lines represent input
files and green triangles represent the output files.



Above graph shows the effectiveness of proposed approach to generate JPEG files
with higher code coverage. We can see that there are almost 50% of output files
with code coverage greater than all the input files. The mutation and prediction
worked really well causing the increase in code coverage of files in every
generation.



In case of PDFs, the algorithm did not work as good as it worked for JPEGs.

We can see that around 80% of the files are covering very less code. The
possible reason for this could be that the PDF parser in target software is
rejecting the generated PDFs. Experimenting with the factors like degree of
mutation, reproduction strategies and number of generations could lead to better
results.

But there are still few files in output generation with basic block count
significantly higher than the input files.

 


SCOPE FOR IMPROVEMENT:

Above system works well with binary file formats. But fuzzing the systems with
highly syntactical inputs will be a problem. For example, while fuzzing xml,
json or any programming language parser. Slight change in input will make the
target software to reject it. The learner in above system does not learn the
syntax of input and it only looks for few patterns and predicts the code
coverage. Hence some grammatical inference mechanism could be used to learn the
input grammar and generate respective output.

Also, the time required to generate files can also be reduced. Experimentation
is needed to optimise the parameters of file generator.

That’s all for this post. Feel free to use the comment section for suggestions
and queries.

Get to know more about our process, methodology & team!

Get started today
Close the overlay



I AM LOOKING FOR
CYBERSECURITY SERVICES
CYBERSECURITY TRAINING

PLEASE CLICK ONE!




☷ ALL BLOGS ›  ✍ LATEST BLOGS

vishnu.k
21/07/2022

GETTING STARTED WITH SEMGREP AND FINDING VULNERABILITIES

Semgrep is an open-source static analysis tool that helps to scan source code
and find different vulnerabilities with custom/predefined rules....

debjeet
13/07/2022

UNDERSTANDING SERVER SIDE TEMPLATE INJECTION IN FLASK APPS

Learn about Server Side Template Injection(SSTI) vulnerabilities in Flask/Jinja2
applications and how to exploit it!...

rajesh.r
12/07/2022

HOW A SIMPLE IDOR LED ME TO DELETE ANY ACCOUNT

This is a tale about a particular application that had a 'delete user' feature
which couldn't validate the CSRF token and cookie header in the HTTP request...


☷ ALL NEWS ›  ⚑ LATEST NEWS

Talk, Online
28-May-2022

Aseem Jakhar will be giving a talk at cyberstartersconference.

Workshop, Online
13-May-2022

Kartheek Lade will be conducting a workshop on “Car hacking 101”

Webinar, Online
29-April-2022

Amit prajapat will be delivering a webinar on “Gaining Access to Protected
Components In Android”.


SUBSCRIBE TO OUR NEWSLETTER




Subscribe
or
or


FOLLOW OUR SOCIAL MEDIA HANDLES




FOLLOW OUR SOCIAL MEDIA HANDLES




Research Powered Cybersecurity Services and Training. Eliminate security threats
through our innovative and extensive security assessments.

Subscribe to our newsletter



Services

IoT Security Testing Red Team Assessment Product Security AI/ML Security Audit
Web Security Testing Mobile Security Testing DevSecOps Consulting Code Review
Cloud Security Critical Infrastructure

Products

EXPLIoT CloudFuzz

Conference

Nullcon Hardwear.io

Resources

Blog E-Book Advisory Media Case Studies MasterClass Series Securecode.wiki

About

About Us Career News Contact Us Payatu Bandits Hardware-Lab Disclosure Policy

All rights reserverved © 2022 Payatu



Research Powered Cybersecurity Services and Training. Eliminate security threats
through our innovative and extensive security assessments.

Subscribe to our newsletter



Services

IoT Security Testing Red Team Assessment Product Security AI/ML Security Audit
Web Security Testing Mobile Security Testing DevSecOps Consulting Code Review
Cloud Security Critical Infrastructure

Products

EXPLIoT CloudFuzz

Conference

Nullcon Hardwear.io

Resources

Blog E-Book Advisory Media Case Studies MasterClass Series Securecode.wiki

About

About Us Career News Contact Us Payatu Bandits Hardware-Lab Disclosure Policy

All rights reserverved © 2021 Payatu