payatu.com
Open in
urlscan Pro
188.114.97.3
Public Scan
Submitted URL: https://payatu.com/blog/Nikhil-Joshi/machine-learning-effective-fuzzing
Effective URL: https://payatu.com/machine-learning-effective-fuzzing
Submission: On July 25 via api from US — Scanned from NL
Effective URL: https://payatu.com/machine-learning-effective-fuzzing
Submission: On July 25 via api from US — Scanned from NL
Form analysis
1 forms found in the DOMPOST https://payatu.com/subscription/newsletter
<form id="newsletter-subs" class="form" method="POST" action="https://payatu.com/subscription/newsletter"
style="width: 100% !important;margin-left: 0px !important;padding-left: 0px !important;margin-right: 0px !important;padding-right: 0px !important;text-align: center !important;">
<input id="newsletter-subs-email" pattern="[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{1,63}$" required="" type="email" class="form__field col-10" placeholder="Your E-Mail Address"
style=" box-shadow: 10px 10px 5px #850404 !important; width: 21em !important; height: 50px; background: #EAEAEA; border-radius: 5px 0px 0px 5px; font-family: Nunito; font-style: normal; font-weight: normal; font-size: 16px; line-height: 27px; color: #6C757D;">
<button type="submit" class="btn btn--primary btn--inside uppercase col-4 hvr-grow"
style=" padding: 3px !important;width: 198px; background: #221638; border-radius: 0px 5px 5px 0px; height: 55px; font-family: Roboto; font-style: normal; font-weight: normal; font-size: 16px; color: #FFFFFF;">Subscribe</button>
</form>
Text Content
Services IoT Security Testing Red Team Assessment Product Security AI/ML Security Audit Web Security Testing Mobile Security Testing DevSecOps Consulting Code Review Cloud Security Critical Infrastructure Products EXPLIoT EXPLIoT is framework for IoT security testing and exploitation. CloudFuzz CloudFuzz is platform that lets you code for bugs by running your software with millions of test cases. Who we are About Us Payatu Bandits Resources Blogs MasterClass Series Case Studies E-Books New Advisory Media Checklist Tools securecode.wiki New Contact Us Pune Location Europe Location Australia Location Top Openings Security consultant IT sales Pre-Sales Executive Software Developer Embedded Developer ALL OPENINGS Get all of it Be a Bandit Services Products Who we are Resources Contact Us We are Hiring * Home * News * Advisory * Hardware-Lab * Contact-Us * Career Back Services Products Who we are Resources News Advisory Hardware Lab Career Contact Us Services IoT Security Testing Red Team Assessment Product Security AI/ML Security Audit Web Security Testing Mobile Security Testing DevSecOps Consulting Code Review Cloud Security Critical Infrastructure Products EXPLIoT CloudFuzz Resources Blogs MasterClass Series Case Studies E-Books New Advisory Media Checklist Tools securecode.wiki New Who we are About Us Payatu Bandits Contact Us Pune Location Europe Location Australia Location ⌂ Home › ☷ All Blogs › ✍ Nikhil-Joshi › MACHINE LEARNING FOR EFFECTIVE FUZZING – CLOUDFUZZ Nikhil-Joshi 05/02/2018 MACHINE LEARNING FOR EFFECTIVE FUZZING – CLOUDFUZZ 05/02/2018 0 Comments in Blog by Nikhil Joshi In this blog we will see Machine learning techniques that can be used to perform effective fuzzing on a software system. This system will be integrated with CloudFuzz. CloudFuzz is an integrated software framework for security based fuzzing. The end goal is to provide a workflow that will allow continuous fuzzing and generate reports of the software security vulnerabilities by analysing crashes on a given piece of software. In CloudFuzz we provide crafted data to a software system and analyse the system for crashes. Ultimate aim of fuzzing is to discover bugs and security vulnerabilities in the target software. Probability of discovering a bug increases with the magnitude of code covered by the input provided to target software. Generating inputs with high code coverage is a tricky task. Here is one of the attempts to solve this problem using machine learning. PROBLEM: Fuzzing a software with random data may or may not discover new bugs. Also, such random attempts do not guarantee of covering the complete code. Hence there should be a system which learns the type and format of input files and generate similar files to attain higher code coverage. Since there could be countless number of file formats, our system should be highly generic and should work for every type of file format. It should not be bounded by a certain type of input. Eg: If the system is working for .doc files then it should also work for JPEGs or PDFs, etc. APPROACH TO SOLVE THE PROBLEM: Following diagram shows how the system works. Preprocessor: Preprocessor churns drcov logs (contains code coverage information) and sample input files and generates a dataset in csv format. Dataset contains predefined features extracted from sample files and drcov logs. It is necessary to select the features which contribute to code coverage. Learner: With the help of .csv generated by preprocessor learner learns the relationship between input file format, data associated with file and code coverage. After satisfactory training, learner is ready to predict the code coverage of new files. Different classification algorithms like Artificial Neural Networks, Support Vector Machines, etc can be used for learning purpose. File generator: File generator runs a metaheuristic evolutionary algorithm to generate and evolve files to attain higher code coverage. Following are the steps taken by file generator. * Crossover: consider two files as parents and generate two new files as a result of crossover. We can randomly select a crossover point and reproduce to generate new files or we can also randomly interchange a block of data between two files to generate new files. Basic crossover operation is shown in the diagram below. * Mutation: Mutate random data in file. Degree of mutation should be kept very less and must be experimented with. This step is very important as it introduces new files to the system which can contribute to cover different parts of code. * Selection: In selection, fitness of every file in current generation is calculated using predictor. Only the fittest files (files with higher code coverage) are kept and others are discarded. Yes, instead of using predictor we can calculate the code coverage of each file at run time but this will significantly increase the execution time of algorithm. Multiple iteration of above steps should sufficiently increase the code coverage of files. Finally the evolved files can be used for fuzzing. RESULTS: Above explained system was tested to generate JPEGs and PDFs against their parser software. For testing JPEG we have used convert utility of Linux as a target software. Convert parses the input file and converts it to specified format. For PDF pdfium was used as target software. It parses a PDF file and can perform operations like extracting the text from document, writing the pages to images, etc. Results are explained in graphs below. Plot explains code coverage i.e. basic block count (on y-axis) for every output file. Red dashed lines represent input files and green triangles represent the output files. Above graph shows the effectiveness of proposed approach to generate JPEG files with higher code coverage. We can see that there are almost 50% of output files with code coverage greater than all the input files. The mutation and prediction worked really well causing the increase in code coverage of files in every generation. In case of PDFs, the algorithm did not work as good as it worked for JPEGs. We can see that around 80% of the files are covering very less code. The possible reason for this could be that the PDF parser in target software is rejecting the generated PDFs. Experimenting with the factors like degree of mutation, reproduction strategies and number of generations could lead to better results. But there are still few files in output generation with basic block count significantly higher than the input files. SCOPE FOR IMPROVEMENT: Above system works well with binary file formats. But fuzzing the systems with highly syntactical inputs will be a problem. For example, while fuzzing xml, json or any programming language parser. Slight change in input will make the target software to reject it. The learner in above system does not learn the syntax of input and it only looks for few patterns and predicts the code coverage. Hence some grammatical inference mechanism could be used to learn the input grammar and generate respective output. Also, the time required to generate files can also be reduced. Experimentation is needed to optimise the parameters of file generator. That’s all for this post. Feel free to use the comment section for suggestions and queries. Get to know more about our process, methodology & team! Get started today Close the overlay I AM LOOKING FOR CYBERSECURITY SERVICES CYBERSECURITY TRAINING PLEASE CLICK ONE! ☷ ALL BLOGS › ✍ LATEST BLOGS vishnu.k 21/07/2022 GETTING STARTED WITH SEMGREP AND FINDING VULNERABILITIES Semgrep is an open-source static analysis tool that helps to scan source code and find different vulnerabilities with custom/predefined rules.... debjeet 13/07/2022 UNDERSTANDING SERVER SIDE TEMPLATE INJECTION IN FLASK APPS Learn about Server Side Template Injection(SSTI) vulnerabilities in Flask/Jinja2 applications and how to exploit it!... rajesh.r 12/07/2022 HOW A SIMPLE IDOR LED ME TO DELETE ANY ACCOUNT This is a tale about a particular application that had a 'delete user' feature which couldn't validate the CSRF token and cookie header in the HTTP request... ☷ ALL NEWS › ⚑ LATEST NEWS Talk, Online 28-May-2022 Aseem Jakhar will be giving a talk at cyberstartersconference. Workshop, Online 13-May-2022 Kartheek Lade will be conducting a workshop on “Car hacking 101” Webinar, Online 29-April-2022 Amit prajapat will be delivering a webinar on “Gaining Access to Protected Components In Android”. SUBSCRIBE TO OUR NEWSLETTER Subscribe or or FOLLOW OUR SOCIAL MEDIA HANDLES FOLLOW OUR SOCIAL MEDIA HANDLES Research Powered Cybersecurity Services and Training. Eliminate security threats through our innovative and extensive security assessments. Subscribe to our newsletter Services IoT Security Testing Red Team Assessment Product Security AI/ML Security Audit Web Security Testing Mobile Security Testing DevSecOps Consulting Code Review Cloud Security Critical Infrastructure Products EXPLIoT CloudFuzz Conference Nullcon Hardwear.io Resources Blog E-Book Advisory Media Case Studies MasterClass Series Securecode.wiki About About Us Career News Contact Us Payatu Bandits Hardware-Lab Disclosure Policy All rights reserverved © 2022 Payatu Research Powered Cybersecurity Services and Training. Eliminate security threats through our innovative and extensive security assessments. Subscribe to our newsletter Services IoT Security Testing Red Team Assessment Product Security AI/ML Security Audit Web Security Testing Mobile Security Testing DevSecOps Consulting Code Review Cloud Security Critical Infrastructure Products EXPLIoT CloudFuzz Conference Nullcon Hardwear.io Resources Blog E-Book Advisory Media Case Studies MasterClass Series Securecode.wiki About About Us Career News Contact Us Payatu Bandits Hardware-Lab Disclosure Policy All rights reserverved © 2021 Payatu