www.overops.com Open in urlscan Pro
104.155.191.63  Public Scan

Submitted URL: https://go.overops.com/Nzg2LUJRWC05ODYAAAGAGbKL5J646QBNeSQXZJwycS_yeccMQxaiuW2Huxw4TKPl7HPZz8adDEmoV-iCI6e2TvhS_0A=
Effective URL: https://www.overops.com/blog/troubleshooting-apache-spark-applications-with-overops/?mkt_tok=Nzg2LUJRWC05ODYAAAGAGbKL5CD...
Submission: On October 17 via manual from IN — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

 * Product
   * Product
     * Overview
     * Identify
     * Prevent
     * Resolve
   * Technology
     * Why OverOps
     * Architecture
     * Security
     * API
   * Integrations
     * Pipeline & CI/CD
     * Log Management
     * APM
     * Incidents
     * See All
 * Pricing
 * Resources
   * Resources
     * Case Studies
     * Webinars
     * Events
     * White papers/ebooks
   * Industry
     * Financial Services
     * Retail & E-Commerce
     * Telco & Media
     * High Tech Software
     * Gaming
 * Blog
 * Start Free Trial
 * Support
 * Login

 * Get Started


 * Blog Categories
 * Industry Trends
 * R&D
 * News & Updates
 * Community
 * Subscribe


TROUBLESHOOTING APACHE SPARK APPLICATIONS WITH OVEROPS

Chris Caspanello  ● 02nd Sep 2021

6 min read



Chris Caspanello, avid Spark developer, demonstrates how you can use OverOps to
find errors on your Spark application. As Chris states, “configuration issues,
format issues, data issues, and outdated code can all wreak havoc on a Spark
job. In addition, operational challenges be it size of cluster or access make it
hard to debug production issues. OverOps’ ability to detect precisely why
something broke and to see variable state is invaluable in a distributed compute
environment. It makes detecting and resolving critical exceptions quick and
easy.”

If you are a Spark developer and have encountered the above or similar issues,
OverOps can be a game changer. Try OverOps free for 14 days now.

GitHub Files: https://github.com/ccaspanello/overops-spark-blog

Data path not configured property

When developing transformations, most customers would use HDFS to read files
from. This would be a URL like `hdfs://mycluster/path/to/data`. However, some
customers would reference files local to the nodes and would use a URL like
`file://path/to/data`. Unfortunately this is incorrect. The format for a URL is
[schema]://[host]/[path]. If you drop the host, you will need
`file:///path/to/data` with 3 forward slashes. When the Spark job is submitted
to the cluster, the job will die a horrible death with little to no indication
of what happened. This was fixed with upfront path validation, but finding the
root cause was not easy and very time consuming (more on that later). If I only
had OverOps, I could quickly and easily understand why it broke. I could look at
the continuous reliability console and see where the error occurred and what the
error is, along with the variable state coming into the function.



The Spark UI is great . . . when it is running

In the previous example I mentioned that finding the root cause of a Spark job
failure is not easy and time-consuming. The reason for this has to do with how
the Spark UI works. The Spark UI consists of many parts: Master, Worker, and Job
screens. The Master and Worker screens are up for the entire time and contain
details on stats of each service. While the Spark job is running, the Job
screens are available and look like this:





Here you can see what stages are being run and get logs for running / failed
stages. These logs can be useful for finding failures. Unfortunately, when the
Job finishes or fails, the service dies and you can no longer access the logs
through the web UI. Since OverOps detected the event, I was able to see it there
along with the entire variable state.

Missing headers on some files

In this example, I wrote a Spark application and tested it locally on a sample
file. Everything worked fine. However, I then ran the job in my cluster against
a real dataset and it failed with the following error:

IllegalArgumentException: ‘marketplace does not exist. Available: US, 16132861,
. . .’

As you can see, the exception had row data which is good. But that alone is not
enough to let us know what was going on. Since OverOps captures variable state
at the time the exception happens, I was able to see that the schema was
essentially empty. The root cause was because I did not have a header row for
every part file.



Invalid Delimiter

In this example, there is an error similar to the one above.

IllegalArgumentException: “id does not exist. Available:
id,first_name,last_name,email,gender,ip_address”

But in this case I did have a column header on my files. What’s going on then? 
Looking at the variable state in OverOps, I can see that my schema has one
column named:

 id,first_name,last_name,email,gender,ip_address. This tells me that my
delimiter is bad.



Scaling up with unknown data

Oftentimes big data developers will test on a small subset of data .limit(200) .
But what happens when unexpected data comes into the system?  Do you crash the
application?  Or do you swallow the error and move on?  That is always a hot
topic, but either way OverOps can find the exact place where the data could not
be parsed.

In this example the original application was coded to accept a Gender of
MALE/FEMALE. Now a new valid gender value is seen. In our scenario, we should
update our application to include POLYGENDER as well.



Operational Challenges

Aside from coding issues there are also operational challenges:

 * On large Spark clusters with 100s of nodes, finding the right work in order
   to find the right log is a very tough task. This is where a Spark History
   server is useful, but sometimes cluster admins lock this down and the
   developer may not even have permission.
   * OverOps gives us a central place to go for any errors that occur.
 * Running a job on massive datasets may take hours (hence why sometimes records
   are ignored or redirected). 
   * OverOps can detect anomalies as they occur so we could kill our job sooner
     and adjust code.
   * This can be a double cost saving measure:  reduced developer time and
     reduced cloud resources spent running a bad job.
 * Sometimes Logs are turned off to increase speed / conserve resources
   * Even if logs are turned off in the application, log events and exceptions
     can still be captured

Summary

As you can see, there are configuration issues, format issues, data issues, and
outdated code that can all wreak havoc on a Spark job. In addition, operational
challenges be it size of cluster or access make it hard to debug production
issues. OverOps ability to detect precisely why something broke and to see
variable state is invaluable in a distributed compute environment. It makes
detecting and resolving critical exceptions quick and easy. So if you are a
Spark developer and have encountered the above or similar issues, you might want
to give OverOps a try.


TRY OVEROPS WITH A 14-DAY FREE TRIAL

So if you are a Spark developer and have encountered the above or similar
issues, you might want to give OverOps a try. Get started for free now.

CHRIS CASPANELLO



Troubleshooting Apache Spark Applications with OverOps OverOps’ ability to
detect precisely why something broke and to see variable state is invaluable in
a distributed compute environment.

TROUBLESHOOTING APACHE SPARK APPLICATIONS WITH OVEROPS

Learn More  


NEXT ARTICLE


IMPROVE APPLICATION PERFORMANCE WITH THESE ADVANCED GC TECHNIQUES

12 min read


THE FASTEST WAY TO WHY.

ELIMINATE THE DETECTIVE WORK OF SEARCHING LOGS FOR THE CAUSE OF CRITICAL ISSUES.
RESOLVE ISSUES IN MINUTES.

Learn More
 * Product
   * Identify
   * Prevent
   * Resolve
   * Architecture
   * Customers
   * Pricing
   * API

ROLES

 * Developer
 * DevOps and SRE
 * QA Engineer
 * Executives

INTEGRATIONS

 * Log Analytics
 * APM
 * Workflow
 * Visualizations
 * See All

 * Company
   * About us
   * Leadership
   * Press
   * Careers
   * Contact Us
   * Blog

 * Resources
   * eBooks
   * Press
   * White papers
   * Webinars
   * Case Studies
   * Events

FOLLOW US



By OverOps, Inc. 2021 © All Rights Reserved
 * Terms
 * Privacy

     
hello@overops.com     +1 415-767-1250

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in
settings.

Accept
Close GDPR Cookie Settings
 * Privacy Overview
 * Strictly Necessary Cookies

Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user
experience possible. Cookie information is stored in your browser and performs
functions such as recognising you when you return to our website and helping our
team to understand which sections of the website you find most interesting and
useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save
your preferences for cookie settings.

Enable or Disable Cookies

If you disable this cookie, we will not be able to save your preferences. This
means that every time you visit this website you will need to enable or disable
cookies again.

Enable All Save Settings







word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word
word word word word word word word word word word word word word word word word
word word word word word word word word

mmMwWLliI0fiflO&1
mmMwWLliI0fiflO&1
mmMwWLliI0fiflO&1
mmMwWLliI0fiflO&1
mmMwWLliI0fiflO&1
mmMwWLliI0fiflO&1
mmMwWLliI0fiflO&1