www.overops.com
Open in
urlscan Pro
104.155.191.63
Public Scan
Submitted URL: https://go.overops.com/Nzg2LUJRWC05ODYAAAGAGbKL5J646QBNeSQXZJwycS_yeccMQxaiuW2Huxw4TKPl7HPZz8adDEmoV-iCI6e2TvhS_0A=
Effective URL: https://www.overops.com/blog/troubleshooting-apache-spark-applications-with-overops/?mkt_tok=Nzg2LUJRWC05ODYAAAGAGbKL5CD...
Submission: On October 17 via manual from IN — Scanned from DE
Effective URL: https://www.overops.com/blog/troubleshooting-apache-spark-applications-with-overops/?mkt_tok=Nzg2LUJRWC05ODYAAAGAGbKL5CD...
Submission: On October 17 via manual from IN — Scanned from DE
Form analysis
0 forms found in the DOMText Content
* Product * Product * Overview * Identify * Prevent * Resolve * Technology * Why OverOps * Architecture * Security * API * Integrations * Pipeline & CI/CD * Log Management * APM * Incidents * See All * Pricing * Resources * Resources * Case Studies * Webinars * Events * White papers/ebooks * Industry * Financial Services * Retail & E-Commerce * Telco & Media * High Tech Software * Gaming * Blog * Start Free Trial * Support * Login * Get Started * Blog Categories * Industry Trends * R&D * News & Updates * Community * Subscribe TROUBLESHOOTING APACHE SPARK APPLICATIONS WITH OVEROPS Chris Caspanello ● 02nd Sep 2021 6 min read Chris Caspanello, avid Spark developer, demonstrates how you can use OverOps to find errors on your Spark application. As Chris states, “configuration issues, format issues, data issues, and outdated code can all wreak havoc on a Spark job. In addition, operational challenges be it size of cluster or access make it hard to debug production issues. OverOps’ ability to detect precisely why something broke and to see variable state is invaluable in a distributed compute environment. It makes detecting and resolving critical exceptions quick and easy.” If you are a Spark developer and have encountered the above or similar issues, OverOps can be a game changer. Try OverOps free for 14 days now. GitHub Files: https://github.com/ccaspanello/overops-spark-blog Data path not configured property When developing transformations, most customers would use HDFS to read files from. This would be a URL like `hdfs://mycluster/path/to/data`. However, some customers would reference files local to the nodes and would use a URL like `file://path/to/data`. Unfortunately this is incorrect. The format for a URL is [schema]://[host]/[path]. If you drop the host, you will need `file:///path/to/data` with 3 forward slashes. When the Spark job is submitted to the cluster, the job will die a horrible death with little to no indication of what happened. This was fixed with upfront path validation, but finding the root cause was not easy and very time consuming (more on that later). If I only had OverOps, I could quickly and easily understand why it broke. I could look at the continuous reliability console and see where the error occurred and what the error is, along with the variable state coming into the function. The Spark UI is great . . . when it is running In the previous example I mentioned that finding the root cause of a Spark job failure is not easy and time-consuming. The reason for this has to do with how the Spark UI works. The Spark UI consists of many parts: Master, Worker, and Job screens. The Master and Worker screens are up for the entire time and contain details on stats of each service. While the Spark job is running, the Job screens are available and look like this: Here you can see what stages are being run and get logs for running / failed stages. These logs can be useful for finding failures. Unfortunately, when the Job finishes or fails, the service dies and you can no longer access the logs through the web UI. Since OverOps detected the event, I was able to see it there along with the entire variable state. Missing headers on some files In this example, I wrote a Spark application and tested it locally on a sample file. Everything worked fine. However, I then ran the job in my cluster against a real dataset and it failed with the following error: IllegalArgumentException: ‘marketplace does not exist. Available: US, 16132861, . . .’ As you can see, the exception had row data which is good. But that alone is not enough to let us know what was going on. Since OverOps captures variable state at the time the exception happens, I was able to see that the schema was essentially empty. The root cause was because I did not have a header row for every part file. Invalid Delimiter In this example, there is an error similar to the one above. IllegalArgumentException: “id does not exist. Available: id,first_name,last_name,email,gender,ip_address” But in this case I did have a column header on my files. What’s going on then? Looking at the variable state in OverOps, I can see that my schema has one column named: id,first_name,last_name,email,gender,ip_address. This tells me that my delimiter is bad. Scaling up with unknown data Oftentimes big data developers will test on a small subset of data .limit(200) . But what happens when unexpected data comes into the system? Do you crash the application? Or do you swallow the error and move on? That is always a hot topic, but either way OverOps can find the exact place where the data could not be parsed. In this example the original application was coded to accept a Gender of MALE/FEMALE. Now a new valid gender value is seen. In our scenario, we should update our application to include POLYGENDER as well. Operational Challenges Aside from coding issues there are also operational challenges: * On large Spark clusters with 100s of nodes, finding the right work in order to find the right log is a very tough task. This is where a Spark History server is useful, but sometimes cluster admins lock this down and the developer may not even have permission. * OverOps gives us a central place to go for any errors that occur. * Running a job on massive datasets may take hours (hence why sometimes records are ignored or redirected). * OverOps can detect anomalies as they occur so we could kill our job sooner and adjust code. * This can be a double cost saving measure: reduced developer time and reduced cloud resources spent running a bad job. * Sometimes Logs are turned off to increase speed / conserve resources * Even if logs are turned off in the application, log events and exceptions can still be captured Summary As you can see, there are configuration issues, format issues, data issues, and outdated code that can all wreak havoc on a Spark job. In addition, operational challenges be it size of cluster or access make it hard to debug production issues. OverOps ability to detect precisely why something broke and to see variable state is invaluable in a distributed compute environment. It makes detecting and resolving critical exceptions quick and easy. So if you are a Spark developer and have encountered the above or similar issues, you might want to give OverOps a try. TRY OVEROPS WITH A 14-DAY FREE TRIAL So if you are a Spark developer and have encountered the above or similar issues, you might want to give OverOps a try. Get started for free now. CHRIS CASPANELLO Troubleshooting Apache Spark Applications with OverOps OverOps’ ability to detect precisely why something broke and to see variable state is invaluable in a distributed compute environment. TROUBLESHOOTING APACHE SPARK APPLICATIONS WITH OVEROPS Learn More NEXT ARTICLE IMPROVE APPLICATION PERFORMANCE WITH THESE ADVANCED GC TECHNIQUES 12 min read THE FASTEST WAY TO WHY. ELIMINATE THE DETECTIVE WORK OF SEARCHING LOGS FOR THE CAUSE OF CRITICAL ISSUES. RESOLVE ISSUES IN MINUTES. Learn More * Product * Identify * Prevent * Resolve * Architecture * Customers * Pricing * API ROLES * Developer * DevOps and SRE * QA Engineer * Executives INTEGRATIONS * Log Analytics * APM * Workflow * Visualizations * See All * Company * About us * Leadership * Press * Careers * Contact Us * Blog * Resources * eBooks * Press * White papers * Webinars * Case Studies * Events FOLLOW US By OverOps, Inc. 2021 © All Rights Reserved * Terms * Privacy hello@overops.com +1 415-767-1250 We are using cookies to give you the best experience on our website. You can find out more about which cookies we are using or switch them off in settings. Accept Close GDPR Cookie Settings * Privacy Overview * Strictly Necessary Cookies Powered by GDPR Cookie Compliance Privacy Overview This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful. Strictly Necessary Cookies Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings. Enable or Disable Cookies If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again. Enable All Save Settings word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word mmMwWLliI0fiflO&1 mmMwWLliI0fiflO&1 mmMwWLliI0fiflO&1 mmMwWLliI0fiflO&1 mmMwWLliI0fiflO&1 mmMwWLliI0fiflO&1 mmMwWLliI0fiflO&1