phoenixnap.com Open in urlscan Pro
2606:4700:10::6816:3779  Public Scan

URL: https://phoenixnap.com/kb/install-spark-on-windows-10
Submission: On November 28 via api from ZA — Scanned from DE

Form analysis 3 forms found in the DOM

GET https://phoenixnap.com/kb/

<form role="search" method="get" id="searchform" class="searchform" action="https://phoenixnap.com/kb/">
  <div> <label class="screen-reader-text" for="s">Search for:</label> <input type="text" value="" name="s" id="s"> <input type="submit" id="searchsubmit" value="Search"> </div>
</form>

GET https://phoenixnap.com/kb/

<form class="searchform-wrapper-modal" id="searchform" method="get" action="https://phoenixnap.com/kb/">
  <input type="text" class="search-field-modal" name="s" placeholder="Search..." value="" autofocus="">
  <input type="submit" id="submit-button-modal" class="submit-button-modal" value="Search">
  <div id="innericon-modal" class="innericon-modal">
    <svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" width="22px" height="28px" viewBox="0 0 512 512" enable-background="new 0 0 512 512" xml:space="preserve">
      <path id="magnifier-2-icon"
        d="M460.355,421.59L353.844,315.078c20.041-27.553,31.885-61.437,31.885-98.037                    C385.729,124.934,310.793,50,218.686,50C126.58,50,51.645,124.934,51.645,217.041c0,92.106,74.936,167.041,167.041,167.041                    c34.912,0,67.352-10.773,94.184-29.158L419.945,462L460.355,421.59z M100.631,217.041c0-65.096,52.959-118.056,118.055-118.056                    c65.098,0,118.057,52.959,118.057,118.056c0,65.096-52.959,118.056-118.057,118.056C153.59,335.097,100.631,282.137,100.631,217.041                    z">
      </path>
    </svg>
  </div>
</form>

GET https://phoenixnap.com/kb/

<form class="searchform-wrapper-modal" id="searchform" method="get" action="https://phoenixnap.com/kb/">
  <input type="text" class="search-field-modal" name="s" placeholder="Search..." value="">
  <button type="submit" id="submit-button" class="submit-button-modal" value="">
    <div id="innericon" class="innericon-modal">
      <svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" width="22px" height="28px" viewBox="0 0 512 512" enable-background="new 0 0 512 512" xml:space="preserve">
        <path id="magnifier-2-icon"
          d="M460.355,421.59L353.844,315.078c20.041-27.553,31.885-61.437,31.885-98.037                    C385.729,124.934,310.793,50,218.686,50C126.58,50,51.645,124.934,51.645,217.041c0,92.106,74.936,167.041,167.041,167.041                    c34.912,0,67.352-10.773,94.184-29.158L419.945,462L460.355,421.59z M100.631,217.041c0-65.096,52.959-118.056,118.055-118.056                    c65.098,0,118.057,52.959,118.057,118.056c0,65.096-52.959,118.056-118.057,118.056C153.59,335.097,100.631,282.137,100.631,217.041                    z">
        </path>
      </svg>
    </div>
  </button>
</form>

Text Content

 * Call
   * Support
   * Sales
 * Login
   * Bare Metal Cloud
   * Channel Partners
   * Billing Portal
 * Partners

 * PHOENIXNAP HOME
 * PRODUCTS
   * Colocation
     Premier Carrier Hotel
     * Colocation
       Overview
     * Data Center as a Service
       Solutions for Digital Transformation
     * Hardware as a Service
       Flexible Hardware Leasing
     * Meet-Me Room
       The Interconnectivity Hub
     * Schedule a Tour
       Guided Virtual Data Center Tour
     * Data Center Locations
       Global Data Center Footprint
   * Bare Metal Cloud
     API-Driven Dedicated Servers
     * Platform
       Overview
     * Rancher Deployment
       One-Click Kubernetes Deployment
     * Intel Xeon E-2300
       Entry-Level Servers
     * 4th Gen Intel Xeon Scalable CPUs
       Boost Data-Intensive Workloads
     * Alliances
       Technology Partnerships
     * Object Storage
       S3-Compatible Storage Solution
   * Dedicated Servers
     Single-Tenant Physical Machines
     * Dedicated Servers
       Overview
     * FlexServers
       Vertical CPU Scaling
     * Intel Xeon-E Servers
       Intel Xeon 2200 Microarchitecture
     * GPU Servers
       Servers with NVIDIA Tesla GPUs
     * Dedicated Servers vs. BMC
       Compare Popular Platforms
     * Promotions
       See Available Discounts
     * Buy Now
       See All Servers
   * Cloud
     Custom Cloud Solutions
     * Managed Private Cloud (MPC)
       Highly Customizable Cloud
     * Data Security Cloud
       Secure-By-Design Cloud
     * Hybrid Cloud
       Multi-Platform Environment
     * Edge Computing
       Globally Distributed Servers
     * Object Storage
       S3 API Compatible Storage Service
     * Bare Metal Cloud
       API-Driven Dedicated Servers
     * Alternative Cloud Provider
       Overcome Public Cloud Limitations
   * Backup & Restore
     Backup and DRaaS
     * Backup Solutions
       Veeam-Powered Services
     * Disaster Recovery
       VMware, Veeam, Zerto
     * Veeam Cloud Connect
       Backup and Replication
     * Managed Backup for Microsoft 365
       Veeam-Powered Service
   * Security
     Security Services
     * Data Security Cloud
       Secure-by-Design Cloud
     * Encryption Management Platform (EMP)
       Cryptographic Key Management
     * Confidential Computing
       Data-in-Use Encryption
     * Ransomware Protection
       Data Protection and Availability
     * DDoS Protection
       Network Security Features
 * CONTACT SUPPORT
 * NETWORK
   * Network Overview
     Global Network Footprint
   * Network Locations
     U.S., Europe, APAC, LATAM
   * Speed Test
     Download Speed Test
 * LEARN
   * Blog
     IT Tips and Tricks
   * Glossary
     IT Terms and Definitions
   * Resource Library
     Knowledge Resources
   * Events
     Let's Meet!
   * Newsroom
     Media Library
   * Developers
     Development Resources Portal
   * APIs
     Access Our Public APIs
   * GitHub
     Public Code Repositories
 * Search for:




HOW TO INSTALL APACHE SPARK ON WINDOWS 10

May 28, 2020
big datawindows



Home » DevOps and Development » How to Install Apache Spark on Windows 10

Contents
 1. Install Apache Spark on Windows
    1. Step 1: Install Java 8
    2. Step 2: Install Python
    3. Step 3: Download Apache Spark
    4. Step 4: Verify Spark Software File
    5. Step 5: Install Apache Spark
    6. Step 6: Add winutils.exe File
    7. Step 7: Configure Environment Variables
    8. Step 8: Launch Spark
 2. Test Spark

Contents
 1. Install Apache Spark on Windows
    1. Step 1: Install Java 8
    2. Step 2: Install Python
    3. Step 3: Download Apache Spark
    4. Step 4: Verify Spark Software File
    5. Step 5: Install Apache Spark
    6. Step 6: Add winutils.exe File
    7. Step 7: Configure Environment Variables
    8. Step 8: Launch Spark
 2. Test Spark

Introduction

Apache Spark is an open-source framework that processes large volumes of stream
data from multiple sources. Spark is used in distributed computing with machine
learning applications, data analytics, and graph-parallel processing.

This guide will show you how to install Apache Spark on Windows 10 and test the
installation.



Prerequisites

 * A system running Windows 10
 * A user account with administrator privileges (required to install software,
   modify file permissions, and modify system PATH)
 * Command Prompt or Powershell
 * A tool to extract .tar files, such as 7-Zip


INSTALL APACHE SPARK ON WINDOWS

Installing Apache Spark on Windows 10 may seem complicated to novice users, but
this simple tutorial will have you up and running. If you already have Java 8
and Python 3 installed, you can skip the first two steps.


STEP 1: INSTALL JAVA 8

Apache Spark requires Java 8. You can check to see if Java is installed using
the command prompt.

Open the command line by clicking Start > type cmd > click Command Prompt.

Type the following command in the command prompt:

java -version

If Java is installed, it will respond with the following output:



Your version may be different. The second digit is the Java version – in this
case, Java 8.

If you don’t have Java installed:

1. Open a browser window, and navigate to https://java.com/en/download/.



2. Click the Java Download button and save the file to a location of your
choice.

3. Once the download finishes double-click the file to install Java.

Note: At the time this article was written, the latest Java version is
1.8.0_251. Installing a later version will still work. This process only needs
the Java Runtime Environment (JRE) – the full Development Kit (JDK) is not
required. The download link to JDK is
https://www.oracle.com/java/technologies/javase-downloads.html.


STEP 2: INSTALL PYTHON

1. To install the Python package manager, navigate to https://www.python.org/ in
your web browser.

2. Mouse over the Download menu option and click Python 3.8.3. 3.8.3 is the
latest version at the time of writing the article.

3. Once the download finishes, run the file.



4. Near the bottom of the first setup dialog box, check off Add Python 3.8 to
PATH. Leave the other box checked.

5. Next, click Customize installation.



6. You can leave all boxes checked at this step, or you can uncheck the options
you do not want.

7. Click Next.

8. Select the box Install for all users and leave other boxes as they are.

9. Under Customize install location, click Browse and navigate to the C drive.
Add a new folder and name it Python.

10. Select that folder and click OK.



11. Click Install, and let the installation complete.

12. When the installation completes, click the Disable path length limit option
at the bottom and then click Close.

13. If you have a command prompt open, restart it. Verify the installation by
checking the version of Python:

python --version

The output should print Python 3.8.3.

Note: For detailed instructions on how to install Python 3 on Windows or how to
troubleshoot potential issues, refer to our Install Python 3 on Windows guide.


STEP 3: DOWNLOAD APACHE SPARK

1. Open a browser and navigate to https://spark.apache.org/downloads.html.

2. Under the Download Apache Spark heading, there are two drop-down menus. Use
the current non-preview version.

 * In our case, in Choose a Spark release drop-down menu select 2.4.5 (Feb 05
   2020).
 * In the second drop-down Choose a package type, leave the selection Pre-built
   for Apache Hadoop 2.7.

3. Click the spark-2.4.5-bin-hadoop2.7.tgz link.



4. A page with a list of mirrors loads where you can see different servers to
download from. Pick any from the list and save the file to your Downloads
folder.


STEP 4: VERIFY SPARK SOFTWARE FILE

1. Verify the integrity of your download by checking the checksum of the file.
This ensures you are working with unaltered, uncorrupted software.

2. Navigate back to the Spark Download page and open the Checksum link,
preferably in a new tab.

3. Next, open a command line and enter the following command:

certutil -hashfile c:\users\username\Downloads\spark-2.4.5-bin-hadoop2.7.tgz SHA512

4. Change the username to your username. The system displays a long alphanumeric
code, along with the message Certutil: -hashfile completed successfully.



5. Compare the code to the one you opened in a new browser tab. If they match,
your download file is uncorrupted.


STEP 5: INSTALL APACHE SPARK

Installing Apache Spark involves extracting the downloaded file to the desired
location.

1. Create a new folder named Spark in the root of your C: drive. From a command
line, enter the following:

cd \

mkdir Spark

2. In Explorer, locate the Spark file you downloaded.

3. Right-click the file and extract it to C:\Spark using the tool you have on
your system (e.g., 7-Zip).

4. Now, your C:\Spark folder has a new folder spark-2.4.5-bin-hadoop2.7 with the
necessary files inside.


STEP 6: ADD WINUTILS.EXE FILE

Download the winutils.exe file for the underlying Hadoop version for the Spark
installation you downloaded.

1. Navigate to this URL https://github.com/cdarlint/winutils and inside the bin
folder, locate winutils.exe, and click it.



2. Find the Download button on the right side to download the file.

3. Now, create new folders Hadoop and bin on C: using Windows Explorer or the
Command Prompt.

4. Copy the winutils.exe file from the Downloads folder to C:\hadoop\bin.


STEP 7: CONFIGURE ENVIRONMENT VARIABLES

Configuring environment variables in Windows adds the Spark and Hadoop locations
to your system PATH. It allows you to run the Spark shell directly from a
command prompt window.

1. Click Start and type environment.

2. Select the result labeled Edit the system environment variables.

3. A System Properties dialog box appears. In the lower-right corner, click
Environment Variables and then click New in the next window.



4. For Variable Name type SPARK_HOME.

5. For Variable Value type C:\Spark\spark-2.4.5-bin-hadoop2.7 and click OK. If
you changed the folder path, use that one instead.



6. In the top box, click the Path entry, then click Edit. Be careful with
editing the system path. Avoid deleting any entries already on the list.



7. You should see a box with entries on the left. On the right, click New.

8. The system highlights a new line. Enter the path to the Spark folder
C:\Spark\spark-2.4.5-bin-hadoop2.7\bin. We recommend using %SPARK_HOME%\bin to
avoid possible issues with the path.



9. Repeat this process for Hadoop and Java.

 * For Hadoop, the variable name is HADOOP_HOME and for the value use the path
   of the folder you created earlier: C:\hadoop. Add C:\hadoop\bin to the Path
   variable field, but we recommend using %HADOOP_HOME%\bin.
 * For Java, the variable name is JAVA_HOME and for the value use the path to
   your Java JDK directory (in our case it’s C:\Program
   Files\Java\jdk1.8.0_251).

10. Click OK to close all open windows.

Note: Star by restarting the Command Prompt to apply changes. If that doesn't
work, you will need to reboot the system.


STEP 8: LAUNCH SPARK

1. Open a new command-prompt window using the right-click and Run as
administrator:

2. To start Spark, enter:

C:\Spark\spark-2.4.5-bin-hadoop2.7\bin\spark-shell

If you set the environment path correctly, you can type spark-shell to launch
Spark.

3. The system should display several lines indicating the status of the
application. You may get a Java pop-up. Select Allow access to continue.

Finally, the Spark logo appears, and the prompt displays the Scala shell.



4., Open a web browser and navigate to http://localhost:4040/.

5. You can replace localhost with the name of your system.

6. You should see an Apache Spark shell Web UI. The example below shows the
Executors page.



7. To exit Spark and close the Scala shell, press ctrl-d in the command-prompt
window.

Note: If you installed Python, you can run Spark using Python with this command:


pyspark


Exit using quit().


TEST SPARK

In this example, we will launch the Spark shell and use Scala to read the
contents of a file. You can use an existing file, such as the README file in the
Spark directory, or you can create your own. We created pnaptest with some text.

1. Open a command-prompt window and navigate to the folder with the file you
want to use and launch the Spark shell.

2. First, state a variable to use in the Spark context with the name of the
file. Remember to add the file extension if there is any.

val x =sc.textFile("pnaptest")

3. The output shows an RDD is created. Then, we can view the file contents by
using this command to call an action:

x.take(11).foreach(println)



This command instructs Spark to print 11 lines from the file you specified. To
perform an action on this file (value x), add another value y, and do a map
transformation.

4. For example, you can print the characters in reverse with this command:

val y = x.map(_.reverse)

5. The system creates a child RDD in relation to the first one. Then, specify
how many lines you want to print from the value y:

y.take(11).foreach(println)



The output prints 11 lines of the pnaptest file in the reverse order.

When done, exit the shell using ctrl-d.

Conclusion

You should now have a working installation of Apache Spark on Windows 10 with
all dependencies installed. Get started running an instance of Spark in your
Windows environment.

Our suggestion is to also learn more about what Spark DataFrame is, the
features, and how to use Spark DataFrame when collecting data.

Was this article helpful?
YesNo

Share on X (Twitter) Share on Facebook Share on LinkedIn Share on Email
Goran Jevtic
Goran combines his leadership skills and passion for research, writing, and
technology as a Technical Writing Team Lead at phoenixNAP. Working with multiple
departments and on various projects, he has developed an extraordinary
understanding of cloud and virtualization technology trends and best practices.
Next you should read
Databases DevOps and Development
How to Install Elasticsearch, Logstash, and Kibana (ELK Stack) on CentOS 8
May 6, 2020

--------------------------------------------------------------------------------

Need to install the ELK stack to manage server log files on your CentOS 8?
Follow this step-by-step guide and...
Read more
Databases DevOps and Development
Cassandra vs MongoDB - What are the Differences?
April 27, 2020

--------------------------------------------------------------------------------

Learn about the difference between Cassandra and MongoDB. These NoSQL databases
have some similarities, but...
Read more
Databases DevOps and Development
How to Install Elasticsearch on Ubuntu 18.04
April 23, 2020

--------------------------------------------------------------------------------

Elasticsearch is an open-source engine that enhances searching, storing, and
analyzing capabilities of your...
Read more
Databases DevOps and Development
How to Install Spark on Ubuntu
April 13, 2020

--------------------------------------------------------------------------------

This Spark tutorial shows how to get started with Spark. The guide covers the
procedure for installing Java...
Read more

RECENT POSTS
 * How to Restart Kubernetes Pods
 * User Management in Linux
 * How to Connect to a Remote Server via SSH from Windows, Linux or Mac
 * How to Update Python on Windows, Linux, and Mac
 * How to List All Databases in PostgreSQL

CATEGORIES
 * SysAdmin
 * Virtualization
 * DevOps and Development
 * Security
 * Backup and Recovery
 * Bare Metal Servers
 * Web Servers
 * Networking
 * Databases

COLOCATION
 * Phoenix
 * Ashburn
 * Amsterdam
 * Atlanta
 * Belgrade
 * Singapore

PROMOTIONS
SERVERS
 * Dedicated Servers
 * Database Servers
 * Virtualization Servers
 * High Performance Computing (HPC) Servers
 * Dedicated Streaming Servers
 * Dedicated Game Servers
 * Dedicated Storage Servers
 * SQL Server Hosting
 * Dedicated Servers in Amsterdam
 * Cloud Servers in Europe
 * Big Memory Infrastructure

BUY NOW
CLOUD SERVICES
 * Data Security Cloud
 * VPDC
 * Managed Private Cloud
 * Object Storage

SERVERS
 * Disaster Recovery
 * Web Hosting Reseller
 * SaaS Hosting

INDUSTRIES
 * Web Hosting Providers
 * Legal
 * MSPs & VARs
 * Media Hosting
 * Online Gaming
 * SaaS Hosting Solutions
 * Ecommerce Hosting Solutions

COMPLIANCE
 * HIPAA Ready Hosting
 * PCI Compliant Hosting

NEEDS
 * Disaster Recovery Solutions
 * High Availability Solutions
 * Cloud Evaluation

COMPANY
 * About Us
 * GitHub
 * Blog
 * RFP Template
 * Careers

CONNECT
 * Events
 * Press
 * Contact Us

 * PhoenixNAP Home
 * Blog
 * Resources
 * Glossary
 * GitHub
 * RFP Template


 *  Live Chat
 *  Get a Quote
 *  Support | 1-855-330-1509
 *  Sales | 1-877-588-5918


 * Contact Us
 * Legal
 * Privacy Policy
 * Terms of Use
 * DMCA
 * GDPR
 * Sitemap

Privacy Center Do not sell or share my personal information
 * Contact Us
 * Legal
 * Privacy Policy
 * Terms of Use
 * DMCA
 * GDPR
 * Sitemap

© 2022 Copyright phoenixNAP | Global IT Services. All Rights Reserved.

Live Chat ↗


searchtwitterfacebooklinkedinchevron-circle-upyoutube-playinstagram