docs.aws.amazon.com Open in urlscan Pro
108.138.36.2  Public Scan

URL: https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html
Submission: On August 24 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

SELECT YOUR COOKIE PREFERENCES

We use essential cookies and similar tools that are necessary to provide our
site and services. We use performance cookies to collect anonymous statistics so
we can understand how customers use our site and make improvements. Essential
cookies cannot be deactivated, but you can click “Customize cookies” to decline
performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide
useful site features, remember your preferences, and display relevant content,
including relevant advertising. To continue without accepting these cookies,
click “Continue without accepting.” To make more detailed choices or learn more,
click “Customize cookies.”

Accept all cookiesContinue without acceptingCustomize cookies


CUSTOMIZE COOKIE PREFERENCES

We use cookies and similar tools (collectively, "cookies") for the following
purposes.


ESSENTIAL

Essential cookies are necessary to provide our site and services and cannot be
deactivated. They are usually set in response to your actions on the site, such
as setting your privacy preferences, signing in, or filling in forms.




PERFORMANCE

Performance cookies provide anonymous statistics about how customers navigate
our site so we can improve site experience and performance. Approved third
parties may perform analytics on our behalf, but they cannot use the data for
their own purposes.

Allow performance category
Allowed


FUNCTIONAL

Functional cookies help us provide useful site features, remember your
preferences, and display relevant content. Approved third parties may set these
cookies to provide certain site features. If you do not allow these cookies,
then some or all of these services may not function properly.

Allow functional category
Allowed


ADVERTISING

Advertising cookies may be set through our site by us or our advertising
partners and help us deliver relevant marketing content. If you do not allow
these cookies, you will experience less relevant advertising.

Allow advertising category
Allowed

Blocking some types of cookies may impact your experience of our sites. You may
review and change your choices at any time by clicking Cookie preferences in the
footer of this site. We and selected third-parties use cookies or similar
technologies as specified in the AWS Cookie Notice.

CancelSave preferences




UNABLE TO SAVE COOKIE PREFERENCES

We will only store essential cookies at this time, because we were unable to
save your cookie preferences.

If you want to change your cookie preferences, try again later using the link in
the AWS console footer, or contact support if the problem persists.

Dismiss


Contact Us
English


Create an AWS Account
 1. AWS
 2. ...
    
    
 3. Documentation
 4. Amazon EMR Documentation
 5. Management Guide

Feedback
Preferences


AMAZON EMR


MANAGEMENT GUIDE

 * What is Amazon EMR?
    * Overview
    * Benefits
    * Architecture

 * Setting up Amazon EMR
 * Getting started tutorial
 * What's new with the console?
 * Amazon EMR Studio
    * How it works
    * Considerations
    * Configure EMR Studio
       * Administrator permissions to create an EMR Studio
       * Set up an Amazon EMR Studio
          * Choose an authentication mode
          * Create an EMR Studio service role
          * Configure EMR Studio user permissions
          * Create an EMR Studio
          * Assign and manage users
      
       * Manage a Studio
       * Control EMR Studio network traffic
       * Create cluster templates
       * Access and permissions for Git-based repositories
       * Optimize Spark jobs
   
    * Use an EMR Studio
       * Workspace basics
       * Workspace collaboration
       * Run Workspace with a runtime role
       * Run Workspace notebooks programmatically
       * Browse data with SQL Explorer
       * Attach a cluster to a Workspace
       * Link Git repositories
       * Debug applications and jobs
       * Install kernels and libraries
       * Enhance kernels with magic commands
       * Use multi-language notebooks with Spark kernels

 * EMR Notebooks
    * Notebooks in new console
    * Considerations
    * Creating a Notebook
    * Working with EMR Notebooks
    * Programmatic execution
       * CLI command samples
       * Boto3 SDK sample script
       * Ruby sample script
   
    * User impersonation for Spark
    * Security
    * Installing and using kernels and libraries
    * Associating Git-based repositories with EMR Notebooks
       * Prerequisites and considerations
       * Add a Git-based repository to Amazon EMR
       * Update or delete a Git-based repository
       * Link or unlink a Git-based repository
       * Create a new Notebook with an associated Git repository
       * Use Git repositories in a Notebook

 * Plan and configure clusters
    * Launch a cluster quickly
    * Configure cluster location and data storage
       * Choose an AWS Region
       * Work with storage and file systems
       * Prepare input data
          * Types of input Amazon EMR can accept
          * How to get data into Amazon EMR
             * Upload data to Amazon S3
             * Upload data with AWS DataSync
             * Import files with distributed cache
             * How to process compressed files
             * Import DynamoDB data into Hive
             * Connect to data with AWS Direct Connect
             * Upload large amounts of data with AWS Snowball
      
       * Configure an output location
          * What formats can Amazon EMR return?
          * How to write data to an Amazon S3 bucket you don't own
          * Compress the output of your cluster
   
    * Plan and configure primary nodes
       * Supported applications and features
       * Launch an Amazon EMR Cluster with multiple primary nodes
       * Amazon EMR integration with EC2 placement groups
       * Considerations and best practices
   
    * EMR clusters on AWS Outposts
    * EMR clusters on AWS Local Zones
    * Configure Docker
    * Control cluster termination
       * Configuring a cluster to continue or terminate after step execution
       * Using an auto-termination policy
       * Using termination protection
   
    * Working with AMIs
       * Using the default AMI
       * Using a custom AMI
       * Changing the Amazon Linux release when creating a cluster
       * Specifying the Amazon EBS root device volume size
   
    * Configure cluster software
       * Create bootstrap actions
   
    * Configure cluster hardware and networking
       * Understand node types
       * Configure Amazon EC2 instances
          * Supported instance types
             * Instance purchasing options
             * Instance storage
                * Comparing Amazon EBS volume types gp2 and gp3
                * Selecting IOPS and throughput when migrating to gp3
         
          * Configure networking
             * Amazon VPC options
             * Set up a VPC to host clusters
             * Launch clusters into a VPC
             * Minimum Amazon S3 policy for private subnet
         
          * Configure instance fleets or instance groups
             * Configure instance fleets
             * Use capacity reservations with instance fleets
             * Configure uniform instance groups
             * Instance and Availability Zone flexibility
             * Guidelines and best practices
   
    * Configure cluster logging and debugging
    * Tag clusters
       * Tag restrictions
       * Tag resources for billing
       * Add tags to a cluster
       * View tags on a cluster
       * Remove tags from a cluster
   
    * Drivers and third-party application integration
       * Use business intelligence tools with Amazon EMR

 * Security
    * Use security configurations to set up cluster security
       * Create a security configuration
       * Specify a security configuration for a cluster
   
    * Data protection
       * Encrypt data at rest and in transit
          * Encryption options
          * Create keys and certificates for data encryption
   
    * IAM with Amazon EMR
       * How Amazon EMR works with IAM
       * Runtime roles for Amazon EMR steps
       * Configure service roles for Amazon EMR
          * Service roles used by Amazon EMR
             * Amazon EMR role
             * EC2 instance profile
             * Auto Scaling role
             * EMR Notebooks role
             * Using the service-linked role
         
          * Customize IAM roles
          * Configure IAM roles for EMRFS
          * Resource-based policies for AWS Glue
          * Use IAM roles with applications that call AWS services directly
          * Allow users and groups to create and modify roles
      
       * Identity-based policy examples
          * Policy best practices
          * Allow users to view their own permissions
          * Managed policies
             * Full access (v2 scoped)
             * Full access (path to deprecation)
             * Read-only (v2 scoped)
             * Read-only (path to deprecation)
         
          * Policies for tag-based access control
          * Denying the ModifyInstanceGroup action
          * Troubleshooting
   
    * Authenticate to cluster nodes
       * Use an EC2 key pair for SSH credentials
       * Use Kerberos authentication
          * Supported applications
          * Architecture options
          * Configuring Kerberos on Amazon EMR
             * Security configuration and cluster settings
                * Configuration examples
            
             * Configuring a cluster for Kerberos-authenticated HDFS users and
               SSH connections
         
          * Connecting with SSH
          * Tutorial: Cluster-dedicated KDC
          * Tutorial: Cross-realm trust
      
       * Use LDAP authentication
          * Overview
          * LDAP components
          * Considerations
          * Configure LDAP
             * Secrets Manager permissions
             * LDAP security configuration
             * Launch a cluster that uses LDAP
         
          * Examples
   
    * Integrate Amazon EMR with Lake Formation
       * Enable Lake Formation with Amazon EMR
       * Hudi and Lake Formation
       * Considerations
   
    * Integrate Amazon EMR with Apache Ranger
       * Ranger overview
          * Ranger architecture
          * Amazon EMR components
      
       * Application support and limitations
       * Set up Amazon EMR for Apache Ranger
          * Set up Ranger Admin server
             * TLS certificates
             * Service definition installation
             * Network traffic rules
         
          * IAM roles for native integration with Apache Ranger
             * EC2 instance profile
             * IAM role for Apache Ranger
             * IAM role for other AWS services
             * Validate your permissions
         
          * Create the EMR security configuration
          * Store TLS certificates in AWS Secrets Manager
          * Start an EMR cluster
          * Configure Zeppelin
          * Known issues
      
       * Apache Ranger plugins
          * Apache Hive plugin
          * Apache Spark plugin
          * EMRFS S3 plugin
          * Trino plugin
      
       * Apache Ranger troubleshooting
          * EMR cluster failed to provision
          * Queries are unexpectedly failing
   
    * Control network traffic with security groups
       * Working with Amazon EMR-managed security groups
       * Working with additional security groups
       * Specifying security groups
       * Security groups for EMR Notebooks
       * Using block public access
   
    * Compliance validation
    * Resilience
    * Infrastructure security
       * Connect to Amazon EMR using an interface VPC endpoint

 * Manage clusters
    * Connect to a cluster
       * Before you connect
       * Connect to the primary node using SSH
          * Amazon EMR service ports
          * View web interfaces hosted on Amazon EMR clusters
             * Option 1: Set up an SSH tunnel to the primary node using local
               port forwarding
             * Option 2, part 1: Set up an SSH tunnel to the primary node using
               dynamic port forwarding
             * Option 2, part 2: Configure proxy settings to view websites
               hosted on the primary node
   
    * Submit work to a cluster
       * Add steps with the console
       * Add steps with the CLI
       * Running multiple steps
       * Viewing steps
       * Canceling steps
   
    * View and monitor a cluster
       * View cluster status and details
       * Enhanced step debugging
       * View application history
          * View persistent application user interfaces
          * View a high-level application history
      
       * View log files
       * View cluster instances in Amazon EC2
       * CloudWatch events and metrics
          * Monitor metrics
          * Monitor events
          * Respond to events
             * Create rules
             * Set alarms
             * Respond to insufficient capacity events
      
       * View cluster application metrics with Ganglia
       * Logging Amazon EMR API calls in AWS CloudTrail
   
    * Use cluster scaling
       * Managed scaling
          * Configure managed scaling
          * Node allocation strategies
          * Managed scaling metrics
      
       * Automatic scaling with a custom policy
       * Resize a running cluster
       * Provisioning timeouts
          * Provisioning timeout for launch
          * Provisioning timeout for resize
      
       * Cluster scale-down
   
    * Terminate a cluster
    * Clone a cluster
    * Automate recurring clusters with AWS Data Pipeline

 * Troubleshoot clusters
    * Troubleshooting tools
    * View and restart processes
    * Common errors
       * Error codes
          * Bootstrap failures
             * Primary with non zero code
             * BA download failed primary
             * File not found primary
         
          * Internal errors
             * EC2 insufficient capacity AZ
             * Spot price increase primary
             * Spot no capacity primary
         
          * Validation failures
             * Subnet not from one VPC
             * Security group not from one VPC
             * Invalid SSH key name
             * Instance type not supported
      
       * Resource errors
          * Cluster terminates with NO_SLAVE_LEFT and core nodes
            FAILED_BY_MASTER
          * Cannot replicate block, only managed to replicate to zero nodes.
          * EC2 QUOTA EXCEEDED
          * Too many fetch-failures
          * File could only be replicated to 0 nodes instead of 1
          * Deny-listed nodes
          * Throttling errors
          * Instance type not supported
          * EC2 is out of capacity
      
       * Input and output errors
       * Permissions errors
       * Hive cluster errors
       * VPC errors
       * Streaming cluster errors
       * Custom JAR cluster errors
       * AWS GovCloud (US-West) errors
   
    * Troubleshoot failed clusters
       * Step 1: Gather data about the issue
       * Step 2: Check the environment
       * Step 3: Look at the last state change
       * Step 4: Examine the log files
       * Step 5: Test the cluster step by step
   
    * Troubleshoot slow clusters
       * Step 1: Gather data about the issue
       * Step 2: Check the environment
       * Step 3: Examine the log files
       * Step 4: Check cluster and instance health
       * Step 5: Check for suspended groups
       * Step 6: Review configuration settings
       * Step 7: Examine input data
   
    * Troubleshoot a Lake Formation cluster

 * Write applications that launch and manage clusters
    * End-to-end Amazon EMR Java source code sample
    * Common concepts for API calls
    * Use SDKs to call Amazon EMR APIs
       * Using the AWS SDK for Java to create an Amazon EMR cluster
   
    * Manage Amazon EMR Service Quotas
       * What are Amazon EMR Service Quotas
       * How to manage Amazon EMR Service Quotas
       * When to set up EMR events in CloudWatch

 * AWS glossary

What's new with the console? - Amazon EMR
AWSDocumentationAmazon EMR DocumentationManagement Guide
What console am I in?Using the old consoleSummary of differences


WHAT'S NEW WITH THE CONSOLE?

PDF

Amazon EMR has migrated to a new experience. The new console offers an updated
interface that provides you with an intuitive way to manage your Amazon EMR
environment and gives you convenient access to documentation, product
information, and other resources. This page describes important differences
between the old console experience and the new AWS Management Console for Amazon
EMR.


WHAT CONSOLE AM I IN?

To determine the Amazon EMR console that you currently use, view the URL for the
console page in your browser:

 * New console URL – https://console.aws.amazon.com/emr

 * Old console URL – https://console.aws.amazon.com/elasticmapreduce

NOTE

The Amazon EMR console defaults to the new Amazon EMR console experience. To use
the old console, select Switch to the old console from the banner at the top of
the console or from the side navigation. Amazon EMR will remember your
preference to use the old console for 8 hours. After the time limit expires, the
experience will once again default to the new console. You can continue to
access and use the old console with the in-app Switch to the old console link
until we deprecate the old console starting September 30, 2023.

The Amazon EMR console functionality is migrating to the new experience in
phases. The following table lists the main Amazon EMR console components and
their console migration status.

Amazon EMR console component New console Old console

EMR Studio1

✔

✔

Create and manage clusters

✔

✔

Block public access

✔

✔

Monitor Amazon CloudWatch Events

✔

✔

Security configurations

✔ ✔

Virtual clusters (Amazon EMR on EKS)

✔

✔

View and manage your Amazon Virtual Private Cloud subnets2 ✔ ✔

Notebooks3

✔ ✔

1 EMR Studio uses the new interface experience in both the new and old consoles.

2 In the new console, you can view and manage your Amazon VPC subnets within the
Networking section when you create a cluster. In the old console, use the link
in the left-hand navigation bar to access the list of Amazon VPC subnets.

3 EMR Notebooks are available as EMR Studio Workspaces in the new console. You
can still use your existing notebooks in the old console, but you can't create
new notebooks in the old console. The Create Workspace button in the new console
replaces this functionality. To access or create Workspaces, EMR Notebooks users
need additional IAM role permissions. For more information, see Amazon EMR
Notebooks are Amazon EMR Studio Workspaces in new console and What's new in the
console?


USING THE OLD CONSOLE

The Amazon EMR console defaults to the new Amazon EMR console experience. To use
the old console, select Switch to the old console from the banner at the top of
the console or from the side navigation. Amazon EMR will remember your
preference to use the old console for 8 hours. After the time limit expires, the
experience will once again default to the new console. You can continue to
access and use the old console with the in-app Switch to the old console link
until we deprecate the old console starting September 30, 2023.


SUMMARY OF DIFFERENCES

This section outlines the differences between the old Amazon EMR console and the
new Amazon EMR console experiences. The differences fall into the following
categories:

 * Cluster compatibility between old and new console

 * Differences when you create clusters

 * Differences when you view or edit cluster details

 * Differences when you list and search for clusters

 * Differences when you work with security configurations


CLUSTER COMPATIBILITY BETWEEN OLD AND NEW CONSOLE

In some cases, a cluster that you created in the old Amazon EMR console might
not be compatible with the new console. The following list describes
compatibility requirements for the new Amazon EMR console.

 * The new console supports clusters created in Amazon EMR releases 5.20.1 and
   later.

 * You can clone clusters that use automatic scaling in the new console, but you
   can only create new clusters if you want to manually scale them or use
   managed scaling.

To create and work with clusters that are not compatible with the new console,
you can use the AWS Command Line Interface (AWS CLI), the AWS SDK, or the old
console.


DIFFERENCES WHEN YOU CREATE CLUSTERS

The following table highlights the differences that you can expect when you
create clusters with the new Amazon EMR console as opposed to the old Amazon EMR
console.

Capability New console Old console

Terminology: Amazon EMR cluster node types

Primary, core, task

Master, core, task

Amazon EMR supported releases1

Amazon EMR release 5.20.1 and later

All Amazon EMR releases

Quickly launching a cluster

Use the Create cluster button under the Summary panel

Use the Create cluster - Quick Options page

Configuring a Spot provisioning timeout

Define a timeout period for provisioning instances for each fleet in your
cluster.

You can't customize a provisioning timeout when you create a cluster.

Service roles and Amazon EC2 instance profile role

The new console does not create default roles; you must create roles with the
IAM Console or select an already-created IAM role

Supports default role creation with v1 and v2 policies, or you can select an
already-created IAM role

Cluster visibility

From within the Amazon EMR console, you can't make a cluster visible to all a
users; your IAM policy determines cluster access

From within the Amazon EMR console, you can make a cluster visible to all a
users if you use the deprecated v1 role-creation policies

Networking - configure private subnets

You must configure Amazon S3 endpoints and NAT gateways from their respective
Amazon S3 and Amazon VPC consoles

You can configure Amazon S3 endpoints and NAT gateways directly from the Create
cluster workflow in the old console

EMR File System consistent view (EMRFS CV)

With the release of Amazon S3 strong read-after-write consistency on December 1,
2020, you don't need to use EMRFS CV with your EMR clusters

EMRFS CV is enabled, but you can turn off EMRFS CV and delete the Amazon
DynamoDB database that it uses; see Consistent view for more information

Debugging

You can debug jobs using the Application UI interface on the cluster details
page

You can use a debugger tool (step 3 in advanced options) to debug jobs for
clusters that run on Amazon EMR releases 4.1.0 through 5.27.0

1 You can't create or edit clusters using releases earlier than Amazon EMR
5.20.1 in the new console, but any existing clusters created using releases
earlier than 5.20.1 will continue to work. To create and edit clusters with
Amazon EMR releases earlier than 5.20.1, use the API or CLI, or switch back to
the old console.


DIFFERENCES WHEN YOU LIST AND SEARCH FOR CLUSTERS

The following table highlights the differences that you can expect when you view
and search for clusters in the list view with the new Amazon EMR console as
opposed to the old Amazon EMR console.

NOTE

For both the old and new consoles, when you apply a data filter to the cluster
list, it queries the entire database. But when you enter a text string into the
search box, the search only applies to the results that the list has loaded
client side.

Capability New console Old console

Viewing cluster details

You can select the Cluster ID to view exhaustive cluster details like
configuration options, persistent application UIs, and logs.

You can expand and collapse each cluster row to view information like
configuration details and to access links for cluster monitoring and logs.

Searching for clusters

Use a single search field to enter text search queries and to create and apply
data filters like "Status = Any active status".

Use a dropdown to refine the state of the clusters (Active, Terminated, Failed)
and a separate field to enter a text search query.

Finding failed clusters

To search for failed clusters, apply the filter Status = Terminated with errors.

To search for failed clusters, apply the filter Failed clusters.


DIFFERENCES WHEN YOU VIEW OR EDIT CLUSTER DETAILS

The following table highlights the differences that you can expect when you view
or edit the details for an existing cluster with the new Amazon EMR console as
opposed to the old Amazon EMR console.

Capability New console Old console

Viewing the instances in your instance groups and instance fleets, along with
scaling, provisioning, resizing, and termination options

View instance options and details in the Instances tab. View termination options
in the Properties tab.

View instance configuration and termination options in the Hardware tab.

Viewing app UIs, logs, and configurations

(Apache Spark UI, Spark History service, Apache Tez UI, YARN timeline server)

View cluster configurations in the Configurations tab. Launch a live,
persistent, application UI to see the logs for an application from the
Applications tab.

View cluster configurations in the Configurations tab. Launch a live,
persistent, application UI to see the logs for an application from the
Applications user interfaces tab. As of January 2023, high-level application
history is no longer available.

Exporting a cluster to CLI

Option available from cluster detail and list view Actions menus as "View
command for cloning cluster"

Option available from cluster list view Actions menus as "AWS CLI Export"


DIFFERENCES WHEN YOU WORK WITH SECURITY CONFIGURATIONS

The following table highlights the differences that you can expect when you
configure security options with the new Amazon EMR console as opposed to the old
Amazon EMR console.

Capability New console Old console

Cloning security configurations

✔



Federated governance using Trino and Apache Ranger

✔



Using a runtime role to submit work to a cluster1

✔



Authorizing access to EMR File System (EMRFS) data

Amazon S3 access points

AWS Identity and Access Management (IAM) roles

AWS Lake Formation access controls

Runtime roles

SAML federation

1 To pass a role during step submission, your cluster must use a security
configuration with an IAM permissions policy attached so that the a user can
pass only the approved roles and your jobs can access Amazon EMR resources. For
more information, see Runtime roles for Amazon EMR steps.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please
refer to your browser's Help pages for instructions.

Document Conventions
Getting started tutorial
Amazon EMR Studio
Did this page help you? - Yes

Thanks for letting us know we're doing a good job!

If you've got a moment, please tell us what we did right so we can do more of
it.



Did this page help you? - No

Thanks for letting us know this page needs work. We're sorry we let you down.

If you've got a moment, please tell us how we can make the documentation better.





DID THIS PAGE HELP YOU?

Yes
No
Provide feedback

NEXT TOPIC:

Amazon EMR Studio

PREVIOUS TOPIC:

Getting started tutorial

NEED HELP?

 * Try AWS re:Post 
 * Connect with an AWS IQ expert 

PrivacySite termsCookie preferences
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.


ON THIS PAGE

 * What console am I in?
 * Using the old console
 * Summary of differences





DID THIS PAGE HELP YOU? - NO



Thanks for letting us know this page needs work. We're sorry we let you down.

If you've got a moment, please tell us how we can make the documentation better.




Feedback