docs.aws.amazon.com
Open in
urlscan Pro
108.138.36.2
Public Scan
URL:
https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html
Submission: On August 24 via api from US — Scanned from DE
Submission: On August 24 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
SELECT YOUR COOKIE PREFERENCES We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can click “Customize cookies” to decline performance cookies. If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To continue without accepting these cookies, click “Continue without accepting.” To make more detailed choices or learn more, click “Customize cookies.” Accept all cookiesContinue without acceptingCustomize cookies CUSTOMIZE COOKIE PREFERENCES We use cookies and similar tools (collectively, "cookies") for the following purposes. ESSENTIAL Essential cookies are necessary to provide our site and services and cannot be deactivated. They are usually set in response to your actions on the site, such as setting your privacy preferences, signing in, or filling in forms. PERFORMANCE Performance cookies provide anonymous statistics about how customers navigate our site so we can improve site experience and performance. Approved third parties may perform analytics on our behalf, but they cannot use the data for their own purposes. Allow performance category Allowed FUNCTIONAL Functional cookies help us provide useful site features, remember your preferences, and display relevant content. Approved third parties may set these cookies to provide certain site features. If you do not allow these cookies, then some or all of these services may not function properly. Allow functional category Allowed ADVERTISING Advertising cookies may be set through our site by us or our advertising partners and help us deliver relevant marketing content. If you do not allow these cookies, you will experience less relevant advertising. Allow advertising category Allowed Blocking some types of cookies may impact your experience of our sites. You may review and change your choices at any time by clicking Cookie preferences in the footer of this site. We and selected third-parties use cookies or similar technologies as specified in the AWS Cookie Notice. CancelSave preferences UNABLE TO SAVE COOKIE PREFERENCES We will only store essential cookies at this time, because we were unable to save your cookie preferences. If you want to change your cookie preferences, try again later using the link in the AWS console footer, or contact support if the problem persists. Dismiss Contact Us English Create an AWS Account 1. AWS 2. ... 3. Documentation 4. Amazon EMR Documentation 5. Management Guide Feedback Preferences AMAZON EMR MANAGEMENT GUIDE * What is Amazon EMR? * Overview * Benefits * Architecture * Setting up Amazon EMR * Getting started tutorial * What's new with the console? * Amazon EMR Studio * How it works * Considerations * Configure EMR Studio * Administrator permissions to create an EMR Studio * Set up an Amazon EMR Studio * Choose an authentication mode * Create an EMR Studio service role * Configure EMR Studio user permissions * Create an EMR Studio * Assign and manage users * Manage a Studio * Control EMR Studio network traffic * Create cluster templates * Access and permissions for Git-based repositories * Optimize Spark jobs * Use an EMR Studio * Workspace basics * Workspace collaboration * Run Workspace with a runtime role * Run Workspace notebooks programmatically * Browse data with SQL Explorer * Attach a cluster to a Workspace * Link Git repositories * Debug applications and jobs * Install kernels and libraries * Enhance kernels with magic commands * Use multi-language notebooks with Spark kernels * EMR Notebooks * Notebooks in new console * Considerations * Creating a Notebook * Working with EMR Notebooks * Programmatic execution * CLI command samples * Boto3 SDK sample script * Ruby sample script * User impersonation for Spark * Security * Installing and using kernels and libraries * Associating Git-based repositories with EMR Notebooks * Prerequisites and considerations * Add a Git-based repository to Amazon EMR * Update or delete a Git-based repository * Link or unlink a Git-based repository * Create a new Notebook with an associated Git repository * Use Git repositories in a Notebook * Plan and configure clusters * Launch a cluster quickly * Configure cluster location and data storage * Choose an AWS Region * Work with storage and file systems * Prepare input data * Types of input Amazon EMR can accept * How to get data into Amazon EMR * Upload data to Amazon S3 * Upload data with AWS DataSync * Import files with distributed cache * How to process compressed files * Import DynamoDB data into Hive * Connect to data with AWS Direct Connect * Upload large amounts of data with AWS Snowball * Configure an output location * What formats can Amazon EMR return? * How to write data to an Amazon S3 bucket you don't own * Compress the output of your cluster * Plan and configure primary nodes * Supported applications and features * Launch an Amazon EMR Cluster with multiple primary nodes * Amazon EMR integration with EC2 placement groups * Considerations and best practices * EMR clusters on AWS Outposts * EMR clusters on AWS Local Zones * Configure Docker * Control cluster termination * Configuring a cluster to continue or terminate after step execution * Using an auto-termination policy * Using termination protection * Working with AMIs * Using the default AMI * Using a custom AMI * Changing the Amazon Linux release when creating a cluster * Specifying the Amazon EBS root device volume size * Configure cluster software * Create bootstrap actions * Configure cluster hardware and networking * Understand node types * Configure Amazon EC2 instances * Supported instance types * Instance purchasing options * Instance storage * Comparing Amazon EBS volume types gp2 and gp3 * Selecting IOPS and throughput when migrating to gp3 * Configure networking * Amazon VPC options * Set up a VPC to host clusters * Launch clusters into a VPC * Minimum Amazon S3 policy for private subnet * Configure instance fleets or instance groups * Configure instance fleets * Use capacity reservations with instance fleets * Configure uniform instance groups * Instance and Availability Zone flexibility * Guidelines and best practices * Configure cluster logging and debugging * Tag clusters * Tag restrictions * Tag resources for billing * Add tags to a cluster * View tags on a cluster * Remove tags from a cluster * Drivers and third-party application integration * Use business intelligence tools with Amazon EMR * Security * Use security configurations to set up cluster security * Create a security configuration * Specify a security configuration for a cluster * Data protection * Encrypt data at rest and in transit * Encryption options * Create keys and certificates for data encryption * IAM with Amazon EMR * How Amazon EMR works with IAM * Runtime roles for Amazon EMR steps * Configure service roles for Amazon EMR * Service roles used by Amazon EMR * Amazon EMR role * EC2 instance profile * Auto Scaling role * EMR Notebooks role * Using the service-linked role * Customize IAM roles * Configure IAM roles for EMRFS * Resource-based policies for AWS Glue * Use IAM roles with applications that call AWS services directly * Allow users and groups to create and modify roles * Identity-based policy examples * Policy best practices * Allow users to view their own permissions * Managed policies * Full access (v2 scoped) * Full access (path to deprecation) * Read-only (v2 scoped) * Read-only (path to deprecation) * Policies for tag-based access control * Denying the ModifyInstanceGroup action * Troubleshooting * Authenticate to cluster nodes * Use an EC2 key pair for SSH credentials * Use Kerberos authentication * Supported applications * Architecture options * Configuring Kerberos on Amazon EMR * Security configuration and cluster settings * Configuration examples * Configuring a cluster for Kerberos-authenticated HDFS users and SSH connections * Connecting with SSH * Tutorial: Cluster-dedicated KDC * Tutorial: Cross-realm trust * Use LDAP authentication * Overview * LDAP components * Considerations * Configure LDAP * Secrets Manager permissions * LDAP security configuration * Launch a cluster that uses LDAP * Examples * Integrate Amazon EMR with Lake Formation * Enable Lake Formation with Amazon EMR * Hudi and Lake Formation * Considerations * Integrate Amazon EMR with Apache Ranger * Ranger overview * Ranger architecture * Amazon EMR components * Application support and limitations * Set up Amazon EMR for Apache Ranger * Set up Ranger Admin server * TLS certificates * Service definition installation * Network traffic rules * IAM roles for native integration with Apache Ranger * EC2 instance profile * IAM role for Apache Ranger * IAM role for other AWS services * Validate your permissions * Create the EMR security configuration * Store TLS certificates in AWS Secrets Manager * Start an EMR cluster * Configure Zeppelin * Known issues * Apache Ranger plugins * Apache Hive plugin * Apache Spark plugin * EMRFS S3 plugin * Trino plugin * Apache Ranger troubleshooting * EMR cluster failed to provision * Queries are unexpectedly failing * Control network traffic with security groups * Working with Amazon EMR-managed security groups * Working with additional security groups * Specifying security groups * Security groups for EMR Notebooks * Using block public access * Compliance validation * Resilience * Infrastructure security * Connect to Amazon EMR using an interface VPC endpoint * Manage clusters * Connect to a cluster * Before you connect * Connect to the primary node using SSH * Amazon EMR service ports * View web interfaces hosted on Amazon EMR clusters * Option 1: Set up an SSH tunnel to the primary node using local port forwarding * Option 2, part 1: Set up an SSH tunnel to the primary node using dynamic port forwarding * Option 2, part 2: Configure proxy settings to view websites hosted on the primary node * Submit work to a cluster * Add steps with the console * Add steps with the CLI * Running multiple steps * Viewing steps * Canceling steps * View and monitor a cluster * View cluster status and details * Enhanced step debugging * View application history * View persistent application user interfaces * View a high-level application history * View log files * View cluster instances in Amazon EC2 * CloudWatch events and metrics * Monitor metrics * Monitor events * Respond to events * Create rules * Set alarms * Respond to insufficient capacity events * View cluster application metrics with Ganglia * Logging Amazon EMR API calls in AWS CloudTrail * Use cluster scaling * Managed scaling * Configure managed scaling * Node allocation strategies * Managed scaling metrics * Automatic scaling with a custom policy * Resize a running cluster * Provisioning timeouts * Provisioning timeout for launch * Provisioning timeout for resize * Cluster scale-down * Terminate a cluster * Clone a cluster * Automate recurring clusters with AWS Data Pipeline * Troubleshoot clusters * Troubleshooting tools * View and restart processes * Common errors * Error codes * Bootstrap failures * Primary with non zero code * BA download failed primary * File not found primary * Internal errors * EC2 insufficient capacity AZ * Spot price increase primary * Spot no capacity primary * Validation failures * Subnet not from one VPC * Security group not from one VPC * Invalid SSH key name * Instance type not supported * Resource errors * Cluster terminates with NO_SLAVE_LEFT and core nodes FAILED_BY_MASTER * Cannot replicate block, only managed to replicate to zero nodes. * EC2 QUOTA EXCEEDED * Too many fetch-failures * File could only be replicated to 0 nodes instead of 1 * Deny-listed nodes * Throttling errors * Instance type not supported * EC2 is out of capacity * Input and output errors * Permissions errors * Hive cluster errors * VPC errors * Streaming cluster errors * Custom JAR cluster errors * AWS GovCloud (US-West) errors * Troubleshoot failed clusters * Step 1: Gather data about the issue * Step 2: Check the environment * Step 3: Look at the last state change * Step 4: Examine the log files * Step 5: Test the cluster step by step * Troubleshoot slow clusters * Step 1: Gather data about the issue * Step 2: Check the environment * Step 3: Examine the log files * Step 4: Check cluster and instance health * Step 5: Check for suspended groups * Step 6: Review configuration settings * Step 7: Examine input data * Troubleshoot a Lake Formation cluster * Write applications that launch and manage clusters * End-to-end Amazon EMR Java source code sample * Common concepts for API calls * Use SDKs to call Amazon EMR APIs * Using the AWS SDK for Java to create an Amazon EMR cluster * Manage Amazon EMR Service Quotas * What are Amazon EMR Service Quotas * How to manage Amazon EMR Service Quotas * When to set up EMR events in CloudWatch * AWS glossary What's new with the console? - Amazon EMR AWSDocumentationAmazon EMR DocumentationManagement Guide What console am I in?Using the old consoleSummary of differences WHAT'S NEW WITH THE CONSOLE? PDF Amazon EMR has migrated to a new experience. The new console offers an updated interface that provides you with an intuitive way to manage your Amazon EMR environment and gives you convenient access to documentation, product information, and other resources. This page describes important differences between the old console experience and the new AWS Management Console for Amazon EMR. WHAT CONSOLE AM I IN? To determine the Amazon EMR console that you currently use, view the URL for the console page in your browser: * New console URL – https://console.aws.amazon.com/emr * Old console URL – https://console.aws.amazon.com/elasticmapreduce NOTE The Amazon EMR console defaults to the new Amazon EMR console experience. To use the old console, select Switch to the old console from the banner at the top of the console or from the side navigation. Amazon EMR will remember your preference to use the old console for 8 hours. After the time limit expires, the experience will once again default to the new console. You can continue to access and use the old console with the in-app Switch to the old console link until we deprecate the old console starting September 30, 2023. The Amazon EMR console functionality is migrating to the new experience in phases. The following table lists the main Amazon EMR console components and their console migration status. Amazon EMR console component New console Old console EMR Studio1 ✔ ✔ Create and manage clusters ✔ ✔ Block public access ✔ ✔ Monitor Amazon CloudWatch Events ✔ ✔ Security configurations ✔ ✔ Virtual clusters (Amazon EMR on EKS) ✔ ✔ View and manage your Amazon Virtual Private Cloud subnets2 ✔ ✔ Notebooks3 ✔ ✔ 1 EMR Studio uses the new interface experience in both the new and old consoles. 2 In the new console, you can view and manage your Amazon VPC subnets within the Networking section when you create a cluster. In the old console, use the link in the left-hand navigation bar to access the list of Amazon VPC subnets. 3 EMR Notebooks are available as EMR Studio Workspaces in the new console. You can still use your existing notebooks in the old console, but you can't create new notebooks in the old console. The Create Workspace button in the new console replaces this functionality. To access or create Workspaces, EMR Notebooks users need additional IAM role permissions. For more information, see Amazon EMR Notebooks are Amazon EMR Studio Workspaces in new console and What's new in the console? USING THE OLD CONSOLE The Amazon EMR console defaults to the new Amazon EMR console experience. To use the old console, select Switch to the old console from the banner at the top of the console or from the side navigation. Amazon EMR will remember your preference to use the old console for 8 hours. After the time limit expires, the experience will once again default to the new console. You can continue to access and use the old console with the in-app Switch to the old console link until we deprecate the old console starting September 30, 2023. SUMMARY OF DIFFERENCES This section outlines the differences between the old Amazon EMR console and the new Amazon EMR console experiences. The differences fall into the following categories: * Cluster compatibility between old and new console * Differences when you create clusters * Differences when you view or edit cluster details * Differences when you list and search for clusters * Differences when you work with security configurations CLUSTER COMPATIBILITY BETWEEN OLD AND NEW CONSOLE In some cases, a cluster that you created in the old Amazon EMR console might not be compatible with the new console. The following list describes compatibility requirements for the new Amazon EMR console. * The new console supports clusters created in Amazon EMR releases 5.20.1 and later. * You can clone clusters that use automatic scaling in the new console, but you can only create new clusters if you want to manually scale them or use managed scaling. To create and work with clusters that are not compatible with the new console, you can use the AWS Command Line Interface (AWS CLI), the AWS SDK, or the old console. DIFFERENCES WHEN YOU CREATE CLUSTERS The following table highlights the differences that you can expect when you create clusters with the new Amazon EMR console as opposed to the old Amazon EMR console. Capability New console Old console Terminology: Amazon EMR cluster node types Primary, core, task Master, core, task Amazon EMR supported releases1 Amazon EMR release 5.20.1 and later All Amazon EMR releases Quickly launching a cluster Use the Create cluster button under the Summary panel Use the Create cluster - Quick Options page Configuring a Spot provisioning timeout Define a timeout period for provisioning instances for each fleet in your cluster. You can't customize a provisioning timeout when you create a cluster. Service roles and Amazon EC2 instance profile role The new console does not create default roles; you must create roles with the IAM Console or select an already-created IAM role Supports default role creation with v1 and v2 policies, or you can select an already-created IAM role Cluster visibility From within the Amazon EMR console, you can't make a cluster visible to all a users; your IAM policy determines cluster access From within the Amazon EMR console, you can make a cluster visible to all a users if you use the deprecated v1 role-creation policies Networking - configure private subnets You must configure Amazon S3 endpoints and NAT gateways from their respective Amazon S3 and Amazon VPC consoles You can configure Amazon S3 endpoints and NAT gateways directly from the Create cluster workflow in the old console EMR File System consistent view (EMRFS CV) With the release of Amazon S3 strong read-after-write consistency on December 1, 2020, you don't need to use EMRFS CV with your EMR clusters EMRFS CV is enabled, but you can turn off EMRFS CV and delete the Amazon DynamoDB database that it uses; see Consistent view for more information Debugging You can debug jobs using the Application UI interface on the cluster details page You can use a debugger tool (step 3 in advanced options) to debug jobs for clusters that run on Amazon EMR releases 4.1.0 through 5.27.0 1 You can't create or edit clusters using releases earlier than Amazon EMR 5.20.1 in the new console, but any existing clusters created using releases earlier than 5.20.1 will continue to work. To create and edit clusters with Amazon EMR releases earlier than 5.20.1, use the API or CLI, or switch back to the old console. DIFFERENCES WHEN YOU LIST AND SEARCH FOR CLUSTERS The following table highlights the differences that you can expect when you view and search for clusters in the list view with the new Amazon EMR console as opposed to the old Amazon EMR console. NOTE For both the old and new consoles, when you apply a data filter to the cluster list, it queries the entire database. But when you enter a text string into the search box, the search only applies to the results that the list has loaded client side. Capability New console Old console Viewing cluster details You can select the Cluster ID to view exhaustive cluster details like configuration options, persistent application UIs, and logs. You can expand and collapse each cluster row to view information like configuration details and to access links for cluster monitoring and logs. Searching for clusters Use a single search field to enter text search queries and to create and apply data filters like "Status = Any active status". Use a dropdown to refine the state of the clusters (Active, Terminated, Failed) and a separate field to enter a text search query. Finding failed clusters To search for failed clusters, apply the filter Status = Terminated with errors. To search for failed clusters, apply the filter Failed clusters. DIFFERENCES WHEN YOU VIEW OR EDIT CLUSTER DETAILS The following table highlights the differences that you can expect when you view or edit the details for an existing cluster with the new Amazon EMR console as opposed to the old Amazon EMR console. Capability New console Old console Viewing the instances in your instance groups and instance fleets, along with scaling, provisioning, resizing, and termination options View instance options and details in the Instances tab. View termination options in the Properties tab. View instance configuration and termination options in the Hardware tab. Viewing app UIs, logs, and configurations (Apache Spark UI, Spark History service, Apache Tez UI, YARN timeline server) View cluster configurations in the Configurations tab. Launch a live, persistent, application UI to see the logs for an application from the Applications tab. View cluster configurations in the Configurations tab. Launch a live, persistent, application UI to see the logs for an application from the Applications user interfaces tab. As of January 2023, high-level application history is no longer available. Exporting a cluster to CLI Option available from cluster detail and list view Actions menus as "View command for cloning cluster" Option available from cluster list view Actions menus as "AWS CLI Export" DIFFERENCES WHEN YOU WORK WITH SECURITY CONFIGURATIONS The following table highlights the differences that you can expect when you configure security options with the new Amazon EMR console as opposed to the old Amazon EMR console. Capability New console Old console Cloning security configurations ✔ Federated governance using Trino and Apache Ranger ✔ Using a runtime role to submit work to a cluster1 ✔ Authorizing access to EMR File System (EMRFS) data Amazon S3 access points AWS Identity and Access Management (IAM) roles AWS Lake Formation access controls Runtime roles SAML federation 1 To pass a role during step submission, your cluster must use a security configuration with an IAM permissions policy attached so that the a user can pass only the approved roles and your jobs can access Amazon EMR resources. For more information, see Runtime roles for Amazon EMR steps. Javascript is disabled or is unavailable in your browser. To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions. Document Conventions Getting started tutorial Amazon EMR Studio Did this page help you? - Yes Thanks for letting us know we're doing a good job! If you've got a moment, please tell us what we did right so we can do more of it. Did this page help you? - No Thanks for letting us know this page needs work. We're sorry we let you down. If you've got a moment, please tell us how we can make the documentation better. DID THIS PAGE HELP YOU? Yes No Provide feedback NEXT TOPIC: Amazon EMR Studio PREVIOUS TOPIC: Getting started tutorial NEED HELP? * Try AWS re:Post * Connect with an AWS IQ expert PrivacySite termsCookie preferences © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. ON THIS PAGE * What console am I in? * Using the old console * Summary of differences DID THIS PAGE HELP YOU? - NO Thanks for letting us know this page needs work. We're sorry we let you down. If you've got a moment, please tell us how we can make the documentation better. Feedback