117.122.208.134
Open in
urlscan Pro
117.122.208.134
Public Scan
URL:
http://117.122.208.134:7180/static/help/topics/cdh_ports.html
Submission Tags: falconsandbox
Submission: On September 04 via api from US — Scanned from DE
Submission Tags: falconsandbox
Submission: On September 04 via api from US — Scanned from DE
Form analysis
1 forms found in the DOMName: searchform —
<form id="searchform" name="searchform"><label class="visuallyhidden" for="q">Search</label><input class="search" id="q" type="search" name="q" placeholder="Search Docs Below For..."><input class="icon-search" type="submit" value=""></form>
Text Content
CLOUDERA * Cloudera.com * Training * Support * Documentation * Dev Center * |Contact Us * Downloads JavaScript must be enabled in order to use this site. Please enable JavaScript in your browser and refresh the page. Cloudera Enterprise 6.0.x | Other versions PORTS USED BY CDH COMPONENTS All ports listed are TCP. In the following tables, Internal means that the port is used only for communication among the components; External means that the port can be used for either internal or external communication. Table 1. External Ports Component Service Port Configuration Comment Apache Hadoop HDFS DataNode 9866 dfs.datanode.address DataNode HTTP server port 1004 dfs.datanode.address 9864 dfs.datanode.http.address 9865 dfs.datanode.https.address 1006 dfs.datanode.http.address 9867 dfs.datanode.ipc.address NameNode 8020 fs.default.name or fs.defaultFS fs.default.nameis deprecated (but still works) 8022 dfs.namenode. servicerpc-address Optional port used by HDFS daemons to avoid sharing the RPC port used by clients (8020). Cloudera recommends using port 8022. 9870 dfs.http.address or dfs.namenode.http-address dfs.http.addressis deprecated (but still works) 9871 dfs.https.address or dfs.namenode.https-address dfs.https.addressis deprecated (but still works) NFS gateway 2049 nfs port (nfs3.server.port) 4242 mountd port (nfs3.mountd.port) 111 portmapperorrpcbindport 50079 nfs.http.port The NFS gateway daemon uses this port to serve metrics. The port is configurable on versions 5.10 and higher. 50579 nfs.https.port The NFS gateway daemon uses this port to serve metrics. The port is configurable on versions 5.10 and higher. HttpFS 14000 14001 Apache Hadoop YARN (MRv2) ResourceManager 8032 yarn. resourcemanager. address 8033 yarn. resourcemanager. admin.address 8088 yarn. resourcemanager. webapp.address 8090 yarn. resourcemanager. webapp.https.address NodeManager 8042 yarn. nodemanager. webapp.address 8044 yarn. nodemanager. webapp.https.address JobHistory Server 19888 mapreduce. jobhistory. webapp.address 19890 mapreduce. jobhistory. webapp.https.address ApplicationMaster The ApplicationMaster serves an HTTP service using an ephemeral port that cannot be restricted. This port is never accessed directly from outside the cluster by clients. All requests to the ApplicationMaster web server is routed using the YARN ResourceManager (proxy service). Locking down access to ephemeral port ranges within the cluster's network might restrict your access to the ApplicationMaster UI and its logs, along with the ability to look at running applications. Apache Flume Flume Agent 41414 Apache Hadoop KMS Key Management Server 16000 kms_http_port Applies to both Java KeyStore KMS and Key Trustee KMS. Apache HBase Master 16000 hbase.master. port IPC 16010 hbase.master. info.port HTTP RegionServer 16020 hbase. regionserver. port IPC 16030 hbase. regionserver. info.port HTTP REST 20550 hbase.rest.port The default REST port in HBase is 8080. Because this is a commonly used port, Cloudera Manager sets the default to 20550 instead. REST UI 8085 Thrift Server 9090 Pass -p <port> on CLI Thrift Server 9095 9090 Pass --port <port> on CLI Lily HBase Indexer 11060 Apache Hive Metastore 9083 HiveServer2 10000 hive. server2. thrift.port The Beeline command interpreter requires that you specify this port on the command line. If you use Oracle database, you must manually reserve this port. For more information, see Reserving Ports for HiveServer 2. HiveServer2 Web User Interface (UI) 10002 hive. server2. webui.port in hive-site.xml WebHCat Server 50111 templeton.port Apache Hue Server 8888 Apache Impala Impala Daemon 21000 Used to transmit commands and receive results by impala-shell and version 1.2 of the Cloudera ODBC driver. 21050 Used to transmit commands and receive results by applications, such as Business Intelligence tools, using JDBC, the Beeswax query editor in Hue, and version 2.0 or higher of the Cloudera ODBC driver. 25000 Impala web interface for administrators to monitor and troubleshoot. StateStore Daemon 25010 StateStore web interface for administrators to monitor and troubleshoot. Catalog Daemon 25020 Catalog service web interface for administrators to monitor and troubleshoot. Apache Kafka Broker 9092 port The primary communication port used by producers and consumers; also used for inter-broker communication. 9093 ssl_port A secured communication port used by producers and consumers; also used for inter-broker communication. Apache Kudu Master 7051 Kudu Master RPC port 8051 Kudu Master HTTP server port TabletServer 7050 Kudu TabletServer RPC port 8050 Kudu TabletServer HTTP server port Apache Oozie Oozie Server 11000 OOZIE_HTTP_PORT in oozie-env.sh HTTP 11443 HTTPS Apache Sentry Sentry Server 8038 sentry.service. server.rpc-port 51000 sentry.service. web.port Apache Solr Solr Server 8983 All Solr-specific actions, update/query. Apache Spark Default Master RPC port 7077 Default Worker RPC port 7078 Default Master web UI port 18080 Default Worker web UI port 18081 History Server 18088 history.port Apache Sqoop Metastore 16000 sqoop. metastore. server.port Apache ZooKeeper Server (with CDH or Cloudera Manager) 2181 clientPort Client port Table 2. Internal Ports Component Service Port Configuration Comment Apache Hadoop HDFS Secondary NameNode 9868 dfs.secondary.http.address or dfs.namenode. secondary. http-address dfs.secondary.http.addressis deprecated (but still works) 9869 dfs.secondary.https.address JournalNode 8485 dfs.namenode. shared.edits.dir 8480 dfs.journalnode. http-address 8481 dfs.journalnode. https-address Failover Controller 8019 Used for NameNode HA Apache Hadoop YARN (MRv2) ResourceManager 8030 yarn.resourcemanager.scheduler.address 8031 yarn. resourcemanager. resource-tracker. address NodeManager 8040 yarn. nodemanager. localizer. address 8041 yarn. nodemanager. address JobHistory Server 10020 mapreduce. jobhistory. address 10033 mapreduce. jobhistory.admin. address Shuffle HTTP 13562 mapreduce.shuffle.port Apache Hadoop KMS Key Management Server 16001 kms_admin_port Applies to both Java KeyStore KMS and Key Trustee KMS. Apache HBase HQuorumPeer 2181 hbase. zookeeper. property. clientPort HBase-managed ZooKeeper mode 2888 hbase. zookeeper. peerport HBase-managed ZooKeeper mode 3888 hbase. zookeeper. leaderport HBase-managed ZooKeeper mode Apache Impala Impala Daemon 22000 Internal use only. Impala daemons use this port to communicate with each other. 23000 Internal use only. Impala daemons listen on this port for updates from the statestore daemon. StateStore Daemon 24000 Internal use only. The statestore daemon listens on this port for registration/unregistration requests. Catalog Daemon 23020 Internal use only. The catalog daemon listens on this port for updates from the statestore daemon. 26000 Internal use only. The catalog service uses this port to communicate with the Impala daemons. Apache Kafka Broker 9092 port The primary communication port used by producers and consumers; also used for inter-broker communication. 9093 ssl_port A secured communication port used by producers and consumers; also used for inter-broker communication. 9393 jmx_port Internal use only. Used for administration via JMX. 9394 kafka.http.metrics.port Internal use only. This is the port via which the HTTP metric reporter listens. It is used to retrieve metrics through HTTP instead of JMX. MirrorMaker 24042 jmx_port Internal use only. Used to administer the producer and consumer of the MirrorMaker. Apache Solr Solr Server 8984 Solr administrative use. Apache Spark Shuffle service 7337 Apache ZooKeeper Server (with CDH only) 2888 X in server.N =host:X:Y Peer Server (with CDH only) 3888 X in server.N =host:X:Y Peer Server (with CDH and Cloudera Manager) 3181 X in server.N =host:X:Y Peer Server (with CDH and Cloudera Manager) 4181 X in server.N =host:X:Y Peer ZooKeeper JMX port 9010 ZooKeeper will also use another randomly selected port for RMI. To allow Cloudera Manager to monitor ZooKeeper, you must do one of the following: * Open up all ports when the connection originates from the Cloudera Manager Server * Do the following: 1. Open a non-ephemeral port (such as 9011) in the firewall. 2. Install Oracle Java 7u4 JDK or higher. 3. Add the port configuration to the advanced configuration snippet, for example: -Dcom.sun.management. jmxremote.rmi.port=9011 4. Restart ZooKeeper. Page generated September 13, 2018. << Ports Used by Cloudera Navigator Encryption ©2016 Cloudera, Inc. All rights reserved Ports Used by DistCp >> Terms and Conditions Privacy Policy Documentation Cloudera Installation Guide Before You Install Ports Ports Used by CDH Components Hide NavigationPrevious TopicNext TopicToggle HighlightingPrintPrint All * Contents * Index * Glossary * Search * Overview of Cloudera and the Cloudera Documentation Set * Cloudera Primary User Personas * Reference Architectures * CDH Overview * Apache Hive Overview in CDH * Apache Impala Overview * Apache Kudu Overview * Apache Sentry Overview * Apache Spark Overview * External Documentation * Cloudera Manager Overview * Overview of Cloudera Manager Software Management * Parcels * Cloudera Navigator Data Management Overview * Getting Started with Cloudera Navigator * Cloudera Navigator Frequently Asked Questions * Cloudera Navigator Data Encryption Overview * Cloudera Navigator Key Trustee Server Overview * Cloudera Navigator Key HSM Overview * Cloudera Navigator HSM KMS Overview * Cloudera Navigator Encrypt Overview * Cloudera Navigator Optimizer * Proof-of-Concept Installation Guide * Before You Begin * Installing a Proof-of-Concept Cluster * Step 1: Download and Run the Cloudera Manager Server Installer * Step 2: Install CDH Using the Wizard * Step 3: Set Up a Cluster Using the Wizard * Managing the Embedded PostgreSQL Database * Migrating from the Cloudera Manager Embedded PostgreSQL Database Server to an External PostgreSQL Database * Getting Support * Frequently Asked Questions About Cloudera Software * Cloudera Release Notes * Requirements and Supported Versions * Cloudera Installation Guide * Before You Install * Storage Space Planning for Cloudera Manager * Configure Network Names * Disabling the Firewall * Setting SELinux mode * Enabling NTP * (RHEL 6 Compatible Only) Install Python 2.7 * Impala Requirements * Required Privileges for Package-based Installations of CDH * Ports * Ports Used by Cloudera Manager and Cloudera Navigator * Ports Used by Cloudera Navigator Encryption * Ports Used by CDH Components * Ports Used by DistCp * Ports Used by Third-Party Components * Recommended Cluster Hosts and Role Distribution * Custom Installation Solutions * Using an Internal Parcel Repository * Using an Internal Package Repository * Creating Virtual Images of Cluster Hosts * Configuring a Custom Java Home Location * Creating a CDH Cluster Using a Cloudera Manager Template * Installing Cloudera Manager, CDH, and Managed Services * Step 1: Configure a Repository for Cloudera Manager * Step 2: Install Java Development Kit * Step 3: Install Cloudera Manager Server * Step 4: Install and Configure Databases * Install and Configure MariaDB for Cloudera Software * Install and Configure MySQL for Cloudera Software * Install and Configure PostgreSQL for Cloudera Software * Install and Configure Oracle Database for Cloudera Software * Configuring an External Database for Sqoop 2 * Step 5: Set up the Cloudera Manager Database * Step 6: Install CDH and Other Software * Step 7: Set Up a Cluster Using the Wizard * Installing the Cloudera Navigator Data Management Component * Installing Cloudera Navigator Encryption Components * Installing Cloudera Navigator Key Trustee Server * Installing Cloudera Navigator Key HSM * Installing Key Trustee KMS * Installing Navigator HSM KMS Backed by Thales HSM * Installing Navigator HSM KMS Backed by Luna HSM * Installing Cloudera Navigator Encrypt * After Installation * Deploying Clients * Testing the Installation * Installing the GPL Extras Parcel * Migrating from Packages to Parcels * Migrating from Parcels to Packages * Secure Your Cluster * Troubleshooting Installation Problems * Uninstalling Cloudera Manager and Managed Software * Uninstalling a CDH Component From a Single Host * Upgrading Cloudera Enterprise * Cluster Management * Cloudera Manager * Cloudera Manager Admin Console * Cloudera Manager Admin Console Home Page * Displaying Cloudera Manager Documentation * Automatic Logout * Cloudera Manager Frequently Asked Questions * Cloudera Manager API * Using the Cloudera Manager API for Cluster Automation * Cloudera Manager Administration * Starting, Stopping, and Restarting the Cloudera Manager Server * Configuring Cloudera Manager Server Ports * Moving the Cloudera Manager Server to a New Host * Managing the Cloudera Manager Server Log * Cloudera Manager Agents * Starting, Stopping, and Restarting Cloudera Manager Agents * Configuring Cloudera Manager Agents * Managing Cloudera Manager Agent Logs * Configuring Network Settings * Managing Licenses * Sending Usage and Diagnostic Data to Cloudera * Exporting and Importing Cloudera Manager Configuration * Backing up Cloudera Manager * Other Cloudera Manager Tasks and Settings * Cloudera Management Service * Extending Cloudera Manager * Cluster Configuration Overview * Modifying Configuration Properties Using Cloudera Manager * Autoconfiguration * Custom Configuration * Stale Configurations * Client Configuration Files * Viewing and Reverting Configuration Changes * Exporting and Importing Cloudera Manager Configuration * Cloudera Manager Configuration Properties Reference * Managing Clusters * Adding and Deleting Clusters * Starting, Stopping, Refreshing, and Restarting a Cluster * Pausing a Cluster in AWS * Renaming a Cluster * Cluster-Wide Configuration * Managing Services * Managing the HBase Service * Managing HDFS * NameNodes * Backing Up and Restoring HDFS Metadata * Moving NameNode Roles * Sizing NameNode Heap Memory * Backing Up and Restoring NameNode Metadata * DataNodes * Configuring Storage Directories for DataNodes * Configuring Storage Balancing for DataNodes * Performing Disk Hot Swap for DataNodes * JournalNodes * Configuring Short-Circuit Reads * Configuring HDFS Trash * HDFS Balancers * Enabling WebHDFS * Adding HttpFS * Adding and Configuring an NFS Gateway * Setting HDFS Quotas * Configuring Mountable HDFS * Configuring Centralized Cache Management in HDFS * Configuring Proxy Users to Access HDFS * Using CDH with Isilon Storage * Configuring Heterogeneous Storage in HDFS * Managing Apache Hive in CDH * Managing Hue * Adding a Hue Service and Role Instance * Managing Hue Analytics Data Collection * Enabling Hue Applications Using Cloudera Manager * Managing Impala * The Impala Service * Modifying Impala Startup Options * Post-Installation Configuration for Impala * Configuring Impala to Work with ODBC * Configuring Impala to Work with JDBC * Managing Key-Value Store Indexer * Managing Kudu * Managing Oozie * Managing Solr * Managing Spark * Managing Spark Using Cloudera Manager * Managing the Spark History Server * Managing the Sqoop 1 Client * Managing Sqoop 2 * Managing YARN (MRv2) and MapReduce (MRv1) * Managing YARN * Managing MapReduce * Managing ZooKeeper * Configuring Services to Use the GPL Extras Parcel * Managing Hosts * Viewing Host Details * Using the Host Inspector * Adding a Host to the Cluster * Specifying Racks for Hosts * Host Templates * Performing Maintenance on a Cluster Host * Tuning and Troubleshooting Host Decommissioning * Maintenance Mode * Changing Hostnames * Deleting Hosts * Moving a Host Between Clusters * Managing Services * Adding a Service * Comparing Configurations for a Service Between Clusters * Add-on Services * Starting, Stopping, and Restarting Services * Rolling Restart * Aborting a Pending Command * Deleting Services * Renaming a Service * Configuring Maximum File Descriptors * Exposing Hadoop Metrics to Graphite * Exposing Hadoop Metrics to Ganglia * Managing Roles * Role Instances * Role Groups * Monitoring and Diagnostics * Introduction to Cloudera Manager Monitoring * Time Line * Health Tests * Cloudera Manager Admin Console Home Page * Viewing Charts for Cluster, Service, Role, and Host Instances * Configuring Monitoring Settings * Monitoring Clusters * Monitoring Services * Monitoring Service Status * Viewing Service Status * Viewing Service Instance Details * Viewing Role Instance Status * The Processes Tab * Running Diagnostic Commands for Roles * Periodic Stacks Collection * Managing and Monitoring Federated HDFS * Viewing Running and Recent Commands * Monitoring Resource Management * Monitoring Hosts * Host Details * Host Inspector * Monitoring Activities * Monitoring MapReduce Jobs * Viewing and Filtering MapReduce Activities * Viewing the Jobs in a Pig, Oozie, or Hive Activity * Task Attempts * Viewing Activity Details in a Report Format * Comparing Similar Activities * Viewing the Distribution of Task Attempts * Monitoring Impala Queries * Query Details * Monitoring YARN Applications * Monitoring Spark Applications * Events * Alerts * Managing Alerts * Configuring Alert Email Delivery * Configuring Alert SNMP Delivery * Configuring Custom Alert Scripts * Triggers * Cloudera Manager Trigger Use Cases * Lifecycle and Security Auditing * Charting Time-Series Data * Dashboards * tsquery Language * Metric Aggregation * Logs * Viewing the Cloudera Manager Server Log * Viewing the Cloudera Manager Agent Logs * Managing Disk Space for Log Files * Reports * Directory Usage Report * Disk Usage Reports * Activity, Application, and Query Reports * The File Browser * Downloading HDFS Directory Access Permission Reports * Troubleshooting Cluster Configuration and Operation * Monitoring Reference * Cloudera Manager Entity Types * Cloudera Manager Entity Type Attributes * Cloudera Manager Events * SYSTEM Category * HEALTH_CHECK Category * AUDIT_EVENT Category * ACTIVITY_EVENT Category * LOG_MESSAGE Category * HBASE Category * Cloudera Manager Health Tests * Active Database Health Tests * Active Key Trustee Server Health Tests * Activity Monitor Health Tests * Alert Publisher Health Tests * Beeswax Server Health Tests * Cloudera Management Service Health Tests * DataNode Health Tests * DSSD DataNode Health Tests * Event Server Health Tests * Failover Controller Health Tests * Flume Health Tests * Flume Agent Health Tests * Garbage Collector Health Tests * HBase Health Tests * HBase REST Server Health Tests * HBase Thrift Server Health Tests * HDFS Health Tests * History Server Health Tests * Hive Health Tests * Hive Metastore Server Health Tests * HiveServer2 Health Tests * Host Health Tests * Host Monitor Health Tests * HttpFS Health Tests * Hue Health Tests * Hue Server Health Tests * Impala Health Tests * Impala Catalog Server Health Tests * Impala Daemon Health Tests * Impala Llama ApplicationMaster Health Tests * Impala StateStore Health Tests * JobHistory Server Health Tests * JobTracker Health Tests * JournalNode Health Tests * Kafka Health Tests * Kafka Broker Health Tests * Kafka MirrorMaker Health Tests * Kerberos Ticket Renewer Health Tests * Key Management Server Health Tests * Key Management Server Proxy Health Tests * Key-Value Store Indexer Health Tests * Lily HBase Indexer Health Tests * Load Balancer Health Tests * Logger Health Tests * MapReduce Health Tests * Master Health Tests * Monitor Health Tests * NFS Gateway Health Tests * NameNode Health Tests * Navigator Audit Server Health Tests * Navigator HSM KMS Metastore Health Tests * Navigator HSM KMS Proxy Health Tests * Navigator Luna KMS Metastore Health Tests * Navigator Luna KMS Proxy Health Tests * Navigator Metadata Server Health Tests * Navigator Thales KMS Metastore Health Tests * Navigator Thales KMS Proxy Health Tests * NodeManager Health Tests * Oozie Health Tests * Oozie Server Health Tests * Passive Database Health Tests * Passive Key Trustee Server Health Tests * RegionServer Health Tests * Reports Manager Health Tests * ResourceManager Health Tests * SecondaryNameNode Health Tests * Sentry Health Tests * Sentry Server Health Tests * Service Monitor Health Tests * Solr Health Tests * Solr Server Health Tests * Spark Health Tests * Spark (Standalone) Health Tests * Spark 2 Health Tests * Sqoop 2 Health Tests * Sqoop 2 Server Health Tests * Tablet Server Health Tests * TaskTracker Health Tests * Telemetry Publisher Health Tests * Tracer Health Tests * WebHCat Server Health Tests * Worker Health Tests * YARN (MR2 Included) Health Tests * ZooKeeper Health Tests * ZooKeeper Server Health Tests * Cloudera Manager Metrics * Accumulo Metrics * Active Database Metrics * Active Key Trustee Server Metrics * Activity Metrics * Activity Monitor Metrics * Agent Metrics * Alert Publisher Metrics * Attempt Metrics * Cloudera Management Service Metrics * Cloudera Manager Server Metrics * Cluster Metrics * DataNode Metrics * Directory Metrics * Disk Metrics * Event Server Metrics * Failover Controller Metrics * Filesystem Metrics * Flume Metrics * Flume Channel Metrics * Flume Sink Metrics * Flume Source Metrics * Garbage Collector Metrics * HBase Metrics * HBase REST Server Metrics * HBase RegionServer Replication Peer Metrics * HBase Thrift Server Metrics * HDFS Metrics * HDFS Cache Directive Metrics * HDFS Cache Pool Metrics * HRegion Metrics * HTable Metrics * History Server Metrics * Hive Metrics * Hive Metastore Server Metrics * HiveServer2 Metrics * Host Metrics * Host Monitor Metrics * HttpFS Metrics * Hue Metrics * Hue Server Metrics * Impala Metrics * Impala Catalog Server Metrics * Impala Daemon Metrics * Impala Daemon Resource Pool Metrics * Impala Llama ApplicationMaster Metrics * Impala Pool Metrics * Impala Pool User Metrics * Impala Query Metrics * Impala StateStore Metrics * Isilon Metrics * Java KeyStore KMS Metrics * JobHistory Server Metrics * JobTracker Metrics * JournalNode Metrics * Kafka Metrics * Kafka Broker Metrics * Kafka Broker Topic Metrics * Kafka MirrorMaker Metrics * Kafka Replica Metrics * Kerberos Ticket Renewer Metrics * Key Management Server Metrics * Key Management Server Proxy Metrics * Key Trustee KMS Metrics * Key Trustee Server Metrics * Key-Value Store Indexer Metrics * Kudu Metrics * Kudu Replica Metrics * Lily HBase Indexer Metrics * Load Balancer Metrics * MapReduce Metrics * Master Metrics * Monitor Metrics * NFS Gateway Metrics * NameNode Metrics * Navigator Audit Server Metrics * Navigator HSM KMS backed by SafeNet Luna HSM Metrics * Navigator HSM KMS backed by Thales HSM Metrics * Navigator Luna KMS Metastore Metrics * Navigator Luna KMS Proxy Metrics * Navigator Metadata Server Metrics * Navigator Thales KMS Metastore Metrics * Navigator Thales KMS Proxy Metrics * Network Interface Metrics * NodeManager Metrics * Oozie Metrics * Oozie Server Metrics * Passive Database Metrics * Passive Key Trustee Server Metrics * RegionServer Metrics * Reports Manager Metrics * ResourceManager Metrics * SecondaryNameNode Metrics * Sentry Metrics * Sentry Server Metrics * Server Metrics * Service Monitor Metrics * Solr Metrics * Solr Replica Metrics * Solr Server Metrics * Solr Shard Metrics * Spark Metrics * Spark (Standalone) Metrics * Sqoop 1 Client Metrics * Sqoop 2 Metrics * Sqoop 2 Server Metrics * Tablet Server Metrics * TaskTracker Metrics * Telemetry Publisher Metrics * Time Series Table Metrics * Tracer Metrics * User Metrics * WebHCat Server Metrics * Worker Metrics * YARN (MR2 Included) Metrics * YARN Pool Metrics * YARN Pool User Metrics * ZooKeeper Metrics * Disabling Metrics for Specific Roles * Performance Management * Optimizing Performance in CDH * Choosing and Configuring Data Compression * Tuning the Solr Server * Tuning Apache Spark Applications * Tuning YARN * Resource Management * Static Service Pools * Linux Control Groups (cgroups) * Dynamic Resource Pools * YARN (MRv2) and MapReduce (MRv1) Schedulers * Configuring the Fair Scheduler * Enabling and Disabling Fair Scheduler Preemption * Resource Management for Impala * Admission Control and Query Queuing * Managing Impala Admission Control * Data Storage for Monitoring Data * Cluster Utilization Reports * Creating a Custom Cluster Utilization Report * High Availability * HDFS High Availability * Introduction to HDFS High Availability * Configuring Hardware for HDFS HA * Enabling HDFS HA * Disabling and Redeploying HDFS HA * Configuring Other CDH Components to Use HDFS HA * Administering an HDFS High Availability Cluster * Changing a Nameservice Name for Highly Available HDFS Using Cloudera Manager * MapReduce (MRv1) and YARN (MRv2) High Availability * YARN (MRv2) ResourceManager High Availability * Work Preserving Recovery for YARN Components * MapReduce (MRv1) JobTracker High Availability * Cloudera Navigator Key Trustee Server High Availability * Enabling Key Trustee KMS High Availability * Enabling Navigator HSM KMS High Availability * High Availability for Other CDH Components * HBase High Availability * HBase Read Replicas * Oozie High Availability * Search High Availability * Configuring Cloudera Manager for High Availability With a Load Balancer * Introduction to Cloudera Manager Deployment Architecture * Prerequisites for Setting up Cloudera Manager High Availability * Cloudera Manager Failover Protection * High-Level Steps to Configure Cloudera Manager High Availability * Step 1: Setting Up Hosts and the Load Balancer * Step 2: Installing and Configuring Cloudera Manager Server for High Availability * Step 3: Installing and Configuring Cloudera Management Service for High Availability * Step 4: Automating Failover with Corosync and Pacemaker * Database High Availability Configuration * TLS and Kerberos Configuration for Cloudera Manager High Availability * Backup and Disaster Recovery * Port Requirements for Backup and Disaster Recovery * Data Replication * Designating a Replication Source * HDFS Replication * Monitoring the Performance of HDFS Replications * Hive/Impala Replication * Monitoring the Performance of Hive/Impala Replications * Replicating Data to Impala Clusters * Using Snapshots with Replication * Enabling Replication Between Clusters with Kerberos Authentication * Replication of Encrypted Data * HBase Replication * Snapshots * Cloudera Manager Snapshot Policies * Managing HBase Snapshots * Managing HDFS Snapshots * BDR Tutorials * How To Back Up and Restore Apache Hive Data Using Cloudera Enterprise BDR * How To Back Up and Restore HDFS Data Using Cloudera Enterprise BDR * BDR Automation Examples * Migrating Data between Clusters Using distcp * Copying Cluster Data Using DistCp * Copying Data between a Secure and an Insecure Cluster using DistCp and WebHDFS * Post-migration Verification * Backing Up Databases * Cloudera Navigator Administration * Get Started with Amazon S3 * Configuring the Amazon S3 Connector * Using S3 Credentials with YARN, MapReduce, or Spark * Using Fast Upload with Amazon S3 * Configuring and Managing S3Guard * How to Configure a MapReduce Job to Access S3 with an HDFS Credstore * Get Started with Microsoft Azure Data Lake Storage (ADLS) * Configuring ADLS Access Using Cloudera Manager * Configuring ADLS Connectivity * How To Create a Multitenant Enterprise Data Hub * Cloudera Security * Cloudera Security Overview * Authentication Overview * Encryption Overview * Encryption Mechanisms Overview * Authorization Overview * Auditing and Data Governance Overview * Authentication * Kerberos Security Artifacts Overview * Configuring Authentication in Cloudera Manager * Cloudera Manager User Accounts * Configuring External Authentication and Authorization for Cloudera Manager * Enabling Kerberos Authentication for CDH * Step 1: Install Cloudera Manager and CDH * Step 2: Installing JCE Policy File for AES-256 Encryption * Step 3: Create the Kerberos Principal for Cloudera Manager Server * Step 4: Enabling Kerberos Using the Wizard * Step 5: Create the HDFS Superuser * Step 6: Get or Create a Kerberos Principal for Each User Account * Step 7: Prepare the Cluster for Each User * Step 8: Verify that Kerberos Security is Working * Step 9: (Optional) Enable Authentication for HTTP Web Consoles for Hadoop Roles * Kerberos Authentication for Non-Default Users * Customizing Kerberos Principals * Managing Kerberos Credentials Using Cloudera Manager * Using a Custom Kerberos Keytab Retrieval Script * Adding Trusted Realms to the Cluster * Using Auth-to-Local Rules to Isolate Cluster Users * Configuring Authentication for Cloudera Navigator * Cloudera Navigator and External Authentication * Configuring Cloudera Navigator for Active Directory * Configuring Cloudera Navigator for OpenLDAP * Configuring Cloudera Navigator for SAML * Configuring Groups for Cloudera Navigator * Configuring Authentication for Other Components * Flume Authentication * Configuring Kerberos for Flume Thrift Source and Sink Using Cloudera Manager * Writing to a Secure HBase Cluster * Using Substitution Variables for Flume Kerberos Principal and Keytab * HBase Authentication * Configuring Kerberos Authentication for HBase * Configuring Secure HBase Replication * Configuring the HBase Client TGT Renewal Period * Hive Authentication * HiveServer2 Security Configuration * Using Hive to Run Queries on a Secure HBase Server * HttpFS Authentication * Hue Authentication * Enable Hue to Use Kerberos for Authentication * Impala Authentication * Enabling Kerberos Authentication for Impala * Enabling LDAP Authentication for Impala * Using Multiple Authentication Methods with Impala * Configuring Impala Delegation for Hue and BI Tools * Using Kerberos with Cloudera Search * Spark Authentication * ZooKeeper Authentication * Configuring a Dedicated MIT KDC for Cross-Realm Trust * Integrating MIT Kerberos and Active Directory * Hadoop Users (user:group) and Kerberos Principals * Mapping Kerberos Principals to Short Names * Authorization * Cloudera Manager User Roles * HDFS Extended ACLs * Configuring LDAP Group Mappings * Authorization With Apache Sentry * Configuring HBase Authorization * Encrypting Data in Transit * Understanding Keystores and Truststores * Configuring TLS Encryption for Cloudera Manager and CDH Using Auto-TLS * Configuring TLS Encryption for Cloudera Manager * Configuring TLS/SSL Encryption for CDH Services * Configuring TLS/SSL for HDFS, YARN and MapReduce * Configuring TLS/SSL for HBase * Configuring TLS/SSL for Flume Thrift Source and Sink * Configuring Encrypted Communication Between HiveServer2 and Client Drivers * Configuring TLS/SSL for Hue * Configuring TLS/SSL for Impala * Configuring TLS/SSL for Oozie * Configuring TLS/SSL for Solr * Spark Encryption * Configuring TLS/SSL for HttpFS * Configuring TLS/SSL for Navigator Audit Server * Configuring TLS/SSL for Navigator Metadata Server * Configuring TLS/SSL for Kafka (Navigator Event Broker) * Configuring Encrypted Transport for HDFS * Configuring Encrypted Transport for HBase * Encrypting Data at Rest * Data at Rest Encryption Reference Architecture * Data at Rest Encryption Requirements * Resource Planning for Data at Rest Encryption * HDFS Transparent Encryption * Optimizing Performance for HDFS Transparent Encryption * Enabling HDFS Encryption Using the Wizard * Managing Encryption Keys and Zones * Configuring the Key Management Server (KMS) * Securing the Key Management Server (KMS) * Configuring KMS Access Control Lists (ACLs) * Migrating from a Key Trustee KMS to an HSM KMS * Migrating Keys from a Java KeyStore to Cloudera Navigator Key Trustee Server * Configuring CDH Services for HDFS Encryption * Cloudera Navigator Key Trustee Server * Backing Up and Restoring Key Trustee Server and Clients * Initializing Standalone Key Trustee Server * Configuring a Mail Transfer Agent for Key Trustee Server * Verifying Cloudera Navigator Key Trustee Server Operations * Managing Key Trustee Server Organizations * Managing Key Trustee Server Certificates * Cloudera Navigator Key HSM * Initializing Navigator Key HSM * HSM-Specific Setup for Cloudera Navigator Key HSM * Validating Key HSM Settings * Managing the Navigator Key HSM Service * Integrating Key HSM with Key Trustee Server * Cloudera Navigator Encrypt * Registering Cloudera Navigator Encrypt with Key Trustee Server * Preparing for Encryption Using Cloudera Navigator Encrypt * Encrypting and Decrypting Data Using Cloudera Navigator Encrypt * Navigator Encrypt Access Control List * Maintaining Cloudera Navigator Encrypt * Configuring Encryption for Data Spills * Configuring Encrypted On-disk File Channels for Flume * Impala Security Overview * Security Guidelines for Impala * Securing Impala Data and Log Files * Installation Considerations for Impala Security * Securing the Hive Metastore Database * Securing the Impala Web User Interface * Kudu Security Overview * Security How-To Guides * How to Add Root and Intermediate CAs to Truststore for TLS/SSL * Amazon Web Services (AWS) Security * How to Authenticate Kerberos Principals Using Java * How to Check Security Settings on a Cluster * How to Use Antivirus Software on CDH Hosts * How to Configure Browser-based Interfaces to Require Kerberos Authentication * How to Configure Browsers for Kerberos Authentication * How to Configure Clusters to Use Kerberos for Authentication * How to Convert File Encodings (DER, JKS, PEM) for TLS/SSL Certificates and Keys * How To Configure Authentication for Amazon S3 * How to Configure Encryption for Amazon S3 * How to Configure AWS Credentials * How to Enable Sensitive Data Redaction * How to Log a Security Support Case * How To Obtain and Deploy Keys and Certificates for TLS/SSL * How To Renew and Redistribute Certificates * How to Set Up a Gateway Host to Restrict Access to the Cluster * How To Set Up Access to Cloudera EDH or Cloudera Director (Microsoft Azure Marketplace) * How to Use Self-Signed Certificates for TLS * Troubleshooting Security Issues * Error Messages and Various Failures * Authentication and Kerberos Issues * HDFS Encryption Issues * Key Trustee KMS Encryption Issues * Troubleshooting TLS/SSL Issues in Cloudera Manager * YARN, MRv1, and Linux OS Security * TaskController Error Codes (MRv1) * ContainerExecutor Error Codes (YARN) * Cloudera Navigator Data Management * Cloudera Navigator Overview * Finding Specific Entities by Searching Metadata * Performing Actions on Entities * Cloudera Navigator Auditing * Using Audit Events to Understand Cluster Activity * Exploring Audit Data * Cloudera Navigator Audit Event Reports * Downloading HDFS Directory Access Permission Reports * Analytics: Data Stewardship Dashboard * Using Policies to Automate Metadata Tagging * Lineage * Using the Lineage View * Using Lineage to Display Table Schema * Generating Lineage Diagrams * Cloudera Navigator Business Metadata * Defining Managed Properties * Adding and Editing Metadata * Administration (Navigator Console) * Managing Metadata Storage with Purge * Administering Navigator User Roles * Navigator Configuration and Management * Accessing Navigator Data Management Logs * Backing Up Cloudera Navigator Data * Authentication and Authorization * Configuring Cloudera Navigator to work with Hue HA * Encryption (TLS/SSL) and Cloudera Navigator * Limiting Sensitive Data in Navigator Logs * Navigator Audit Server Management * Setting Up Navigator Audit Server * Enabling Audit and Log Collection for Services * Configuring Audit Server Properties * Adding Audit Filters * Monitoring Navigator Audit Service Health * Publishing Audit Events * Navigator Metadata Server Management * Setting Up Navigator Metadata Server * Navigator Metadata Server Tuning * Configuring and Managing Extraction * Hive and Impala Lineage Configuration * Configuring the Server for Policy Messages * Cloudera Navigator and the Cloud * Using Cloudera Navigator with Altus Clusters * Configuring Extraction for Altus Clusters on AWS * Using Cloudera Navigator with Amazon S3 * Configuring Extraction for Amazon S3 * Cloudera Navigator APIs * Navigator APIs Overview * Applying Metadata to HDFS and Hive Entities using the API * Using the Purge APIs for Metadata Maintenance Tasks * Cloudera Navigator Reference * Lineage Diagram Icons * Search Syntax and Properties * Service Audit Events * Service Metadata Entity Types * Metadata Policy Expressions * User Roles and Privileges Reference * Troubleshooting Navigator Data Management * CDH Component Guides * Apache Crunch Guide * Apache Flume Guide * Configuring Apache Flume * Configuring the Flume Properties File * Files Installed by the Flume RPM and Debian Packages * Configuring Flume Security with Kafka * Managing Flume * Running Flume * Supported Sources, Sinks, and Channels * Viewing the Flume Documentation * Apache HBase Guide * Configuration Settings for HBase * Accessing HBase by using the HBase Shell * HBase Online Merge * Using MapReduce with HBase * Configuring HBase Garbage Collection * Configuring the HBase Canary * Configuring the Blocksize for HBase * Configuring the HBase BlockCache * Configuring the HBase Scanner Heartbeat * Limiting the Speed of Compactions * Configuring and Using the HBase REST API * Configuring HBase MultiWAL Support * Storing Medium Objects (MOBs) in HBase * Configuring the Storage Policy for the Write-Ahead Log (WAL) * Using Azure Data Lake Store with HBase * Managing HBase * Starting and Stopping HBase * Accessing HBase by using the HBase Shell * Using HBase Command-Line Utilities * Checking and Repairing HBase Tables * Hedged Reads * Reading Data from HBase * HBase Filtering * Writing Data to HBase * Importing Data Into HBase * Exposing HBase Metrics to a Ganglia Server * Managing HBase Security * Troubleshooting HBase * Best Practices for Using Apache Hive in CDH * Apache Hive Changes in CDH 6.0 * Apache Hive Components Changes in CDH 6.0 * Apache Hive Components New Features in CDH 6.0 * Apache Hive Components Incompatible Changes in CDH 6.0 * Hive on Spark Changes in CDH 6.0 * Hive Unsupported Features in CDH 6.0 * Overview of Apache Hive Installation and Upgrade in CDH * Configuring Apache Hive in CDH * Configuring the Hive Metastore for CDH * Configuring HiveServer2 for CDH * Starting the Hive Metastore in CDH * Apache Hive File System Permissions in CDH * Starting, Stopping, and Using HiveServer2 in CDH * Using Apache Hive with HBase in CDH * Using the Hive Schema Tool in CDH * Installing Cloudera JDBC and ODBC Drivers on Clients in CDH * Setting HADOOP_MAPRED_HOME for Apache Hive in CDH * Configuring the Hive Metastore to Use HDFS High Availability in CDH * Using & Managing Apache Hive in CDH * Managing Hive Using Cloudera Manager * Overview of Ingesting and Querying Data with Apache Hive in CDH * Apache Parquet Tables with Hive in CDH * Running Apache Hive on Spark in CDH * Using HiveServer2 Web UI in CDH * Accessing Apache Hive Table Statistics in CDH * Managing Apache Hive User-Defined Functions (UDFs) in CDH * Configuring Transient Apache Hive ETL Jobs to Use the Amazon S3 Filesystem in CDH * How To Set Up a Shared Amazon RDS as Your Hive Metastore for CDH * Using Microsoft Azure Data Lake Store with Apache Hive in CDH * Tuning Apache Hive in CDH * Tuning Apache Hive on Spark in CDH * Tuning Apache Hive Performance on the Amazon S3 Filesystem in CDH * Configuring Apache Hive Metastore High Availability in CDH * Configuring HiveServer2 High Availability in CDH * Query Vectorization for Apache Hive in CDH * Overview of Apache Hive Data Replication in CDH * Overview of Apache Hive Security in CDH * Troubleshooting Apache Hive in CDH * Get Started with Hue * Hue Versions * Hue Installation & Upgrade * Hue Custom Databases * Connect Hue to MySQL or MariaDB * Connect Hue to PostgreSQL * Connect Hue to Oracle with Client Parcel * Connect Hue to Oracle with Client Package * Migrate Hue Database * Hue Custom Database Tutorial * How to Populate the Hue Database * Hue Administration * Hue Configuration Files and Safety-valves * Hue Logs and Paths * Hue User Permissions * Secure Hue Passwords with Scripts * Customize the Hue Web UI * Hue Security * Configure Hue for High Availability * Authenticate Hue Users with LDAP * Synchronize Hue with LDAP Server * Authenticate Hue Users with SAML * Authorize Hue User Groups with Sentry * Hue How-tos * How to Add a Hue Load Balancer * How to Enable SQL Editor Autocompleter in Hue * How to Enable and Use Governance-Based Data Discovery * How to Enable Usage-Based Query Assistance for Hue * How to Enable S3 Cloud Storage in Hue * How to Use S3 as Source or Sink in Hue * How to Run Hue Shell Commands * Hue Troubleshooting * Potential Misconfiguration Detected * Apache Impala - Interactive SQL * Impala Concepts and Architecture * Components of the Impala Server * Developing Impala Applications * How Impala Fits Into the Hadoop Ecosystem * Planning for Impala Deployment * Impala Requirements * Guidelines for Designing Impala Schemas * Impala Tutorials * Impala Administration * How to Configure Resource Management for Impala * Setting Timeout Periods for Daemons, Queries, and Sessions * Using Impala through a Proxy for High Availability * Managing Disk Space for Impala Data * Auditing Impala Operations * Viewing Lineage Information for Impala Data * Impala SQL Language Reference * Comments * Data Types * ARRAY Complex Type (CDH 5.5 or higher only) * BIGINT Data Type * BOOLEAN Data Type * CHAR Data Type (CDH 5.2 or higher only) * DECIMAL Data Type (CDH 6.0 / Impala 3.0 or higher only) * DOUBLE Data Type * FLOAT Data Type * INT Data Type * MAP Complex Type (CDH 5.5 or higher only) * REAL Data Type * SMALLINT Data Type * STRING Data Type * STRUCT Complex Type (CDH 5.5 or higher only) * TIMESTAMP Data Type * TINYINT Data Type * VARCHAR Data Type (CDH 5.2 or higher only) * Complex Types (CDH 5.5 or higher only) * Literals * SQL Operators * Impala Schema Objects and Object Names * Overview of Impala Aliases * Overview of Impala Databases * Overview of Impala Functions * Overview of Impala Identifiers * Overview of Impala Tables * Overview of Impala Views * Impala SQL Statements * DDL Statements * DML Statements * ALTER TABLE Statement * ALTER VIEW Statement * COMPUTE STATS Statement * CREATE DATABASE Statement * CREATE FUNCTION Statement * CREATE ROLE Statement (CDH 5.2 or higher only) * CREATE TABLE Statement * CREATE VIEW Statement * DELETE Statement (CDH 5.10 or higher only) * DESCRIBE Statement * DROP DATABASE Statement * DROP FUNCTION Statement * DROP ROLE Statement (CDH 5.2 or higher only) * DROP STATS Statement * DROP TABLE Statement * DROP VIEW Statement * EXPLAIN Statement * GRANT Statement (CDH 5.2 or higher only) * INSERT Statement * INVALIDATE METADATA Statement * LOAD DATA Statement * REFRESH Statement * REFRESH FUNCTIONS Statement * REVOKE Statement (CDH 5.2 or higher only) * SELECT Statement * Joins in Impala SELECT Statements * ORDER BY Clause * GROUP BY Clause * HAVING Clause * LIMIT Clause * OFFSET Clause * UNION Clause * Subqueries in Impala SELECT Statements * TABLESAMPLE Clause * WITH Clause * DISTINCT Operator * SET Statement * Query Options for the SET Statement * ABORT_ON_ERROR Query Option * ALLOW_UNSUPPORTED_FORMATS Query Option * APPX_COUNT_DISTINCT Query Option (CDH 5.2 or higher only) * BATCH_SIZE Query Option * BUFFER_POOL_LIMIT Query Option * COMPRESSION_CODEC Query Option (CDH 5.2 or higher only) * COMPUTE_STATS_MIN_SAMPLE_SIZE Query Option * DEBUG_ACTION Query Option * DECIMAL_V2 Query Option * DEFAULT_JOIN_DISTRIBUTION_MODE Query Option * DEFAULT_SPILLABLE_BUFFER_SIZE Query Option * DISABLE_CODEGEN Query Option * DISABLE_CODEGEN_ROWS_THRESHOLD Query Option (CDH 5.13 / Impala 2.10 or higher only) * DISABLE_ROW_RUNTIME_FILTERING Query Option (CDH 5.7 or higher only) * DISABLE_STREAMING_PREAGGREGATIONS Query Option (CDH 5.7 or higher only) * DISABLE_UNSAFE_SPILLS Query Option (CDH 5.2 or higher only) * ENABLE_EXPR_REWRITES Query Option * EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (CDH 5.3 or higher only) * EXEC_TIME_LIMIT_S Query Option (CDH 5.15 / Impala 2.12 or higher only) * EXPLAIN_LEVEL Query Option * HBASE_CACHE_BLOCKS Query Option * HBASE_CACHING Query Option * IDLE_SESSION_TIMEOUT Query Option (CDH 5.15 / Impala 2.12 or higher only) * LIVE_PROGRESS Query Option (CDH 5.5 or higher only) * LIVE_SUMMARY Query Option (CDH 5.5 or higher only) * MAX_ERRORS Query Option * MAX_NUM_RUNTIME_FILTERS Query Option (CDH 5.7 or higher only) * MAX_ROW_SIZE Query Option * MAX_SCAN_RANGE_LENGTH Query Option * MEM_LIMIT Query Option * MIN_SPILLABLE_BUFFER_SIZE Query Option * MT_DOP Query Option * NUM_NODES Query Option * NUM_SCANNER_THREADS Query Option * OPTIMIZE_PARTITION_KEY_SCANS Query Option (CDH 5.7 or higher only) * PARQUET_COMPRESSION_CODEC Query Option * PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (CDH 5.8 or higher only) * PARQUET_ARRAY_RESOLUTION Query Option (CDH 5.12 or higher only) * PARQUET_DICTIONARY_FILTERING Query Option (CDH 5.12 or higher only) * PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (CDH 5.8 or higher only) * PARQUET_FILE_SIZE Query Option * PARQUET_READ_STATISTICS Query Option (CDH 5.12 or higher only) * PREFETCH_MODE Query Option (CDH 5.8 or higher only) * QUERY_TIMEOUT_S Query Option (CDH 5.2 or higher only) * REQUEST_POOL Query Option * REPLICA_PREFERENCE Query Option (CDH 5.9 or higher only) * RUNTIME_BLOOM_FILTER_SIZE Query Option (CDH 5.7 or higher only) * RUNTIME_FILTER_MAX_SIZE Query Option (CDH 5.8 or higher only) * RUNTIME_FILTER_MIN_SIZE Query Option (CDH 5.8 or higher only) * RUNTIME_FILTER_MODE Query Option (CDH 5.7 or higher only) * RUNTIME_FILTER_WAIT_TIME_MS Query Option (CDH 5.7 or higher only) * S3_SKIP_INSERT_STAGING Query Option (CDH 5.8 or higher only) * SCHEDULE_RANDOM_REPLICA Query Option (CDH 5.7 or higher only) * SCRATCH_LIMIT Query Option * SHUFFLE_DISTINCT_EXPRS Query Option * SUPPORT_START_OVER Query Option * SYNC_DDL Query Option * SHOW Statement * TRUNCATE TABLE Statement (CDH 5.5 or higher only) * UPDATE Statement (CDH 5.10 or higher only) * UPSERT Statement (CDH 5.10 or higher only) * USE Statement * Optimizer Hints in Impala * Impala Built-In Functions * Impala Mathematical Functions * Impala Bit Functions * Impala Type Conversion Functions * Impala Date and Time Functions * Impala Conditional Functions * Impala String Functions * Impala Miscellaneous Functions * Impala Aggregate Functions * APPX_MEDIAN Function * AVG Function * COUNT Function * GROUP_CONCAT Function * MAX Function * MIN Function * NDV Function * STDDEV, STDDEV_SAMP, STDDEV_POP Functions * SUM Function * VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions * Impala Analytic Functions * Impala User-Defined Functions (UDFs) * SQL Differences Between Impala and Hive * Porting SQL from Other Database Systems to Impala * Using the Impala Shell (impala-shell Command) * impala-shell Configuration Options * Connecting to impalad through impala-shell * Running Commands and SQL Statements in impala-shell * impala-shell Command Reference * Tuning Impala for Performance * Impala Performance Guidelines and Best Practices * Performance Considerations for Join Queries * Table and Column Statistics * Benchmarking Impala Queries * Controlling Impala Resource Usage * Runtime Filtering for Impala Queries (CDH 5.7 or higher only) * Using HDFS Caching with Impala (CDH 5.3 or higher only) * Testing Impala Performance * Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles * Detecting and Correcting HDFS Block Skew Conditions * Scalability Considerations for Impala * Partitioning for Impala Tables * How Impala Works with Hadoop File Formats * Using Text Data Files with Impala Tables * Using the Parquet File Format with Impala Tables * Using the Avro File Format with Impala Tables * Using the RCFile File Format with Impala Tables * Using the SequenceFile File Format with Impala Tables * Using Impala to Query Kudu Tables * Using Impala to Query HBase Tables * Using Impala with the Amazon S3 Filesystem * Specifying Impala Credentials to Access Data in S3 with Cloudera Manager * Specifying Impala Credentials to Access Data in S3 * Using Impala with the Azure Data Lake Store (ADLS) * Using Impala with Isilon Storage * Using Impala Logging * Troubleshooting Impala * Impala Web User Interface for Debugging * Breakpad Minidumps for Impala (CDH 5.8 or higher only) * Ports Used by Impala * Impala Reserved Words * Impala Frequently Asked Questions * Apache Kafka Guide * Kafka Setup * Kafka in Cloudera Manager * Kafka Clients * Kafka Brokers * Kafka Integration * Kafka Security * Managing Multiple Kafka Versions * Managing Topics across Multiple Kafka Clusters * Setting up an End-to-End Data Streaming Pipeline * Developing Kafka Clients * Kafka Metrics * Kafka Administration * Kafka Performance Tuning * Kafka Tuning: Handling Large Messages * Kafka Cluster Sizing * Kafka Performance Broker Configuration * Kafka Performance: System-Level Broker Tuning * Kafka-ZooKeeper Performance Tuning * Kafka Reference * Metrics Reference * Useful Shell Command Reference * Kafka Public APIs * Kafka Frequently Asked Questions * Apache Kudu Guide * Apache Kudu Concepts and Architecture * Apache Kudu Usage Limitations * Apache Kudu Installation and Upgrade * Apache Kudu Configuration * Apache Kudu Administration * Developing Applications With Apache Kudu * Using Apache Impala with Kudu * Apache Kudu Schema Design * Apache Kudu Transaction Semantics * Apache Kudu Background Maintenance Tasks * Troubleshooting Apache Kudu * More Resources for Apache Kudu * Oozie Guide * Configuration * Configuring an External Database for Oozie * Oozie High Availability * Configuring Other CDH Components to Use HDFS HA * Oozie Authentication * Using Sqoop Actions with Oozie * Configuring Oozie to Enable MapReduce Jobs To Read/Write from Amazon S3 * Configuring Oozie to Enable MapReduce Jobs To Read/Write from Microsoft Azure (ADLS) * Managing Oozie * Starting, Stopping, and Accessing the Oozie Server * Adding the Oozie Service Using Cloudera Manager * Redeploying the Oozie ShareLib * Configuring Oozie Data Purge Settings Using Cloudera Manager * Dumping and Loading an Oozie Database Using Cloudera Manager * Adding Schema to Oozie Using Cloudera Manager * Enabling the Oozie Web Console on Managed Clusters * Enabling Oozie SLA with Cloudera Manager * Setting the Oozie Database Timezone * Scheduling in Oozie Using Cron-like Syntax * Search Guide * Cloudera Search Overview * Understanding Cloudera Search * Cloudera Search and Other Cloudera Components * Cloudera Search Architecture * Cloudera Search Tasks and Processes * Cloudera Search Tutorial * Validating the Cloudera Search Deployment * Preparing to Index Sample Tweets with Cloudera Search * Using MapReduce Batch Indexing to Index Sample Tweets * Near Real Time (NRT) Indexing Tweets Using Flume * Using Hue with Cloudera Search * Deployment Planning for Cloudera Search * Schemaless Mode Overview and Best Practices * Deploying Cloudera Search * Using Search through a Proxy for High Availability * Using Custom JAR Files with Search * Cloudera Search Security * Managing Cloudera Search * Managing Cloudera Search Configuration * Managing Collections in Cloudera Search * solrctl Reference * Example solrctl Usage * Migrating Solr Replicas * Backing Up and Restoring Cloudera Search * Extracting, Transforming, and Loading Data With Cloudera Morphlines * Example Morphline Usage * Indexing Data Using Cloudera Search * Near Real Time Indexing Using Cloudera Search * Near Real Time Indexing Using Flume * Flume MorphlineSolrSink Configuration Options * Flume MorphlineInterceptor Configuration Options * Flume Solr UUIDInterceptor Configuration Options * Flume Solr BlobHandler Configuration Options * Flume Solr BlobDeserializer Configuration Options * Lily HBase Near Real Time Indexing for Cloudera Search * Using the Lily HBase NRT Indexer Service * Configuring Lily HBase Indexer Security * Batch Indexing Using Cloudera Search * Spark Indexing * MapReduce Indexing * MapReduceIndexerTool * Lily HBase Batch Indexing for Cloudera Search * Cloudera Search Frequently Asked Questions * Troubleshooting Cloudera Search * Cloudera Search Configuration and Log Files * Identifying Problems in Your Cloudera Search Deployment * Apache Sentry Guide * Before You Install Sentry * Installing and Upgrading the Sentry Service * Configuring the Sentry Service * Sentry High Availability * Enabling Sentry Authorization for Impala * Configuring Sentry Authorization for Cloudera Search * Managing the Sentry Service * Synchronizing HDFS ACLs and Sentry Permissions * Authorization Privilege Model for Hive and Impala * Authorization Privilege Model for Cloudera Search * Hive SQL Syntax for Use with Sentry * Using the Sentry Web Server * Sentry Debugging and Failure Scenarios * Troubleshooting Sentry * Sentry How-To Guides * How To Enable Sentry High Availability * How to Verify that HDFS ACLs are Synching with Sentry * Using Sentry to Manage Table Access in Hue * Spark Guide * Running Your First Spark Application * Troubleshooting for Spark * Frequently Asked Questions about Apache Spark in CDH * Spark Application Overview * Developing Spark Applications * Developing and Running a Spark WordCount Application * Using Spark Streaming * Using Spark SQL * Using Spark MLlib * Accessing External Storage from Spark * Accessing Data Stored in Amazon S3 through Spark * Accessing Data Stored in Azure Data Lake Store (ADLS) through Spark * Accessing Avro Data Files From Spark SQL Applications * Accessing Parquet Files From Spark SQL Applications * Building Spark Applications * Configuring Spark Applications * Running Spark Applications * Running Spark Applications on YARN * Using PySpark * Running Spark Python Applications * Spark and IPython and Jupyter Notebooks * Tuning Apache Spark Applications * Spark and Hadoop Integration * Building and Running a Crunch Application with Spark * File Formats and Compression * Using Apache Parquet Data Files with CDH * Using Apache Avro Data Files with CDH * Data Compression * Snappy Compression * Cloudera Glossary No search has been performed. Search