docs.aws.amazon.com
Open in
urlscan Pro
18.66.147.13
Public Scan
Submitted URL: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.BestPractices.FastFailover.html#AuroraPostgreS...
Effective URL: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.BestPractices.FastFailover.html
Submission: On November 04 via api from DE — Scanned from DE
Effective URL: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.BestPractices.FastFailover.html
Submission: On November 04 via api from DE — Scanned from DE
Form analysis
0 forms found in the DOMText Content
SELECT YOUR COOKIE PREFERENCES We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can click “Customize cookies” to decline performance cookies. If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To continue without accepting these cookies, click “Continue without accepting.” To make more detailed choices or learn more, click “Customize cookies.” Accept all cookiesContinue without acceptingCustomize cookies CUSTOMIZE COOKIE PREFERENCES We use cookies and similar tools (collectively, "cookies") for the following purposes. ESSENTIAL Essential cookies are necessary to provide our site and services and cannot be deactivated. They are usually set in response to your actions on the site, such as setting your privacy preferences, signing in, or filling in forms. PERFORMANCE Performance cookies provide anonymous statistics about how customers navigate our site so we can improve site experience and performance. Approved third parties may perform analytics on our behalf, but they cannot use the data for their own purposes. Allow performance category Allowed FUNCTIONAL Functional cookies help us provide useful site features, remember your preferences, and display relevant content. Approved third parties may set these cookies to provide certain site features. If you do not allow these cookies, then some or all of these services may not function properly. Allow functional category Allowed ADVERTISING Advertising cookies may be set through our site by us or our advertising partners and help us deliver relevant marketing content. If you do not allow these cookies, you will experience less relevant advertising. Allow advertising category Allowed Blocking some types of cookies may impact your experience of our sites. You may review and change your choices at any time by clicking Cookie preferences in the footer of this site. We and selected third-parties use cookies or similar technologies as specified in the AWS Cookie Notice . CancelSave preferences UNABLE TO SAVE COOKIE PREFERENCES We will only store essential cookies at this time, because we were unable to save your cookie preferences. If you want to change your cookie preferences, try again later using the link in the AWS console footer, or contact support if the problem persists. Dismiss Contact Us English Create an AWS Account 1. AWS 2. ... 3. Documentation 4. Amazon Relational Database Service (RDS) 5. User Guide for Aurora Feedback Preferences Amazon Aurora User Guide for Aurora * What is Aurora? * Aurora DB clusters * Aurora versions * Regions and Availability Zones * Supported Aurora features by Region and engine * Database activity streams in Aurora * Exporting cluster data to Amazon S3 * Exporting snapshot data to Amazon S3 * Aurora global databases * IAM database authentication in Aurora * Kerberos authentication with Aurora * Aurora machine learning * Performance Insights with Aurora * Amazon RDS Proxy * Aurora Serverless v2 * Aurora Serverless v1 * Data API for Aurora Serverless v1 * Engine-native features * Aurora connection management * DB instance classes * Aurora storage and reliability * Aurora security * High availability for Amazon Aurora * Replication with Aurora * DB instance billing for Aurora * On-Demand DB instances * Reserved DB instances * Setting up your environment * Getting started * Creating an Aurora MySQL DB cluster and connecting to it * Creating an Aurora PostgreSQL DB cluster and connecting to it * Tutorial: Create a web server and an Amazon Aurora DB cluster * Launch an EC2 instance * Create a DB cluster * Install a web server * Tutorials and sample code * Configuring your Aurora DB cluster * Creating a DB cluster * Creating resources with AWS CloudFormation * Using Aurora global databases * Getting started with Aurora global databases * Managing an Aurora global database * Connecting to an Aurora global database * Using write forwarding in an Aurora global database * Using failover in an Aurora global database * Monitoring an Aurora global database * Using Aurora global databases with other AWS services * Upgrading an Amazon Aurora global database * Connecting to a DB cluster * Working with parameter groups * Working with DB cluster parameter groups * Working with DB parameter groups * Comparing parameter groups * Specifying DB parameters * Migrating data to a DB cluster * Managing an Aurora DB cluster * Stopping and starting a cluster * Connecting an EC2 instance * Modifying an Aurora DB cluster * Adding Aurora Replicas * Managing performance and scaling * Cloning a volume for an Aurora DB cluster * Integrating with AWS services * Using Auto Scaling with Aurora replicas * Using machine learning with Aurora * Maintaining an Aurora DB cluster * Rebooting an Aurora DB cluster or instance * Deleting Aurora clusters and instances * Tagging RDS resources * Working with ARNs * Aurora updates * Backing up and restoring an Aurora DB cluster * Overview of backing up and restoring * Backup storage * Creating a DB cluster snapshot * Restoring from a DB cluster snapshot * Copying a DB cluster snapshot * Sharing a DB cluster snapshot * Exporting DB cluster data to Amazon S3 * Exporting DB cluster snapshot data to Amazon S3 * Point-in-time recovery * Deleting a DB cluster snapshot * Tutorial: Restore a DB cluster from a snapshot * Monitoring metrics in an Aurora DB cluster * Overview of monitoring * Viewing cluster status and recommendations * Viewing metrics in the Amazon RDS console * Monitoring Aurora with CloudWatch * Overview of Amazon Aurora and Amazon CloudWatch * Viewing CloudWatch metrics * Creating CloudWatch alarms * Monitoring DB load with Performance Insights * Overview of Performance Insights * Database load * Maximum CPU * Amazon Aurora DB engine, Region, and instance class support for Performance Insights * Pricing and data retention for Performance Insights * Turning Performance Insights on and off * Turning on the Performance Schema for Aurora MySQL * Performance Insights policies * Analyzing metrics with the Performance Insights dashboard * Overview of the dashboard * Accessing the dashboard * Analyzing DB load * Analyzing queries * Overview of the Top SQL tab * Accessing more SQL text * Viewing SQL statistics * Retrieving metrics with the Performance Insights API * Logging Performance Insights calls using AWS CloudTrail * Analyzing performance with DevOps Guru for RDS * Monitoring the OS with Enhanced Monitoring * Overview of Enhanced Monitoring * Setting up and enabling Enhanced Monitoring * Viewing OS metrics in the RDS console * Viewing OS metrics using CloudWatch Logs * Aurora metrics reference * CloudWatch metrics for Aurora * CloudWatch dimensions for Aurora * Availability of Aurora metrics in the Amazon RDS console * CloudWatch metrics for Performance Insights * Counter metrics for Performance Insights * SQL statistics for Performance Insights * SQL statistics for Aurora MySQL * SQL statistics for Aurora PostgreSQL * OS metrics in Enhanced Monitoring * Monitoring events, logs, and database activity streams * Viewing logs, events, and streams in the Amazon RDS console * Monitoring Aurora events * Overview of events for Aurora * Viewing Amazon RDS events * Working with Amazon RDS event notification * Overview of Amazon RDS event notification * Granting permissions * Subscribing to Amazon RDS event notification * Listing Amazon RDS event notification subscriptions * Modifying an Amazon RDS event notification subscription * Adding a source identifier to an Amazon RDS event notification subscription * Removing a source identifier from an Amazon RDS event notification subscription * Listing the Amazon RDS event notification categories * Deleting an Amazon RDS event notification subscription * Creating a rule that triggers on an Amazon Aurora event * Amazon RDS event categories and event messages * Monitoring Aurora logs * Viewing and listing database log files * Downloading a database log file * Watching a database log file * Publishing to CloudWatch Logs * Reading log file contents using REST * MySQL database log files * Overview of Aurora MySQL database logs * Publishing Aurora MySQL logs to Amazon CloudWatch Logs * Managing table-based Aurora MySQL logs * Configuring Aurora MySQL binary logging * Accessing MySQL binary logs * PostgreSQL database log files * Monitoring Aurora API calls in CloudTrail * Monitoring Aurora with Database Activity Streams * Overview * Aurora MySQL network prerequisites * Starting a database activity stream * Getting the activity stream status * Stopping a database activity stream * Monitoring activity streams * Managing access to activity streams * Working with Aurora MySQL * Overview of Aurora MySQL * Aurora MySQL version 3 compatible with MySQL 8.0 * New temporary table behavior in Aurora MySQL version 3 * Comparison of Aurora MySQL version 2 and Aurora MySQL version 3 * Comparison of Aurora MySQL version 3 and MySQL 8.0 Community Edition * Upgrading to Aurora MySQL version 3 * Aurora MySQL version 2 compatible with MySQL 5.7 * Security with Aurora MySQL * Updating applications for new SSL/TLS certificates * Migrating data to Aurora MySQL * Migrating from an external MySQL database to Aurora MySQL * Migrating from a MySQL DB instance to Aurora MySQL * Migrating an RDS for MySQL snapshot to Aurora * Migrating from a MySQL DB instance to Aurora MySQL using a read replica * Managing Aurora MySQL * Managing performance and scaling for Amazon Aurora MySQL * Backtracking a DB cluster * Testing Amazon Aurora using fault injection queries * Altering tables in Amazon Aurora using fast DDL * Displaying volume status for an Aurora DB cluster * Tuning Aurora MySQL with wait events and thread states * Essential concepts for Aurora MySQL tuning * Tuning Aurora MySQL with wait events * cpu * io/aurora_redo_log_flush * io/aurora_respond_to_client * io/file/innodb/innodb_data_file * io/socket/sql/client_connection * io/table/sql/handler * synch/cond/mysys/my_thread_var::suspend * synch/cond/sql/MDL_context::COND_wait_status * synch/mutex/innodb/aurora_lock_thread_slot_futex * synch/mutex/innodb/buf_pool_mutex * synch/mutex/innodb/fil_system_mutex * synch/mutex/innodb/trx_sys_mutex * synch/rwlock/innodb/hash_table_locks * synch/sxlock/innodb/hash_table_locks * Tuning Aurora MySQL with thread states * creating sort index * sending data * Parallel query for Aurora MySQL * Advanced Auditing with Aurora MySQL * Replication with Aurora MySQL * Cross-Region replication * Using binary log (binlog) replication * Using GTID-based replication * Working with multi-master clusters * Integrating Aurora MySQL with AWS services * Authorizing Aurora MySQL to access AWS services * Setting up IAM roles to access AWS services * Creating an IAM policy to access Amazon S3 * Creating an IAM policy to access Lambda * Creating an IAM policy to access CloudWatch Logs * Creating an IAM policy to access AWS KMS * Creating an IAM role to access AWS services * Associating an IAM role with a DB cluster * Enabling network communication to AWS services * Loading data from text files in Amazon S3 * Saving data into text files in Amazon S3 * Invoking a Lambda function from Aurora MySQL * Publishing Aurora MySQL logs to CloudWatch Logs * Using Aurora machine learning with Aurora MySQL * Aurora MySQL lab mode * Best practices with Amazon Aurora MySQL * Aurora MySQL reference * Aurora MySQL updates * Version Numbers and Special Versions * Preparing for Aurora MySQL version 1 end of life * Upgrading Amazon Aurora MySQL DB clusters * Upgrading the minor version or patch level of an Aurora MySQL DB cluster * Upgrading the Aurora MySQL major version of a DB cluster * Database engine updates for Amazon Aurora MySQL version 3 * Database engine updates for Amazon Aurora MySQL version 2 * Database engine updates for Amazon Aurora MySQL version 1 * Database engine updates for Aurora MySQL Serverless clusters * MySQL bugs fixed by Aurora MySQL database engine updates * Security vulnerabilities fixed in Amazon Aurora MySQL * Working with Aurora PostgreSQL * Security with Aurora PostgreSQL * Understanding PostgreSQL roles and permissions * Updating applications for new SSL/TLS certificates * Using Kerberos authentication * Setting up * Managing a DB cluster in a Domain * Connecting with Kerberos authentication * Migrating data to Aurora PostgreSQL * Using Babelfish for Aurora PostgreSQL * Babelfish limitations * Understanding Babelfish architecture and configuration * Babelfish architecture * DB cluster parameter group settings for Babelfish * Collations supported by Babelfish * Managing collations * Collation limitations and differences * Managing Babelfish error handling * Creating a Babelfish for Aurora PostgreSQL DB cluster * Migrating a SQL Server database to Babelfish * Connecting to a Babelfish DB cluster * Creating C# or JDBC client connections to Babelfish * Using a SQL Server client to connect to your DB cluster * Using a PostgreSQL client to connect to your DB cluster * Working with Babelfish * Getting information from the Babelfish system catalog * Differences between Babelfish for Aurora PostgreSQL and SQL Server * T-SQL differences in Babelfish * Using Babelfish features with limited implementation * Using explain plan to improve query performance * Using Aurora PostgreSQL extensions with Babelfish * Troubleshooting Babelfish * Turning off Babelfish * Babelfish versions * Babelfish reference * Unsupported functionality * Supported functionality by Babelfish version * Managing Aurora PostgreSQL * Testing Amazon Aurora PostgreSQL by using fault injection queries * Displaying volume status for an Aurora DB cluster * Specifying the RAM disk for the stats_temp_directory * Tuning with wait events for Aurora PostgreSQL * Essential concepts for Aurora PostgreSQL tuning * Aurora PostgreSQL wait events * Client:ClientRead * Client:ClientWrite * CPU * IO:BufFileRead and IO:BufFileWrite * IO:DataFileRead * IO:XactSync * ipc:damrecordtxack * Lock:advisory * Lock:extend * Lock:Relation * Lock:transactionid * Lock:tuple * lwlock:buffer_content (BufferContent) * LWLock:buffer_mapping * LWLock:BufferIO * LWLock:lock_manager * Timeout:PgSleep * Best practices with Aurora PostgreSQL * Fast failover * Fast recovery after failover * Managing connection churn * Using logical replication for blue-green upgrade * Tuning memory parameters for Aurora PostgreSQL * Replication with Aurora PostgreSQL * Using logical replication * Integrating Aurora PostgreSQL with AWS services * Importing data from Amazon S3 into Aurora PostgreSQL * Exporting PostgreSQL data to Amazon S3 * Invoking a Lambda function from Aurora PostgreSQL * Lambda function reference * Publishing Aurora PostgreSQL logs to CloudWatch Logs * Using Aurora machine learning with Aurora PostgreSQL * Managing query execution plans for Aurora PostgreSQL * Overview of Aurora PostgreSQL query plan management * Best practices for Aurora PostgreSQL query plan management * Understanding query plan management * Capturing Aurora PostgreSQL execution plans * Using Aurora PostgreSQL managed plans * Examining Aurora PostgreSQL query plans in the dba_plans view * Maintaining Aurora PostgreSQL execution plans * Reference * Parameter reference for Aurora PostgreSQL query plan management * Function reference for Aurora PostgreSQL query plan management * Reference for the apg_plan_mgmt.dba_plans view * Working with extensions and foreign data wrappers * Managing large objects more efficiently with the lo module * Managing spatial data with PostGIS * Managing partitions with the pg_partman extension * Scheduling maintenance with the pg_cron extension * Supported foreign data wrappers * Aurora PostgreSQL reference * Aurora PostgreSQL functions reference * aurora_db_instance_identifier * aurora_ccm_status * aurora_global_db_instance_status * aurora_global_db_status * aurora_list_builtins * aurora_replica_status * aurora_stat_backend_waits * aurora_stat_dml_activity * aurora_stat_get_db_commit_latency * aurora_stat_system_waits * aurora_stat_wait_event * aurora_stat_wait_type * aurora_version * aurora_wait_report * Aurora PostgreSQL parameters * Aurora PostgreSQL wait events * Aurora PostgreSQL updates * Identifying versions of Amazon Aurora PostgreSQL * Aurora PostgreSQL releases * Extension versions for Aurora PostgreSQL * Upgrading the PostgreSQL DB engine * Using a long-term support (LTS) release * Using RDS Proxy * Planning where to use RDS Proxy * RDS Proxy concepts and terminology * Getting started with RDS Proxy * Managing an RDS Proxy * Working with RDS Proxy endpoints * Monitoring RDS Proxy with CloudWatch * Working with RDS Proxy events * RDS Proxy examples * Troubleshooting RDS Proxy * Using RDS Proxy with AWS CloudFormation * Using Aurora Serverless v2 * How Aurora Serverless v2 works * Requirements for Aurora Serverless v2 * Getting started with Aurora Serverless v2 * Creating a cluster for Aurora Serverless v2 * Managing Aurora Serverless v2 * Performance and scaling for Aurora Serverless v2 * Using Aurora Serverless v1 * How Aurora Serverless v1 works * Creating an Aurora Serverless v1 DB cluster * Restoring an Aurora Serverless v1 DB cluster * Modifying an Aurora Serverless v1 DB cluster * Scaling Aurora Serverless v1 DB cluster capacity manually * Viewing Aurora Serverless v1 DB clusters * Deleting an Aurora Serverless v1 DB cluster * Aurora Serverless v1 and Aurora database engine versions * Using the Data API * Logging Data API calls with AWS CloudTrail * Using the query editor * DBQMS API reference * Best practices with Aurora * Performing an Aurora proof of concept * Security * Database authentication * Data protection * Data encryption * Encrypting Amazon Aurora resources * AWS KMS key management * Using SSL/TLS to encrypt a connection * Rotating your SSL/TLS certificate * Internetwork traffic privacy * Identity and access management * How Amazon Aurora works with IAM * Identity-based policy examples * AWS managed policies * Policy updates * Cross-service confused deputy prevention * IAM database authentication * Enabling and disabling * Creating and using an IAM policy for IAM database access * Creating a database account using IAM authentication * Connecting to your DB cluster using IAM authentication * Connecting using IAM: AWS CLI and mysql client * Connecting using IAM authentication from the command line: AWS CLI and psql client * Connecting using IAM authentication and the AWS SDK for .NET * Connecting using IAM authentication and the AWS SDK for Go * Connecting using IAM authentication and the AWS SDK for Java * Connecting using IAM authentication and the AWS SDK for Python (Boto3) * Troubleshooting * Logging and monitoring * Compliance validation * Resilience * Infrastructure security * VPC endpoints (AWS PrivateLink) * Security best practices * Controlling access with security groups * Master user account privileges * Service-linked roles * Using Amazon Aurora with Amazon VPC * Working with a DB cluster in a VPC * Scenarios for accessing a DB cluster in a VPC * Tutorial: Create a VPC for use with a DB cluster (IPv4 only) * Tutorial: Create a VPC for use with a DB cluster (dual-stack mode) * Quotas and constraints * Troubleshooting * Amazon RDS API reference * Using the Query API * Troubleshooting applications * Document history * AWS glossary Fast failover with Amazon Aurora PostgreSQL - Amazon Aurora AWSDocumentationAmazon Relational Database Service (RDS)User Guide for Aurora Setting TCP keepalives parametersConfiguring your application for fast failoverTesting failoverFast failover example in Java FAST FAILOVER WITH AMAZON AURORA POSTGRESQL PDFRSS Following, you can learn how to make sure that failover occurs as fast as possible. To recover quickly after failover, you can use cluster cache management for your Aurora PostgreSQL DB cluster. For more information, see Fast recovery after failover with cluster cache management for Aurora PostgreSQL. Some of the steps that you can take to make failover perform fast include the following: * Set Transmission Control Protocol (TCP) keepalives with short time frames, to stop longer running queries before the read timeout expires if there's a failure. * Set timeouts for Java Domain Name System (DNS) caching aggressively. Doing this helps ensure the Aurora read-only endpoint can properly cycle through read-only nodes on later connection attempts. * Set the timeout variables used in the JDBC connection string as low as possible. Use separate connection objects for short- and long-running queries. * Use the read and write Aurora endpoints that are provided to connect to the cluster. * Use RDS API operations to test application response on server-side failures. Also, use a packet dropping tool to test application response for client-side failures. * Use the AWS JDBC Driver for PostgreSQL (preview) to take full advantage of the failover capabilities of Aurora PostgreSQL. For more information about the AWS JDBC Driver for PostgreSQL and complete instructions for using it, see the AWS JDBC Driver for PostgreSQL GitHub repository. These are covered in more detail following. Topics * Setting TCP keepalives parameters * Configuring your application for fast failover * Testing failover * Fast failover example in Java SETTING TCP KEEPALIVES PARAMETERS When you set up a TCP connection, a set of timers is associated with the connection. When the keepalive timer reaches zero, a keepalive probe packet is sent to the connection endpoint. If the probe receives a reply, you can assume that the connection is still up and running. Turning on TCP keepalive parameters and setting them aggressively ensures that if your client can't connect to the database, any active connections are quickly closed. The application can then connect to a new endpoint. Make sure to set the following TCP keepalive parameters: * tcp_keepalive_time controls the time, in seconds, after which a keepalive packet is sent when no data has been sent by the socket. ACKs aren't considered data. We recommend the following setting: tcp_keepalive_time = 1 * tcp_keepalive_intvl controls the time, in seconds, between sending subsequent keepalive packets after the initial packet is sent. Set this time by using the tcp_keepalive_time parameter. We recommend the following setting: tcp_keepalive_intvl = 1 * tcp_keepalive_probes is the number of unacknowledged keepalive probes that occur before the application is notified. We recommend the following setting: tcp_keepalive_probes = 5 These settings should notify the application within five seconds when the database stops responding. If keepalive packets are often dropped within the application's network, you can set a higher tcp_keepalive_probes value. Doing this allows for more buffer in less reliable networks, although it increases the time that it takes to detect an actual failure. To set TCP keepalive parameters on Linux 1. Test how to configure your TCP keepalive parameters. We recommend doing so by using the command line with the following commands. This suggested configuration is system-wide. In other words, it also affects all other applications that create sockets with the SO_KEEPALIVE option on. sudo sysctl net.ipv4.tcp_keepalive_time=1 sudo sysctl net.ipv4.tcp_keepalive_intvl=1 sudo sysctl net.ipv4.tcp_keepalive_probes=5 2. After you've found a configuration that works for your application, persist these settings by adding the following lines to /etc/sysctl.conf, including any changes you made: tcp_keepalive_time = 1 tcp_keepalive_intvl = 1 tcp_keepalive_probes = 5 CONFIGURING YOUR APPLICATION FOR FAST FAILOVER Following, you can find a discussion of several configuration changes for Aurora PostgreSQL that you can make for fast failover. To learn more about PostgreSQL JDBC driver setup and configuration, see the PostgreSQL JDBC Driver documentation. Topics * Reducing DNS cache timeouts * Setting an Aurora PostgreSQL connection string for fast failover * Other options for obtaining the host string REDUCING DNS CACHE TIMEOUTS When your application tries to establish a connection after a failover, the new Aurora PostgreSQL writer will be a previous reader. You can find it by using the Aurora read-only endpoint before DNS updates have fully propagated. Setting the java DNS time to live (TTL) to a low value, such as under 30 seconds, helps cycle between reader nodes on later connection attempts. // Sets internal TTL to match the Aurora RO Endpoint TTL java.security.Security.setProperty("networkaddress.cache.ttl" , "1"); // If the lookup fails, default to something like small to retry java.security.Security.setProperty("networkaddress.cache.negative.ttl" , "3"); SETTING AN AURORA POSTGRESQL CONNECTION STRING FOR FAST FAILOVER To use Aurora PostgreSQL fast failover, make sure that your application's connection string has a list of hosts instead of just a single host. Following is an example connection string that you can use to connect to an Aurora PostgreSQL cluster. In this example, the hosts are in bold. jdbc:postgresql://myauroracluster.cluster-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432, myauroracluster.cluster-ro-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432 /postgres?user=<primaryuser>&password=<primarypw>&loginTimeout=2 &connectTimeout=2&cancelSignalTimeout=2&socketTimeout=60 &tcpKeepAlive=true&targetServerType=primary For best availability and to avoid a dependency on the RDS API, we recommend that you maintain a file to connect with. This file contains a host string that your application reads from when you establish a connection to the database. This host string has all the Aurora endpoints available for the cluster. For more information about Aurora endpoints, see Amazon Aurora connection management. For example, you might store your endpoints in a local file as shown following. myauroracluster.cluster-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432, myauroracluster.cluster-ro-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432 Your application reads from this file to populate the host section of the JDBC connection string. Renaming the DB cluster causes these endpoints to change. Make sure that your application handles this event if it occurs. Another option is to use a list of DB instance nodes, as follows. my-node1.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com:5432, my-node2.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com:5432, my-node3.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com:5432, my-node4.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com:5432 The benefit of this approach is that the PostgreSQL JDBC connection driver loops through all nodes on this list to find a valid connection. In contrast, when you use the Aurora endpoints only two nodes are tried in each connection attempt. However, there's a downside to using DB instance nodes. If you add or remove nodes from your cluster and the list of instance endpoints becomes stale, the connection driver might never find the correct host to connect to. To help ensure that your application doesn't wait too long to connect to any one host, set the following parameters aggressively: * targetServerType – Controls whether the driver connects to a write or read node. To ensure that your applications reconnect only to a write node, set the targetServerType value to primary. Values for the targetServerType parameter include primary, secondary, any, and preferSecondary. The preferSecondary value attempts to establish a connection to a reader first. It connects to the writer if no reader connection can be established. * loginTimeout – Controls how long your application waits to log in to the database after a socket connection has been established. * connectTimeout – Controls how long the socket waits to establish a connection to the database. You can modify other application parameters to speed up the connection process, depending on how aggressive you want your application to be: * cancelSignalTimeout – In some applications, you might want to send a "best effort" cancel signal on a query that has timed out. If this cancel signal is in your failover path, consider setting it aggressively to avoid sending this signal to a dead host. * socketTimeout – This parameter controls how long the socket waits for read operations. This parameter can be used as a global "query timeout" to ensure no query waits longer than this value. A good practice is to have two connection handlers. One connection handler runs short-lived queries and sets this value lower. Another connection handler, for long-running queries, has this value set much higher. With this approach, you can rely on TCP keepalive parameters to stop long-running queries if the server goes down. * tcpKeepAlive – Turn on this parameter to ensure the TCP keepalive parameters that you set are respected. * loadBalanceHosts – When set to true, this parameter has the application connect to a random host chosen from a list of candidate hosts. OTHER OPTIONS FOR OBTAINING THE HOST STRING You can get the host string from several sources, including the aurora_replica_status function and by using the Amazon RDS API. In many cases, you need to determine who the writer of the cluster is or to find other reader nodes in the cluster. To do this, your application can connect to any DB instance in the DB cluster and query the aurora_replica_status function. You can use this function to reduce the amount of time it takes to find a host to connect to. However, in certain network failure scenarios the aurora_replica_status function might show out-of-date or incomplete information. A good way to ensure that your application can find a node to connect to is to try to connect to the cluster writer endpoint and then the cluster reader endpoint. You do this until you can establish a readable connection. These endpoints don't change unless you rename your DB cluster. Thus, you can generally leave them as static members of your application or store them in a resource file that your application reads from. After you establish a connection using one of these endpoints, you can get information about the rest of the cluster. To do this, call the aurora_replica_status function. For example, the following command retrieves information with aurora_replica_status. postgres=> SELECT server_id, session_id, highest_lsn_rcvd, cur_replay_latency_in_usec, now(), last_update_timestamp FROM aurora_replica_status(); server_id | session_id | highest_lsn_rcvd | cur_replay_latency_in_usec | now | last_update_timestamp -----------+--------------------------------------+------------------+----------------------------+-------------------------------+------------------------ mynode-1 | 3e3c5044-02e2-11e7-b70d-95172646d6ca | 594221001 | 201421 | 2017-03-07 19:50:24.695322+00 | 2017-03-07 19:50:23+00 mynode-2 | 1efdd188-02e4-11e7-becd-f12d7c88a28a | 594221001 | 201350 | 2017-03-07 19:50:24.695322+00 | 2017-03-07 19:50:23+00 mynode-3 | MASTER_SESSION_ID | | | 2017-03-07 19:50:24.695322+00 | 2017-03-07 19:50:23+00 (3 rows) For example, the hosts section of your connection string might start with both the writer and reader cluster endpoints, as shown following. myauroracluster.cluster-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432, myauroracluster.cluster-ro-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432 In this scenario, your application attempts to establish a connection to any node type, primary or secondary. When your application is connected, a good practice is to first examine the read/write status of the node. To do this, query for the result of the command SHOW transaction_read_only. If the return value of the query is OFF, then you successfully connected to the primary node. However, suppose that the return value is ON and your application requires a read/write connection. In this case, you can call the aurora_replica_status function to determine the server_id that has session_id='MASTER_SESSION_ID'. This function gives you the name of the primary node. You can use this with the endpointPostfix described following. Make sure that you're aware when you connect to a replica that has stale data. When this happens, the aurora_replica_status function might show out-of-date information. You can set a threshold for staleness at the application level. To check this, you can look at the difference between the server time and the last_update_timestamp value. In general, your application should avoid flipping between two hosts due to conflicting information returned by the aurora_replica_status function. Your application should try all known hosts first instead of following the data returned by aurora_replica_status. LISTING INSTANCES USING THE DESCRIBEDBCLUSTERS API OPERATION, EXAMPLE IN JAVA You can programmatically find the list of instances by using the AWS SDK for Java, specifically the DescribeDBClusters API operation. Following is a small example of how you might do this in Java 8. AmazonRDS client = AmazonRDSClientBuilder.defaultClient(); DescribeDBClustersRequest request = new DescribeDBClustersRequest() .withDBClusterIdentifier(clusterName); DescribeDBClustersResult result = rdsClient.describeDBClusters(request); DBCluster singleClusterResult = result.getDBClusters().get(0); String pgJDBCEndpointStr = singleClusterResult.getDBClusterMembers().stream() .sorted(Comparator.comparing(DBClusterMember::getIsClusterWriter) .reversed()) // This puts the writer at the front of the list .map(m -> m.getDBInstanceIdentifier() + endpointPostfix + ":" + singleClusterResult.getPort())) .collect(Collectors.joining(",")); Here, pgJDBCEndpointStr contains a formatted list of endpoints, as shown following. my-node1.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com:5432, my-node2.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com:5432 The variable endpointPostfix can be a constant that your application sets. Or your application can get it by querying the DescribeDBInstances API operation for a single instance in your cluster. This value remains constant within an AWS Region and for an individual customer. So it saves an API call to simply keep this constant in a resource file that your application reads from. In the example preceding, it's set to the following. .cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com For availability purposes, a good practice is to default to using the Aurora endpoints of your DB cluster if the API isn't responding or takes too long to respond. The endpoints are guaranteed to be up to date within the time it takes to update the DNS record. Updating the DNS record with an endpoint typically takes less than 30 seconds. You can store the endpoint in a resource file that your application consumes. TESTING FAILOVER In all cases you must have a DB cluster with two or more DB instances in it. From the server side, certain API operations can cause an outage that can be used to test how your applications responds: * FailoverDBCluster – This operation attempts to promote a new DB instance in your DB cluster to writer. The following code example shows how you can use failoverDBCluster to cause an outage. For more details about setting up an Amazon RDS client, see Using the AWS SDK for Java. public void causeFailover() { final AmazonRDS rdsClient = AmazonRDSClientBuilder.defaultClient(); FailoverDBClusterRequest request = new FailoverDBClusterRequest(); request.setDBClusterIdentifier("cluster-identifier"); rdsClient.failoverDBCluster(request); } * RebootDBInstance – Failover isn't guaranteed with this API operation. It shuts down the database on the writer, however. You can use it to test how your application responds to connections dropping. The ForceFailover parameter doesn't apply for Aurora engines. Instead, use the FailoverDBCluster API operation. * ModifyDBCluster – Modifying the Port parameter causes an outage when the nodes in the cluster begin listening on a new port. In general, your application can respond to this failure first by ensuring that only your application controls port changes. Also, ensure that it can appropriately update the endpoints it depends on. You can do this by having someone manually update the port when they make modifications at the API level. Or you can do this by using the RDS API in your application to determine if the port has changed. * ModifyDBInstance – Modifying the DBInstanceClass parameter causes an outage. * DeleteDBInstance – Deleting the primary (writer) causes a new DB instance to be promoted to writer in your DB cluster. From the application or client side, if you use Linux, you can test how the application responds to sudden packet drops. You can do this based on whether port, host, or if TCP keepalive packets are sent or received by using the iptables command. FAST FAILOVER EXAMPLE IN JAVA The following code example shows how an application might set up an Aurora PostgreSQL driver manager. The application calls the getConnection function when it needs a connection. A call to getConnection can fail to find a valid host. An example is when no writer is found but the targetServerType parameter is set to primary. In this case, the calling application should simply retry calling the function. To avoid pushing the retry behavior onto the application, you can wrap this retry call into a connection pooler. With most connection poolers, you can specify a JDBC connection string. So your application can call into getJdbcConnectionString and pass that to the connection pooler. Doing this means you can use faster failover with Aurora PostgreSQL. import java.sql.Connection; import java.sql.DriverManager; import java.sql.SQLException; import java.sql.Statement; import java.util.ArrayList; import java.util.List; import java.util.stream.Collectors; import java.util.stream.IntStream; import org.joda.time.Duration; public class FastFailoverDriverManager { private static Duration LOGIN_TIMEOUT = Duration.standardSeconds(2); private static Duration CONNECT_TIMEOUT = Duration.standardSeconds(2); private static Duration CANCEL_SIGNAL_TIMEOUT = Duration.standardSeconds(1); private static Duration DEFAULT_SOCKET_TIMEOUT = Duration.standardSeconds(5); public FastFailoverDriverManager() { try { Class.forName("org.postgresql.Driver"); } catch (ClassNotFoundException e) { e.printStackTrace(); } /* * RO endpoint has a TTL of 1s, we should honor that here. Setting this aggressively makes sure that when * the PG JDBC driver creates a new connection, it will resolve a new different RO endpoint on subsequent attempts * (assuming there is > 1 read node in your cluster) */ java.security.Security.setProperty("networkaddress.cache.ttl" , "1"); // If the lookup fails, default to something like small to retry java.security.Security.setProperty("networkaddress.cache.negative.ttl" , "3"); } public Connection getConnection(String targetServerType) throws SQLException { return getConnection(targetServerType, DEFAULT_SOCKET_TIMEOUT); } public Connection getConnection(String targetServerType, Duration queryTimeout) throws SQLException { Connection conn = DriverManager.getConnection(getJdbcConnectionString(targetServerType, queryTimeout)); /* * A good practice is to set socket and statement timeout to be the same thing since both * the client AND server will stop the query at the same time, leaving no running queries * on the backend */ Statement st = conn.createStatement(); st.execute("set statement_timeout to " + queryTimeout.getMillis()); st.close(); return conn; } private static String urlFormat = "jdbc:postgresql://%s" + "/postgres" + "?user=%s" + "&password=%s" + "&loginTimeout=%d" + "&connectTimeout=%d" + "&cancelSignalTimeout=%d" + "&socketTimeout=%d" + "&targetServerType=%s" + "&tcpKeepAlive=true" + "&ssl=true" + "&loadBalanceHosts=true"; public String getJdbcConnectionString(String targetServerType, Duration queryTimeout) { return String.format(urlFormat, getFormattedEndpointList(getLocalEndpointList()), CredentialManager.getUsername(), CredentialManager.getPassword(), LOGIN_TIMEOUT.getStandardSeconds(), CONNECT_TIMEOUT.getStandardSeconds(), CANCEL_SIGNAL_TIMEOUT.getStandardSeconds(), queryTimeout.getStandardSeconds(), targetServerType ); } private List<String> getLocalEndpointList() { /* * As mentioned in the best practices doc, a good idea is to read a local resource file and parse the cluster endpoints. * For illustration purposes, the endpoint list is hardcoded here */ List<String> newEndpointList = new ArrayList<>(); newEndpointList.add("myauroracluster.cluster-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432"); newEndpointList.add("myauroracluster.cluster-ro-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432"); return newEndpointList; } private static String getFormattedEndpointList(List<String> endpoints) { return IntStream.range(0, endpoints.size()) .mapToObj(i -> endpoints.get(i).toString()) .collect(Collectors.joining(",")); } } Javascript is disabled or is unavailable in your browser. To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions. Document Conventions Best practices with Aurora PostgreSQL Fast recovery after failover Did this page help you? - Yes Thanks for letting us know we're doing a good job! If you've got a moment, please tell us what we did right so we can do more of it. Did this page help you? - No Thanks for letting us know this page needs work. We're sorry we let you down. If you've got a moment, please tell us how we can make the documentation better. Did this page help you? Yes No Provide feedback Edit this page on GitHub Next topic:Fast recovery after failover Previous topic:Best practices with Aurora PostgreSQL Need help? * Try AWS re:Post * Connect with an AWS IQ expert PrivacySite termsCookie preferences © 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved. On this page -------------------------------------------------------------------------------- * Setting TCP keepalives parameters * Configuring your application for fast failover * Testing failover * Fast failover example in Java DID THIS PAGE HELP YOU? - NO Thanks for letting us know this page needs work. We're sorry we let you down. If you've got a moment, please tell us how we can make the documentation better. Feedback