docs.aws.amazon.com Open in urlscan Pro
18.66.147.13  Public Scan

Submitted URL: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.BestPractices.FastFailover.html#AuroraPostgreS...
Effective URL: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.BestPractices.FastFailover.html
Submission: On November 04 via api from DE — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

SELECT YOUR COOKIE PREFERENCES

We use essential cookies and similar tools that are necessary to provide our
site and services. We use performance cookies to collect anonymous statistics so
we can understand how customers use our site and make improvements. Essential
cookies cannot be deactivated, but you can click “Customize cookies” to decline
performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide
useful site features, remember your preferences, and display relevant content,
including relevant advertising. To continue without accepting these cookies,
click “Continue without accepting.” To make more detailed choices or learn more,
click “Customize cookies.”

Accept all cookiesContinue without acceptingCustomize cookies


CUSTOMIZE COOKIE PREFERENCES

We use cookies and similar tools (collectively, "cookies") for the following
purposes.


ESSENTIAL

Essential cookies are necessary to provide our site and services and cannot be
deactivated. They are usually set in response to your actions on the site, such
as setting your privacy preferences, signing in, or filling in forms.




PERFORMANCE

Performance cookies provide anonymous statistics about how customers navigate
our site so we can improve site experience and performance. Approved third
parties may perform analytics on our behalf, but they cannot use the data for
their own purposes.

Allow performance category
Allowed


FUNCTIONAL

Functional cookies help us provide useful site features, remember your
preferences, and display relevant content. Approved third parties may set these
cookies to provide certain site features. If you do not allow these cookies,
then some or all of these services may not function properly.

Allow functional category
Allowed


ADVERTISING

Advertising cookies may be set through our site by us or our advertising
partners and help us deliver relevant marketing content. If you do not allow
these cookies, you will experience less relevant advertising.

Allow advertising category
Allowed

Blocking some types of cookies may impact your experience of our sites. You may
review and change your choices at any time by clicking Cookie preferences in the
footer of this site. We and selected third-parties use cookies or similar
technologies as specified in the AWS Cookie Notice

.

CancelSave preferences




UNABLE TO SAVE COOKIE PREFERENCES

We will only store essential cookies at this time, because we were unable to
save your cookie preferences.

If you want to change your cookie preferences, try again later using the link in
the AWS console footer, or contact support if the problem persists.

Dismiss


Contact Us
English


Create an AWS Account
 1. AWS
 2. ...
    
    
 3. Documentation
 4. Amazon Relational Database Service (RDS)
 5. User Guide for Aurora

Feedback
Preferences
Amazon Aurora
User Guide for Aurora
 * What is Aurora?
    * Aurora DB clusters
    * Aurora versions
    * Regions and Availability Zones
    * Supported Aurora features by Region and engine
       * Database activity streams in Aurora
       * Exporting cluster data to Amazon S3
       * Exporting snapshot data to Amazon S3
       * Aurora global databases
       * IAM database authentication in Aurora
       * Kerberos authentication with Aurora
       * Aurora machine learning
       * Performance Insights with Aurora
       * Amazon RDS Proxy
       * Aurora Serverless v2
       * Aurora Serverless v1
       * Data API for Aurora Serverless v1
       * Engine-native features
   
    * Aurora connection management
    * DB instance classes
    * Aurora storage and reliability
    * Aurora security
    * High availability for Amazon Aurora
    * Replication with Aurora
    * DB instance billing for Aurora
       * On-Demand DB instances
       * Reserved DB instances

 * Setting up your environment
 * Getting started
    * Creating an Aurora MySQL DB cluster and connecting to it
    * Creating an Aurora PostgreSQL DB cluster and connecting to it
    * Tutorial: Create a web server and an Amazon Aurora DB cluster
       * Launch an EC2 instance
       * Create a DB cluster
       * Install a web server

 * Tutorials and sample code
 * Configuring your Aurora DB cluster
    * Creating a DB cluster
    * Creating resources with AWS CloudFormation
    * Using Aurora global databases
       * Getting started with Aurora global databases
       * Managing an Aurora global database
       * Connecting to an Aurora global database
       * Using write forwarding in an Aurora global database
       * Using failover in an Aurora global database
       * Monitoring an Aurora global database
       * Using Aurora global databases with other AWS services
       * Upgrading an Amazon Aurora global database
   
    * Connecting to a DB cluster
    * Working with parameter groups
       * Working with DB cluster parameter groups
       * Working with DB parameter groups
       * Comparing parameter groups
       * Specifying DB parameters
   
    * Migrating data to a DB cluster

 * Managing an Aurora DB cluster
    * Stopping and starting a cluster
    * Connecting an EC2 instance
    * Modifying an Aurora DB cluster
    * Adding Aurora Replicas
    * Managing performance and scaling
    * Cloning a volume for an Aurora DB cluster
    * Integrating with AWS services
       * Using Auto Scaling with Aurora replicas
       * Using machine learning with Aurora
   
    * Maintaining an Aurora DB cluster
    * Rebooting an Aurora DB cluster or instance
    * Deleting Aurora clusters and instances
    * Tagging RDS resources
    * Working with ARNs
    * Aurora updates

 * Backing up and restoring an Aurora DB cluster
    * Overview of backing up and restoring
    * Backup storage
    * Creating a DB cluster snapshot
    * Restoring from a DB cluster snapshot
    * Copying a DB cluster snapshot
    * Sharing a DB cluster snapshot
    * Exporting DB cluster data to Amazon S3
    * Exporting DB cluster snapshot data to Amazon S3
    * Point-in-time recovery
    * Deleting a DB cluster snapshot
    * Tutorial: Restore a DB cluster from a snapshot

 * Monitoring metrics in an Aurora DB cluster
    * Overview of monitoring
    * Viewing cluster status and recommendations
    * Viewing metrics in the Amazon RDS console
    * Monitoring Aurora with CloudWatch
       * Overview of Amazon Aurora and Amazon CloudWatch
       * Viewing CloudWatch metrics
       * Creating CloudWatch alarms
   
    * Monitoring DB load with Performance Insights
       * Overview of Performance Insights
          * Database load
          * Maximum CPU
          * Amazon Aurora DB engine, Region, and instance class support for
            Performance Insights
          * Pricing and data retention for Performance Insights
      
       * Turning Performance Insights on and off
       * Turning on the Performance Schema for Aurora MySQL
       * Performance Insights policies
       * Analyzing metrics with the Performance Insights dashboard
          * Overview of the dashboard
          * Accessing the dashboard
          * Analyzing DB load
          * Analyzing queries
             * Overview of the Top SQL tab
             * Accessing more SQL text
             * Viewing SQL statistics
      
       * Retrieving metrics with the Performance Insights API
       * Logging Performance Insights calls using AWS CloudTrail
   
    * Analyzing performance with DevOps Guru for RDS
    * Monitoring the OS with Enhanced Monitoring
       * Overview of Enhanced Monitoring
       * Setting up and enabling Enhanced Monitoring
       * Viewing OS metrics in the RDS console
       * Viewing OS metrics using CloudWatch Logs
   
    * Aurora metrics reference
       * CloudWatch metrics for Aurora
       * CloudWatch dimensions for Aurora
       * Availability of Aurora metrics in the Amazon RDS console
       * CloudWatch metrics for Performance Insights
       * Counter metrics for Performance Insights
       * SQL statistics for Performance Insights
          * SQL statistics for Aurora MySQL
          * SQL statistics for Aurora PostgreSQL
      
       * OS metrics in Enhanced Monitoring

 * Monitoring events, logs, and database activity streams
    * Viewing logs, events, and streams in the Amazon RDS console
    * Monitoring Aurora events
       * Overview of events for Aurora
       * Viewing Amazon RDS events
       * Working with Amazon RDS event notification
          * Overview of Amazon RDS event notification
          * Granting permissions
          * Subscribing to Amazon RDS event notification
          * Listing Amazon RDS event notification subscriptions
          * Modifying an Amazon RDS event notification subscription
          * Adding a source identifier to an Amazon RDS event notification
            subscription
          * Removing a source identifier from an Amazon RDS event notification
            subscription
          * Listing the Amazon RDS event notification categories
          * Deleting an Amazon RDS event notification subscription
      
       * Creating a rule that triggers on an Amazon Aurora event
       * Amazon RDS event categories and event messages
   
    * Monitoring Aurora logs
       * Viewing and listing database log files
       * Downloading a database log file
       * Watching a database log file
       * Publishing to CloudWatch Logs
       * Reading log file contents using REST
       * MySQL database log files
          * Overview of Aurora MySQL database logs
          * Publishing Aurora MySQL logs to Amazon CloudWatch Logs
          * Managing table-based Aurora MySQL logs
          * Configuring Aurora MySQL binary logging
          * Accessing MySQL binary logs
      
       * PostgreSQL database log files
   
    * Monitoring Aurora API calls in CloudTrail
    * Monitoring Aurora with Database Activity Streams
       * Overview
       * Aurora MySQL network prerequisites
       * Starting a database activity stream
       * Getting the activity stream status
       * Stopping a database activity stream
       * Monitoring activity streams
       * Managing access to activity streams

 * Working with Aurora MySQL
    * Overview of Aurora MySQL
       * Aurora MySQL version 3 compatible with MySQL 8.0
          * New temporary table behavior in Aurora MySQL version 3
          * Comparison of Aurora MySQL version 2 and Aurora MySQL version 3
          * Comparison of Aurora MySQL version 3 and MySQL 8.0 Community Edition
          * Upgrading to Aurora MySQL version 3
      
       * Aurora MySQL version 2 compatible with MySQL 5.7
   
    * Security with Aurora MySQL
    * Updating applications for new SSL/TLS certificates
    * Migrating data to Aurora MySQL
       * Migrating from an external MySQL database to Aurora MySQL
       * Migrating from a MySQL DB instance to Aurora MySQL
          * Migrating an RDS for MySQL snapshot to Aurora
          * Migrating from a MySQL DB instance to Aurora MySQL using a read
            replica
   
    * Managing Aurora MySQL
       * Managing performance and scaling for Amazon Aurora MySQL
       * Backtracking a DB cluster
       * Testing Amazon Aurora using fault injection queries
       * Altering tables in Amazon Aurora using fast DDL
       * Displaying volume status for an Aurora DB cluster
   
    * Tuning Aurora MySQL with wait events and thread states
       * Essential concepts for Aurora MySQL tuning
       * Tuning Aurora MySQL with wait events
          * cpu
          * io/aurora_redo_log_flush
          * io/aurora_respond_to_client
          * io/file/innodb/innodb_data_file
          * io/socket/sql/client_connection
          * io/table/sql/handler
          * synch/cond/mysys/my_thread_var::suspend
          * synch/cond/sql/MDL_context::COND_wait_status
          * synch/mutex/innodb/aurora_lock_thread_slot_futex
          * synch/mutex/innodb/buf_pool_mutex
          * synch/mutex/innodb/fil_system_mutex
          * synch/mutex/innodb/trx_sys_mutex
          * synch/rwlock/innodb/hash_table_locks
          * synch/sxlock/innodb/hash_table_locks
      
       * Tuning Aurora MySQL with thread states
          * creating sort index
          * sending data
   
    * Parallel query for Aurora MySQL
    * Advanced Auditing with Aurora MySQL
    * Replication with Aurora MySQL
       * Cross-Region replication
       * Using binary log (binlog) replication
       * Using GTID-based replication
   
    * Working with multi-master clusters
    * Integrating Aurora MySQL with AWS services
       * Authorizing Aurora MySQL to access AWS services
          * Setting up IAM roles to access AWS services
             * Creating an IAM policy to access Amazon S3
             * Creating an IAM policy to access Lambda
             * Creating an IAM policy to access CloudWatch Logs
             * Creating an IAM policy to access AWS KMS
             * Creating an IAM role to access AWS services
             * Associating an IAM role with a DB cluster
         
          * Enabling network communication to AWS services
      
       * Loading data from text files in Amazon S3
       * Saving data into text files in Amazon S3
       * Invoking a Lambda function from Aurora MySQL
       * Publishing Aurora MySQL logs to CloudWatch Logs
       * Using Aurora machine learning with Aurora MySQL
   
    * Aurora MySQL lab mode
    * Best practices with Amazon Aurora MySQL
    * Aurora MySQL reference
    * Aurora MySQL updates
       * Version Numbers and Special Versions
       * Preparing for Aurora MySQL version 1 end of life
       * Upgrading Amazon Aurora MySQL DB clusters
          * Upgrading the minor version or patch level of an Aurora MySQL DB
            cluster
          * Upgrading the Aurora MySQL major version of a DB cluster
      
       * Database engine updates for Amazon Aurora MySQL version 3
       * Database engine updates for Amazon Aurora MySQL version 2
       * Database engine updates for Amazon Aurora MySQL version 1
       * Database engine updates for Aurora MySQL Serverless clusters
       * MySQL bugs fixed by Aurora MySQL database engine updates
       * Security vulnerabilities fixed in Amazon Aurora MySQL

 * Working with Aurora PostgreSQL
    * Security with Aurora PostgreSQL
       * Understanding PostgreSQL roles and permissions
   
    * Updating applications for new SSL/TLS certificates
    * Using Kerberos authentication
       * Setting up
       * Managing a DB cluster in a Domain
       * Connecting with Kerberos authentication
   
    * Migrating data to Aurora PostgreSQL
    * Using Babelfish for Aurora PostgreSQL
       * Babelfish limitations
       * Understanding Babelfish architecture and configuration
          * Babelfish architecture
          * DB cluster parameter group settings for Babelfish
          * Collations supported by Babelfish
             * Managing collations
             * Collation limitations and differences
         
          * Managing Babelfish error handling
      
       * Creating a Babelfish for Aurora PostgreSQL DB cluster
       * Migrating a SQL Server database to Babelfish
       * Connecting to a Babelfish DB cluster
          * Creating C# or JDBC client connections to Babelfish
          * Using a SQL Server client to connect to your DB cluster
          * Using a PostgreSQL client to connect to your DB cluster
      
       * Working with Babelfish
          * Getting information from the Babelfish system catalog
          * Differences between Babelfish for Aurora PostgreSQL and SQL Server
             * T-SQL differences in Babelfish
         
          * Using Babelfish features with limited implementation
          * Using explain plan to improve query performance
          * Using Aurora PostgreSQL extensions with Babelfish
      
       * Troubleshooting Babelfish
       * Turning off Babelfish
       * Babelfish versions
       * Babelfish reference
          * Unsupported functionality
          * Supported functionality by Babelfish version
   
    * Managing Aurora PostgreSQL
       * Testing Amazon Aurora PostgreSQL by using fault injection queries
       * Displaying volume status for an Aurora DB cluster
       * Specifying the RAM disk for the stats_temp_directory
   
    * Tuning with wait events for Aurora PostgreSQL
       * Essential concepts for Aurora PostgreSQL tuning
       * Aurora PostgreSQL wait events
       * Client:ClientRead
       * Client:ClientWrite
       * CPU
       * IO:BufFileRead and IO:BufFileWrite
       * IO:DataFileRead
       * IO:XactSync
       * ipc:damrecordtxack
       * Lock:advisory
       * Lock:extend
       * Lock:Relation
       * Lock:transactionid
       * Lock:tuple
       * lwlock:buffer_content (BufferContent)
       * LWLock:buffer_mapping
       * LWLock:BufferIO
       * LWLock:lock_manager
       * Timeout:PgSleep
   
    * Best practices with Aurora PostgreSQL
       * Fast failover
       * Fast recovery after failover
       * Managing connection churn
       * Using logical replication for blue-green upgrade
       * Tuning memory parameters for Aurora PostgreSQL
   
    * Replication with Aurora PostgreSQL
       * Using logical replication
   
    * Integrating Aurora PostgreSQL with AWS services
       * Importing data from Amazon S3 into Aurora PostgreSQL
       * Exporting PostgreSQL data to Amazon S3
       * Invoking a Lambda function from Aurora PostgreSQL
          * Lambda function reference
      
       * Publishing Aurora PostgreSQL logs to CloudWatch Logs
       * Using Aurora machine learning with Aurora PostgreSQL
   
    * Managing query execution plans for Aurora PostgreSQL
       * Overview of Aurora PostgreSQL query plan management
       * Best practices for Aurora PostgreSQL query plan management
       * Understanding query plan management
       * Capturing Aurora PostgreSQL execution plans
       * Using Aurora PostgreSQL managed plans
       * Examining Aurora PostgreSQL query plans in the dba_plans view
       * Maintaining Aurora PostgreSQL execution plans
       * Reference
          * Parameter reference for Aurora PostgreSQL query plan management
          * Function reference for Aurora PostgreSQL query plan management
          * Reference for the apg_plan_mgmt.dba_plans view
   
    * Working with extensions and foreign data wrappers
       * Managing large objects more efficiently with the lo module
       * Managing spatial data with PostGIS
       * Managing partitions with the pg_partman extension
       * Scheduling maintenance with the pg_cron extension
       * Supported foreign data wrappers
   
    * Aurora PostgreSQL reference
       * Aurora PostgreSQL functions reference
          * aurora_db_instance_identifier
          * aurora_ccm_status
          * aurora_global_db_instance_status
          * aurora_global_db_status
          * aurora_list_builtins
          * aurora_replica_status
          * aurora_stat_backend_waits
          * aurora_stat_dml_activity
          * aurora_stat_get_db_commit_latency
          * aurora_stat_system_waits
          * aurora_stat_wait_event
          * aurora_stat_wait_type
          * aurora_version
          * aurora_wait_report
      
       * Aurora PostgreSQL parameters
       * Aurora PostgreSQL wait events
   
    * Aurora PostgreSQL updates
       * Identifying versions of Amazon Aurora PostgreSQL
       * Aurora PostgreSQL releases
       * Extension versions for Aurora PostgreSQL
       * Upgrading the PostgreSQL DB engine
       * Using a long-term support (LTS) release

 * Using RDS Proxy
    * Planning where to use RDS Proxy
    * RDS Proxy concepts and terminology
    * Getting started with RDS Proxy
    * Managing an RDS Proxy
    * Working with RDS Proxy endpoints
    * Monitoring RDS Proxy with CloudWatch
    * Working with RDS Proxy events
    * RDS Proxy examples
    * Troubleshooting RDS Proxy
    * Using RDS Proxy with AWS CloudFormation

 * Using Aurora Serverless v2
    * How Aurora Serverless v2 works
    * Requirements for Aurora Serverless v2
    * Getting started with Aurora Serverless v2
    * Creating a cluster for Aurora Serverless v2
    * Managing Aurora Serverless v2
    * Performance and scaling for Aurora Serverless v2

 * Using Aurora Serverless v1
    * How Aurora Serverless v1 works
    * Creating an Aurora Serverless v1 DB cluster
    * Restoring an Aurora Serverless v1 DB cluster
    * Modifying an Aurora Serverless v1 DB cluster
    * Scaling Aurora Serverless v1 DB cluster capacity manually
    * Viewing Aurora Serverless v1 DB clusters
    * Deleting an Aurora Serverless v1 DB cluster
    * Aurora Serverless v1 and Aurora database engine versions

 * Using the Data API
    * Logging Data API calls with AWS CloudTrail

 * Using the query editor
    * DBQMS API reference

 * Best practices with Aurora
 * Performing an Aurora proof of concept
 * Security
    * Database authentication
    * Data protection
       * Data encryption
          * Encrypting Amazon Aurora resources
          * AWS KMS key management
          * Using SSL/TLS to encrypt a connection
          * Rotating your SSL/TLS certificate
      
       * Internetwork traffic privacy
   
    * Identity and access management
       * How Amazon Aurora works with IAM
       * Identity-based policy examples
       * AWS managed policies
       * Policy updates
       * Cross-service confused deputy prevention
       * IAM database authentication
          * Enabling and disabling
          * Creating and using an IAM policy for IAM database access
          * Creating a database account using IAM authentication
          * Connecting to your DB cluster using IAM authentication
             * Connecting using IAM: AWS CLI and mysql client
             * Connecting using IAM authentication from the command line: AWS
               CLI and psql client
             * Connecting using IAM authentication and the AWS SDK for .NET
             * Connecting using IAM authentication and the AWS SDK for Go
             * Connecting using IAM authentication and the AWS SDK for Java
             * Connecting using IAM authentication and the AWS SDK for Python
               (Boto3)
      
       * Troubleshooting
   
    * Logging and monitoring
    * Compliance validation
    * Resilience
    * Infrastructure security
    * VPC endpoints (AWS PrivateLink)
    * Security best practices
    * Controlling access with security groups
    * Master user account privileges
    * Service-linked roles
    * Using Amazon Aurora with Amazon VPC
       * Working with a DB cluster in a VPC
       * Scenarios for accessing a DB cluster in a VPC
       * Tutorial: Create a VPC for use with a DB cluster (IPv4 only)
       * Tutorial: Create a VPC for use with a DB cluster (dual-stack mode)

 * Quotas and constraints
 * Troubleshooting
 * Amazon RDS API reference
    * Using the Query API
    * Troubleshooting applications

 * Document history
 * AWS glossary

Fast failover with Amazon Aurora PostgreSQL - Amazon Aurora
AWSDocumentationAmazon Relational Database Service (RDS)User Guide for Aurora
Setting TCP keepalives parametersConfiguring your application for fast
failoverTesting failoverFast failover example in Java


FAST FAILOVER WITH AMAZON AURORA POSTGRESQL

PDFRSS

Following, you can learn how to make sure that failover occurs as fast as
possible. To recover quickly after failover, you can use cluster cache
management for your Aurora PostgreSQL DB cluster. For more information, see Fast
recovery after failover with cluster cache management for Aurora PostgreSQL.

Some of the steps that you can take to make failover perform fast include the
following:

 * Set Transmission Control Protocol (TCP) keepalives with short time frames, to
   stop longer running queries before the read timeout expires if there's a
   failure.

 * Set timeouts for Java Domain Name System (DNS) caching aggressively. Doing
   this helps ensure the Aurora read-only endpoint can properly cycle through
   read-only nodes on later connection attempts.

 * Set the timeout variables used in the JDBC connection string as low as
   possible. Use separate connection objects for short- and long-running
   queries.

 * Use the read and write Aurora endpoints that are provided to connect to the
   cluster.

 * Use RDS API operations to test application response on server-side failures.
   Also, use a packet dropping tool to test application response for client-side
   failures.

 * Use the AWS JDBC Driver for PostgreSQL (preview) to take full advantage of
   the failover capabilities of Aurora PostgreSQL. For more information about
   the AWS JDBC Driver for PostgreSQL and complete instructions for using it,
   see the AWS JDBC Driver for PostgreSQL GitHub repository.

These are covered in more detail following.

Topics

 * Setting TCP keepalives parameters
 * Configuring your application for fast failover
 * Testing failover
 * Fast failover example in Java


SETTING TCP KEEPALIVES PARAMETERS

When you set up a TCP connection, a set of timers is associated with the
connection. When the keepalive timer reaches zero, a keepalive probe packet is
sent to the connection endpoint. If the probe receives a reply, you can assume
that the connection is still up and running.

Turning on TCP keepalive parameters and setting them aggressively ensures that
if your client can't connect to the database, any active connections are quickly
closed. The application can then connect to a new endpoint.

Make sure to set the following TCP keepalive parameters:

 * tcp_keepalive_time controls the time, in seconds, after which a keepalive
   packet is sent when no data has been sent by the socket. ACKs aren't
   considered data. We recommend the following setting:
   
   tcp_keepalive_time = 1

 * tcp_keepalive_intvl controls the time, in seconds, between sending subsequent
   keepalive packets after the initial packet is sent. Set this time by using
   the tcp_keepalive_time parameter. We recommend the following setting:
   
   tcp_keepalive_intvl = 1

 * tcp_keepalive_probes is the number of unacknowledged keepalive probes that
   occur before the application is notified. We recommend the following setting:
   
   tcp_keepalive_probes = 5

These settings should notify the application within five seconds when the
database stops responding. If keepalive packets are often dropped within the
application's network, you can set a higher tcp_keepalive_probes value. Doing
this allows for more buffer in less reliable networks, although it increases the
time that it takes to detect an actual failure.

To set TCP keepalive parameters on Linux

 1. Test how to configure your TCP keepalive parameters.
    
    We recommend doing so by using the command line with the following commands.
    This suggested configuration is system-wide. In other words, it also affects
    all other applications that create sockets with the SO_KEEPALIVE option on.
    
    sudo sysctl net.ipv4.tcp_keepalive_time=1
    sudo sysctl net.ipv4.tcp_keepalive_intvl=1
    sudo sysctl net.ipv4.tcp_keepalive_probes=5

 2. After you've found a configuration that works for your application, persist
    these settings by adding the following lines to /etc/sysctl.conf, including
    any changes you made:
    
    tcp_keepalive_time = 1
    tcp_keepalive_intvl = 1
    tcp_keepalive_probes = 5


CONFIGURING YOUR APPLICATION FOR FAST FAILOVER

Following, you can find a discussion of several configuration changes for Aurora
PostgreSQL that you can make for fast failover. To learn more about PostgreSQL
JDBC driver setup and configuration, see the PostgreSQL JDBC Driver
documentation.

Topics

 * Reducing DNS cache timeouts
 * Setting an Aurora PostgreSQL connection string for fast failover
 * Other options for obtaining the host string


REDUCING DNS CACHE TIMEOUTS

When your application tries to establish a connection after a failover, the new
Aurora PostgreSQL writer will be a previous reader. You can find it by using the
Aurora read-only endpoint before DNS updates have fully propagated. Setting the
java DNS time to live (TTL) to a low value, such as under 30 seconds, helps
cycle between reader nodes on later connection attempts.

// Sets internal TTL to match the Aurora RO Endpoint TTL
java.security.Security.setProperty("networkaddress.cache.ttl" , "1");
// If the lookup fails, default to something like small to retry
java.security.Security.setProperty("networkaddress.cache.negative.ttl" , "3");


SETTING AN AURORA POSTGRESQL CONNECTION STRING FOR FAST FAILOVER

To use Aurora PostgreSQL fast failover, make sure that your application's
connection string has a list of hosts instead of just a single host. Following
is an example connection string that you can use to connect to an Aurora
PostgreSQL cluster. In this example, the hosts are in bold.

jdbc:postgresql://myauroracluster.cluster-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432,
myauroracluster.cluster-ro-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432
/postgres?user=<primaryuser>&password=<primarypw>&loginTimeout=2
&connectTimeout=2&cancelSignalTimeout=2&socketTimeout=60
&tcpKeepAlive=true&targetServerType=primary

For best availability and to avoid a dependency on the RDS API, we recommend
that you maintain a file to connect with. This file contains a host string that
your application reads from when you establish a connection to the database.
This host string has all the Aurora endpoints available for the cluster. For
more information about Aurora endpoints, see Amazon Aurora connection
management.

For example, you might store your endpoints in a local file as shown following.

myauroracluster.cluster-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432,
myauroracluster.cluster-ro-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432

Your application reads from this file to populate the host section of the JDBC
connection string. Renaming the DB cluster causes these endpoints to change.
Make sure that your application handles this event if it occurs.

Another option is to use a list of DB instance nodes, as follows.

my-node1.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com:5432,
my-node2.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com:5432,
my-node3.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com:5432,
my-node4.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com:5432

The benefit of this approach is that the PostgreSQL JDBC connection driver loops
through all nodes on this list to find a valid connection. In contrast, when you
use the Aurora endpoints only two nodes are tried in each connection attempt.
However, there's a downside to using DB instance nodes. If you add or remove
nodes from your cluster and the list of instance endpoints becomes stale, the
connection driver might never find the correct host to connect to.

To help ensure that your application doesn't wait too long to connect to any one
host, set the following parameters aggressively:

 * targetServerType – Controls whether the driver connects to a write or read
   node. To ensure that your applications reconnect only to a write node, set
   the targetServerType value to primary.
   
   Values for the targetServerType parameter include primary, secondary, any,
   and preferSecondary. The preferSecondary value attempts to establish a
   connection to a reader first. It connects to the writer if no reader
   connection can be established.

 * loginTimeout – Controls how long your application waits to log in to the
   database after a socket connection has been established.

 * connectTimeout – Controls how long the socket waits to establish a connection
   to the database.

You can modify other application parameters to speed up the connection process,
depending on how aggressive you want your application to be:

 * cancelSignalTimeout – In some applications, you might want to send a "best
   effort" cancel signal on a query that has timed out. If this cancel signal is
   in your failover path, consider setting it aggressively to avoid sending this
   signal to a dead host.

 * socketTimeout – This parameter controls how long the socket waits for read
   operations. This parameter can be used as a global "query timeout" to ensure
   no query waits longer than this value. A good practice is to have two
   connection handlers. One connection handler runs short-lived queries and sets
   this value lower. Another connection handler, for long-running queries, has
   this value set much higher. With this approach, you can rely on TCP keepalive
   parameters to stop long-running queries if the server goes down.

 * tcpKeepAlive – Turn on this parameter to ensure the TCP keepalive parameters
   that you set are respected.

 * loadBalanceHosts – When set to true, this parameter has the application
   connect to a random host chosen from a list of candidate hosts.


OTHER OPTIONS FOR OBTAINING THE HOST STRING

You can get the host string from several sources, including the
aurora_replica_status function and by using the Amazon RDS API.

In many cases, you need to determine who the writer of the cluster is or to find
other reader nodes in the cluster. To do this, your application can connect to
any DB instance in the DB cluster and query the aurora_replica_status function.
You can use this function to reduce the amount of time it takes to find a host
to connect to. However, in certain network failure scenarios the
aurora_replica_status function might show out-of-date or incomplete information.

A good way to ensure that your application can find a node to connect to is to
try to connect to the cluster writer endpoint and then the cluster reader
endpoint. You do this until you can establish a readable connection. These
endpoints don't change unless you rename your DB cluster. Thus, you can
generally leave them as static members of your application or store them in a
resource file that your application reads from.

After you establish a connection using one of these endpoints, you can get
information about the rest of the cluster. To do this, call the
aurora_replica_status function. For example, the following command retrieves
information with aurora_replica_status.

postgres=> SELECT server_id, session_id, highest_lsn_rcvd, cur_replay_latency_in_usec, now(), last_update_timestamp
FROM aurora_replica_status();

server_id | session_id | highest_lsn_rcvd | cur_replay_latency_in_usec | now | last_update_timestamp
-----------+--------------------------------------+------------------+----------------------------+-------------------------------+------------------------
mynode-1 | 3e3c5044-02e2-11e7-b70d-95172646d6ca | 594221001 | 201421 | 2017-03-07 19:50:24.695322+00 | 2017-03-07 19:50:23+00
mynode-2 | 1efdd188-02e4-11e7-becd-f12d7c88a28a | 594221001 | 201350 | 2017-03-07 19:50:24.695322+00 | 2017-03-07 19:50:23+00
mynode-3 | MASTER_SESSION_ID | | | 2017-03-07 19:50:24.695322+00 | 2017-03-07 19:50:23+00
(3 rows)


For example, the hosts section of your connection string might start with both
the writer and reader cluster endpoints, as shown following.

myauroracluster.cluster-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432,
myauroracluster.cluster-ro-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432

In this scenario, your application attempts to establish a connection to any
node type, primary or secondary. When your application is connected, a good
practice is to first examine the read/write status of the node. To do this,
query for the result of the command SHOW transaction_read_only.

If the return value of the query is OFF, then you successfully connected to the
primary node. However, suppose that the return value is ON and your application
requires a read/write connection. In this case, you can call the
aurora_replica_status function to determine the server_id that has
session_id='MASTER_SESSION_ID'. This function gives you the name of the primary
node. You can use this with the endpointPostfix described following.

Make sure that you're aware when you connect to a replica that has stale data.
When this happens, the aurora_replica_status function might show out-of-date
information. You can set a threshold for staleness at the application level. To
check this, you can look at the difference between the server time and the
last_update_timestamp value. In general, your application should avoid flipping
between two hosts due to conflicting information returned by the
aurora_replica_status function. Your application should try all known hosts
first instead of following the data returned by aurora_replica_status.

LISTING INSTANCES USING THE DESCRIBEDBCLUSTERS API OPERATION, EXAMPLE IN JAVA

You can programmatically find the list of instances by using the AWS SDK for
Java, specifically the DescribeDBClusters API operation.

Following is a small example of how you might do this in Java 8.

AmazonRDS client = AmazonRDSClientBuilder.defaultClient();
DescribeDBClustersRequest request = new DescribeDBClustersRequest()
   .withDBClusterIdentifier(clusterName);
DescribeDBClustersResult result = 
rdsClient.describeDBClusters(request);

DBCluster singleClusterResult = result.getDBClusters().get(0);

String pgJDBCEndpointStr = 
singleClusterResult.getDBClusterMembers().stream()
   .sorted(Comparator.comparing(DBClusterMember::getIsClusterWriter)
   .reversed()) // This puts the writer at the front of the list
   .map(m -> m.getDBInstanceIdentifier() + endpointPostfix + ":" + singleClusterResult.getPort()))
   .collect(Collectors.joining(","));


Here, pgJDBCEndpointStr contains a formatted list of endpoints, as shown
following.

my-node1.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com:5432,
my-node2.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com:5432

The variable endpointPostfix can be a constant that your application sets. Or
your application can get it by querying the DescribeDBInstances API operation
for a single instance in your cluster. This value remains constant within an AWS
Region and for an individual customer. So it saves an API call to simply keep
this constant in a resource file that your application reads from. In the
example preceding, it's set to the following.

.cksc6xlmwcyw.us-east-1-beta.rds.amazonaws.com

For availability purposes, a good practice is to default to using the Aurora
endpoints of your DB cluster if the API isn't responding or takes too long to
respond. The endpoints are guaranteed to be up to date within the time it takes
to update the DNS record. Updating the DNS record with an endpoint typically
takes less than 30 seconds. You can store the endpoint in a resource file that
your application consumes.


TESTING FAILOVER

In all cases you must have a DB cluster with two or more DB instances in it.

From the server side, certain API operations can cause an outage that can be
used to test how your applications responds:

 * FailoverDBCluster – This operation attempts to promote a new DB instance in
   your DB cluster to writer.
   
   The following code example shows how you can use failoverDBCluster to cause
   an outage. For more details about setting up an Amazon RDS client, see Using
   the AWS SDK for Java.
   
   public void causeFailover() {
       
       final AmazonRDS rdsClient = AmazonRDSClientBuilder.defaultClient();
      
       FailoverDBClusterRequest request = new FailoverDBClusterRequest();
       request.setDBClusterIdentifier("cluster-identifier");
   
       rdsClient.failoverDBCluster(request);
   }
   

 * RebootDBInstance – Failover isn't guaranteed with this API operation. It
   shuts down the database on the writer, however. You can use it to test how
   your application responds to connections dropping. The ForceFailover
   parameter doesn't apply for Aurora engines. Instead, use the
   FailoverDBCluster API operation.

 * ModifyDBCluster – Modifying the Port parameter causes an outage when the
   nodes in the cluster begin listening on a new port. In general, your
   application can respond to this failure first by ensuring that only your
   application controls port changes. Also, ensure that it can appropriately
   update the endpoints it depends on. You can do this by having someone
   manually update the port when they make modifications at the API level. Or
   you can do this by using the RDS API in your application to determine if the
   port has changed.

 * ModifyDBInstance – Modifying the DBInstanceClass parameter causes an outage.

 * DeleteDBInstance – Deleting the primary (writer) causes a new DB instance to
   be promoted to writer in your DB cluster.

From the application or client side, if you use Linux, you can test how the
application responds to sudden packet drops. You can do this based on whether
port, host, or if TCP keepalive packets are sent or received by using the
iptables command.


FAST FAILOVER EXAMPLE IN JAVA

The following code example shows how an application might set up an Aurora
PostgreSQL driver manager.

The application calls the getConnection function when it needs a connection. A
call to getConnection can fail to find a valid host. An example is when no
writer is found but the targetServerType parameter is set to primary. In this
case, the calling application should simply retry calling the function.

To avoid pushing the retry behavior onto the application, you can wrap this
retry call into a connection pooler. With most connection poolers, you can
specify a JDBC connection string. So your application can call into
getJdbcConnectionString and pass that to the connection pooler. Doing this means
you can use faster failover with Aurora PostgreSQL.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

import org.joda.time.Duration;

public class FastFailoverDriverManager {
   private static Duration LOGIN_TIMEOUT = Duration.standardSeconds(2);
   private static Duration CONNECT_TIMEOUT = Duration.standardSeconds(2);
   private static Duration CANCEL_SIGNAL_TIMEOUT = Duration.standardSeconds(1);
   private static Duration DEFAULT_SOCKET_TIMEOUT = Duration.standardSeconds(5);

   public FastFailoverDriverManager() {
       try {
            Class.forName("org.postgresql.Driver");
       } catch (ClassNotFoundException e) {
            e.printStackTrace();
       }

       /*
         * RO endpoint has a TTL of 1s, we should honor that here. Setting this aggressively makes sure that when
         * the PG JDBC driver creates a new connection, it will resolve a new different RO endpoint on subsequent attempts
         * (assuming there is > 1 read node in your cluster)
         */
        java.security.Security.setProperty("networkaddress.cache.ttl" , "1");
       // If the lookup fails, default to something like small to retry
       java.security.Security.setProperty("networkaddress.cache.negative.ttl" , "3");
   }

   public Connection getConnection(String targetServerType) throws SQLException {
       return getConnection(targetServerType, DEFAULT_SOCKET_TIMEOUT);
   }

   public Connection getConnection(String targetServerType, Duration queryTimeout) throws SQLException {
        Connection conn = DriverManager.getConnection(getJdbcConnectionString(targetServerType, queryTimeout));

       /*
         * A good practice is to set socket and statement timeout to be the same thing since both 
         * the client AND server will stop the query at the same time, leaving no running queries 
         * on the backend
         */
        Statement st = conn.createStatement();
        st.execute("set statement_timeout to " + queryTimeout.getMillis());
        st.close();

       return conn;
   }

   private static String urlFormat = "jdbc:postgresql://%s"
           + "/postgres"
           + "?user=%s"
           + "&password=%s"
           + "&loginTimeout=%d"
           + "&connectTimeout=%d"
           + "&cancelSignalTimeout=%d"
           + "&socketTimeout=%d"
           + "&targetServerType=%s"
           + "&tcpKeepAlive=true"
           + "&ssl=true"
           + "&loadBalanceHosts=true";
   public String getJdbcConnectionString(String targetServerType, Duration queryTimeout) {
       return String.format(urlFormat, 
                getFormattedEndpointList(getLocalEndpointList()),
                CredentialManager.getUsername(),
                CredentialManager.getPassword(),
                LOGIN_TIMEOUT.getStandardSeconds(),
                CONNECT_TIMEOUT.getStandardSeconds(),
                CANCEL_SIGNAL_TIMEOUT.getStandardSeconds(),
                queryTimeout.getStandardSeconds(),
                targetServerType
       );
   }

   private List<String> getLocalEndpointList() {
       /*
         * As mentioned in the best practices doc, a good idea is to read a local resource file and parse the cluster endpoints. 
         * For illustration purposes, the endpoint list is hardcoded here
         */
        List<String> newEndpointList = new ArrayList<>();
        newEndpointList.add("myauroracluster.cluster-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432");
        newEndpointList.add("myauroracluster.cluster-ro-c9bfei4hjlrd.us-east-1-beta.rds.amazonaws.com:5432");

       return newEndpointList;
   }

   private static String getFormattedEndpointList(List<String> endpoints) {
       return IntStream.range(0, endpoints.size())
               .mapToObj(i -> endpoints.get(i).toString())
               .collect(Collectors.joining(","));
   }
}        

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please
refer to your browser's Help pages for instructions.

Document Conventions
Best practices with Aurora PostgreSQL
Fast recovery after failover
Did this page help you? - Yes

Thanks for letting us know we're doing a good job!

If you've got a moment, please tell us what we did right so we can do more of
it.



Did this page help you? - No

Thanks for letting us know this page needs work. We're sorry we let you down.

If you've got a moment, please tell us how we can make the documentation better.




Did this page help you?
Yes
No
Provide feedback
Edit this page on GitHub 
Next topic:Fast recovery after failover
Previous topic:Best practices with Aurora PostgreSQL
Need help?
 * Try AWS re:Post 
 * Connect with an AWS IQ expert 

PrivacySite termsCookie preferences
© 2022, Amazon Web Services, Inc. or its affiliates. All rights reserved.
On this page

--------------------------------------------------------------------------------

 * Setting TCP keepalives parameters
 * Configuring your application for fast failover
 * Testing failover
 * Fast failover example in Java





DID THIS PAGE HELP YOU? - NO



Thanks for letting us know this page needs work. We're sorry we let you down.

If you've got a moment, please tell us how we can make the documentation better.



Feedback