How does RDS determine when a “primary” instance fails over into a “standby” instance?

keithB · March 24, 2023, 5:37pm

I’m trying to understand how RDS determines when a “primary” instance fails over into a “standby” instance. The FAQ reads (emphasis mine):
> When you run a DB instance as a Multi-AZ deployment, the “primary” serves database writes and reads. In addition, Amazon RDS provisions and maintains a “standby” behind the scenes, which is an up-to-date replica of the primary. The standby is “promoted” in failover scenarios. After failover, the standby becomes the primary and accepts your database operations. You do not interact directly with the standby (e.g. for read operations) at any point prior to promotion.
Does anyone have any material that describes how exactly a failure of the primary is detected? I’ve tried searching online, but I only get high level marketing material from Amazon in my search results.

krista · March 24, 2023, 6:54pm

Disclaimer: I work for AWS, but not for RDS.

It’s unlikely that you will get such material. The mechanisms for failover detection and triggering are likely always being improved and I can imagine that it’s not something that team would want to take a firm public stance on, lest they want to improve it in the future.