AWS RDS Proxy for Aurora MySQL cluster connection issue

williamTiller · August 6, 2024, 1:27am

Hi all, I’m having a weird issue using AWS RDS proxy with an Aurora MySQL cluster, where, if I check the status of the targets, I’m getting an AVAILABLE state in of of the instances but the other is unhealthy, this is the output of aws rds describe-db-proxy-targets

    "Targets": [
        {
            "Endpoint": "endpoint-instance-0.region.rds.amazonaws.com",
            "TrackedClusterId": "db-cluster",
            "RdsResourceId": "db-instance-0",
            "Port": 3306,
            "Type": "RDS_INSTANCE",
            "Role": "READ_WRITE",
            "TargetHealth": {
                "State": "AVAILABLE"
            }
        },
        {
            "Endpoint": "endpoint-instance-1.region.rds.amazonaws.com",
            "TrackedClusterId": "db-cluster",
            "RdsResourceId": "db-instance-1",
            "Port": 3306,
            "Type": "RDS_INSTANCE",
            "Role": "UNKNOWN",
            "TargetHealth": {
                "State": "endpoint-instance-0.region.rds.amazonaws.com",
                "Reason": "UNREACHABLE",
                "Description": "Timeout connecting to the database"
            }
        },
        {
            "RdsResourceId": "db-cluster",
            "Port": 3306,
            "Type": "TRACKED_CLUSTER"
        }
    ]
}```


And this is what I see of the instance logs:



```[Server] Access denied for user 'rdsadmin'@'localhost' (using password: YES) (sql_authentication.cc:1412)```

pOdom · August 6, 2024, 1:31am

I mean, trust the error message, bad password - do you have automated password rotation via secrets manager?

pOdom · August 6, 2024, 1:31am

Tho the one error says timeout

williamTiller · August 6, 2024, 1:36am

I don’t use secrets manager, the password is self managed and I’ve created the cluster and instances via terraform. I can also connect to the instances and the cluster endpoints with dbeaver

pOdom · August 6, 2024, 2:28am

So you can connect to all nodes in the cluster?

williamTiller · August 6, 2024, 3:26am

No, individually with the master password. Connecting to the proxy also works but it is probably using the available RW endpoint

pOdom · August 6, 2024, 3:37am

Gotcha, you can’t go through to some specific endpoint?

pOdom · August 6, 2024, 4:37am

I assume if all the instances are good it’s a proxy misconfiguration somehow, like it’s trying to route to an old instance that was rebuilt or something

pOdom · August 6, 2024, 5:08am

Or at least the health check is doing something different than you are when you can connect to the instance successfully

pOdom · August 6, 2024, 5:18am

That’s the thread I would pull on but unfortunately I don’t have like an aha here’s the problem

pOdom · August 6, 2024, 6:08am

Are you logging in with the same account as the proxy health check?

pOdom · August 6, 2024, 6:08am

The fact that you got a bad password error makes me think that something changed

williamTiller · August 6, 2024, 6:36am

Yes, same account, the health check gets the credentials from a secret but that secret is replicating the master pwd

pOdom · August 6, 2024, 7:28am

Can you just look at the values and make sure that they’re right

pOdom · August 6, 2024, 8:06am

Or make sure that they match I guess

williamTiller · August 6, 2024, 8:20am

I believe something is wrong with the instance but can’t pinpoint it… the bad instance is on us-east-1c and the good one on us-east-1a

pOdom · August 6, 2024, 8:54am

You can take drastic measures but I would do all of the reasonable sanity checks first

williamTiller · August 6, 2024, 9:31am

thinking my next step should be having them both on the same AZ, as I’m not going to need multi AZ at this point, but I wonder why because multi AZ is supported

pOdom · August 6, 2024, 9:51am

That seems even more drastic than I was thinking lol

pOdom · August 6, 2024, 10:39am

You have some threads to pull on, you’re not in outage