CloudFormation race condition: ECS service attaching to LoadBalancer with TargetGroup creation issue

ernestB · December 30, 2023, 2:29am

Anyone have had came across annoying race condition within CloudFormation when you are attaching ECS::Service to LoadBalancer, where TargetGroup is being created in the same Stack, and ListenerRule as well ?

Despite adding TargetGroup to DependsOn list to ECS::Service, it produces: The target group with targetGroupArn does not have an associated load balancer.

Any tricks on this or I really need yet another Lambda for this ?

charlesH · December 30, 2023, 2:45am

Use ListenerRule in DependsOn list.

ernestB · December 30, 2023, 3:31am

Well, that’s the problem, ListenerRule is present already on ECS::Service DependsOn list

ernestB · December 30, 2023, 3:33am

which why this drives me crazy

ernestB · December 30, 2023, 3:35am

from crhelper import CfnResource
import logging as log
import os
import time


LOG_LEVEL = os.getenv("LOG_LEVEL", "DEBUG")
client = boto3.client('elbv2')

helper = CfnResource(
    json_logging=False, log_level="DEBUG", boto_level="CRITICAL", sleep_on_delete=120
)

@helper.create
@helper.update
def wait_attachment(event, context):
    properties = event["ResourceProperties"]
    target_group_arn = properties["TargetGroupArn"]
    
    while True:
        time.sleep(5)
        response = client.describe_target_health(
            TargetGroupArn=target_group_arn
        )
        
        for group in response["TargetGroups"]:
            if group["TargetGroupArn"] == target_group_arn:
                if len(group["LoadBalancerArns"]) > 0:
                    break
        
def lambda_handler(event, context):
    log.debug(f"EVENT: {event}")
    helper(event, context)```

ernestB · December 30, 2023, 3:59am

baked quick lambda, let’s see if it helps

charlesH · December 30, 2023, 4:54am

Hmm, it should work. I used ListenerRule (only) with DependsOn for multiple services.

ernestB · December 30, 2023, 5:21am

Well, further testing makes this even more bizzare - I have multiple ECS::Service in single Stack, each having its own ListenerRule, and only on of them is failing

charlesH · December 30, 2023, 5:45am

Target group is already a dependency for ListenerRule to be created, so it doesn’t need to be listed for service.

ernestB · December 30, 2023, 5:58am

That’s my expectation as well, and not the first time I am dealing with ECS::Service + ELB, but seems that there is room for race condition anyway

ernestB · December 30, 2023, 6:46am

let’s see does Lambda help here

ernestB · December 30, 2023, 7:06am

if it does, then it’s clearly to be reported as bug

ernestB · December 30, 2023, 7:14am

        Type: AWS::ECS::Service
        DependsOn:
            -   PublicListener
            -   AttachRegionalCertificate
            -   AlertmanagerByHostname
            -   PublicTargetGroupAlertmanager
        Properties:
            Cluster: !Ref ServiceCluster
            DesiredCount: 1
            LaunchType: FARGATE
            NetworkConfiguration:
                AwsvpcConfiguration:
                    AssignPublicIp: DISABLED
                    SecurityGroups:
                        - !Ref AlertmanagerSecurityGroup
                        - !Ref EFSSecurityGroup
                    Subnets:
                        - !ImportValue RegionalPrivateSubnet0
            TaskDefinition: !Ref AlertmanagerTaskDefinition
            DeploymentConfiguration:
                MaximumPercent: 200
                MinimumHealthyPercent: 100
            SchedulingStrategy: REPLICA
            EnableExecuteCommand: true
            Tags:
                -   Key: Name
                    Value: observability-public-alertmanager
            LoadBalancers:
                -   ContainerName: alertmanager
                    ContainerPort: 9093
                    TargetGroupArn: !Ref PublicTargetGroupAlertmanager
            ServiceRegistries:
                -   RegistryArn: !GetAtt CloudmapServicealertmanager.Arn
                    ContainerName: alertmanager
                    ContainerPort: 9093

    PublicTargetGroupAlertmanager:
        Type: AWS::ElasticLoadBalancingV2::TargetGroup
        Properties:
            Name: alertmanager-public
            Port: 9093
            Protocol: HTTP
            VpcId: !ImportValue RegionalVPC
            TargetType: ip
            HealthCheckIntervalSeconds: 10
            HealthCheckPath: /metrics
            HealthCheckProtocol: HTTP
            HealthCheckTimeoutSeconds: 5
            HealthyThresholdCount: 2
            Tags:
                -   Key: Name
                    Value: observability-public-alertmanager

    AlertmanagerByHostname:
        Type: AWS::ElasticLoadBalancingV2::ListenerRule
        Properties:
            Actions:
                -   Type: forward
                    TargetGroupArn: !Ref PublicTargetGroupVMAgent
            Conditions:
                -   Field: host-header
                    HostHeaderConfig:
                        Values:
                            - !Sub "alerts.${PublicZoneName}"
                            - !Sub "alerts.${AWS::Region}.${PublicZoneName}"
            ListenerArn: !Ref PublicListener
            Priority: 3```

charlesH · December 30, 2023, 7:17am

Alternatively you can try to use ListenerRule as dependency for TaskDefinition, to give it a bit more time before it tries to create the service. :shrug:

ernestB · December 30, 2023, 8:16am

We think very alike, as that I tried already

ernestB · December 30, 2023, 9:07am

Oh, I know why I am getting this race condition and it’s stupid reason and clearly race condition

ernestB · December 30, 2023, 9:44am

Here, I have ECS::Service already defined/deployed, and I am extending it via Update with LoadBalancer association, whilst rest of the services when I was deploying, I was deploying ECS::Service with Loadbalancer assoc. via Create, aka ECS::Service was to be created as well, here only Updated.

ernestB · December 30, 2023, 9:58am

So, attaching loadbalancer to existing ECS::Service will throw you given alert

ernestB · December 30, 2023, 10:15am

CF bug reported: https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/1868