Scaling IPv4 Addresses for EKS Cluster VPC

Hi friends, we have a VPC using Ipv4 only, this vpc is used by our EKS cluster. We worry that we may not be able to scale the number of nodes due to the limited number of available IPv4 addresses left on our subnets. What should we do in this case?

Our VPC is setup using terraform

  source  = "terraform-aws-modules/vpc/aws"
  version = "2.78.0"

  name                 = local.vpc_name
  cidr                 = "10.0.0.0/16"
  azs                  = data.aws_availability_zones.available.names
  private_subnets      = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets       = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
  enable_nat_gateway   = true
  single_nat_gateway   = true
  enable_dns_hostnames = true

  tags = {
    Terraform                                 = "true"
    Environment                               = local.workspace
  }
}```

You have a big enough VPC CIDR. Can you create more private subnets (much larger CIDRs), then create a new EKS node group in those new subnets, then migrate all workloads to the new node group, then destroy the old?

Is there anything we can do without having to migrate our workload. I remember reading something about adding a secondary cidr block to increase the number of ip addresses. But im honestly i am not that well read on aws vpc/subnets networking in general. So i don’t know what that looks like.

Naive question but is it possible to increase the available ip addresses on the existing subnets we have? Or is it always going to come down to us having to create more subnets?

aside: Since we initially bootstrapped our vpc using the terraform-aws-vpc module. I have noticed there are a few new params not only public_subnets private_subnets but now also database_subnets elasticache_subnets intra_subnets . We have aws rds and elasticache also deployed on our existing vpc so those params looks interesting to us. (not sure what intra_subnets are used for)

You can add add CIDR to a VPC, but you can’t re-size a subnet

I see so when you introduce a secondary cidr you would also introduce new subnets? and you are saying u have a big enough VPC CIDR. so my existing cidr already supports creating many more subnets?

Yes, so new VPC CIDR gives you more address space to add subnets. Your CIDR is a /16, there is still plenty of space to add larger subnets.

And now you’re in the world of using a Terraform module, and now you need it do to something else.

My gut reaction is:

  1. Move all workloads off subnet 10.0.3.0/24 (this might mean a new EKS node group specifying just the 10.0.1.0/24 and 10.0.2.0/24 subnets). This includes non-EKS stuff, like EC2
  2. Update private_subnets in Terraform from 10.0.3.0/24 to something much bigger. Terraform will destroy 10.0.3.0/24 and then replace with the new subnet.
  3. Repeat for other subnets.
    It’s a bit of a pain. The alternative is messing around with terraform state removal/imports and making the subnet changes in AWS console.

Ok i can build a dev environment to test these changes

Update private_subnets in Terraform from 10.0.3.0/24 to something much bigger. I guess im unsure what this looks like. Is there some tool that generate the value for me?

if i understand correctly youre suggesting keeping 3 public and 3 private subnets, but instead make each one “larger” so they each have more available ip addresses to begin with.?

Sounds sensible, yes :+1:

So in the end it would come down to migrating our workload and in that case it then might make sense to create a second VPC?. We could use VPC peering and migrate our workload over to the new VPC piece by piece.

The reason i sugguest that is becuase migrating our workload from one subnet to the other subnet may not be possible if all subnets only have a few ip addresses remaining. For example 1 node group may not fit all on the other subnet

ALso some of our eks workload have ebs volumes attached and those ebs volumes are region/az specific (us-east-2c)

That could work - would need to be sure you have connections between VPCs if needed for app-to-app traffic.