High network thread usage on Kafka cluster's first broker

rosa · August 7, 2024, 1:16am

Hey, we have a number of Kafka cluster where we consistently have “more than average” cpu usage on the first broker. Looking deeper at this: Despite not having more partitions or more messages, we have a high network thread usage. I’m assuming this is mainly related to bootstrapping targeting the first broker in the list first from all of our clients (we have many!). We’ve eliminated most of the usecases of clients doing excessive bootstrapping but overall there are simply too many clients to make sure everyone is integrating optimally - we need to accept some overhead.

I’m thinking about ideas on what could be done with this. My thoughts:

Actually make sure the most busy clients have “random” order of bootstrap brokers
a. Talking to ~10 teams would likely make the spread “random”
Revisit our cruise control setup to ensure first brokers will have less load
a. if so: how?
Actually run larger instance type for first broker to cater for the load
a. Seems like bad practice but also doesn’t make sense to scale up all brokers given the load spread
There’s a risk I am wrong and it’s not bootstrapping - any other ideas on reason for "first" broker to be hit harder consistently?

kenS · August 7, 2024, 1:59am

we have the same problem, the brokers used for bootstraping take ~15-20% resources more then the rest of the brokers. we spent some time investigating this and indeed the extra load is due to extra load from bootstraping.
looking back on the initial deployment, might have been a good ideea to add all the brokers in the bootstrap list, we use 1 per /dc/az - 4 used in total.

rosa · August 7, 2024, 1:59am

Thanks for sharing! For our larger clusters we have 6 bootstrap brokers but in reality most clients hit the first one in the list and then move to the other ones only if needed.

tScavuzzo · August 7, 2024, 1:59am

/
http://1.To|1.To mitigate the load on the first broker, ensure that the bootstrap servers list in the client configurations includes multiple brokers.
2.Consider setting up a load balancer in front of your Kafka brokers to distribute incoming client connections more evenly.

rosa · August 7, 2024, 2:29am

From my experience this does not help - they still prioritize the first. Might depend on clients.
LB might be worth looking in to, thanks!