Kafka __consumer_offsets compaction not happening

Hi all,
I am using kafka 2.0.0 and I have noticed that there are some partitions of the __consumer_offsets topic that are 500-700 GB with more than 5000-7000 segments. These segments are older than 2-3 months.

There aren’t errors in the logs and that topic is COMPACT as default.
What could be the problem?
What checks could I do?

My settings:
log.cleaner.enable=true
log.cleanup.policy = [delete]
log.retention.bytes = -1
log.segment.bytes = 268435456
log.retention.hours = 72
<http://log.retention.check.interval.ms|log.retention.check.interval.ms> = 300000
...
offsets.commit.required.acks = -1
<http://offsets.commit.timeout.ms|offsets.commit.timeout.ms> = 5000
offsets.load.buffer.size = 5242880
<http://offsets.retention.check.interval.ms|offsets.retention.check.interval.ms> = 600000
offsets.retention.minutes = 10080
offsets.topic.compression.codec = 0
offsets.topic.num.partitions = 50
offsets.topic.replication.factor = 3
offsets.topic.segment.bytes = 104857600

Maybe you have a few consumer groups that commit way too often. I’ve seen single groups that commit offsets 50k / second which is obviously pointless.

How many consumer groups do you have in your cluster?

Since you already checked the number of segments it is indeed surprising that compaction does not seem to happen. I could only think of the dirty ratio which would cause that. Have you checked the broker logs?

I have only 39 consumer groups, there aren’t errors in the logs and the dirty ratio is 0.5 ( as default ).
What other checks could I do?
We are really worried because the topic keep growing.
Thanks :slightly_smiling_face:

Do you happen to know how many offsets are committed ? Just check the message in rate of that topic

There are 2998905656 committed offsets.

But what’s the rate of incoming commits? I mean in the end I can’t answer you why the compaction does not work, I could only assume that the dirty ratio is the reason (there is a JMX metric to check the dirty ratio I think).

Personally I’d never mess with the topic configuration for the consumer offsets topic

The dirty ratio is 0.5.

There are 7 brokers, the rate is:
• Broker 1: 80-100 per second
• Brokers 2-7: 4000-6000 per second

Sounds like way too many consumer group offset commits to me. You can identify these groups using https://github.com/cloudhut/kminion

Set scrapeMode to offsetsTopic, see: https://github.com/cloudhut/kminion/blob/master/docs/reference-config.yaml#L53-L61

This exporter can print the number of offset commits by consumer group id, then you probably know what consumergroups commit so often

Is there not another more simple way to know what are these consumer groups from command line ?

You can list all consumer groups but the admin api does not expose how many offsets are committed per group

I don’t have the permissions to install that exporter.
Can the bad consumers cause these problems? If yes, how can I check?