Hi everyone,
My team and I provide managed Kafka clusters for our organization, and we’ve encountered an issue where clients sometimes leave producers running on non-production clusters, causing the Kafka-data disk to run out of storage. This has led to brokers failing due to the lack of available disk space. Our current solution involves increasing the data disk size and then deleting messages and decreasing the retention period to free up space.
We’re looking for a “read-only” solution to prevent this. Specifically, we want to disable producing to the cluster once the VM’s disk usage reaches a certain threshold (e.g., 80%), while still allowing consuming. Our proposed solution is to increase the min.insync.replicas
above the maximum broker count. This would prevent producers from pushing to the cluster since the required number of ISRs wouldn’t be met, but consuming would still work. We plan to dynamically implement this change via Kafka brokers, and when the disk usage decreases to a certain level (e.g., 70%), we will disable the read-only mode and revert the min.insync.replicas
to normal value.
We understand that this solution may not be considered best practice, as it relies on broker configurations and could potentially be overridden by topic-level configurations. However, it’s the best solution we’ve found so far.
Does anyone have thoughts on this approach or suggestions for a better solution?
Thanks in advance for your input!