Ensuring consistent Kafka environments

Question for you folks. For those that have an automated environment, how do you ensure you have consistent environments? On release, do you tear down the Kafka cluster and recreate it entirely from code stored in version control, or do you ‘upgrade’ the Kafka cluster in place using a script?

So what about the application specific configurations: topics with a variety of configurations, ACLs/RBAC, etc.
Say you release a new version of your application that needs 2 new topics and has a ksql job that requires various permissions.

That stuff ive seen people script and use ansible. More of the stateful side of Kafka, definitely the harder piece which IaC (at least in AWS) usually doesnt extend to.

I’d suggest taking a look at: https://github.com/kafka-ops/julie

As well as: https://github.com/simplesteph/kafka-security-manager

, we manage topics, ACLs, and connectors in Terraform, so pretty much everything is in code.
Data is persisted across the restarts, so there’s no need to re-create the topics after the broker got restarted, if that’s what you were getting at.

For the env, we run in k8s and tear them down one by one. We wait until each broker stabilizes (as well as the URPs) and then move onto the next broker until the whole cluster stabilizes. For the ACLs, we use kafka-security-manager. Our topics and consumer groups are not in code yet.

I’ve keeping an eye on this project for a bit: https://github.com/devshawn/kafka-gitops

But, I am also interested in this recent confluent article on the topic: https://www.confluent.io/blog/devops-for-apache-kafka-with-kubernetes-and-gitops/

The streaming ops project is written by a colleague of mine, as is Julie. Both are pretty cool.
I’ve been wondering whether there is a way to help users externalize configuration, particularly from confluent cloud. Julie helps with this to some degree and we had some internal discussions they didn’t land anywhere concrete. From your perspective, what’s on the devops wish list for Kafka/Confluent?

Big pain point ive seen is copying data between environments for testing. Always tonnes of effort and varying implementations from copying the data directory to mirror maker type migration. Would be cool if there was just some cli command for some set of topics that wasn’t too complex to configure (no unix pipe hell :sweat_smile:)

Yeah, that’s a good idea. There is some work in confluent cloud ongoing to provide that using cluster links (a nice feature as it’s there is no extra overhead as you’d get with mirror maker or replciator). Not sure if there is a CLI for that or not.