We are setting up Kafka as a streaming platform on k8s. Our plan is to have internal producers which are connected to Kafka directly and external clients which also connect to Kafka directly to consume that data (all using OAuth2).
Although, we were wondering what the best practices are when it comes to exposing Kafka directly to the internet. Is there a use case for it or is it generally recommended to “hide” your Kafka from the outside world due to different reasons, using something like Kafka REST Proxy or other technologies in between?
I’d be happy if you can share your experiences.
This is one of those cases where the number and rate of change of external consumers/producers is going to matter a LOT to the advice given.
In addition there’s also a question of what level of guarantees you want to be able to offer the external clients.
Personally I’d err on using a proxy of some description for external systems, as they tend to have easier authentication (kafka libs and oauth sounds like a lot of work) for large numbers, and rebalances can be taxing on a cluster if there’s a lot of change.
My current personal preference for this is an opensource tool called Zilla which works as a rest proxy as well as for other real-time protocols like MQTT, WebSocket and SSE : https://github.com/aklivity/zilla
Thanks a lot for taking the time to answer my question.
I’ve looked into Zilla yesterday and although it looks promising, it seems like the OAuth Authentication would be a challenge for us.
We now tend towards writing our own proxy.
ahh for that have you seen gravitee?
I am fairly sure it does Oauth
We already have our internal API Management solution, so we are restricted to that in our case.
then writing your own proxy makes more sense
there are a few other tools out there but they are all managed services and not open. if that sort of thing is what you are after Ably is an option. You’d use oauth to grant a JWT and push data into ably websockets to get it to the users.