Saving Kafka batch messages into MongoDB and avoiding duplicates

Hello,

I’m new here and I don’t know if here is the right place to ask this question…
Anyone with experience in consume batch of messages from Kafka could give me some help ?
I need to consume a batch of messages from Kafka and save it in a MongoDB database avoiding duplicates. Is there a simple way to develop a consumer like that ? Any code examples would be nice ?

Have you looked into Kafka Connect ?

Hi

I was considering using Kafka Connect.
However I need to do some transformation in 2 fields from the message I read from kafka topic and I’m not sure if I can do that using SMT.
Basically I have to convert an id field from string to UUID and another string field to Date format.

Those sound like good candidates for an SMT, if not an existing one, then certainly a custom STM could do that. And writing an SMT might be simpler than writing and running another application just to do those transforms. Here’s some resources on SMTs that might be helpful (the last one has an example of creating a custom SMT).
https://rmoff.net/2021/01/04/kafka-connect-deep-dive-into-single-message-transforms/
https://www.morling.dev/blog/single-message-transforms-swiss-army-knife-of-kafka-connect/
https://www.confluent.io/blog/kafka-connect-single-message-transformation-tutorial-with-examples/

Ohh thanks a lot !
Do you think that using a MongoDB connector I can solve the problem of consume a batch (multiple messages) from Kafka and avoid duplication of messages in my DB ?
Would you by any chance have an example of how to build this connector?
I recently created a very simple MongoDB connector, just to test Confluent’s platform. But he certainly doesn’t meet my needs.

For this, you might want to post a new question in the channel. There are some brilliant and helpful people lurking there that might be able to help you out.

One question though, is whether you are trying to avoid duplicates from being created or if you are trying to deduplicate data where duplicates already exist. The latter might end up requiring a consumer application.

Ok, Thanks again