Saving Kafka batch messages into MongoDB and avoiding duplicates

danielF · August 25, 2021, 10:04pm

Hello,

I’m new here and I don’t know if here is the right place to ask this question…
Anyone with experience in consume batch of messages from Kafka could give me some help ?
I need to consume a batch of messages from Kafka and save it in a MongoDB database avoiding duplicates. Is there a simple way to develop a consumer like that ? Any code examples would be nice ?

ronaldM · August 25, 2021, 10:50pm

Have you looked into Kafka Connect ?

danielF · August 26, 2021, 12:48am

Hi

I was considering using Kafka Connect.
However I need to do some transformation in 2 fields from the message I read from kafka topic and I’m not sure if I can do that using SMT.
Basically I have to convert an id field from string to UUID and another string field to Date format.

ronaldM · August 26, 2021, 1:33am

Those sound like good candidates for an SMT, if not an existing one, then certainly a custom STM could do that. And writing an SMT might be simpler than writing and running another application just to do those transforms. Here’s some resources on SMTs that might be helpful (the last one has an example of creating a custom SMT).
https://rmoff.net/2021/01/04/kafka-connect-deep-dive-into-single-message-transforms/
https://www.morling.dev/blog/single-message-transforms-swiss-army-knife-of-kafka-connect/
https://www.confluent.io/blog/kafka-connect-single-message-transformation-tutorial-with-examples/

danielF · August 26, 2021, 2:44am

Ohh thanks a lot !
Do you think that using a MongoDB connector I can solve the problem of consume a batch (multiple messages) from Kafka and avoid duplication of messages in my DB ?
Would you by any chance have an example of how to build this connector?
I recently created a very simple MongoDB connector, just to test Confluent’s platform. But he certainly doesn’t meet my needs.

ronaldM · August 26, 2021, 4:24am

For this, you might want to post a new question in the channel. There are some brilliant and helpful people lurking there that might be able to help you out.

One question though, is whether you are trying to avoid duplicates from being created or if you are trying to deduplicate data where duplicates already exist. The latter might end up requiring a consumer application.

danielF · August 26, 2021, 5:53am

Ok, Thanks again