Creating Kafka producer with state stored in compacted topic drawbacks?

Hi :wave:
I need to create a Kafka producer that needs to store its state somewhere. I know that Kafka can store such state in a compacted topic. Compacted topics have known drawbacks like no actual guarantees about getting the latest value for the key, which is crucial to my application (otherwise exactly-once delivery isn’t guaranteed). Has this changed or not?

I also have read that RocksDb key-value store have been added to Kafka at some point. Maybe one can utilize it to store producer state? Can one also store the state and send the message atomically within a transaction? From what I’ve read it seems possible, but I’m not sure.

Kafka streams, https://kafka.apache.org/documentation/streams/|https://kafka.apache.org/documentation/streams/ , had a rocksdb state store

Sure, but we use Rust and librdkafka, not JVM.

Compacted topics have known drawbacks like no actual guarantees about getting the latest value for the key

Yeah, log compaction doesn’t happen on the active segment, and happens asynchronously on closed segments, so you have to read to the end of the topic to ensure you have the latest value.

You can produce to multiple topics in a transaction, yep.

As for storing state locally, you’ll have to implement that yourself, RocksDB is one option as mentioned above.

No-no, if producers state is stored anywhere except Kafka there would be a possibility of duplicated messages.

Okay, so yeah, to ensure you’ve got the absolute latest message for a given key, you have to ensure you’ve consumed to the end of the topic partition, I’m afraid.

‘Max.compaction.lag’ can help here. But it’s rather expensive and doesn’t actually fix the problem

It’s certainly possible to achieve what you want, but not out-of-the-box.

Kafka Streams follows a read-process-write pattern so it’s not fit. But the concepts it’s using like state stores is stuff you could copy/rebuild.

If you need RocksDB or just an in-memory hash map depends on other app requirements / properties like state size etc.

But if you write all state updates to a compacted topic and read it to the end of startup (as pointed out earlier), you can always rebuild the latest state on startup. And during processing you use a single TX to write into the actuall producer topic plus changelog topic to keep both in-sync and update the state „on the side“.