I have a question about kafka sharding strategies. We have a location based stream, going in to a service that needs to build some cache state based on that location. So we want to shard the data based on “tiles” or so. Since we build up state based on tile, we want the destination consumer to be as “sticky” as possible. Should we shard this stream based on partition ids on the same topic, or separate the stream into different topics altogether?
I know there’s the “sticky assignor” partitioning strategy, which would keep re-partitioning down a bit. But since we expect this steam to scale up and down quite a lot (like 6 consumers in the day time, 1 at night), over time I feel like most consumers will more or less get all of the partition ids at some point in time. Meaning the cache state would potentially get huge.
But at the same time, putting things into completely separate topics feels a bit… off?