how to implement fair and balanced message processing if I could have some users sending many messages in short interval and new users can be created anytime
I have a system where I sync multiple remote storage providers. I keep in a kafka topic the changes to the files from one provider than need to be synced to another provider, let’s say a file create of file content change.
I could have a situation where one provider does a lot of changes and many messages are sent to the topic and consumers will be busy processing mostly those provider messages not processing much from other providers.
Ideally I would want to have a fair distribution, logically like a queue for each provider and consumers process round-robin from each queue. Like that all providers messages will be processed fairly.
Even if I send message to a partition by hash cluster, like provider_id % num_partitions this will affect al providers in the same partition.
Would it be ok to have a topic for each user with 1 partition (as order of messages is important)? with the new KRaft mode I read topics can now be in the order of millions
I have for each user multiple providers and I want to sync changes between multiple users providers
when user change a file in his provider1 I want to propagate it to user2 provider42 for ex
Would it be ok to have a topic for each user with 1 partition (as order of messages is important)? with the new KRaft mode I read topics can now be in the order of millions
it is for a system like this https://github.com/radumarias/syncoxiders
would be globally distributed and maybe could reach millions of users with many files and also large ones
maybe splitting in cluster per multiple countries and have there in the order of max 100k topics could work?
did some test and redis can handle 1 million streams also with some messages in them and it stays at around 4G mem. So I think this would be the best solution to ingest all changes in a kafka topic with 20-50 partitions then batch them grouped by user_id and push to a dedicated user redis stream and process from there with stream groups
maybe I’ll start with many topics solution and when I reach many users I change to redis, as the redis solution requires some orchestration for jobs as it’s not trivial to consume in parallel from those many streams in redis and retain order or processing per list (as are file operations this is needed)