I have an question about available strategies for scaling event processing by kafka consumers. Please see thread (I don’t want to clutter the main channel with a wall of text)
I’m writing a long message now.
Our kafka use case is populating data changes from our application data base cluster to our Data Warehouse.
Each data change on an application record is produced to kafka for that record’s key.
Each record type is its own topic.
Consumer listens for an event, does a very light data transformation, upserts record with a transaction to an RDBMS database, and sends an acknowledgment to kafka broker.
Data changes have to be processed in order for a given record.
So I’m wondering about concurrency and throughput.
The only tool in the box for concurrent processing is topics and partitions.
Let T= the amount of topics and P = the amount of partitions.
Accordingly, I can only have up to T*P concurrent processing of published changes.
Online I’m seeing that P should generally be pretty low.
So I cant just increase P indefinitely to get higher throughput.
What other strategies are available to me.
Can I increase T indefinitely and add kafka broker nodes to the cluster as needed?
Is that standard procedure to increase throughput for this type of workload?
Are you planning to do something else with the data that you’re putting into Kafka besides putting into data warehouse?
Other than some light transformations not at this time.
but after researching this a bit, being new to kafka and its ecosystem, I’ve come to realize that I really don’t want to be doing this at all.
I need to using a sink connector (cluster)
I’m still researching but if this is the correct approach I will be very happy