Aggregating each node's data on the primary node

Hello guys
I’m new to kafka, sorry for possible misunderstanding
My usecase: 9 server cluster (3 nodes quorum-replicated, 6 others just parallel nodes); all the nodes write some data to postgres locally (separate postgres clusters); I’d like to aggregate each node’s data on the primary node, taking the summary from each node’s data

Is this an adequate usecase from kafka/CDC/debezium/whatever or are they an overkill for my usecase? If they are, why? And what better tools to use then?

Thanks in advance

What do you mean by “3 nodes quorum-replicated, 6 others just parallel nodes” ?

1 primary node, 2 replicas of it
other 6 nodes are just basically in the same network and doing the same work as the ‘main’ 3 ones, but are not technically part of a quorum cluster

You might want to read up a bit on how Kafka works.

You setup Kafka clusters. It is not a master replica kinda thing like databases.

Can’t I just use a single-node kafka cluster on a primary node?

Like I said, there is no primary node. You talk to the cluster.

You can have a single broker cluster but I would defn not recommend that for production.

Well, I’m bounded by a single primary node…
But anyway, thanks for suggestions :slightly_smiling_face:

Once you create topics, their replicas are created on other brokers in that cluster, assuming you are using replication-factor 3 and have 3 or more brokers in that cluster.

I see that for it to work it’s best to have several nodes for kafka and several for the target store (data wharehouse etc.) and maybe several for the source

Apparently I have no free room for a separate kafka cluster :smiley:

What is your objective? is this a PoC?

If its a PoC you can do everything on one machine. just dont expect data resiliency in case something goes wrong.

I have a storage system that is pretty much a rack of 9 nodes, 3 management nodes, 1 of them primary at a time
It has usage statistics on each node
I’d like to aggregate those stats in real-time to the primary (and then from there send them somewhere else or show in UI or …)

Why not use something like prometheus? dont need kafka in that case

I should correct myself. I have no clue what storage system you are using.

If it’s application logging your after, I once used Logstash to get them to Kafka. Not sure that’s still the best. There might be a Kafka Connect Connector for it these days. Something like https://docs.confluent.io/kafka-connect-spooldir/current/connectors/elf_source_connector.html