I am trying to understand what an acceptable amount of memory should be for a 6 node cluster averaging about 20MBs of throughput. In a QA environment we saw low memory consumption, once we went into prod, were we have 11k producers, we saw the brokers memory jump to 96%. Keep in mind that most of the producers are not producing much, but they are connected.
If the amount of connections increases memory utilization what settings effect that the most? Is what I am seeing indicative of a misconfiguration, or do we need to just allocate more memory. Is there a formula for how much each producer would expect to allocate? Any tips would be appreciated
What does free -m
say?
A normal producer only takes a couple kb of memory. However, depending on how you measure ram consumption, give you have a bunch of producers all producing to different partitions you could be forcing the OS to fill the page cache
The available column in free
tells you how much is available to applications, and only being “used” by page cache
Mem: 62Gi 7.3Gi 3.2Gi 4.0Mi 52Gi 55Gi```
It’s all used by page cache.
How many partitions do you have per broker?
Can you grab the stats for the machine too? Cpus & disk type
Our brokers are running on GKE, e2-highcpu-8 6 nodes
We have about 1110 partitions across 41 TOpics
The stats make sense. the cpu utilization is explained by the number of producers. The memory is used by page cache
As long as your SLAs are good, and you’re not hitting timeouts I don’t see anything abnormal
Ok, so even through we have only 20MB of throughout, the page cache is getting consumed by the number of partitions?
When we did load tests in QA, we only had 30 partitions, and were running 500MB throughput.
Partitions are by default 1GB per segment.
And you have to page in the whole segment
Thanks for the information. I really appreciate the community support. Our best course would be to increase memory on the node, rather than increase the number of nodes/brokers? We did go from 3 to 6, and saw that the memory went right back up to 96%.
The memory isn’t really “used”