Hi all. I’m meeting trouble on alert rules configuration for kafka broker memory. My cluster is running on kubernetes. I found that memory usage ratio is always larger than 90% even if I restart the broker. So I’m wondering if it is necessary to alter on high memory usage ratio…
I know broker will use page cache to achieve high performance, but will it use almost all free memory until os cannot allocate memory for it?
Os page cache should be calculated for your solution otherwise you can run into issues, e.g.:
if you receive 20 GB of messages every hour on each broker, and you allow consumers to lag behind by 3 hours worth of messages, you would require 60 GB OS page cache
if you do it like this, you won’t run out of memory … keep in mind, your OS page cache is used by consumers to get messages from your topics … if it runs out, the message will just be retrieved from disk through conventional IO (the main downside being speed at that point).
Yeah… I’m wondering if I have only 50GB OS page cache, will it run out of memory? In kubernetes, a pod OOM will be killed… It sounds in such scenario, container will restart again and again. But my broker running in pod always uses 90%+ memory and hardly ever OOM. So I’m a little curious about that…
By default, if your OS page cache runs out, the systems starts using IO … so, no, you won’t ever experience an OOM … the only way your brokers can end up with an OOM is if their JVM Heap reaches its max and the next alloc is fired …
Oh yeah got it! Many thanks!!! By the way, I’m also using kafka on a resource-limited environments. Kafka broker is only given 1G memory… Do you have any suggested configuration for jvm heap and metaspace memory?
The way to calculate the heap is very dependant on how many topics your have, how many partitions each topic has and your replication factor.
Quite some time ago there was the notion that heap could be calculated as being 1 MB for each partition handled by the given broker.
Given that notion, most Kafka setup start at 4 G heap and will grow in production situations. You could also ask yourself ofcourse if it wouldn’t be easier to just add another Kafka node and scale out instead of up.
You can refer to this as a baseline for your case:
export KAFKA_HEAP_OPTS="-XX:ParallelGCThreads=2 -Xms4g -Xmx4g -XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 "
Thanks dear!! You really help me a lot!!! Thank you~~