hey ,
i have a small question. i have a custom exporter which collects data and exposes is as prometheus gauges.
i have this exporter running in each namespace (==environment) of my cluster.
i am trying to create a graph which will show a line for each namespace with the value of the gauge in each sample.
for some reason i cannot find the correct way to do so… the only functions that work for me re either sum and avg, all other options either give me an error about the “by (namespace)” part or no data or a flat line with the value of 0.
but i just want the actual value.
this is something that works: avg(metric_name) by (namespace)
example metric: my_metric_name{count="4",duration="0.179696",name="dashboard_overview",result="200",size="74157"} 0.179696
You metric makes no sense as you have the value as label (and count & size are probably also values ?). You will end up with a lot of series and a lot of stale series. How often do you get a value that match all those labels exactly ? (ie: you type this exact metric name and labels values, how many result do you have on the last month ?)
You can’t group something without an aggregation (and it’s valid in any QL).
Let me type a bit more of details about what I have in mind
Let’s imagine a CPU metric: my_cpu_usage{hostname="web-app", datacenter="A"}
In this metric we have 2 labels:
• hostname: let’s say we only have 2 possible values (database and web-app)
• datacenter: we have 2 datacenters that are exactly the same, they are A and B.
Let’s say I am scraping every 1min. Each time I am scraping, how many values do I end up writing to my TSDB ? The answer is 4:
• the value for web-app in datacenter A
• the value for web-app in datacenter B
• the value for database in datacenter A
• the value for database in datacenter B
That make sense right ? I can display those 4 values easily and plot 4 graphs.
So now if I want to have the values based on datacenter (actively excluding hostname to have 2 graphes), I absolutely need to do either a sum, or an average, or any other aggregation !
Let’s add some more details on your specific problem now
Your metric my_metric_name{count="4",duration="0.179696",name="dashboard_overview",result="200",size="74157"} has several problems. But the first problem you have is that you can’t do an aggregation by namespace because you have no namespace label. Prometheus will not be able to tell you the value for the namespace A, because those values are not linked to a namespace.
Now let’s say you have this label added in ingestion like you said, then the by statement will work as intended. So no problem here, I supposed it’s just that you didn’t add this label in your copy-paste in Slack.
About the labels now. Prometheus sizing mostly depends on the number of timeseries you have. In my previous example I created 4 timeseries (every time I go get my values, I write 4 values in the same streams). Your labels are probably creating a timeserie per value. It’s not scalable. If you increase the ingestion, you will need bigger and bigger Prometheus systems.
first of all, thank you for the detailed explanation!
regarding the namespace label - yes, it is added during scraping and i can see it, as well as tried using the by
basically the reason i added those additional labels is to have the option to create more detailed graphs in the future, but honestly i am more interested in the value itself.
so i understand that i should reduce the amount of labels to reduce the amount of data, but i would still expect to have the following working:
last_over_time(my_metric_name [2h]) by (namespace)
which yields a 422 error - unparsed data left: “by (namespace)”
whereas this does work as expected, but i dont want to use average: avg(my_metric_name) by (namespace)
Let’s look at natural language example. Here is my dataset:
• Patrick is a male engineer and has 20$ in his pocket
• Tom is a male gardener and has 10$ in his pocket
• Lila is a female director and has 30$ in her pocket
• Maria is a female ambassador and has 10$ in her pocket
Total amount by sex: 30$ for males, 40$ for females.
Average amount by sex: 15$ for males, 20$ for females.
Easy to get, right ? But you don’t want to find average or sum.
So your question is “What is the amount by sex ?” Using this example, can you give me the expected result and logic behind it ?
It’s a real question as I believe we don’t use the same lexicon. It is logically impossible to group values without doing any aggregation (aggregate comes from “grouping stuff” in Latin so… make sense)
but my dataset is different, my dataset looks more like this (or i want it to look like this):
Sample #1:
• Patrick is a male engineer and has 20$
• Tom is a male gardener and has 10$
• Maria is a female ambassador and has 30$
Sample #2:
• Patrick is a male engineer and has 15$
• Tom is a male gardener and has 20$
• Maria is a female ambassador and has 45$
Sample #3:
• Patrick is a male engineer and has 25$
• Tom is a male gardener and has 13$
• Maria is a female ambassador and has 22$
my question would be, how much money does X has in his pocket over time
hey , i changed my exporter to remove unnecessary labels, i know have the value of the gauge and the labels added during scraping (namespace,cluster,etc…)
however, i noticed that i am getting a line per sample (not sure if relevant that the container was restarted between the samples).