Help with Loki Dashboard Setup – Duplicate Fields

Hi all!
I’m currently trying to set up my first local Loki dashboard with promtail (I know I should use Alloy instead but I wanted to get promtail working first) and wanted to know why my logs have duplicated fields where they have property and property_extracted, it’s causing all my logs to be a little cluttered with fields. I’ve attached what my fields look like in grafana and my current config file is above ^

https://grafana.com/docs/loki/latest/configure/#limits_config|https://grafana.com/docs/loki/latest/configure/#limits_config
Check
discover_service_name
discover_lof_level

P stands for Parsed, I for Indexed.
Example: You have level=INFO host=Ubuntu msg="Connected to PostgreSQL database" as a log line, and you send it to loki with host as a label. When you will do {host=~".*"} | logfmt , Loki will return you host and host_extracted because your parsing gives a key that is conflicting with your existing indexed keys.

You are not storing duplicated keys, but when you parse, you have conflicts so the number of keys seem to grow. It has no impact on performance

right so in this context if I removed host from my labels in my config file I would only get one key for host? Would it therefore be better to keep my labels and remove them from the actual log line instead? If I have it in both the log line and the label that’s what causes this issue?

Since it has no incidence on performance, it’s better to keep it this way.

Loki’s “ideal” scenario is when you index nothing about what is in the log line. You don’t even read, you just ship it and add labels about where it’s coming from (hostname, filename, cluster name, etc)

Now IRL scenarios, it’s never perfect, and you could need to add a label that is either already present in the logline (conflicts) or need to have a details that is already insde

So Loki just use this concept of “extracted” if the parsing is giving the same keys, but it has no impact on features or performance

You will be happy to have it the day you have a real conflict

Right, so in many logs something like “service” or “env” won’t be parsed and will have to be inferred by labels

Exactly !
Real example of why you want to have extracted vs indexed: an app on hostname=my-server-1 logging something like “I received this batch from hostname=my-backend-12”… You will be happy to have a hostname for both where it comes from and what’s inside the log line

Right where does that indexed information come from if it’s not parsed through the log? I thought it was all just doing parsing from text, I guess things like promtail or alloy add this information under the hood before pushing to loki?

Alloy/Promtail can either enrich from environment or read the log line (extract data with regex)

Loki is doing that extract. It is in the limits_config and I Provided the config Parameters. _extracted is in my understand nothing from the logql query.

right so doesn’t that just say any label in discover_service_name should be used to populate the service_name field and any label in the log_level_fields should be used to populate the level field. That doesn’t quite explain why things like env_extracted are being created or how to stop them showing. The explanation that it’s just pulling these fields from both the log line and the label config and effectively creating the same field twice makes sense to me though

As a last question I just wanted to check I’ve understood labels correctly. So labels are used to create streams for logs and the more unique label values there are, the more demanding this is on the machine running Loki. I previously had a field client_ip and user_id(hashed) in my labels (as I want to be able to monitor bad actors or malicious traffic) but moved it out of the labels and just put it into the log line. Is this the correct practice? Have I misunderstood anything there?

Correct. You should avoid Labels with high cardinality. Clientip and userid are probably Bad candidates for labels