[Grafana] What is the best timeseries databases to store sizeable blobs and that can be graphed on Grafana

keithB · October 2, 2022, 11:26pm

We have a postgres table that stores all of our server’s HTTP requests/response with external services, allowing us to review the request/response body in case of a reported issue. We’re also starting to build Grafana panels off of this table as it is essentially a time series. Question: I’m trying to get a sense for how people typically model something like this in more typical time-series databases. I think that storing the entire request/response body in a TS database event would be overkill, and that you’d typically want to keep the data down to just simple numeric/string values, but preserving the request/response for potential later review is an important use case. So if I wanted to use a more robust TS database to build graphs around, how would I model this? Thinking maybe: continue to use the Postgres table for storing req/resp (and potentially prune old rows since they’re pretty large), and in addition add simpler statistical HttpInteraction events with duration/status_code/error to a more seasoned TS database that can keep the historical data around much longer for graphing purposes?

johnRichardson · October 2, 2022, 11:51pm

What you’re describing sounds a lot like log indexing problem. You can take a look at how Loki works, it’s pretty much the use case you have

johnRichardson · October 3, 2022, 12:50am

Sorry, I’ve misread that you’re talking about request / response bodies. I would probably still index the logs and have a reference in a log line to req / res bodies ID. The bodies can then be stored using that ID either in a DB or in an object storage (S3 / GCS).

keithB · October 3, 2022, 1:31am

That makes sense to me. In my case, there wouldn’t be a log line, but rather a UUID of the log, that I could stash as part of the TS event.

Followup question: Grafana graphs are typically aggregated/averaged values over time; if I see a problematic region in the graph, how do I drill down and ultimately find the problematic UUIDs so that I can look up the req/resp bodies in our PG database?

keithB · October 3, 2022, 2:32am

(I think this is a very general question that could apply to lots of Grafana use cases/scenarios: how do you 1. de-aggregate the data and get specific events and 2. how would you add links to take you to another site (non-Grafana) to inspect the contents of the req/resps?

johnRichardson · October 3, 2022, 3:49am

So in terms of problematic exemplars there’s a couple of things to be said:

Metrics (as in some series of (ts, value) pairs) are not de-aggregatable. This is a design limitation, since it would be prohibitively expensive to store them that way.
The answer is usually sampling - Prometheus has exemplars, which have some support in Grafana 7.4.
In terms of links, Grafana has support for data links (https://grafana.com/docs/grafana/latest/linking/data-links/).

Here’s a blog post that might be relevant - https://grafana.com/blog/2020/11/09/trace-discovery-in-grafana-tempo-using-prometheus-exemplars-loki-2.0-queries-and-more/.

keithB · October 3, 2022, 5:37am

I’m reading that blog and the glossary and i still don’t really understand what an Exemplar is

johnRichardson · October 3, 2022, 7:16am

An exemplar is essentially some sampled measurements in a timeseries that includes extra labels. A common use-case is to use those labels to provide IDs of traces or logs, so that you can cross-reference some metric measurement against a trace or a log line. You can check out for example this video here - https://www.youtube.com/watch?v=TzNZIEvhAdA, which expands on this idea.