Optimizing Datadog-agent memory usage for PostgreSQL monitoring

carrie · December 3, 2023, 11:47pm

hey all, I’m configuring the database monitoring and tracing in one of our production clusters and we have noticed that deploying the change quadruples datadog-agent memory usage (only 1 of the pods, goes from 1GB to 6+, rest is mostly unafffected). I need some advice on which <https://github.com/DataDog/integrations-core/blob/master/postgres/datadog_checks/postgres/data/conf.yaml.example|postgres monitoring parameters> should I tune to optimise this a little.

carrie · December 4, 2023, 12:09am

      cluster_check = true
      init_config   = {}
      instances     = [
        for endpoint in setunion(
          [var.dd_agent_postgres_URL],
          [
            for instance in var.dd_agent_postgres_replicas : "${instance}.${local.db_host}"
          ]) : {
          dbm              = true
          host             = endpoint
          port             = 5432
          username         = "datadog"
          dbname           = var.dd_agent_db_name
          password         = var.dd_agent_postgres_password
          dbstrict         = true
          collect_settings = {
            enabled = true
          }
          collect_schemas = {
            enabled = true
          }
          collect_function_metrics = true
          collect_bloat_metrics    = true
          query_samples            = {
            enabled                       = true
            explain_parameterized_queries = true
          }
          tags = [
            "aws_account_environment:${var.environment}",
            "env:${var.environment}",
            "region:${var.region}",
            "service:aurora_${var.dd_agent_db_name}",
          ]
          max_relations = 1000
          relations     = [
            {
              relation_regex = ".*"
              relkind        = ["r", "i", "s", "t", "m", "c", "f", "p"]
            }
          ]
        }
      ]
    })```


that’s our TF config for the agent. Postgres is an aurora cluster, 2 read replicas and 1 writer

carrie · December 4, 2023, 12:26am

I have a feeling this may be caused by the configuration being done for each instance separately but that’s the only way I was able to see the metrics being populated correctly

carrie · December 4, 2023, 1:26am

ok so I pinpointed that if I disable query_metrics the memory usage does not increase, unfortunately that also disables most of the features of the dbm I’ll try tuning the colleciton interval there I guess