Topk – largest k elements by sample value
topk(k, <metric_expression>)
Examples
Top 5 CPU usage metrics
topk(5, sum(rate(container_cpu_usage_seconds_total[5m])) by (container_name))
This query calculates the rate of CPU usage for each container over the last 5 minutes, sums it up by container_name
, and then returns the top 5 containers with the highest CPU usage.
Top 10 HTTP Request Rates
topk(10, sum(rate(http_requests_total[5m])) by (instance))
This query calculates the rate of HTTP requests for each instance over the last 5 minutes and then returns the top 10 instances with the highest request rates.
Request rate
Throughput
Note: below examples are true for the span metrics generated with the Otel connector.
Rate of processed requests can be queries as follows:
sum(rate(duration_seconds_count{job:<service_name>}[5m]))
The output timeseries is measured in requests per seconds (RPS).
If you prefer to see requests per minute (RPM) then multiply it by 60:
sum(rate(duration_seconds_count{job:<service_name>}[5m])) * 60
Hit rate
It might be interesting to look at the rate of hits the service. Ideally it should be close to the throughput. Throughput rate cannot be higher than the hit rate. As before – its hits per seconds.
sum(rate(calls_total{}[5m]))
P95 response duration
histogram_quantile(0.95, sum(rate(duration_seconds_bucket{span_kind=~"SPAN_KIND_SERVER|SPAN_KIND_CONSUMER", job=<service_name>} [5m])) by (le))