From 8e0579478465de5be1d9275e6a69b1580d3f138c Mon Sep 17 00:00:00 2001 From: Ara Sadoyan Date: Wed, 28 May 2025 21:24:22 +0200 Subject: [PATCH] Metrics exporter for Prometheus --- METRICS.md | 116 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 METRICS.md diff --git a/METRICS.md b/METRICS.md new file mode 100644 index 0000000..caec570 --- /dev/null +++ b/METRICS.md @@ -0,0 +1,116 @@ +# πŸ“ˆ Gazan Prometheus Metrics Reference + +This document outlines Prometheus metrics for the [Gazan](https://github.com/sadoyan/gazan) reverse proxy. +These metrics can be used for monitoring, alerting and performance analysis. + +Exposed to `http://config_address/metrics` + +By default `http://127.0.0.1:3000/metrics` + +--- + +## πŸ› οΈ Prometheus Metrics + +### 1. `gazan_requests_total` + +- **Type**: `Counter` +- **Purpose**: Total amount requests served by Gazan. + +**PromQL example:** + +```promql +rate(gazan_requests_total[5m]) +``` + +--- + +### 2. `gazan_errors_total` + +- **Type**: `Counter` +- **Purpose**: Count of requests that resulted in an error. + +**PromQL example:** + +```promql +rate(gazan_errors_total[5m]) +``` + +--- + +### 3. `gazan_responses_total{status="200"}` + +- **Type**: `CounterVec` +- **Purpose**: Count of responses by HTTP status code. + +**PromQL example:** + +```promql +rate(gazan_responses_total{status=~"5.."}[5m]) > 0 +``` + +> Useful for alerting on 5xx errors. + +--- + +### 4. `gazan_response_latency_seconds` + +- **Type**: `Histogram` +- **Purpose**: Tracks the latency of responses in seconds. + +**Example bucket output:** + +```prometheus +gazan_response_latency_seconds_bucket{le="0.01"} 15 +gazan_response_latency_seconds_bucket{le="0.1"} 120 +gazan_response_latency_seconds_bucket{le="0.25"} 245 +gazan_response_latency_seconds_bucket{le="0.5"} 500 +... +gazan_response_latency_seconds_count 1023 +gazan_response_latency_seconds_sum 42.6 +``` + +| Metric | Meaning | +|-------------------------|---------------------------------------------------------------| +| `bucket{le="0.1"} 120` | 120 requests were ≀ 100ms | +| `bucket{le="0.25"} 245` | 245 requests were ≀ 250ms | +| `count` | Total number of observations (i.e., total responses measured) | +| `sum` | Total time of all responses, in seconds | + +### πŸ” How to interpret: + +- `le` means β€œless than or equal to”. +- `count` is total amount of observations. +- `sum` is the total time (in seconds) of all responses. + +**PromQL examples:** + +πŸ”Ή **95th percentile latency** + +```promql +histogram_quantile(0.95, rate(gazan_response_latency_seconds_bucket[5m])) + +``` + +πŸ”Ή **Average latency** + +```promql +rate(gazan_response_latency_seconds_sum[5m]) / rate(gazan_response_latency_seconds_count[5m]) +``` + +--- + +## βœ… Notes + +- Metrics are registered after the first served request. + +--- +βœ… Summary of key metrics + +| Metric Name | Type | What it Tells You | +|---------------------------------------|------------|---------------------------| +| `gazan_requests_total` | Counter | Total requests served | +| `gazan_errors_total` | Counter | Number of failed requests | +| `gazan_responses_total{status="200"}` | CounterVec | Response status breakdown | +| `gazan_response_latency_seconds` | Histogram | How fast responses are | + +πŸ“˜ *Last updated: May 2025*