From 8e0579478465de5be1d9275e6a69b1580d3f138c Mon Sep 17 00:00:00 2001
From: Ara Sadoyan <ara.sadoyan@netangels.net>
Date: Wed, 28 May 2025 21:24:22 +0200
Subject: [PATCH] Metrics exporter for Prometheus

---
 METRICS.md | 116 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 116 insertions(+)
 create mode 100644 METRICS.md

diff --git a/METRICS.md b/METRICS.md
new file mode 100644
index 0000000..caec570
--- /dev/null
+++ b/METRICS.md
@@ -0,0 +1,116 @@
+# 📈 Gazan Prometheus Metrics Reference
+
+This document outlines Prometheus metrics for the [Gazan](https://github.com/sadoyan/gazan) reverse proxy.
+These metrics can be used for monitoring, alerting and performance analysis.
+
+Exposed to `http://config_address/metrics`
+
+By default `http://127.0.0.1:3000/metrics`
+
+---
+
+## 🛠️ Prometheus Metrics
+
+### 1. `gazan_requests_total`
+
+- **Type**: `Counter`
+- **Purpose**: Total amount requests served by Gazan.
+
+**PromQL example:**
+
+```promql
+rate(gazan_requests_total[5m])
+```
+
+---
+
+### 2. `gazan_errors_total`
+
+- **Type**: `Counter`
+- **Purpose**: Count of requests that resulted in an error.
+
+**PromQL example:**
+
+```promql
+rate(gazan_errors_total[5m])
+```
+
+---
+
+### 3. `gazan_responses_total{status="200"}`
+
+- **Type**: `CounterVec`
+- **Purpose**: Count of responses by HTTP status code.
+
+**PromQL example:**
+
+```promql
+rate(gazan_responses_total{status=~"5.."}[5m]) > 0
+```
+
+> Useful for alerting on 5xx errors.
+
+---
+
+### 4. `gazan_response_latency_seconds`
+
+- **Type**: `Histogram`
+- **Purpose**: Tracks the latency of responses in seconds.
+
+**Example bucket output:**
+
+```prometheus
+gazan_response_latency_seconds_bucket{le="0.01"}  15
+gazan_response_latency_seconds_bucket{le="0.1"}   120
+gazan_response_latency_seconds_bucket{le="0.25"}  245
+gazan_response_latency_seconds_bucket{le="0.5"}   500
+...
+gazan_response_latency_seconds_count  1023
+gazan_response_latency_seconds_sum    42.6
+```
+
+| Metric                  | Meaning                                                       |
+|-------------------------|---------------------------------------------------------------|
+| `bucket{le="0.1"} 120`  | 120 requests were ≤ 100ms                                     |
+| `bucket{le="0.25"} 245` | 245 requests were ≤ 250ms                                     |
+| `count`                 | Total number of observations (i.e., total responses measured) |
+| `sum`                   | Total time of all responses, in seconds                       |
+
+### 🔍 How to interpret:
+
+- `le` means “less than or equal to”.
+- `count` is total amount of observations.
+- `sum` is the total time (in seconds) of all responses.
+
+**PromQL examples:**
+
+🔹 **95th percentile latency**
+
+```promql
+histogram_quantile(0.95, rate(gazan_response_latency_seconds_bucket[5m]))
+
+```
+
+🔹 **Average latency**
+
+```promql
+rate(gazan_response_latency_seconds_sum[5m]) / rate(gazan_response_latency_seconds_count[5m])
+```
+
+---
+
+## ✅ Notes
+
+- Metrics are registered after the first served request.
+
+---
+✅ Summary of key metrics
+
+| Metric Name                           | Type       | What it Tells You         |
+|---------------------------------------|------------|---------------------------|
+| `gazan_requests_total`                | Counter    | Total requests served     |
+| `gazan_errors_total`                  | Counter    | Number of failed requests |
+| `gazan_responses_total{status="200"}` | CounterVec | Response status breakdown |
+| `gazan_response_latency_seconds`      | Histogram  | How fast responses are    |
+
+📘 *Last updated: May 2025*