Monitoring

Zeridion Flare provides three monitoring pillars out of the box: OpenTelemetry tracing, a Metrics API for custom dashboards, and health endpoints for container orchestration.

OpenTelemetry tracing

The Flare API ships with OpenTelemetry pre-configured. Every request generates distributed traces with automatic instrumentation for:

ASP.NET Core — HTTP request spans with route, status code, and duration
HttpClient — outbound HTTP call spans
Npgsql — database query spans with SQL command text and duration

Tenant tagging

Authenticated /v1/* request spans are enriched with tenant context by the TenantTelemetryMiddleware:

Tag	Value	Description
`tenant.id`	Project ID	Identifies which project (customer) owns the request
`tenant.plan`	Plan name	The project's pricing tier (free, starter, pro, business)

This makes it straightforward to filter traces and metrics by tenant in your observability platform.

Exporters

Exporter	Configuration
Console	Enabled automatically when `IsDevelopment()` returns `true`
Azure Monitor	Enabled whenever `ApplicationInsights:ConnectionString` is set, regardless of environment

In development, trace output appears in the terminal. When ApplicationInsights:ConnectionString is configured (typically in production, but also usable in development), traces are additionally exported to Azure Application Insights for querying, alerting, and dashboarding.

Metrics API

Three endpoints provide aggregate job metrics, all scoped to the authenticated project. Use them to build custom dashboards, feed alerting systems, or integrate with external monitoring tools.

Summary

GET /v1/metrics/summary?period=24h

Returns state counts, success rate, and average duration for the specified period.

Query parameters:

Parameter	Values	Default	Description
`period`	`1h`, `24h`, `7d`, `30d`	`24h`	Time window for aggregation

Response:

{
  "total": 1523,
  "pending": 12,
  "scheduled": 3,
  "processing": 8,
  "succeeded": 1450,
  "failed": 5,
  "cancelled": 20,
  "dead_letter": 25,
  "success_rate": 0.9797,
  "avg_duration_ms": 342.5,
  "period": "24h"
}

Field	Description
`success_rate`	`succeeded / (succeeded + failed + dead_letter)`, rounded to 4 decimal places
`avg_duration_ms`	Average execution time across jobs that reported `duration_ms`, or `null` if none

Throughput

GET /v1/metrics/throughput?period=7d&granularity=hour

Returns time-bucketed counts of enqueued, succeeded, and failed jobs.

Query parameters:

Parameter	Values	Default	Description
`period`	`1h`, `24h`, `7d`, `30d`	`24h`	Time window
`granularity`	`minute`, `hour`, `day`	Auto	Bucket size

Auto-granularity (when granularity is omitted):

Period	Default granularity
`1h`	`minute`
`24h`	`hour`
`7d`	`hour`
`30d`	`day`

Response:

{
  "period": "7d",
  "granularity": "hour",
  "data": [
    {
      "timestamp": "2026-03-20T00:00:00+00:00",
      "enqueued": 45,
      "succeeded": 42,
      "failed": 1
    },
    {
      "timestamp": "2026-03-20T01:00:00+00:00",
      "enqueued": 38,
      "succeeded": 37,
      "failed": 0
    }
  ]
}

Queue depth

GET /v1/metrics/queues

Returns the current depth of each queue — how many jobs are pending, processing, or scheduled per queue.

Response:

{
  "queues": [
    {
      "name": "default",
      "pending": 15,
      "processing": 5,
      "scheduled": 2
    },
    {
      "name": "email",
      "pending": 3,
      "processing": 1,
      "scheduled": 0
    }
  ]
}

Use queue depth for autoscaling decisions: when pending grows faster than processing can drain it, you need more workers on that queue.

Health endpoints

Two unauthenticated endpoints for container orchestration probes:

Liveness

GET /health/live

Returns 200 Healthy if the process is running and can accept HTTP requests. No external dependencies are checked — this is purely a process liveness signal.

Use as: Kubernetes liveness probe, Azure Container Apps liveness probe.

Readiness

GET /health/ready

Returns 200 Healthy if the process can serve traffic, including verifying that PostgreSQL is reachable. Returns 503 Unhealthy if the database connection fails.

Use as: Kubernetes readiness probe, Azure Container Apps readiness probe, load balancer health check.

note

Both health endpoints are unauthenticated — no API key required. They are excluded from the authentication middleware pipeline.

Rate limit monitoring

Every /v1/* response includes rate limit headers:

Header	Description
`X-RateLimit-Limit`	Maximum requests per hour for this project
`X-RateLimit-Remaining`	Requests remaining in the current window
`X-RateLimit-Reset`	Unix timestamp (seconds) when the window resets

Monitor X-RateLimit-Remaining proactively. When it drops below 10% of X-RateLimit-Limit, throttle your request rate to avoid hitting 429.

See the Rate Limits reference for tier details and backoff strategies.

Dashboard

The built-in Zeridion dashboard visualizes all metrics in real time:

Overview page — summary cards (total, succeeded, failed, dead letter), throughput chart, state distribution pie chart, queue depth bar chart
Jobs list — filterable by state, searchable, with cursor-based pagination
Job detail — payload, error details, progress bar, metadata, state badge

The dashboard polls the metrics API automatically using TanStack React Query, so data stays fresh without manual refresh.

Alerting patterns

While built-in alerting (email, Slack) is planned for a future release, you can build custom alerting today by polling the metrics API.

tip

The Flare API returns snake_case JSON (e.g., dead_letter, success_rate). When using GetFromJsonAsync with custom POCOs, pass JsonSerializerOptions with JsonNamingPolicy.SnakeCaseLower or use [JsonPropertyName] attributes to match the API field names.

Dead letter alert

Poll the summary endpoint and alert when dead letter count exceeds a threshold:

private static readonly JsonSerializerOptions JsonOptions = new()
{
    PropertyNamingPolicy = JsonNamingPolicy.SnakeCaseLower,
    PropertyNameCaseInsensitive = true
};

public class DeadLetterMonitor(HttpClient http, IAlertService alerts) : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken ct)
    {
        while (!ct.IsCancellationRequested)
        {
            var summary = await http.GetFromJsonAsync<MetricsSummary>(
                "/v1/metrics/summary?period=1h", JsonOptions, ct);

            if (summary?.DeadLetter > 10)
            {
                await alerts.SendAsync(
                    $"Dead letter alert: {summary.DeadLetter} dead-lettered jobs in the last hour");
            }

            await Task.Delay(TimeSpan.FromMinutes(5), ct);
        }
    }
}

Queue backlog alert

Monitor queue depth and alert when pending jobs accumulate beyond capacity:

var queues = await http.GetFromJsonAsync<QueueDepthResponse>(
    "/v1/metrics/queues", JsonOptions, ct);

foreach (var queue in queues?.Queues ?? [])
{
    if (queue.Pending > 1000)
    {
        await alerts.SendAsync(
            $"Queue backlog: {queue.Name} has {queue.Pending} pending jobs");
    }
}

Success rate drop

Alert when the success rate falls below an acceptable threshold:

var summary = await http.GetFromJsonAsync<MetricsSummary>(
    "/v1/metrics/summary?period=1h", JsonOptions, ct);

if (summary is not null && summary.SuccessRate < 0.95)
{
    await alerts.SendAsync(
        $"Success rate dropped to {summary.SuccessRate:P1} in the last hour");
}

OpenTelemetry tracing​

Tenant tagging​

Exporters​

Metrics API​

Summary​

Throughput​

Queue depth​

Health endpoints​

Liveness​

Readiness​

Rate limit monitoring​

Dashboard​

Alerting patterns​

Dead letter alert​

Queue backlog alert​

Success rate drop​

See also​

OpenTelemetry tracing

Tenant tagging

Exporters

Metrics API

Summary

Throughput

Queue depth

Health endpoints

Liveness

Readiness

Rate limit monitoring

Dashboard

Alerting patterns

Dead letter alert

Queue backlog alert

Success rate drop

See also