Skip to main content

Monitoring

Zeridion Flare provides three monitoring pillars out of the box: OpenTelemetry tracing, a Metrics API for custom dashboards, and health endpoints for container orchestration.

OpenTelemetry tracing

The Flare API ships with OpenTelemetry pre-configured. Every request generates distributed traces with automatic instrumentation for:

  • ASP.NET Core — HTTP request spans with route, status code, and duration
  • HttpClient — outbound HTTP call spans
  • Npgsql — database query spans with SQL command text and duration

Tenant tagging

Authenticated /v1/* request spans are enriched with tenant context by the TenantTelemetryMiddleware:

TagValueDescription
tenant.idProject IDIdentifies which project (customer) owns the request
tenant.planPlan nameThe project's pricing tier (free, starter, pro, business)

This makes it straightforward to filter traces and metrics by tenant in your observability platform.

Exporters

ExporterConfiguration
ConsoleEnabled automatically when IsDevelopment() returns true
Azure MonitorEnabled whenever ApplicationInsights:ConnectionString is set, regardless of environment

In development, trace output appears in the terminal. When ApplicationInsights:ConnectionString is configured (typically in production, but also usable in development), traces are additionally exported to Azure Application Insights for querying, alerting, and dashboarding.

Metrics API

Three endpoints provide aggregate job metrics, all scoped to the authenticated project. Use them to build custom dashboards, feed alerting systems, or integrate with external monitoring tools.

Summary

GET /v1/metrics/summary?period=24h

Returns state counts, success rate, and average duration for the specified period.

Query parameters:

ParameterValuesDefaultDescription
period1h, 24h, 7d, 30d24hTime window for aggregation

Response:

{
"total": 1523,
"pending": 12,
"scheduled": 3,
"processing": 8,
"succeeded": 1450,
"failed": 5,
"cancelled": 20,
"dead_letter": 25,
"success_rate": 0.9797,
"avg_duration_ms": 342.5,
"period": "24h"
}
FieldDescription
success_ratesucceeded / (succeeded + failed + dead_letter), rounded to 4 decimal places
avg_duration_msAverage execution time across jobs that reported duration_ms, or null if none

Throughput

GET /v1/metrics/throughput?period=7d&granularity=hour

Returns time-bucketed counts of enqueued, succeeded, and failed jobs.

Query parameters:

ParameterValuesDefaultDescription
period1h, 24h, 7d, 30d24hTime window
granularityminute, hour, dayAutoBucket size

Auto-granularity (when granularity is omitted):

PeriodDefault granularity
1hminute
24hhour
7dhour
30dday

Response:

{
"period": "7d",
"granularity": "hour",
"data": [
{
"timestamp": "2026-03-20T00:00:00+00:00",
"enqueued": 45,
"succeeded": 42,
"failed": 1
},
{
"timestamp": "2026-03-20T01:00:00+00:00",
"enqueued": 38,
"succeeded": 37,
"failed": 0
}
]
}

Queue depth

GET /v1/metrics/queues

Returns the current depth of each queue — how many jobs are pending, processing, or scheduled per queue.

Response:

{
"queues": [
{
"name": "default",
"pending": 15,
"processing": 5,
"scheduled": 2
},
{
"name": "email",
"pending": 3,
"processing": 1,
"scheduled": 0
}
]
}

Use queue depth for autoscaling decisions: when pending grows faster than processing can drain it, you need more workers on that queue.

Health endpoints

Two unauthenticated endpoints for container orchestration probes:

Liveness

GET /health/live

Returns 200 Healthy if the process is running and can accept HTTP requests. No external dependencies are checked — this is purely a process liveness signal.

Use as: Kubernetes liveness probe, Azure Container Apps liveness probe.

Readiness

GET /health/ready

Returns 200 Healthy if the process can serve traffic, including verifying that PostgreSQL is reachable. Returns 503 Unhealthy if the database connection fails.

Use as: Kubernetes readiness probe, Azure Container Apps readiness probe, load balancer health check.

note

Both health endpoints are unauthenticated — no API key required. They are excluded from the authentication middleware pipeline.

Rate limit monitoring

Every /v1/* response includes rate limit headers:

HeaderDescription
X-RateLimit-LimitMaximum requests per hour for this project
X-RateLimit-RemainingRequests remaining in the current window
X-RateLimit-ResetUnix timestamp (seconds) when the window resets

Monitor X-RateLimit-Remaining proactively. When it drops below 10% of X-RateLimit-Limit, throttle your request rate to avoid hitting 429.

See the Rate Limits reference for tier details and backoff strategies.

Dashboard

The built-in Zeridion dashboard visualizes all metrics in real time:

  • Overview page — summary cards (total, succeeded, failed, dead letter), throughput chart, state distribution pie chart, queue depth bar chart
  • Jobs list — filterable by state, searchable, with cursor-based pagination
  • Job detail — payload, error details, progress bar, metadata, state badge

The dashboard polls the metrics API automatically using TanStack React Query, so data stays fresh without manual refresh.

Alerting patterns

While built-in alerting (email, Slack) is planned for a future release, you can build custom alerting today by polling the metrics API.

tip

The Flare API returns snake_case JSON (e.g., dead_letter, success_rate). When using GetFromJsonAsync with custom POCOs, pass JsonSerializerOptions with JsonNamingPolicy.SnakeCaseLower or use [JsonPropertyName] attributes to match the API field names.

Dead letter alert

Poll the summary endpoint and alert when dead letter count exceeds a threshold:

private static readonly JsonSerializerOptions JsonOptions = new()
{
PropertyNamingPolicy = JsonNamingPolicy.SnakeCaseLower,
PropertyNameCaseInsensitive = true
};

public class DeadLetterMonitor(HttpClient http, IAlertService alerts) : BackgroundService
{
protected override async Task ExecuteAsync(CancellationToken ct)
{
while (!ct.IsCancellationRequested)
{
var summary = await http.GetFromJsonAsync<MetricsSummary>(
"/v1/metrics/summary?period=1h", JsonOptions, ct);

if (summary?.DeadLetter > 10)
{
await alerts.SendAsync(
$"Dead letter alert: {summary.DeadLetter} dead-lettered jobs in the last hour");
}

await Task.Delay(TimeSpan.FromMinutes(5), ct);
}
}
}

Queue backlog alert

Monitor queue depth and alert when pending jobs accumulate beyond capacity:

var queues = await http.GetFromJsonAsync<QueueDepthResponse>(
"/v1/metrics/queues", JsonOptions, ct);

foreach (var queue in queues?.Queues ?? [])
{
if (queue.Pending > 1000)
{
await alerts.SendAsync(
$"Queue backlog: {queue.Name} has {queue.Pending} pending jobs");
}
}

Success rate drop

Alert when the success rate falls below an acceptable threshold:

var summary = await http.GetFromJsonAsync<MetricsSummary>(
"/v1/metrics/summary?period=1h", JsonOptions, ct);

if (summary is not null && summary.SuccessRate < 0.95)
{
await alerts.SendAsync(
$"Success rate dropped to {summary.SuccessRate:P1} in the last hour");
}

See also