Skip to main content

Queues and Concurrency

Queues let you isolate different types of work so they don't compete for the same worker slots. Concurrency controls limit how many jobs each worker processes in parallel. Together, they give you fine-grained control over throughput and resource usage.

Named queues

Every job is assigned to a queue. The default queue is "default". Use named queues to separate workloads — fast email sends should not be blocked by slow report generation:

[JobConfig(Queue = "email")]
public class SendWelcomeEmail : IJob<NewUserPayload> { ... }

[JobConfig(Queue = "reports")]
public class GenerateMonthlyReport : IJob<ReportPayload> { ... }

[JobConfig(Queue = "critical")]
public class ProcessPayment : IJob<PaymentPayload> { ... }

Assigning a queue

You can set the queue at two levels. More specific settings override less specific ones.

Per-class (attribute)

[JobConfig(Queue = "email")]
public class SendWelcomeEmail : IJob<NewUserPayload>
{
public async Task ExecuteAsync(NewUserPayload payload, JobContext ctx)
{
// Always enqueued to the "email" queue by default
}
}

Per-call (options)

await jobs.EnqueueAsync<SendWelcomeEmail>(payload, new JobOptions
{
Queue = "critical" // Overrides the class-level "email" queue
});

Resolution order

LevelHow to setDefault
Per-callnew JobOptions { Queue = "..." }
Per-class[JobConfig(Queue = "...")]"default"

Resolution: JobOptions (per-call) > [JobConfig] (per-class) > "default".

Queue names are trimmed and normalized by the server. Max length is 100 characters.

Worker queue binding

The SDK worker automatically polls all queues used by its registered job types. When the worker starts, it discovers which queues to listen on from the JobTypeRegistry:

If your application registers three job types with queues email, reports, and critical, the worker sends all three queue names in its POST /v1/workers/poll request.

Queue isolation with separate workers

For strict workload isolation, deploy separate worker instances that only register specific job types:

// Worker A: handles email jobs only
builder.Services.AddZeridionFlare(o =>
{
o.ApiKey = "...";
o.JobAssemblies = [typeof(SendWelcomeEmail).Assembly];
});
// Worker B: handles report jobs only
builder.Services.AddZeridionFlare(o =>
{
o.ApiKey = "...";
o.JobAssemblies = [typeof(GenerateMonthlyReport).Assembly];
});

Worker A only polls the email queue; Worker B only polls the reports queue. Slow reports cannot starve email delivery.

How polling works

The SDK worker uses long-polling to claim jobs from the API:

Key behaviors:

  • SELECT ... FOR UPDATE SKIP LOCKED — atomically claims jobs without blocking other workers. Two workers polling simultaneously never claim the same job.
  • WHERE "Queue" = ANY(@queues) — only jobs in the worker's registered queues are considered.
  • LIMIT @capacity — the worker only requests as many jobs as it has available concurrency slots.
  • Long-polling — if no jobs are available, the server holds the connection for up to 30 seconds, checking every 2 seconds, before returning an empty response.

Concurrency control

ConcurrencyLimit controls how many jobs a single worker instance processes in parallel. It defaults to 10:

builder.Services.AddZeridionFlare(o =>
{
o.ApiKey = "...";
o.ConcurrencyLimit = 5;
});

Under the hood, the worker uses a SemaphoreSlim initialized to ConcurrencyLimit. Before starting each job, the worker acquires a semaphore slot. When the job completes, the slot is released. The poll request's capacity parameter equals the number of available semaphore slots, so the worker never claims more jobs than it can handle.

Choosing the right limit

Job typeRecommended ConcurrencyLimitRationale
I/O-bound (HTTP calls, email)10–20Jobs spend most time waiting; higher parallelism is safe
CPU-bound (image processing)2–4Jobs consume CPU; too many in parallel causes contention
Memory-intensive (large reports)2–5Each job uses significant memory; limit prevents OOM
Mixed workload10 (default)Good general-purpose starting point

Scaling workers

Scale horizontally by deploying multiple worker instances. Each instance polls independently and SKIP LOCKED ensures no double-claiming:

WorkersConcurrencyLimitMax parallel jobs
11010
21020
31030
520100

Scaling strategies

Uniform scaling — all workers process all job types. Simple to deploy, good for balanced workloads:

Worker Instance 1: all queues, ConcurrencyLimit = 10
Worker Instance 2: all queues, ConcurrencyLimit = 10
Worker Instance 3: all queues, ConcurrencyLimit = 10

Queue-isolated scaling — dedicated workers per queue. Scale each workload independently:

Email Workers (3 instances): queue = "email", ConcurrencyLimit = 20
Report Workers (1 instance): queue = "reports", ConcurrencyLimit = 2
Payment Workers (2 instances): queue = "critical", ConcurrencyLimit = 5

Use Azure Container Apps or Kubernetes to auto-scale worker replicas based on queue depth metrics.

Queue depth monitoring

GET /v1/metrics/queues returns the current depth of each queue:

{
"queues": [
{
"name": "default",
"pending": 45,
"processing": 10,
"scheduled": 3
},
{
"name": "email",
"pending": 120,
"processing": 15,
"scheduled": 0
}
]
}
FieldDescription
pendingJobs waiting to be claimed by a worker
processingJobs currently being executed by a worker
scheduledJobs with a future RunAt or waiting for a parent to complete

Backlog detection

A growing pending count means jobs are arriving faster than workers can process them. Possible responses:

  1. Increase ConcurrencyLimit — if workers have idle CPU/memory
  2. Add worker instances — horizontal scaling
  3. Investigate slow jobs — a single slow job type may be consuming all worker slots

Autoscaling with KEDA

On Azure Container Apps with KEDA, you can scale worker replicas based on queue depth by polling the metrics endpoint and feeding it into a custom KEDA scaler.

Poll interval

PollInterval controls how long the worker waits between poll cycles when the previous poll returned no jobs:

builder.Services.AddZeridionFlare(o =>
{
o.PollInterval = TimeSpan.FromSeconds(5); // Default: 2s
});

Lower values increase responsiveness (faster job pickup) but increase API call volume. The default of 2 seconds is a good balance for most workloads.

note

The poll interval only applies when the long-poll returns empty. When jobs are available, the worker polls again immediately after processing the claimed batch.

Graceful shutdown

When the host shuts down (e.g., SIGTERM from a container orchestrator), the worker:

  1. Stops accepting new poll requests
  2. Waits for all in-flight jobs to complete
  3. Acks each completed job (success or failure)
  4. Exits cleanly

This prevents jobs from being orphaned mid-execution. The server's stuck job reaper (heartbeat timeout) provides a safety net for cases where the worker crashes without completing the shutdown sequence.

Best practices

  1. Use descriptive queue namesemail, reports, billing, imports are immediately meaningful. Avoid generic names like queue1.

  2. Isolate long-running jobs — put slow jobs (report generation, data imports) in their own queue so they don't block fast jobs (email sends, webhook deliveries).

  3. Match concurrency to resource requirements — CPU-bound jobs need lower concurrency than I/O-bound jobs. Start with the default (10) and adjust based on monitoring.

  4. Monitor queue depth — track pending counts via GET /v1/metrics/queues. Rising backlogs mean you need more workers or faster jobs.

  5. Scale horizontally, not just vertically — adding worker instances is generally more effective than increasing ConcurrencyLimit beyond 20, because each instance gets its own process memory and CPU scheduling.

See also