Background jobs, retries you can defend, failures the dashboard surfaces.

Background jobs are the part of the product that goes wrong in ways no one notices. We build queues with explicit retry policies, idempotent handlers, dead-letter queues for the genuinely broken work, and dashboards that surface what's running, what's stuck, and what failed. The ops team sees the truth; the product team stops finding out about issues from support tickets.

What we build

BullMQ on Redis for the queue tier

Typed job payloads in TypeScript, per-job timeouts, configurable concurrency per queue, and rate-limited workers where downstream APIs need it. Redis is the queue; Postgres is the truth, we don't conflate them.

Idempotent handlers as the default

Every handler is safe to run twice. State changes are dedupe'd on a job-derived key. A retry storm doesn't corrupt the database; a manual replay during incident response doesn't either.

Retry policies, not retry hope

Per-job retry counts, exponential backoff with jitter, and explicit handling for retryable vs non-retryable errors. A 429 from a vendor retries; a 400 doesn't. The policy is documented in the job definition, not learned the hard way.

Dead-letter queues that someone reads

Jobs that exhaust their retries land in a DLQ with the full payload, error chain, and timestamps. The DLQ has a UI, an owner, and an SLA. It's not where jobs go to die quietly.

Observability via Sentry + queue dashboard

Every handler is wrapped in a Sentry span; failures get full stack traces with the job payload. The BullMQ board shows current depth per queue, lag, and recent failures. Production isn't a black box.

Scheduled jobs as code

Cron-shaped work is defined alongside the queue, version-controlled, and visible. No 'who set up that cron on the box' surprises. Recurring backups, reconciliations, and digest emails all run through the same scheduler the rest of the queue uses.

Where this fits

Your product has been shipping background work via setTimeout or unmanaged crons and it's starting to bite.

You have a queue but no observability into what's stuck, and you find out about it when a customer asks why their export didn't arrive.

You're integrating with three vendor APIs, each rate-limited differently, and the current code lives in one handler that throws when any of them is slow.

InsyteOperations SaaS

The workspace dropshipping operators move into when their store grows past one person

Tech stack

TypeScript
BullMQ
Redis
Postgres
Sentry

Want this for your team?

30 minutes with a founder or senior engineer. We'll scope what you need and tell you straight whether Stacklane fits.

Book a Free Call

Related capabilities

Other patterns in this area

Back to Engineering