Hardening

Your scheduler works. Now make it trustworthy. The goal: you should feel safe running bun run sync at any time — after a reboot, after editing a schedule, after doing nothing. No side effects. No surprises.

Idempotent Sync

Run sync twice:

bun run src/cli.ts sync
bun run src/cli.ts sync

The second run reports "unchanged" for all jobs. The manager generates the plist, compares it byte-for-byte against the installed one, and skips jobs that haven't changed. No unnecessary bootout/bootstrap cycles.

This is the difference between "it works" and "I trust it." An idempotent sync means you can put bun run sync in your login script, run it on a cron, trigger it from a git hook — it converges to the right state and does nothing extra.

Error Handling in Run Scripts

Your run scripts should never crash silently. The pattern:

#!/bin/bash
set -euo pipefail  # bash: exit on any error

#!/usr/bin/env bun
// bun: catch and report at the top level
main().catch((error) => {
  console.error("[job] Failed:", error);
  process.exit(1);
});

A non-zero exit code shows up in jobs status and jobs doctor. Silent failures (exit 0 but wrong output) are harder to catch — log enough to know whether the job actually did its work.

Retry in AI Jobs

LLM APIs fail. Network drops. Rate limits hit. Don't retry at the launchd level (it doesn't have retry logic for exit codes). Retry inside the run script:

async function withRetry<T>(fn: () => Promise<T>, retries = 3): Promise<T> {
  for (let i = 0; i < retries; i++) {
    try {
      return await fn();
    } catch (err) {
      if (i === retries - 1) throw err;
      const ms = 1000 * Math.pow(2, i);
      console.warn(`[retry] Attempt ${i + 1} failed, waiting ${ms}ms`);
      await Bun.sleep(ms);
    }
  }
  throw new Error("unreachable");
}

Exponential backoff. Three retries. If all three fail, exit non-zero and let the logs tell you what happened.

What "Production" Means Here

This isn't a server you deploy. It's infrastructure that runs on YOUR machine. "Production" means:

It survives reboots (launchd handles this)
It recovers from failures (retry + sync reconverge)
It's observable (jobs status, jobs doctor, jobs logs)
It's safe to change (idempotent sync, dry-run)
It doesn't surprise you

That's it. No uptime monitoring. No alerting pipeline. No on-call rotation. Just a system you trust to do its job while you do yours.

Verification

# Sync is idempotent
bun run src/cli.ts sync
bun run src/cli.ts sync  # should show all unchanged

# Doctor is clean
bun run src/cli.ts doctor  # should show 0 errors

# Jobs are running
bun run src/cli.ts status  # should show all loaded, exit 0

What You Learned

Idempotent sync = safe to repeat = trustworthy
Error handling in scripts: set -euo pipefail (bash) or .catch(exit 1) (bun)
Retry with exponential backoff for API calls
"Production" for personal infrastructure = it works, it's observable, it doesn't surprise you