The Attention Model

The spec had this figured out. Use dmux's sophisticated three-level attention system — armed state, fingerprinting, focus-aware routing. Handle visual notifications with surgical precision. Done.

That lasted until the first multi-workspace test session. Five agents across five workspaces, all completing at once. The notification system turned into a fork bomb. macOS alert popups stacked like playing cards. The "sophisticated" solution was a disaster.

By the end of this lesson, you'll understand why the final attention model uses exactly two signals, why focus detection matters more than interaction heuristics, and how fleet operation changes everything about notification design.

Iteration 1: The dmux inheritance

The spec started with dmux's battle-tested model from research/07-PATTERNS-STOLEN.md. Three attention levels with smart triggering:

Armed State → User expects output, suppress external noise
Fingerprinting → Visual baseline detection, only notify on real changes  
Focus-aware → Route notifications based on window focus

This sounded perfect. Proven in production. Well-documented patterns. The implementation plan was clear: port dmux's NotificationManager to cmux, adapt the focus detection APIs, ship it.

That plan died during the first design session. dmux was single-workspace. cmux is multi-workspace. The focus models are fundamentally incompatible. Dmux's "user is looking at THIS workspace" becomes meaningless when agents live across 5+ workspaces. The sophisticated system was built for the wrong problem.

Decision: Defer the sophistication. Start simple. Add complexity only when simple fails.

Iteration 2: 30-second interaction window

Simple heuristic: if the user typed anything in the last 30 seconds, they're active. Suppress notifications during active periods.

// The interaction recency experiment
function isUserActive(): boolean {
  const lastInput = getLastInputTime();
  const nowMs = Date.now();
  return (nowMs - lastInput) < 30_000; // 30 seconds
}

pi.on("agent_end", async () => {
  if (!isUserActive()) {
    notify("pi", "Agent complete - needs input");
  }
});

This worked for single-agent sessions. The user types a prompt, pi responds, the interaction window suppresses the completion notification because they just typed 10 seconds ago. Perfect.

It broke immediately in multi-workspace scenarios. User types in workspace 1. Agent completes in workspace 2. The interaction in workspace 1 suppresses the notification from workspace 2. The user never knows workspace 2 finished. The completion goes unnoticed until they happen to check that workspace hours later.

Failure mode: Interaction recency ≠ focus. Typing in one workspace doesn't mean you're paying attention to another.

Iteration 3: Always notify

Remove all heuristics. Every agent completion triggers a macOS notification. Simple, reliable, guaranteed visibility.

pi.on("agent_end", async () => {
  const sessionName = pi.getSessionName();
  notify("pi", sessionName 
    ? `${sessionName} — waiting for input`
    : "Waiting for input");
});

This solved the visibility problem completely. Every completion was announced. No missed agents. Perfect awareness across all workspaces.

It created a new problem: notification spam. During fleet testing (5 agents, 20-minute tasks), the notifications never stopped. Every few minutes, another agent completed. Another popup. Another interruption. The user couldn't focus on anything because the fleet kept demanding attention.

Insight: "Always notify" works for single agents. It's unbearable for fleets.

Iteration 4: Focus detection

Only notify when the user is NOT looking at the completing agent's workspace. Use cmux identify to compare the caller's surface with the currently focused surface.

function isFocused(): boolean {
  const raw = cmuxSafe("identify");
  if (!raw) return false;
  try {
    const info = JSON.parse(raw);
    return info.caller?.surface_ref === info.focused?.surface_ref;
  } catch { 
    return false; 
  }
}

pi.on("agent_end", async (event, ctx) => {
  stopHeartbeat();
  _turnCount++;
  setStatus(STATUS_NEEDS_INPUT);
  
  if (!isFocused()) {
    const sessionName = pi.getSessionName();
    notify("pi", sessionName
      ? `${sessionName} — waiting for input`
      : "Waiting for input");
    playPeonPing("stop");
  }
});

The logic: If you're literally looking at the surface where the agent completed, you already know it's done. No notification needed. If you're looking at a different surface, you get the notification.

This was the breakthrough. Fleet testing went from unbearable to manageable. When you're focused on workspace 1, completions in workspaces 2-5 notify you. When you're in workspace 3, only workspaces 1,2,4,5 send notifications. You always know about completions you can't see. You never get interrupted by completions you're already watching.

Success: Focus detection solved the fleet notification problem. The simple binary (focused/not focused) works better than sophisticated heuristics.

Iteration 5: Debouncing for fleet

One remaining edge case: rapid completion detection during model switching. An agent stops, completion detection fires, then the user immediately continues. The notification fires before the user realizes they want to continue. False positive interruptions.

Add an idle threshold to completion detection:

let idleCount = 0;
const IDLE_THRESHOLD = 3; // 3 consecutive quiet polls

setInterval(() => {
  if (pi.isWaiting()) {
    idleCount++;
    if (idleCount >= IDLE_THRESHOLD) {
      pi.emit("completion_detected");
      idleCount = 0; // Reset after firing
    }
  } else {
    idleCount = 0; // Reset if agent becomes active again
  }
}, 1000);

This prevented false positive notifications during natural session transitions. The agent stops, but completion detection waits 3 seconds before deciding it's really done. During those 3 seconds, if the user types anything, the count resets.

Fleet operation became smooth. Each workspace correctly detects completion. Focus detection prevents notification spam. Debouncing prevents false positives.

The two-signal system

The final contract is elegantly simple:

Agent runs  → sidebar: "Running" (blue bolt)     | NO notification (quiet)
Agent stops → sidebar: "Needs input" (blue bell)  | macOS notification (LOUD)
User types  → sidebar: "Idle" (gray pause)        | clear notifications (RESET)

Two audiences, two mechanisms:

Sidebar status = Cross-workspace ambient awareness. Visible from any workspace without switching. Each workspace shows its own agent state. You glance at the sidebar to see which workspaces need attention. Quiet, non-interrupting, always available.

macOS notifications = Focus-stealing alerts for completions you can't see. Only fires when you're NOT looking at the completing workspace. Loud, interrupting, demands immediate attention. Includes audio (peon-ping) for maximum awareness.

The input handler resets everything:

pi.on("input", async () => {
  setStatus(STATUS_IDLE);
  cmuxSafe("clear-notifications");
  cmuxSafe("claude-hook", "prompt-submit");
});

When you type, your agent goes to "Idle" status, all queued notifications clear, and the attention model resets. The cycle begins again.

Platform collision gotcha

cmux ships with a built-in claude_code status key that conflicts with the extension's pi_agent status. The built-in status appears for all pi activity, but it doesn't understand the running/idle/needs-input states. Both statuses appear simultaneously, creating visual confusion.

The solution: actively suppress the built-in status on every update.

function clearBuiltinStatus(): void {
  cmuxSafe("clear-status", "claude_code");
}

function setStatus(status: string): void {
  clearBuiltinStatus();
  cmuxSafe("set-status", "pi_agent", status, iconForStatus(status));
}

Every time the extension updates its status, it clears cmux's built-in status. This ensures only one pi status appears in the sidebar. Extensions must manage platform collisions explicitly.

The deferred sophistication

dmux's fingerprinting and baseline detection still exist in the research. They may return when fleet notification volume becomes a problem — imagine 20 agents, all completing every few minutes. The current focus detection might not be granular enough. Visual change detection might be necessary.

But that's future complexity for future problems. The two-signal system handles fleets up to 5-8 agents gracefully. The sophisticated system would handle fleets of 50+ agents, but we're not there yet.

Principle: Start simple, add sophistication when the failure mode demands it. The deferral itself is the lesson.

Fleet implications

Multi-workspace operation changes everything about attention models:

Single agent: Notifications are usually unwanted. You're watching the agent run. You know when it stops.

Fleet: Notifications are essential. You're watching one workspace while four others complete. Without notifications, completions go unnoticed for hours.

The reversal: What was noise in single-agent mode becomes signal in fleet mode. The attention model must flip its defaults. Always notify, except when literally focused on the completing surface.

This explains why dmux's sophisticated system didn't transfer. It was optimized for "avoid unnecessary notifications." Fleet operation requires "ensure visibility of all completions." The opposite goal demands a different solution.

Test the system

Run three agents simultaneously:

Spawn worker in workspace 1
Spawn worker in workspace 2
Stay focused on workspace 3

Watch the sidebar. Two workspaces show "Running" (blue bolt). One shows "Idle" (gray pause). When workspaces 1 and 2 complete, you get two notifications. The sidebar updates to show "Needs input" (blue bell) for both.

Switch to workspace 1, type a prompt. The sidebar resets to "Idle". No more notifications from workspace 1. Workspace 2 still shows "Needs input" until you service it.

This is fleet attention working correctly. Quiet awareness in the sidebar. Loud alerts for completions you can't see. Immediate reset when you engage. The two-signal system scales to any fleet size you can mentally track.

That's the difference between designing for single agents and designing for fleets. The user's attention is the scarce resource, not compute cycles.