Incident & Postmortem Accountability Automation

ezCater Senior Engineering Manager, Events + Messaging Platform 2022–2025

Situation

ezCater had a persistent incident problem that was not improving despite process changes. Postmortems were completed inconsistently. Follow-up actions were tracked loosely and often dropped. The same classes of incidents recurred.

The organizational assumption was that this was a process problem — that better templates or more reminders would fix it. I diagnosed it as a cultural and operational problem.

Decision

I proved it with data. I pulled incident records, postmortem completion rates, and follow-up action closure rates. The numbers showed that the issue was not awareness — people knew what to do — but accountability. There was no systematic enforcement of postmortem completion or action follow-through.

I conceived of, created, and managed a program with two components:

First, I created a tagging system in Jira for teams to tag their incident-inspired investments in the backlog, then built a Jira board to filter on those items. This made something visible that had been invisible: whether teams were actually investing in work that came out of incidents. They weren’t.

Second, I created a dedicated Jira project to track postmortem completion. When incidents were declared in Slack, the existing Slack automation was augmented to create postmortem tracking items with a two-week completion window. Progress — or lack of it — was visible on the board. Emails nudged compliance.

Risk

There was pushback that this reduced product capacity. The argument: engineering time spent on postmortems and follow-up actions is time not spent on features.

This is a real tradeoff, and I accepted it deliberately. The counter-argument I made: if we don’t operate what we’ve already built, new features compound the problem. Reliability debt was accumulating faster than product velocity was paying it down.

Change

There was an uptick in postmortem completion rates, though the quality of reports and incident-inspired investments remained about the same. That was expected — the primary goal was not to fix everything immediately. The goal was to get data so we could understand the problem and behaviors well enough to iterate.

The real change was visibility. Before the program, nobody could answer “are teams investing in incident-driven work?” or “are postmortems getting done?” After, both questions had clear, visible answers — and the answers were uncomfortable enough to sustain the conversation.

What This Demonstrates