Incident Postmortems
Describe a production incident you handled and how you structured the postmortem. What makes a good blameless postmortem?
// interview question
Describe a production incident you handled and how you structured the postmortem. What makes a good blameless postmortem?
Answer out loud first, then check yourself against the model answer.
More SRE interview questions
Also worth your time on this topic
Error Budget Management
Your service has a 99.9% availability SLO over a 30-day window. How much downtime does that give you, and what do you actually do with that error budget day-to-day?
mid
How to Build an Effective On-Call Rotation and Escalation Policy
Your phone buzzed at 3:14 AM for a disk warning that auto-resolved by 3:16. Nobody fixes the alert. The next person on rotation hates their life. Here is how to build on-call schedules, escalation policies, and alert rules that respect your engineers.
SLOs, SLIs, and Error Budgets: A Practical Implementation Guide
A step-by-step checklist for defining service level objectives, picking the right service level indicators, and using error budgets to make better decisions about reliability vs. feature velocity.
45-90 minutes