Incident Date: September 24, 2025
Duration: 15 minutes
Impact: Temporary unavailability of Jamf School services in eu-central-1
Summary and Timeline
On September 17th, Jamf School experienced a performance degradation. To remediate and prevent recurrence, we scheduled a schema change to add a supporting index to high-concurrency table. When executed, the index creation acquired an unexpected lock that blocked concurrent application queries, resulting in a full Jamf School service interruption for eu-central-1 customers. Automated and manual monitoring triggered rapid investigation; engineers terminated the locking DDL session and service fully recovered within 15 minutes.
Root Cause & Timeline
Root Cause: An online index creation on a high-concurrency table resulted in a blocking table-level lock due to contention and execution plan timing. This prevented critical application queries from proceeding, causing service unavailability.
Key Events:
- 08:30 UTC – Planned schema/index change initiated
- 08:31 UTC – Monitoring alert: elevated errors / unresponsive application
- 08:34 UTC – Incident response bridge (war room) activated
- 08:38 UTC – Root cause isolated: blocking lock on core table from index creation
- 08:44 UTC – Lock cleared / blocking process stopped
- 08:46 UTC – Services fully recovered; continued elevated monitoring initiated.
Business & User Impact
- Jamf School web console: Unavailable for 15 minutes
- Device management (MDM communication): Temporarily degraded (devices unable to check in)
- Scope: eu-central-1 environment only
- Data integrity: No data loss or corruption observed
Gaps Identified
- The eu-central-1 environment exhibited higher peak concurrent activity than other regions where the same index change had already been applied successfully.
- The change was executed during a period of high traffic in eu-central-1 region
- Staging and prior rollout experience did not fully reproduce the eu-central-1 peak concurrency, reducing the predictive value of the pre-change assessment.
Remediation Items & Next Steps
- The status page was updated to reflect the outage and ensure customer awareness – Complete
- Implement detailed change risk assessment for all database changes (even those identified as low risk).
- Add pre-change load/lock simulation harness (prod-like concurrency)
- Enforce execution windows for high-risk database changes (off-peak / maintenance window)