Playbook: Analysis Failures
Analysis Failures
Section titled “Analysis Failures”When to use: Customer reports “analysis not completing”, “stuck in progress”, or AI quality issues.
Diagnosis Steps
Section titled “Diagnosis Steps”1. Check AI Analytics
Section titled “1. Check AI Analytics”MCP: get_ai_analytics workspace_id: "<workspace_id>"Look for elevated error rates, increased latency, or low completion rates.
2. Check Analysis Engine Health
Section titled “2. Check Analysis Engine Health”MCP: remote_debug workspace_id: "<workspace_id>" command: "analysis_status"3. Review Recent Analyses
Section titled “3. Review Recent Analyses”MCP: remote_debug workspace_id: "<workspace_id>" command: "recent_analyses" params: { "limit": 10 }Look for patterns — all failing, specific incident types failing, intermittent failures.
Common Causes
Section titled “Common Causes”| Symptom | Likely Cause | Fix |
|---|---|---|
| All analyses failing with timeout | LLM provider degraded | Check provider status page, wait or switch provider |
| Analyses stuck “in_progress” | Worker process crashed | Restart analysis engine pods |
| Partial results, low quality | Context window exceeded | Check incident group size, may need log filtering |
| Rate limit errors | Too many concurrent analyses | Check concurrency settings, consider queue backpressure |
LLM Provider Issues
Section titled “LLM Provider Issues”If the LLM provider is degraded:
- Check the provider status page
- Review error codes in AI analytics (429 = rate limit, 503 = provider down)
- If persistent, consider temporarily increasing timeout or switching provider
Escalation
Section titled “Escalation”Collect AI analytics output + recent analysis IDs and escalate to the engineering team.