Skip to content

Playbook: Analysis Failures

When to use: Customer reports “analysis not completing”, “stuck in progress”, or AI quality issues.

MCP: get_ai_analytics
workspace_id: "<workspace_id>"

Look for elevated error rates, increased latency, or low completion rates.

MCP: remote_debug
workspace_id: "<workspace_id>"
command: "analysis_status"
MCP: remote_debug
workspace_id: "<workspace_id>"
command: "recent_analyses"
params: { "limit": 10 }

Look for patterns — all failing, specific incident types failing, intermittent failures.

SymptomLikely CauseFix
All analyses failing with timeoutLLM provider degradedCheck provider status page, wait or switch provider
Analyses stuck “in_progress”Worker process crashedRestart analysis engine pods
Partial results, low qualityContext window exceededCheck incident group size, may need log filtering
Rate limit errorsToo many concurrent analysesCheck concurrency settings, consider queue backpressure

If the LLM provider is degraded:

  1. Check the provider status page
  2. Review error codes in AI analytics (429 = rate limit, 503 = provider down)
  3. If persistent, consider temporarily increasing timeout or switching provider

Collect AI analytics output + recent analysis IDs and escalate to the engineering team.