Unpacking LLM Performance Variability: Is Your AI Getting Tired at Night?
Users of large language models (LLMs) like Claude Opus frequently observe a noticeable drop in performance during specific periods, often correlating with peak usage times, such as evenings in the US Eastern time zone. This perceived degradation includes models getting stuck in unproductive "rabbit holes," suggesting overly aggressive or incorrect refactors that break code, and requiring significantly more time and user intervention to achieve desired results.
Theories on Performance Variation
The most prevalent explanation put forth by users is that LLM providers engage in dynamic load balancing. During high demand, it's theorized that they might transparently switch to less capable models, use lower-quality quantizations of the main model, or allocate less inference-time compute per request. This could manifest as a "model router" directing traffic to different versions based on load. Users have reported similar experiences with other models like Google's Gemini and OpenAI's GPT, suggesting a broader industry practice.
While companies like Anthropic have directly denied such practices, user anecdotes persist, with some noting that quality issues often appear after service interruptions or around major product announcements that drive user influx. The argument is that while direct model fallback might not occur, the underlying infrastructure's inability to consistently provide optimal compute under all load conditions inevitably leads to performance variability.
Some users also suggest that "internal quotas" might be at play, causing intelligence to drop abruptly and not recover even in new sessions. Another interesting observation links perceived quality drops in one model (e.g., Sonnet 4.5) to the imminent release of a newer, more powerful one (e.g., Opus 4.5), fueling theories of strategic model routing or resource allocation.
It's important to note that many of these observations are anecdotal, and calls for systematic, controlled benchmarks that run consistently over time to gather concrete data remain unmet. Some suggest that perceived degradation might also be partly due to user fatigue, affecting their ability to detect issues and their patience levels.
Strategies for Mitigating Performance Drops
When facing what appears to be a degraded model performance, several user-suggested tactics have shown some success:
- Be explicit with instructions: Clearly state requirements for minimal, localized changes rather than broad refactors.
- Control refactoring: Instruct the model not to refactor code unless explicitly told to do so or if absolutely necessary.
- Break down tasks: Divide complex requests into smaller, more manageable steps, and consider "locking in" earlier decisions to prevent the model from revisiting them.
- Switch models: If one model is performing poorly, try prompting an alternative model (e.g., Codex, Gemini) and return to the primary model later.
- Manage context: Some users speculate that specific "poison pill files" or an overly large context window might contribute to poor performance, suggesting it's worth examining the input context when issues arise.
These strategies aim to guide the model more tightly, reducing its propensity for exploratory, less productive behaviors, especially when it might be operating under suboptimal conditions.