Documentation Index
Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
Applies to:
- Plan -
- Deployment -
Summary
Issue: Requests to a model with a large context window return acontext_length_exceeded error with a smaller token limit than expected.
Cause: Braintrust load balances across all providers that support a matching model name. If there are multiple providers with matching model names, requests can silently use secondary provider that may have a different underlying model with smaller limits.
Resolution: Disable/remove secondary providers that don’t have the correct model deployed or have different limits.
Resolution steps
Step 1: Confirm which provider handled the request
Open the failed run’s trace in Braintrust and check:- The purple chat completion span (not the root span)
metadata.model— confirms which model ID was sent to the provider
Step 2: Fix provider configuration
Go to Settings > AI providers and do one of the following: Option A — Fix the primary provider: Update the configuration for your providers to match actual limits. Option B — Remove/disable additional providers with matching models: Remove or disable any providers that don’t have the correct model deployed, or that map to older/smaller versions of the model. Option C — Check custom model mappings: If using custom endpoints or Azure deployments, verify that the deployment name actually corresponds to the model you expect.Note: Braintrust automatically load balances across all providers where a model name matches. Any provider with a matching model name will receive requests, even if the underlying deployment differs.