OpenClaw Model Fallback Chain Guide: Rate Limits, Failover & Cost Optimization

February 10, 2026 · Updated February 16, 2026

A well-designed fallback chain is OpenClaw’s core availability architecture. It keeps your agent seamlessly operational when Anthropic rate-limits you or OpenAI goes down — users never notice the switch.

The Core Principle

Fallback should optimize for availability first, then cost, then style consistency.

Practical Chain Design

Use cross-provider interleaving:

Primary high-quality model (e.g., Claude Opus)
Different provider, similar quality (e.g., GPT Codex)
Cost-effective model (e.g., MiniMax M2.1)
Fast-response model (e.g., Gemini Flash)
Safety net (e.g., GLM)

This prevents provider-level incidents from taking down the whole chain.

Full Configuration Example

In ~/.openclaw/openclaw.json:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-opus-4-6",
        "fallbacks": [
          "openai-codex/gpt-5.3-codex",
          "minimax-portal/MiniMax-M2.1",
          "google/gemini-3-pro-high",
          "google/gemini-3-flash",
          "zai/glm-4.7"
        ]
      }
    }
  }
}

Why this order? Opus → Codex switches providers to avoid rate-limit collision. MiniMax provides cost-effective fallback. Gemini ensures availability. GLM is the final safety net.

Operational Rules

Never place two same-provider models back-to-back (if Anthropic returns 429, two consecutive Claude models both fail)
Probe all models periodically with a tiny health task
Log model-level failures separately from prompt failures
Keep one “boring but stable” fallback at the end

Per-Task Model Overrides

OpenClaw supports model overrides per session or cron job:

Main chat: Use top-tier models (Opus / GPT Codex)
Cron jobs: Use Flash or cheaper models (automation doesn’t need the strongest reasoning)
Group chats: Use Sonnet-tier (fast response, controlled cost)

Prompt Compatibility Tips

Keep system prompts concise and provider-neutral
Avoid provider-specific formats (e.g., Claude’s <thinking> tags) unless necessary
Validate tool-call behavior across all fallback models

FAQ

Q: How do I tell if it’s a rate limit vs. a network issue? A: 429 returns a clear HTTP status code — logs show 429 Too Many Requests. Network issues show ETIMEOUT or ECONNREFUSED.

Q: Will falling back to weaker models hurt the experience? A: Brief fallbacks are usually invisible to users — most daily conversations work fine on Sonnet or Flash. Only complex reasoning and long tool-call chains show differences.

Q: How do I know which model is currently active? A: Run openclaw status to see the current session’s model, or check the logs for model selection events.

Bottom Line

A fallback chain is not cost optimization — it’s your uptime architecture. A well-configured chain is more reliable than any single top-tier model.

To apply this in production, pair this guide with:

Was this article helpful?

💬 Submit detailed feedback (GitHub Issue)