6/5/2026» 1@bot_blogs #blog #bot #brtkwr_com ~brtkwr.com

Notes on a stale provider cooldown in OpenClaw

https://brtkwr.com/posts/2026-06-05-openclaw-stale-provider-cooldown/ via brtkwr.com

6/5/2026» 0parent @bot_reader #blog #bot #brtkwr_com

# [Notes on a stale provider cooldown in OpenClaw](https://brtkwr.com/posts/2026-06-05-openclaw-stale-provider-cooldown/)
My OpenClaw gateway went silent for three days after a usage spike, even though I could still chat to the same provider via its web interface normally. The API was serving requests, but OpenClaw had stored a “next reset in 6 days” message as a literal blockedUntil timestamp and refused to try the profile again. Without a fallback model configured, the probe-during-cooldown code never fires, so the profile stays blocked. Cleared the field in auth-state.json and restarted. I’ve added an hourly watchdog; not yet seen it fire.
## Motivation Link to heading
Family chat went quiet and daily cron jobs stopped posting, though the gateway was up. The provider’s weekly cap in openclaw models status showed something like Week 44% left ⏱5d 17h, so there was credit available. I could open the provider’s web app in a browser and chat normally. But every cron run in the gateway log showed decision=skip_candidate ... Provider <id> is in cooldown (suspending lanes).
Last time this happened the heartbeat was firing every 30 minutes. The heartbeat was off this time (the trajectory log had zero events for three days), so this wasn’t a runaway client. The block was local state.
## Tracing where the belief is stored Link to heading
A direct capability call worked:
openclaw infer model run --prompt "say hello in one word" # Hello
So the API was fine. The belief had to live somewhere local. It was in auth-state.json:
{ "usageStats": { "<provider>:<account>": { "blockedUntil": 1780846982712, "blockedReason": "subscription_limit", "blockedSource": "wham", "errorCount": 1, "failureCounts": { "rate_limit": 1 }, "lastFailureAt": 1780401970719 } } }
That blockedUntil decodes to four days in the future. It was set when the upstream returned You've reached your subscription usage limit. Next reset in 6 days, Jun 7 at 3:43 PM UTC during the original spike. OpenClaw stored the timestamp verbatim. The provider’s weekly cap is a rolling window. The cap recovers as the oldest usage ages out of the window, not in one step at the “next reset” time. The API served small requests against the partially-recovered cap days before that date.
## Why nothing tried to recover Link to heading
OpenClaw does have a probe-during-cooldown path. In model-fallback-DRgKirrj.js:
function shouldProbePrimaryDuringCooldown(params) { if (!params.isPrimary || !params.hasFallbackCandidates) return false; ... }
If a fallback model is configured, OpenClaw periodically retries the primary during cooldown, detects the recovery, and switches back. With fallbacks: [], the short-circuit on hasFallbackCandidates means no probe ever runs. The profile stays blocked until blockedUntil arrives, even if the API recovered the day after the failure.
There’s an open upstream issue (openclaw/openclaw#54278, filed in March) describing exactly this pattern. It proposes a separate quota_wait state with periodic probing regardless of fallback configuration. Not implemented yet.
## The fix Link to heading
Two things, neither of which requires adding a fallback.
First, clear the stale block:
import json p = "/path/to/.openclaw/agents/main/agent/auth-state.json" state = json.load(open(p)) prof = state["usageStats"]["<provider>:<account>"] for k in ["blockedUntil", "blockedReason", "blockedSource", "errorCount", "failureCounts"]: prof.pop(k, None) json.dump(state, open(p, "w"), indent=2)
Restart the gateway and the profile is callable again.
…