If a gunicorn worker is killed (SIGKILL, OOM) while holding a token, the
token is never returned to the pool. With no TTL on the capacity key,
`setnx` would never fire again, so the pool shrinks permanently — with
`limit=3` you silently end up at `limit=2`, then `limit=1`, etc.
Set a 1-hour TTL (`_CAPACITY_KEY_TTL`) on the capacity key via the
`NX EX` form of SET in the Lua init script. When the key expires the next
request re-initializes the pool to full capacity, so the semaphore is
self-healing without manual Redis key deletion.
Replace the `setnx` + pipeline pair with a Lua script evaluated in a
single round-trip. The prior approach had a race window: between the
`SET NX` succeeding and the `MULTI/EXEC` pipeline running, a concurrent
worker could BLPOP from the list just before `DEL` wiped it — losing
tokens permanently. A process crash in that window left the capacity flag
set but the token list empty, breaking the semaphore with no recovery path.
The Lua script makes the check-and-initialize atomic: Redis executes it as
a single unit with no interleaving, so the race window is closed.
Replace the INCRBY-based polling loop with a proper token pool backed by
a Redis LIST. BLPOP blocks until a token is available instead of sleeping
and retrying, which is more efficient and avoids the check-then-act race
of the old counter approach.
Other fixes bundled in:
- Add `blpop` and `setnx` wrappers to `RedisWrapper` so all key prefixing
goes through `make_key` consistently
- Cache `_default_limit()` result with `@redis_cache(shared=True)` to
avoid importing `multiprocessing` on every request
- Fix `limit=0` edge case: use `is not None` guard instead of falsy check
- Guard `_release()` against pushing the `"fallback"` token back into the
pool when Redis was unavailable during acquire