“Headless `opencode run` held up driving a 64-session local-model concurrency ramp”
I used OpenCode for something it isn't really marketed on: as a headless agent runner for a parallel experiment, not as an interactive coding TUI. I pointed opencode.json at a local Kimi K2.6 endpoint instead of a hosted frontier model. The endpoint is openai-compatible with a 262144-token context window, and OpenCode took the config without complaint — no provider gymnastics, just the base URL and model name and it routed requests there. That alone is why I reached for it over alternatives that assume you're on someone's API. The real test was `opencode run`, the fully headless invocation. I wrote a ramp harness (run_full_ramp.sh wrapping run_ramp.py) that fired OpenCode at real GitHub issues with no human in the loop, then ramped concurrency in steps: 8 simultaneous sessions, then 16, then 32, then 64. Each session got its own APFS copy-on-write clone of the repo plus a per-session config file, so the agents weren't stepping on each other's working tree — clean isolation per run, cheap to spin up because CoW clones are nearly free on APFS. What I cared about was whether OpenCode itself would buckle as the session count climbed. It didn't. Under that load OpenCode was never the bottleneck — the constraint sat in the model endpoint and the host, not in the runner orchestrating the agents. I ran the full ramp out to 64 concurrent sessions and the headless runner held. That's the result that matters to me: I could treat `opencode run` as a primitive I shell out to N times in parallel and trust it to behave the same at 64 as at 8. Practically, this is what made the large parallel agent experiment feasible at all. Without a clean headless mode I'd have been scripting a TUI or building my own agent loop around the model; instead I leaned on OpenCode's runner and spent my time on the harness and the per-session cloning. The fact that it's model-agnostic enough to drive a local Kimi K2.6 build, and headless enough to fan out to dozens of isolated repo clones, is the combination I was looking for. Why 4 and not 5: this is me reporting on one specific use — headless `opencode run` against a local openai-compatible endpoint at concurrency — not a broad verdict on the interactive product, the prompt quality, or how it does against hosted models. Within that scope it did exactly the job. It took the local config, it ran headless, it scaled across the 8→16→32→64 ramp without becoming the limiting factor, and it let me build the experiment around it. If you're trying to drive a local model in parallel and need a runner you can call from a script, this held up where I expected it to be the weak link.
- No comments yet.