date: 2026-06-10 tags: [inbox, project/cli-agent, type/design-note] public: true

lh Local System Shell IO Buffer 优化方案

背景

关联实验：lh-shell-io-buffer-pressure-experiment

关联旁线：lh-shell-process-lifecycle-experiment

目标：为 lh connect 的 Local System runCommand 输出管理设计 bounded memory 方案，避免 shell stdout/stderr 随历史输出量线性进入 Node/V8 heap。

目标不变量：

Memory(lh connect) = base + O(active_sessions * retained_cap) + O(metadata)

不应出现：

Memory(lh connect) = base + O(total_output_bytes)

Claude Code 方案记录

总体判断

Claude Code 普通 Bash/PowerShell 命令默认走 file mode：stdout 和 stderr 直接写入输出文件 fd，父进程 JS 不监听 stdout/stderr stream，也不把完整输出持续放入 JS heap。

它保留了一个 pipe mode，但当前普通 BashTool/PowerShellTool 调用没有传 onStdout，因此不会走 pipe mode。onProgress 不等于 onStdout：onProgress 仍然可以在 file mode 下通过 poll 文件 tail 实现。

模式选择

关键位置：

../claude-code/src/utils/Shell.ts 中 ExecOptions.onStdout 是内部选项。
usePipeMode = !!onStdout。
new TaskOutput(taskId, onProgress ?? null, !usePipeMode)。

含义：

条件	模式	stdout/stderr 去向
`onStdout` 存在	pipe mode	进入 Node stream，再进入 `TaskOutput`
`onStdout` 不存在	file mode	直接写输出文件 fd

当前普通 BashTool/PowerShellTool 只传 onProgress，没有传 onStdout。因此默认是 file mode。

文件 fd 写入

file mode 下，Claude Code 先打开 taskOutput.path：

outputHandle = await open(
  taskOutput.path,
  process.platform === 'win32'
    ? 'w'
    : fsConstants.O_WRONLY |
        fsConstants.O_CREAT |
        fsConstants.O_APPEND |
        O_NOFOLLOW,
)

然后 spawn 时把 stdout/stderr 都接到同一个 fd：

stdio: usePipeMode
  ? ['pipe', 'pipe', 'pipe']
  : ['pipe', outputHandle?.fd, outputHandle?.fd],

这意味着：

stdin 仍是 pipe。
stdout 是 output file fd。
stderr 也是同一个 output file fd。
child process 直接向 fd 写 bytes。
JS 侧没有 childProcess.stdout.on('data') 热路径。

spawn 后，父进程关闭自己的 fd copy。child 已经继承 fd，仍可继续写。

为什么不会进入 JS heap

ShellCommandImpl 只在 childProcess.stdout / childProcess.stderr 存在时创建 StreamWrapper。

file mode 下 stdout/stderr 已经被设置为文件 fd，因此 childProcess.stdout 和 childProcess.stderr 是 null，不会创建 StreamWrapper。

因此普通 Bash 输出不会经过：

data event -> Buffer -> data.toString() -> JS string -> array retention

这是它相比当前 lh 的核心差异。

Progress 如何实现

file mode 下，progress 不是读 stdout stream，而是定时读输出文件 tail。

TaskOutput.startPolling(taskId) 会把 task 加入 active polling。poller 每秒读取 PROGRESS_TAIL_BYTES = 4096 的 tail，然后从 tail 中估算：

最近 5 行。
最近 100 行。
total lines。
total bytes。
当前内容是否 incomplete。

这让 UI/agent 能看到进度，同时避免把完整输出加载进内存。

结果如何返回

命令结束后，ShellCommandImpl 调用：

const stdout = await this.taskOutput.getStdout()

file mode 下 getStdout() 会调用 #readStdoutFromFile()，使用 readFileRange(this.path, 0, maxBytes) 读取 bounded range。

如果输出文件完整读入：

outputFileRedundant = true。
stdout 直接放进 tool result。
输出文件可删除。

如果输出文件过大：

stdout 只包含 bounded preview。
result 带 outputFilePath、outputFileSize、outputTaskId。
上层再构造 persisted output 消息，让模型知道完整输出在文件里。

Background 如何处理

background 后，file mode 仍然保持 child 直接写文件 fd。因为 foreground timeout 不再约束任务，Claude Code 增加了 size watchdog。

watchdog 每 5 秒 stat 输出文件。如果 background command 的输出文件超过 MAX_TASK_OUTPUT_BYTES = 5GB，就 SIGKILL 进程，避免无限日志把磁盘打满。

这说明 Claude Code 的 file mode 解决了 JS heap 问题，但把风险转移到了磁盘容量，所以它必须有 disk cap 和 watchdog。

可借鉴点

热路径绕开 JS：stdout/stderr 直接写 fd，避免 data.toString() 和 JS string 保留完整历史。
stdout/stderr 可合并到同一个 chronologically interleaved 输出文件，适合人读和 tail preview。
progress 只读 tail，内存成本和输出总量解耦。
result 只返回 bounded preview，大输出用文件引用表达。
background 高输出需要 disk cap，否则会从 heap OOM 变成 disk fill。
输出目录需要 session 维度隔离，避免多会话互相删除或覆盖。
文件打开需要考虑 symlink 安全，例如 POSIX 下使用 O_NOFOLLOW。

映射到 lh 的设计问题

如果 lh 采用 Claude Code 风格的 file mode，需要定稿以下语义：

输出文件位置和权限。
remote device 场景下文件路径如何暴露。
stdout/stderr 是否要分开。
getCommandOutput 是否返回 delta、tail、还是 file reference。
session 清理时是否删除输出文件。
foreground/background 命令是否统一建立 output file。
输出文件的 disk cap、超限行为和错误信息。

可直接采用的能力：

stdout/stderr 热路径直接写文件或尽早落盘。
getCommandOutput 返回 tail/preview，不读取完整历史。
返回 output_file_path、total_bytes、preview_truncated 等元数据。
增加 per-session disk cap 和 cleanup 策略。

Codex 方案记录

总体判断

Codex exec server 采用 bounded retained buffer：输出仍经过 exec server 内存，但每个 process 只保留最近固定字节数，并用 seq / after_seq / max_bytes 定义读取协议。

这套方案不依赖输出文件路径，适合 command/session/output polling 模型。它的核心语义是：server 只保证最近窗口内的输出可读，不保证完整历史输出一直可读。

输出保留结构

Codex 为每个 running process 维护 retained output queue：

const RETAINED_OUTPUT_BYTES_PER_PROCESS: usize = 1024 * 1024;

struct RetainedOutputChunk {
    seq: u64,
    stream: ExecOutputStream,
    chunk: Vec<u8>,
}

struct RunningProcess {
    output: VecDeque<RetainedOutputChunk>,
    retained_bytes: usize,
    next_seq: u64,
}

含义：

seq 是单调递增序号。
stream 区分 stdout、stderr 或 pty。
chunk 保存原始 bytes。
retained_bytes 记录当前保留窗口大小。
VecDeque 支持从尾部追加新 chunk，从头部淘汰旧 chunk。

写入时按 retained cap 淘汰

stdout/stderr 来一个 chunk 时，Codex 分配 seq，追加到 output，累计 retained_bytes。如果超过 RETAINED_OUTPUT_BYTES_PER_PROCESS，就从队头淘汰旧 chunk：

let seq = process.next_seq;
process.next_seq += 1;
process.retained_bytes += chunk.len();
process.output.push_back(RetainedOutputChunk {
    seq,
    stream,
    chunk: chunk.clone(),
});

while process.retained_bytes > RETAINED_OUTPUT_BYTES_PER_PROCESS {
    let Some(evicted) = process.output.pop_front() else {
        break;
    };
    process.retained_bytes = process.retained_bytes.saturating_sub(evicted.chunk.len());
}

这个策略把内存成本限制为：

O(active_processes * retained_cap)

不会随完整历史输出线性增长。

读取协议

Codex read 参数包含：

pub struct ReadParams {
    pub process_id: ProcessId,
    pub after_seq: Option<u64>,
    pub max_bytes: Option<usize>,
    pub wait_ms: Option<u64>,
}

读取含义：

返回 process_id 对应 process 中 seq > after_seq 的 retained chunks，最多返回 max_bytes。

实现上，server 遍历 retained buffer：

for retained in process.output.iter().filter(|chunk| chunk.seq > after_seq) {
    let chunk_len = retained.chunk.len();
    if !chunks.is_empty() && total_bytes + chunk_len > max_bytes {
        break;
    }
    total_bytes += chunk_len;
    chunks.push(ProcessOutputChunk {
        seq: retained.seq,
        stream: retained.stream,
        chunk: retained.chunk.clone().into(),
    });
    next_seq = retained.seq + 1;
    if total_bytes >= max_bytes {
        break;
    }
}

返回结果带 next_seq。调用方下一次读取时可以用新的 cursor 继续读。

方案语义

Codex 方案的输出语义是 retained window：

process 可以持续产生任意大小输出。
server 只保留最近 N bytes。
read 只返回 retained window 中满足 after_seq 的内容。
单次 read 可由 max_bytes 限制。
如果调用方长时间不读，早期输出可能已经被淘汰。

因此需要把输出完整性作为显式元数据表达。适配 lh 时可以考虑返回：

total_stdout_bytes。
total_stderr_bytes。
retained_stdout_bytes。
retained_stderr_bytes。
dropped_stdout_bytes。
dropped_stderr_bytes。
oldest_retained_seq。
next_seq。
output_truncated 或 output_incomplete。

可借鉴点

用 byte cap 管理 retained output，而不是用 string array 保存完整历史。
用 seq cursor 替代 lastReadStdout / lastReadStderr 数组下标。
在组装响应前应用 max_bytes，避免先 join 完整历史再 truncate。
stdout/stderr 可以共用同一个 chunk queue，用 stream 字段区分。
read 可以支持 wait_ms，让 polling 具有 long poll 语义。
即使 agent 不读取输出，connect 内存也受 retained cap 控制。

映射到 lh 的设计问题

如果 lh 采用 Codex 风格的 retained buffer，需要定稿以下语义：

retained cap 是 per stream、per session，还是 global budget。
cap 单位是 UTF-8 bytes、Buffer bytes，还是 JS string length。
getCommandOutput 保持当前无 cursor API，还是暴露 after_seq。
当前 lastReadStdout / lastReadStderr 是否改成 lastReadSeq。
当调用方 cursor 落后于 oldest_retained_seq 时，返回 warning 还是 error。
stdout/stderr 是分开 queue，还是一个 interleaved queue。
filter 是对 retained preview 过滤，还是对完整输出过滤。

可直接采用的能力：

每个 shell session 维护 bounded retained chunks。
chunk 保留原始 bytes 或至少记录准确 byte length。
getCommandOutput 返回 bounded delta，不拼接完整历史。
返回 dropped/truncated/total bytes 元数据。
background session 输出继续流入 retained buffer，但内存不随历史输出增长。

opencode 方案记录

总体判断

opencode shell tool 采用 pipe stream 加 truncation file 的方案：stdout/stderr 进入 JS stream，被合流读取；输出超过配置限制后，完整输出写入 truncation file，tool result 返回 tail preview 和文件路径。

这套方案不是 Claude Code 的 fd 直写，也不是 Codex 的 retained window protocol。它更像：

pipe stdout/stderr -> JS stream 合流读取 -> 超限后写 truncation file -> result 返回 tail + 文件路径

命令执行方式

opencode shell tool 在 cmd() 中创建 command：

return ChildProcess.make(command, [], {
  shell,
  cwd,
  env,
  stdin: "ignore",
  detached: process.platform !== "win32",
})

这里没有把 stdout/stderr 配成文件 fd。Effect spawner 默认 stdout/stderr 是 pipe。shell tool 后续读取 handle.all，也就是 stdout/stderr merge 后的 stream。

输出收集结构

shell run 中维护了几组状态：

const keep = limits.maxBytes * 2
let full = ""
let last = ""
const list: Chunk[] = []
let used = 0
let file = ""
let sink: ReturnType<typeof createWriteStream> | undefined
let cut = false

含义：

full：超限前累积的完整输出。
last：metadata/progress preview，限制到 MAX_METADATA_LENGTH = 30000。
list：最近输出 chunk 列表，用于最终 tail preview。
used：list 当前保留 bytes。
keep：list 的保留上限，等于 limits.maxBytes * 2。
file：truncation file path。
sink：超限后的 append stream。
cut：输出是否已经截断。

流式读取和 tail 保留

opencode 读取合流输出：

Stream.runForEach(Stream.decodeText(handle.all), (chunk) => {
  const size = Buffer.byteLength(chunk, "utf-8")
  list.push({ text: chunk, size })
  used += size
  while (used > keep && list.length > 1) {
    const item = list.shift()
    if (!item) break
    used -= item.size
    cut = true
  }

  last = preview(last + chunk)
  ...
})

这部分保证最终结果至少有一个 bounded tail buffer。list 不保留完整历史，而是按 keep 淘汰旧 chunk。

超限后写 truncation file

当 full 超过 limits.maxBytes 时，opencode 将已有完整输出写入 truncation file，然后后续 chunk append 到同一个文件：

full += chunk
if (Buffer.byteLength(full, "utf-8") > limits.maxBytes) {
  return trunc.write(full).pipe(
    Effect.andThen((next) =>
      Effect.sync(() => {
        file = next
        cut = true
        sink = createWriteStream(next, { flags: "a" })
        full = ""
      }),
    ),
  )
}

后续 chunk：

if (file) {
  sink?.write(chunk)
}

因此 opencode 在超限后保留完整输出文件，同时 JS 内存里主要保留最近 tail 和 metadata preview。

Truncation 配置和清理

默认限制在 truncate.ts：

export const MAX_LINES = 2000
export const MAX_BYTES = 50 * 1024
const RETENTION = Duration.days(7)

配置可通过 tool_output 覆盖：

{
  "tool_output": { "max_lines": 200, "max_bytes": 8192 }
}

truncation file 写入 TRUNCATION_DIR，定期清理超过 retention 的旧文件。

结果如何返回

命令结束后，opencode 从 list 生成 tail preview：

const raw = list.map((item) => item.text).join("")
const end = tail(raw, limits.maxLines, limits.maxBytes)
if (end.cut) cut = true
if (!file && end.cut) {
  file = yield* trunc.write(raw)
}

let output = end.text
if (cut && file) {
  output = `...output truncated...\n\nFull output saved to: ${file}\n\n` + output
}

metadata 中也标记：

metadata: {
  output: last || preview(output),
  exit: code,
  description: input.description,
  truncated: cut,
  ...(cut && file ? { outputPath: file } : {}),
}

测试覆盖了：

超过 line limit 会截断。
超过 byte limit 会截断。
小输出不截断。
截断时完整输出保存到文件。

可借鉴点

shell prompt 明确告诉模型：输出超限会保存到文件，不要自己用 head / tail / pager 截断。
超限后保留完整输出文件，模型可以用 Read offset/limit 或 Grep 检索。
result 中同时返回 tail preview 和 full output path。
metadata 单独保留 bounded progress preview，避免 UI/progress 消耗完整输出。
tail 计算同时考虑 max lines 和 max bytes。
truncation file 有统一目录和 retention cleanup。

映射到 lh 的设计问题

如果 lh 采用 opencode 风格的 pipe plus truncation file，需要定稿以下语义：

stdout/stderr 是合流保存，还是分开保存。
超限阈值按 max lines、max bytes，还是两者都支持。
超限前是否允许 full 在 JS 内存中增长到阈值。
超限后 append file 的错误如何反馈给 getCommandOutput。
返回给 agent 的 output 是 head preview、tail preview，还是 delta preview。
full output path 在 local system/gateway/remote device 场景下如何访问。

可直接采用的能力：

输出超限后写 truncation file。
tool result 中返回 Full output saved to: <file>。
metadata 返回 truncated 和 outputPath。
prompt 明确告诉模型可用 Read/Grep 查完整输出。
tail preview 同时受 max lines 和 max bytes 约束。

本地 CLI 构建前置条件

当前本机 lh 解析结果：

which lh
# /home/cy948/.bun/bin/lh

type -a lh
# lh is /home/cy948/.bun/bin/lh
# lh is /home/cy948/.bun/bin/lh
# lh is /home/cy948/.local/share/pnpm/lh

因此实验时默认使用 bun link 到本地 workspace 的 CLI。修改 packages/local-file-shell 后，需要重建并重新 link apps/cli：

cd apps/cli
bun run build
bun run cli:link
which lh
lh --version

验证时必须记录：

当前 git branch 和 commit。
variant 名称。
which lh 输出。
lh --version 输出。
apps/cli/dist/index.js 的 mtime 或 hash。
Node old space 限制参数。当前实验固定使用 NODE_OPTIONS=--max-old-space-size=512 模拟 1G system / container 下的 V8 heap 红线。
gateway/server 端口和 graph agent env 开关。

横评实验实现准备

实验代码统一使用 git 分支管理。expr/local-system-io-buffer/** 是分支命名空间，不是 repo 内目录。

canary
expr/local-system-io-buffer/fd-direct
expr/local-system-io-buffer/tail-spool

三个 variant 的含义：

variant	管理模型	目标
`baseline-current`	`canary` 上当前 `stdout: string[]`、`stderr: string[]`	作为问题复现和对照基线，不需要改代码
`fd-direct`	stdout/stderr 直接写 output fd，JS 侧只读 tail/range	验证 Claude Code 风格是否能让 heap 和总输出量解耦
`tail-spool`	stdout/stderr 仍进 JS stream，内存只留 bounded tail，超限写 spool file	验证 opencode 风格在保留当前协议形态下的收益

分支使用方式：

git switch canary

git switch -c expr/local-system-io-buffer/fd-direct
# 直接修改 packages/local-file-shell/src/shell/*

git switch canary
git switch -c expr/local-system-io-buffer/tail-spool
# 直接修改 packages/local-file-shell/src/shell/*

每个实验分支直接在原有代码路径上改动，避免维护一份脱离生产路径的 copy。横评时使用同一套 smoke command 和同一套采样脚本，结果按 branch 名称归档。

核心指标：

output_bytes_total：shell 实际输出总 bytes。
rss_mb、heap_used_mb、external_mb、array_buffers_mb。
delta_heap_per_output_mb：每 1MB 输出带来的 heap 增量。
delta_rss_per_output_mb：每 1MB 输出带来的 RSS 增量。
retained_bytes_in_memory：Local System 进程中被设计允许保留的输出 bytes。
spooled_file_bytes：落盘输出 bytes。
dropped_or_truncated_bytes：无法直接返回给 agent 的历史 bytes。
event_loop_delay_p95：输出压力下的主循环延迟。
tool_call_latency_ms：getCommandOutput 或 tool call 的响应延迟。
oom：是否触发进程 OOM。
gateway_ws_1006：gateway 是否出现 abnormal close。

关键判定：

理想：delta_heap_per_output_mb 接近 0，heap 主要随 retained cap 和 active sessions 增长。
可接受：RSS 有管理成本和文件 IO buffer 波动，但不随 total_output_bytes 线性增长。
失败：heap_used_mb 或 RSS 随 total_output_bytes 近似线性增长，并在 old space 限制下 OOM。

hard gate：

实验环境：不启动 Docker，直接用 NODE_OPTIONS=--max-old-space-size=512 启动 lh connect。

必须通过：
- hard cases 跑完后 lh connect 不发生 V8 fatal OOM。
- lh connect 不退出，不导致 gateway ws close abnormal 1006。
- peak heap_used_mb 不超过 512 MiB 红线。

真实观测：
- 曲线绘制使用真实采样值，不预设目标曲线。
- `heap_used_mb` 图只标注 512 MiB 红线。
- `rss_mb`、`external_mb`、`array_buffers_mb`、`output_bytes_total`、`delta_heap_per_output_mb`、`delta_rss_per_output_mb` 都按原始数据展示。
- proto 评估时再比较 baseline-current、fd-direct、tail-spool 的曲线形态、峰值、斜率和 hard edge case 通过情况。

hard edge cases：

B4_stdout_4x20MB：并发背景 stdout 大输出。
C2_stderr_4x20MB：并发背景 stderr 大输出。
R2_log_stream_4x20MB：真实日志流形态。
B6_small_chunks_30MB：大量小 chunk，放大 JS string 和数组管理成本。
huge_single_line_30MB：单行超长输出，测试 line based tail 的边界。
slow_stream_120s：慢速长流，测试观察窗口和 polling。
no_poll_background：background 期间不调用 getCommandOutput，测试无人消费时内存是否受控。
poll_after_spool：超限落盘后再 poll，测试返回语义和文件引用。
kill_while_outputting：输出中途 kill，测试 fd/spool 清理和退出状态。
disk_cap_edge：输出文件接近 cap，测试磁盘保护和错误信息。

结果展示：

time to rss_mb / heap_used_mb / output_bytes_total 曲线。
output_mb to delta_heap_mb 曲线。
concurrency to peak_rss_mb 曲线。
每个 variant 的 hard edge case 0/1 表。

TODO

canary baseline：在 NODE_OPTIONS=--max-old-space-size=512 下跑基线实验，记录曲线、hard gate、gateway ws 1006。
expr/local-system-io-buffer/tail-spool：从 canary 创建分支，直接修改 packages/local-file-shell/src/shell/* 实现 bounded tail plus spool file。
tail-spool 实验：重建并 link apps/cli，用同一组 case 跑实验，记录曲线、hard gate、gateway ws 1006。
expr/local-system-io-buffer/fd-direct：从 canary 创建分支，直接修改 packages/local-file-shell/src/shell/* 实现 stdout/stderr 直写 output fd。
fd-direct 实验：重建并 link apps/cli，用同一组 case 跑实验，记录曲线、hard gate、gateway ws 1006。
横评汇总：比较 baseline-current、tail-spool、fd-direct 的曲线形态、峰值、斜率和 hard edge case 通过情况。
完成后调用 ../lobe-search-eval 脚本中的 notify 通知用户。

实验记录

baseline-current / canary / R2 / 512 MiB

环境：

branch: canary
commit: 65ba08668
lh: /home/cy948/.bun/bin/lh
lh version: 0.0.24
server: http://localhost:3210
device gateway: http://localhost:8787
agent gateway: http://localhost:8788
NODE_OPTIONS: --max-old-space-size=512 --inspect=127.0.0.1:9320
case: R2-real-log-flood, 4 background commands, each emits 20 MiB stdout then sleeps

结果：

tool calls: 4/4 runCommand success
shell ids: sh-1, sh-2, sh-3, sh-4
V8 fatal OOM: no
lh connect exited: no
gateway ws 1006: not observed in connect pane
hard gate: pass for this R2 size under 512 MiB

采样说明：

long-lived inspector sampler 会被 Node inspector session 结束影响，未完整覆盖输出上升段。
改用短连接 poll sampler 后得到稳定 post-output 采样。

post-output 采样摘要：

sample file: /tmp/lh-shell-io-buffer/baseline-current-r2-512-poll-post.jsonl
rows: 12
duration: 5.576s
heapUsed: 90.81 MiB to 90.83 MiB
heapTotal: 103.51 MiB
rss: 209.54 MiB
external: 15.83 MiB
arrayBuffers: 0.12 MiB

另一组 post-output 采样：

sample file: /tmp/lh-shell-io-buffer/baseline-current-r2-512-post.jsonl
rows: 10
duration: 4.536s
heapUsed: 94.31 MiB to 94.33 MiB
heapTotal: 211.01 MiB
rss: 220.45 MiB to 220.58 MiB
external: 36.45 MiB
arrayBuffers: 20.74 MiB

解释：

R2 四路 20 MiB 在 512 MiB old space 下没有触发 OOM。
但 baseline 的输出内存仍然和 total_output_bytes 同量级相关：输出后 heapUsed 约 91 到 94 MiB，高于启动空闲期约 17 MiB。
这个 case 对 baseline 是 pass，但不能证明 baseline 方案安全，只说明 80 MiB 总输出没有撞到 512 MiB 红线。
后续 proto 需要用同一 case 对比 post-output retained memory，并加大单任务大小或并发数看斜率。

实验旁线：

清理 R2 background commands 时，普通父进程清理无法覆盖已脱离父进程的 timeout/sh 孙进程。
最终需要按各自 process group 清理。
这再次确认 process lifecycle 问题应独立于 IO buffer 方案继续处理。

tail-spool / R2 / 512 MiB

环境：

branch: expr/local-system-io-buffer/tail-spool
base commit: 65ba08668
lh: /home/cy948/.bun/bin/lh
lh version: 0.0.24
apps/cli/dist/index.js mtime: 2026-06-10 16:44:36
server: http://localhost:3210
device gateway: http://localhost:8787
agent gateway: http://localhost:8788
NODE_OPTIONS: --max-old-space-size=512 --inspect=127.0.0.1:9320
case: R2-real-log-flood, 4 background commands, each emits 20 MiB stdout then sleeps

实现摘要：

stdout/stderr 仍然进入 Node stream。
每个 stream 使用 bounded tail chunks，默认 tailBytes 256 KiB。
超过 spoolThresholdBytes 512 KiB 后写入 /tmp/lobe-local-file-shell-output。
getCommandOutput 读取 retained tail，并在输出缺历史或落盘时提示 spool path。

验证：

bun run type-check: pass
packages/local-file-shell shell tests: 37 passed
apps/cli build: pass
apps/cli cli:link: pass

结果：

tool calls: 4/4 runCommand success
shell ids: sh-1, sh-2, sh-3, sh-4
V8 fatal OOM: no
lh connect exited: no
gateway ws 1006: not observed in connect pane
hard gate: pass for this R2 size under 512 MiB

采样摘要：

sample file: /tmp/lh-shell-io-buffer/debug-sample.jsonl
rows: 259
duration: 130.455s
heapUsed: 11.98 MiB to 12.59 MiB
heapTotal: 14.02 MiB
rss: 129.14 MiB to 129.89 MiB
external: 15.93 MiB
arrayBuffers: 0.12 MiB

spool 文件：

/tmp/lobe-local-file-shell-output/sh-1-stdout-1781081147854.log: 20.73 MiB
/tmp/lobe-local-file-shell-output/sh-2-stdout-1781081147867.log: 20.68 MiB
/tmp/lobe-local-file-shell-output/sh-3-stdout-1781081147873.log: 20.68 MiB
/tmp/lobe-local-file-shell-output/sh-4-stdout-1781081147882.log: 20.68 MiB

解释：

在同样 R2 四路 20 MiB 负载下，tail-spool 的 post-output heapUsed 保持在约 12 MiB。
输出主体进入 spool 文件，connect 进程内只保留 bounded tail。
与 baseline post-output heapUsed 约 91 到 94 MiB 相比，tail-spool 明显降低 retained heap。
这条 proto 证明：即使 stdout/stderr 仍经过 JS stream，只要不把完整历史保存在 JS heap，也能显著降低 retained memory。

fd-direct / R2 / 512 MiB

环境：

branch: expr/local-system-io-buffer/fd-direct
base commit: 65ba08668
lh: /home/cy948/.bun/bin/lh
lh version: 0.0.24
apps/cli/dist/index.js mtime: 2026-06-10 16:53:22
server: http://localhost:3210
device gateway: http://localhost:8787
agent gateway: http://localhost:8788
NODE_OPTIONS: --max-old-space-size=512 --inspect=127.0.0.1:9320
case: R2-real-log-flood, 4 background commands, each emits 20 MiB stdout then sleeps

实现摘要：

spawn stdio 设置为 stdout file fd 和 stderr file fd。
JS 侧不监听 childProcess.stdout/stderr data。
getCommandOutput 从 stdout/stderr output file 按 offset 读取新增内容。
单次读取超过 256 KiB 时返回 tail，并提示 full output saved to path。

验证：

bun run type-check: pass
packages/local-file-shell shell tests: 36 passed
apps/cli build: pass
apps/cli cli:link: pass

结果：

tool calls: 4/4 runCommand success
shell ids: sh-1, sh-2, sh-3, sh-4
V8 fatal OOM: no
lh connect exited: no
gateway ws 1006: not observed in connect pane
hard gate: pass for this R2 size under 512 MiB

采样摘要：

sample file: /tmp/lh-shell-io-buffer/fd-direct-r2-512-post.jsonl
rows: 104
duration: 52.089s
heapUsed: 16.13 MiB to 16.37 MiB
heapTotal: 17.52 MiB
rss: 121.43 MiB to 121.56 MiB
external: 15.93 MiB
arrayBuffers: 0.12 MiB

output 文件：

/tmp/lobe-local-file-shell-fd-output/sh-1-stdout-1781081658681.log: 20.00 MiB
/tmp/lobe-local-file-shell-fd-output/sh-2-stdout-1781081658687.log: 20.00 MiB
/tmp/lobe-local-file-shell-fd-output/sh-3-stdout-1781081658693.log: 20.00 MiB
/tmp/lobe-local-file-shell-fd-output/sh-4-stdout-1781081658702.log: 20.00 MiB

解释：

fd-direct 让 stdout/stderr 热路径绕开 JS stream，R2 输出主体直接进入文件。
post-output heapUsed 约 16 MiB，RSS 约 121 MiB。
与 tail-spool 相比，fd-direct 的 heapUsed 略高于本次 tail-spool post-output 样本，但 RSS 略低。
这条 proto 证明：完全绕开 JS stream 可以稳定通过 512 MiB hard gate，并把输出历史从 V8 heap 转移到文件系统。

R2 横评汇总

统一 case：

R2-real-log-flood
4 background runCommand
each command emits 20 MiB stdout then sleeps
total stdout bytes about 80 MiB
NODE_OPTIONS=--max-old-space-size=512

结果表：

variant	branch	commit	hard gate	post-output heapUsed	post-output RSS	输出历史去向
baseline-current	`canary`	`65ba08668`	pass	约 91 到 94 MiB	约 210 到 221 MiB	JS heap 中保留 stdout/stderr 历史
tail-spool	`expr/local-system-io-buffer/tail-spool`	`1424e9cda`	pass	约 12 MiB	约 129 MiB	JS stream 进入 bounded tail，完整输出写 spool file
fd-direct	`expr/local-system-io-buffer/fd-direct`	`c8f5f6add`	pass	约 16 MiB	约 121 MiB	stdout/stderr 直接写 output file fd

观察：

三组在 R2 80 MiB 总输出下都通过 512 MiB hard gate。
baseline-current 没有 OOM，但 post-output heapUsed 明显随输出历史增加。
tail-spool 和 fd-direct 都把 retained heap 降到约十几 MiB。
tail-spool 仍经过 JS stream，因此保留了当前 pipe 模型，改动较贴近现有协议。
fd-direct 绕开 JS stream 热路径，RSS 在本次样本中最低，但输出语义更依赖文件管理。

方案判断：

tail-spool 适合验证最小协议改造：保留当前 stdout/stderr 读取模型，主要修 buffer retention。
fd-direct 更接近 Claude Code：从根上避免 data event to Buffer to String to heap retention。
如果最终目标是强约束 Memory(lh connect) 不随 total_output_bytes 线性增长，两个 proto 都满足 R2 的第一轮证据。
若优先落地风险较低的改造，可以先产品化 tail-spool。
若优先追求热路径最稳和大输出极限，fd-direct 更值得继续打磨 disk cap、cleanup、权限和 path 语义。

采样限制：

Node inspector 长连接 sampler 在本地会话中不稳定，最终使用短连接 poll sampler。
部分 full-curve 样本没有覆盖完整输出上升段，因此本轮横评主要比较 post-output retained memory 和 hard gate。
下一轮如要画完整增长曲线，可以在 connect 内部临时暴露 memory sample log，或用外部 RSS 采样加 inspector 短连接采样组合。

下一轮增长曲线实验

当前状态：

尚未获得 2、4、8、16、32 并发下的完整增长曲线。
当前只有 R2 4x20MiB 的 post-output retained memory、hard gate 和 gateway ws 1006 观察。
因此现有结果能说明 R2 80MiB 总输出下的保留内存差异，但不能完整说明不同并发和总输出量下的斜率。

固定 case：

case: background_stdout_growth_20MiB
concurrency: 2, 4, 8, 16, 32
per command stdout: 20 MiB
total stdout: 40 MiB, 80 MiB, 160 MiB, 320 MiB, 640 MiB
NODE_OPTIONS: --max-old-space-size=512
heap redline: 512 MiB
variants: baseline-current, tail-spool, fd-direct

agent 约束：

所有 shell command 必须通过 runCommand 发起。
所有 runCommand 必须设置 run_in_background=true。
agent 只负责并发发起 N 个 background command，并返回 shell_id 和 success。
agent 不调用 getCommandOutput。
agent 不调用 killCommand。
agent 不做输出读取、输出总结或日志采样。
清理和采样由外部实验脚本或 tmux 控制完成。

第二轮实验结果

环境：

case: background_stdout_growth_20MiB
per command stdout: 20 MiB
concurrency: 2, 4, 8, 16, 32
agent: background runCommand only
getCommandOutput: not called
killCommand: not called
NODE_OPTIONS: --max-old-space-size=512 --inspect=127.0.0.1:9320
server: http://localhost:3210
device gateway: http://localhost:8787
agent gateway: http://localhost:8788
summary file: /tmp/lh-shell-io-buffer/growth-curve-report/summary.clean.jsonl

SVG 产物：

/tmp/lh-shell-io-buffer/growth-curve-report/final/concurrency-peak-heap.svg
/tmp/lh-shell-io-buffer/growth-curve-report/final/concurrency-peak-rss.svg
/tmp/lh-shell-io-buffer/growth-curve-report/final/output-delta-heap.svg
/tmp/lh-shell-io-buffer/growth-curve-report/final/output-delta-rss.svg
/tmp/lh-shell-io-buffer/growth-curve-report/final/time-heap-used.svg
/tmp/lh-shell-io-buffer/growth-curve-report/final/time-rss.svg

raw data summary：

variant	N	total MB	peak heap MB	final heap MB	peak RSS MB	delta heap/output	OOM	ws1006	raw
baseline-current	2	40	56.495	51.563	159.84	0.984	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/baseline-current/baseline-current_n2_20260610T173900.jsonl`
baseline-current	4	80	94.987	91.035	214.172	0.973	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/baseline-current/baseline-current_n4_20260610T174038.jsonl`
baseline-current	8	160	171.298	171.276	424.41	0.963	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/baseline-current/baseline-current_n8_20260610T174130.jsonl`
baseline-current	16	320	333.947	331.41	547.523	0.99	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/baseline-current/baseline-current_n16_20260610T174226.jsonl`
baseline-current	32	640	365.273	365.273	601.578	0.544	true	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/baseline-current/baseline-current_n32_20260610T174758.jsonl`
tail-spool	2	40	24.016	11.501	141.223	0.172	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/tail-spool/tail-spool_n2_20260610T175104.jsonl`
tail-spool	4	80	37.179	12.066	176.113	0.25	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/tail-spool/tail-spool_n4_20260610T175230.jsonl`
tail-spool	8	160	30.802	13.168	194.449	0.085	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/tail-spool/tail-spool_n8_20260610T175340.jsonl`
tail-spool	16	320	43.962	15.125	312.043	0.084	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/tail-spool/tail-spool_n16_20260610T175452.jsonl`
tail-spool	32	640	66.205	24.275	352.426	0.077	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/tail-spool/tail-spool_n32_20260610T175621.jsonl`
fd-direct	2	40	17.559	15.861	125.984	0.011	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/fd-direct/fd-direct_n2_20260610T175743.jsonl`
fd-direct	4	80	17.267	16.382	125.836	0.002	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/fd-direct/fd-direct_n4_20260610T175854.jsonl`
fd-direct	8	160	17.267	16.098	125.578	0.001	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/fd-direct/fd-direct_n8_20260610T180006.jsonl`
fd-direct	16	320	17.269	16.203	125.883	0	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/fd-direct/fd-direct_n16_20260610T180123.jsonl`
fd-direct	32	640	17.277	16.229	125.801	0	false	false	`/tmp/lh-shell-io-buffer/growth-curve-raw/fd-direct/fd-direct_n32_20260610T180247.jsonl`

观察：

baseline-current 在 2、4、8、16 档位表现出接近线性的 heap 增长。
baseline-current 在 32x20MiB 档位复现 V8 heap OOM。
baseline-current N=32 表中的 peak heap 是最后一个 inspector 可采样点，不是 OOM 瞬间真实峰值。
connect log 显示 OOM 前 GC 点约为 433 到 447 MiB old space，随后触发 JavaScript heap out of memory。
tail-spool 的 post-output heap 保持在约 11 到 24 MiB，N=32 peak heap 约 66 MiB。
fd-direct 的 heap 基本保持水平，N=32 peak heap 约 17.277 MiB。
三组实验均未观察到 gateway ws 1006。

真实场景画像

估算基线：

第二轮增长曲线使用 2、4、8、16、32 档位更新估算。
baseline-current 启动空闲 heapUsed 约 17 MiB。
baseline-current 在 2、4、8、16 档位的 delta_heap_per_output_mb 约为 0.963 到 0.99。
baseline-current 在 32x20MiB 档位复现 V8 heap OOM。
baseline-current N=32 的最后 inspector 采样点为 heapUsed 365 MiB，但 connect log 显示 OOM 前 GC 已到约 433 到 447 MiB old space。
tail-spool 在 N=32 时 peak heapUsed 约 66 MiB，final heapUsed 约 24 MiB。
fd-direct 在 N=32 时 peak heapUsed 约 17 MiB，final heapUsed 约 16 MiB。
proto 的 heap 成本按 active shell 管理成本、短时 stream buffer、bounded retained tail 和 metadata 估算，不按完整输出线性估算。

场景估算：

场景	典型命令	单任务输出量	并发	baseline-current 额外 heap 粗估	proto 额外 heap 粗估	风险
quick probe	`pwd`, `ls`, `git status --short`	小于 1 MiB	1 到 4	小于 4 MiB	小于 4 MiB	低
normal build	`pnpm build`, `bun run build`	5 到 30 MiB	1 到 2	约 5 到 60 MiB	tail-spool 低到中，fd-direct 接近 0	中
verbose test	`vitest --reporter verbose`, integration logs	20 到 100 MiB	1 到 4	约 20 到 400 MiB	tail-spool peak 受 stream burst 影响，fd-direct 接近 0	中到高
dependency install	`pnpm install`, package manager debug logs	普通 warm install 为 KB 级，debug 或异常重试另算	1 到 2	普通场景可忽略，debug flood 另按输出量估算	tail-spool 低，fd-direct 接近 0	低到中
recursive search flood	`find /`, `grep -R`, `rg` without ignore	`find /` 路径枚举在样本镜像为 1 到 4 MiB，内容匹配或内容 dump 可到 100 MiB 到数 GiB	1 到 4	单纯路径枚举风险低，内容读取型容易接近或超过 512 MiB	heap 应受控，磁盘和进程生命周期变成主风险	中到高
background fanout	多个 agent 同时 runCommand	20 MiB	8 到 32	160 到 640 MiB 级别，32 路已复现 OOM	tail-spool N=32 peak 约 66 MiB，fd-direct N=32 peak 约 17 MiB	高
long-running log stream	server/dev watcher/test watcher	每分钟数 MiB 到数百 MiB	1 到 8	随运行时间持续增长	heap 应受控，必须设置 disk cap	高
accidental binary dump	`cat large.bin`, base64 dump, model/log artifact	100 MiB 到数 GiB	1	容易 OOM	heap 应受控，磁盘 cap 必须生效	高

容量估算公式：

baseline-current_heap_used_mb ≈ idle_heap_mb + total_output_mb * observed_heap_slope
observed_heap_slope 在 2 到 16 并发档位约为 0.963 到 0.99

proto_heap_used_mb ≈ idle_heap_mb + active_shell_count * retained_tail_cap_mb + metadata_cost_mb
proto_disk_used_mb ≈ total_output_mb，直到 disk cap 或 cleanup 生效

容量红线：

在 512 MiB old space 下，baseline-current 的危险区大约从 400 到 640 MiB 总输出开始。
本轮 320 MiB 输出未 OOM，640 MiB 输出复现 OOM。
考虑 GC、arrayBuffers、external、agent gateway、inspector 和业务对象，真实 OOM 点会随并发、chunk 形态和时序波动。
对于 proto，heap 红线不应由 total_output_mb 决定，而应由 active shell 数量、tail cap、metadata 和短时 stream buffer 决定。
proto 的主要容量红线转移到磁盘空间、文件数量、cleanup 和进程生命周期。

dependency install 输出校准

实测命令：

cwd: /home/cy948/workspace/github/lobe-chat
command: pnpm i
pnpm: 10.33.0
mode A: script -q -c 'pnpm i' tmp/local-system-output-size/pnpm-i.tty.log
mode B: pnpm i > tmp/local-system-output-size/pnpm-i.pipe.log 2>&1

结果：

TTY log: 16,568 bytes
pipe log: 4,622 bytes
TTY elapsed: 53s
pipe elapsed: 54s
exit: 0

解释：

普通 pnpm i 的输出量远低于原先按 10 到 80 MiB 给出的 dependency install 粗估。
在本仓库 warm install 场景下，pnpm progress、peer warning、prepare 输出加起来仍然是 KB 级。
dependency install 不应作为默认高输出风险场景。
只有 debug mode、异常重试、网络错误反复刷屏、postinstall 长日志、CI verbose 配置时，才需要按高输出场景估算。

recursive search flood 输出校准

实测命令：

repo: /home/cy948/workspace/github/lobe-search-agent-eval
image: alexgshaw/break-filter-js-from-html:20251031
container limits: --memory=1g --cpus=2 --pids-limit=512 --network=none
command A: timeout 180s find / > /out/find-root.stdout 2> /out/find-root.stderr
command B: timeout 180s find / -xdev > /out/find-root-xdev.stdout 2> /out/find-root-xdev.stderr
artifact dir: tmp/local-system-real-output/recursive-search-flood

结果：

find /:
exit: 0
duration: 1s
stdout: 3,218,099 bytes
stderr: 0 bytes
stdout lines: 57,092

find / -xdev:
exit: 0
duration: 0s
stdout: 1,356,751 bytes
stderr: 0 bytes
stdout lines: 25,086

解释：

在 break-filter-js-from-html 镜像里，单纯 find / 只输出路径列表，结果是 MB 级，不是默认撑爆 buffer 的场景。
recursive search flood 的高风险来自内容读取或内容匹配，例如 grep -R 匹配大量文件内容、cat/base64 dump、大量 stderr permission/log 输出，或者多个后台任务并发叠加。
因此容量画像里应把 path enumeration 和 content flood 分开，不应把 find / 直接等同于 100 MiB 到数 GiB 输出。

实验结束后需要调用 ../lobe-search-agent-eval 脚本中的 notify 通知用户。

kb26

探索

date: 2026-06-10 tags: [inbox, project/cli-agent, type/design-note] public: true

lh Local System Shell IO Buffer 优化方案

背景

Claude Code 方案记录

总体判断

模式选择

文件 fd 写入

为什么不会进入 JS heap

Progress 如何实现

结果如何返回

Background 如何处理

可借鉴点

映射到 lh 的设计问题

Codex 方案记录

总体判断

输出保留结构

写入时按 retained cap 淘汰

读取协议

方案语义

可借鉴点

映射到 lh 的设计问题

opencode 方案记录

总体判断

命令执行方式

输出收集结构

流式读取和 tail 保留

超限后写 truncation file

Truncation 配置和清理

结果如何返回

可借鉴点

映射到 lh 的设计问题

本地 CLI 构建前置条件

横评实验实现准备

TODO

实验记录

baseline-current / canary / R2 / 512 MiB

tail-spool / R2 / 512 MiB

fd-direct / R2 / 512 MiB

R2 横评汇总

下一轮增长曲线实验

第二轮实验结果

真实场景画像

dependency install 输出校准

recursive search flood 输出校准

关系图谱

目录

反向链接