CUDA OOM log cleaner

CUDA OOM log cleaner for coding agents

Paste raw PyTorch training output and get a compact brief that keeps memory evidence while folding progress bars, wandb chatter, NCCL spam, and dependency frames.

CUDA out-of-memory logs are expensive to paste into Codex or Claude Code because the useful memory signal is often surrounded by repeated progress output and distributed-training noise. This page shows the kind of evidence tokensift keeps.

Before and after

Before: raw paste

$ python train.py --model llama-7b --batch-size 64 --precision fp16
[12:04:09] INFO worker=2 prefetch batch=812
[12:04:09] INFO worker=2 prefetch batch=812
Epoch 3: 82%|████████▏| 821/1000 [loss=2.39]
wandb: step=821 loss=2.39 lr=2e-4
wandb: step=821 loss=2.39 lr=2e-4
NCCL INFO Bootstrap : Using eth0
NCCL INFO Bootstrap : Using eth0
Traceback (most recent call last):
  File "/workspace/train.py", line 214, in <module>
    loss = trainer.step(batch)
  File ".../site-packages/torch/nn/modules/module.py", line 1775, in _call_impl
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB.
GPU0: 23.65 GiB total; 1.08 GiB free; 21.40 GiB allocated; 22.49 GiB reserved.

After: compact debugging brief

Debug CUDA OOM.

Type:
- CUDA / PyTorch OOM

Keep:
- cmd: python train.py --model llama-7b --batch-size 64 --precision fp16
- frame: /workspace/train.py:214 -> loss = trainer.step(batch)
- error: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.25 GiB.
- detail: model=llama-7b; batch-size=64; precision=fp16
- memory: GPU0 23.65 GiB total; 1.08 GiB free; 21.40 GiB allocated; 22.49 GiB reserved

Folded:
- duplicate worker INFO, progress bar, wandb x2, NCCL x2, torch internals

Ask: root cause, smallest fix, verify command.

Preserved evidence

  • training command and model arguments
  • batch size and precision
  • top user-code frame
  • PyTorch OOM message
  • GPU total/free/allocated/reserved memory stats

Folded noise

  • duplicate worker INFO lines
  • tqdm-style progress updates
  • wandb metric chatter
  • repeated NCCL bootstrap lines
  • torch dependency internals that are not the likely root cause

How to use it well

Include the command line, batch size, precision, and memory line when you paste. Those details usually decide whether the smallest fix is batch reduction, gradient accumulation, precision change, checkpointing, or allocator configuration.

Paste the relevant failure output into the main tool, run Sift, then review the compact brief before sending it to Codex, Claude Code, ChatGPT, OpenCode, or another coding agent.

Privacy note: No LLM API call. No raw logs sent to external model providers, ad providers, or third-party analytics. Hosted submissions may be stored in a private first-party database for diagnostics and retained for up to 30 days by default, so review and redact before use.

Open tokensift and paste your own log.