GPU Build Picker

Pick a reference build above, or edit any line. Live cost + provenance + capability score + a machine that builds itself.

🧮 Local-LLM Capability Estimator

capability
Model weights
KV cache
Total resident
Est. decode (1 user)
Est. prefill
VRAM capacity
Decode speed

Rough estimates. Decode is modeled as memory-bandwidth-bound and calibrated to measured 3090 numbers (dense 27B≈40, 35B-A3B≈87, 70B≈18 t/s on one card; 120B-A12B MoE≈67 t/s resident on a 4×3090 box). MoE decode tracks ACTIVE params, so a big MoE that fits in VRAM runs far faster than its total size suggests. Multi-GPU decode ≈ per-card bandwidth (pipeline-parallel; tensor-parallel without NVLink is comms-bound). Spill to system RAM drops you toward system-RAM bandwidth. Prefill scales with FP16 TFLOPS. KV-cache size = weights' sibling cost: it grows linearly with context and shrinks with KV quant (f16→q8_0 ≈ half, q4_0 ≈ quarter); GQA + MoE keep it modest. KV is f16-estimated per model and added to the footprint that must fit. Capability reads ONLY parts' specs — an unknown spec is treated as unknown, never 0.

Item Spec / notes Qty Unit $ Source URL Ext.
Subtotal$0
Tax %$0
Shipping $$0
Grand total$0
Budget line: $ ·

💡 Total cost of ownership

Hardware (upfront)
Avg power draw
Electricity
Total cost of ownership
Effective $/month

Provenance: every price carries a freshness badge — ● sourced fresh (<7d), ◌ estimate needs verifying, ● stale (>21d). Estimates are flagged, never hidden; nothing is ever fabricated. Autofill: paste a product URL and click — it reads structured data only, through a public CORS proxy, and says so when it can't.