GPU Build Picker

Add from catalog

—

🧮 Local-LLM Capability Estimator

Total VRAM (GB)

System RAM (GB)

Per-GPU bandwidth (GB/s)

System-RAM bandwidth (GB/s)

GPUs

Aggregate FP16 (TFLOPS)

Context window 32K

KV quant

Attention

–capability

—

Model weights—

KV cache —

Total resident—

Est. decode (1 user)—

Est. prefill—

VRAM capacity

Decode speed

Rough estimates. Decode is modeled as memory-bandwidth-bound and calibrated to measured 3090 numbers (dense 27B≈40, 35B-A3B≈87, 70B≈18 t/s on one card; 120B-A12B MoE≈67 t/s resident on a 4×3090 box). MoE decode tracks ACTIVE params, so a big MoE that fits in VRAM runs far faster than its total size suggests. Multi-GPU decode ≈ per-card bandwidth (pipeline-parallel; tensor-parallel without NVLink is comms-bound). Spill to system RAM drops you toward system-RAM bandwidth. Prefill scales with FP16 TFLOPS. KV-cache size = weights' sibling cost: it grows linearly with context and shrinks with KV quant (f16→q8_0 ≈ half, q4_0 ≈ quarter); GQA + MoE keep it modest. KV is f16-estimated per model and added to the footprint that must fit. Capability reads ONLY parts' specs — an unknown spec is treated as unknown, never 0.

Item	Spec / notes	Qty	Unit $	Source URL	Ext.

Subtotal$0

Tax %$0

Shipping $$0

Grand total$0

Budget line: $ ·

💡 Total cost of ownership

Years

Utilization %

Electricity $/kWh

Full-load draw (W)

Hardware (upfront)—

Avg power draw —

Electricity —

Total cost of ownership—

Effective $/month—

Provenance: every price carries a freshness badge — ● sourced fresh (<7d), ◌ estimate needs verifying, ● stale (>21d). Estimates are flagged, never hidden; nothing is ever fabricated. Autofill: paste a product URL and click ⤓ — it reads structured data only, through a public CORS proxy, and says so when it can't.