GLM-5.1 benchmark on Mac Studio GLM-5.1 benchmark on Mac Studio

Running GLM-5.1 (744B) Locally on a Mac Studio: Benchmark Results

TL;DR I loaded Z.ai’s GLM-5.1 — a 744B parameter MoE model with 40B active parameters — onto a Mac Studio M3 Ultra with 256GB unified memory using a 2-bit quantized GGUF via llama.cpp. It runs at 5.8 tok/s with a 120-second time to first token. The financial analysis quality is genuinely impressive, but it eats 222GB of the 256GB available, leaving room for literally nothing else. It’s a “clear the schedule” model, not an always-on one. ...

April 13, 2026 · 8 min · zolty
Benchmarking every subsystem on four Lenovo M920q Proxmox hosts — NVMe, CPU, memory, and 10GbE network Benchmarking every subsystem on four Lenovo M920q Proxmox hosts — NVMe, CPU, memory, and 10GbE network

Benchmarking Every Subsystem: NVMe, CPU, Memory, and 10GbE on Four Proxmox Hosts

TL;DR Prometheus and Grafana both crashed with I/O errors on the same node. Before assuming software, I ran a full hardware audit across all four Proxmox hosts — SMART health, NVMe disk benchmarks (fio), CPU benchmarks (sysbench), memory bandwidth tests, and 10GbE network throughput (iperf3). The result: all hardware is healthy. The I/O errors were Longhorn CSI virtual block device corruption, not physical disk failure. Along the way, I established baseline performance numbers for every subsystem and discovered that custom cooling makes a dramatic difference in thermal performance. ...

February 22, 2026 · 11 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.