Mac Studio M3 Ultra as a GPU appliance proxied into a k3s cluster Mac Studio M3 Ultra as a GPU appliance proxied into a k3s cluster

The Mac Studio as a GPU appliance: serving Ollama and ComfyUI to a k3s cluster

TL;DR A Mac Studio M3 Ultra costs the same as a single 4090 but comes with 256 GB of unified memory and 60-core GPU, all running at 100–200 W under inference. I stopped trying to pass MPS into containers and instead run Ollama and ComfyUI natively on macOS, then proxy them back into k3s as simple Kubernetes Services with manual Endpoints. Two Mac Studios connected via Thunderbolt 5 split the load: one handles hot-path LLM inference and embeddings, the other runs the heavy forge for diffusion and long-horizon reasoning. Both are cheaper to run than a single-socket A100 and require no special driver stacks. ...

June 4, 2026 · 8 min · zolty

Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.