TL;DR
I deployed ComfyUI natively on my Mac Studio M3 Ultra using Apple’s MPS GPU backend, proxied it through k3s Traefik ingress with Authentik SSO, wired it into Open WebUI as the image generation backend (replacing $0.04/image Bedrock calls), and built an MCP server so AI agents can generate images programmatically. The whole pipeline is Ansible-managed and generates images for free on local hardware.
Why native instead of containerized
ComfyUI needs GPU access. On Linux, that’s straightforward — pass through the GPU via device plugins. On macOS, there’s no container runtime that exposes MPS (Metal Performance Shaders) to containers. Docker Desktop on Mac runs a Linux VM — no Metal, no MPS.
So ComfyUI runs natively on macOS, managed by launchd. The k3s cluster handles routing, authentication, and TLS via a proxy-only namespace — no pods run in the cluster for this service. Just a Service with manual Endpoints pointing to the Mac Studio’s IP.
This is the same pattern I use for Ollama. The Mac Studio is a GPU compute host; the k3s cluster is the control plane.
Ansible role
The installation is fully automated via an Ansible role:
# ansible/playbooks/mac-studio-comfyui.yml
- hosts: mac-studio
roles:
- mac_studio_comfyui
The role handles:
- Git clone — pulls ComfyUI from GitHub to
~/comfyui - Python venv — isolated environment at
~/comfyui/venv/ - PyTorch with MPS — installs the MPS-enabled PyTorch build
- Dependencies —
pip install -r requirements.txt - Model directories — creates the standard ComfyUI model tree
- NAS overflow — optional NFS mount for archive models via
extra_model_paths.yaml - launchd service — templates and loads
com.comfyui.server.plist - Health check — polls
/system_statsuntil ComfyUI is ready
MPS configuration
Two environment variables in the launchd plist make MPS work well on the M3 Ultra:
<key>EnvironmentVariables</key>
<dict>
<key>PYTORCH_MPS_HIGH_WATERMARK_RATIO</key>
<string>0.0</string>
</dict>
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 disables PyTorch’s MPS memory caching. By default, PyTorch reserves GPU memory in chunks and doesn’t release it back to the system. On a machine with 256GB unified memory that also runs Ollama with 70B+ models, that memory pressure matters. Setting it to 0 makes PyTorch allocate and release on demand.
The launch flags:
--listen 0.0.0.0 --port 8188 --force-fp16 --preview-method auto
--force-fp16 is important — it forces 16-bit floating point instead of 32-bit. Halves VRAM usage per model with minimal quality loss. On Apple Silicon where CPU and GPU share the same memory pool, every GB saved for inference is a GB available for Ollama.
k3s ingress proxy
The Kubernetes side is a proxy-only namespace — no pods, just routing resources:
apiVersion: v1
kind: Service
metadata:
name: comfyui
namespace: comfyui
spec:
ports:
- port: 8188
targetPort: 8188
---
apiVersion: v1
kind: Endpoints
metadata:
name: comfyui
namespace: comfyui
subsets:
- addresses:
- ip: 192.168.1.216
ports:
- port: 8188
The Service has no selector — it doesn’t match pods. Instead, a manual Endpoints object points to the Mac Studio’s IP. Kubernetes DNS resolves comfyui.comfyui.svc.cluster.local to this endpoint, and Traefik routes external traffic through it.
Authentik SSO
The IngressRoute applies forward-auth middleware so ComfyUI is behind the same SSO as every other cluster service:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: comfyui
namespace: comfyui
spec:
entryPoints: [websecure]
routes:
- match: Host(`comfyui.k3s.internal.zolty.systems`)
kind: Rule
middlewares:
- name: authentik-forward-auth
namespace: public-ingress
services:
- name: comfyui
port: 8188
- match: Host(`comfyui.k3s.internal.zolty.systems`) && PathPrefix(`/outpost.goauthentik.io/`)
kind: Rule
services:
- name: authentik-server
namespace: authentik
port: 80
tls:
secretName: comfyui-tls
Two routes: the main route applies Authentik forward auth, and the outpost callback route bypasses auth (necessary for the OAuth flow to complete). TLS via cert-manager with Let’s Encrypt DNS-01 challenge.
Users authenticate once through Authentik, then get direct access to the ComfyUI web interface. No separate login.
Open WebUI integration
The real payoff: Open WebUI can use ComfyUI as its image generation backend. Before this, image generation used Amazon Bedrock’s Nova Canvas at $0.04 per image.
# open-webui-values.yaml
ENABLE_IMAGE_GENERATION: "true"
IMAGE_GENERATION_ENGINE: "comfyui"
COMFYUI_BASE_URL: "http://comfyui.comfyui.svc.cluster.local:8188"
IMAGE_SIZE: "1024x1024"
That’s it. Open WebUI connects to ComfyUI over internal cluster DNS, submits SDXL workflows, and displays the generated images inline in chat. Users type “generate an image of…” and it just works.
The cost change:
| Before (Bedrock) | After (Local) | |
|---|---|---|
| Cost per image | $0.04 | $0.00 |
| Latency | ~2s (API round-trip) | Sub-second (MPS) |
| Privacy | Images stored in AWS | Images stay local |
| Model control | Nova Canvas only | Any SDXL/Flux checkpoint |
At 100 images/day (a reasonable estimate for a household with an AI chat interface), that’s $4/day or ~$1,460/year saved. The Mac Studio was already running 24/7 for Ollama — ComfyUI adds negligible power draw.
MCP server
For AI agents (Claude Code, OpenClaw) to generate images programmatically, I built an MCP server that wraps ComfyUI’s workflow API:
from fastmcp import FastMCP
mcp = FastMCP("comfyui")
@mcp.tool()
def generate_image(
prompt: str,
negative_prompt: str = "",
checkpoint: str = "RealVisXL_V5.0_fp16.safetensors",
width: int = 1024,
height: int = 1024,
steps: int = 25,
cfg: float = 7.0,
seed: int = -1
) -> str:
"""Generate an image using ComfyUI SDXL pipeline."""
workflow = build_sdxl_workflow(prompt, negative_prompt, checkpoint, ...)
prompt_id = submit_workflow(workflow)
return poll_until_complete(prompt_id)
Five tools exposed via MCP:
- generate_image — submit an SDXL txt2img workflow, poll until complete, return image paths
- list_models — available checkpoints and LoRAs
- get_queue_status — running and pending generation jobs
- get_system_stats — ComfyUI version, device type (MPS), VRAM status
- get_recent_images — recently generated images with file sizes
The workflow JSON is built programmatically — no need for users to understand ComfyUI’s node graph. The agent says “generate a cyberpunk cityscape” and gets back a file path.
Model management
Models live in two tiers:
Local SSD (hot models): SD 3.5 Large, Juggernaut XI, RealVisXL V5.0. Plus LoRAs for realism enhancement, detail boosting, and anti-blur.
NAS (archive): Older checkpoints, specialized models. Mounted via NFS at /Volumes/comfyui-models and registered in extra_model_paths.yaml. ComfyUI searches both locations when loading.
# extra_model_paths.yaml
nas_overflow:
base_path: /Volumes/comfyui-models
checkpoints: checkpoints/
loras: loras/
vae: vae/
This keeps the Mac Studio’s 2TB SSD for hot models while archiving everything else on the NAS. ComfyUI’s model loader searches both paths transparently.
What I’d do differently
Monitor ComfyUI inference metrics. Unlike Ollama, ComfyUI doesn’t have a clean metrics proxy available. Right now I monitor it via process presence and log shipping, but I’d like per-generation latency and queue depth in Prometheus. A lightweight Python wrapper around /queue and /history would do it.
Add a webhook for generation completion. The MCP server currently polls /history/{prompt_id} every second until the image is done. A WebSocket listener on ComfyUI’s /ws endpoint would be more efficient for long-running generations (img2img, high step counts).
Lessons
- Native macOS + k3s proxy is the right pattern for GPU workloads on Apple Silicon. You can’t pass MPS into a container, so don’t try. Run native, proxy through the cluster.
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0is mandatory when sharing memory with Ollama. Without it, PyTorch hoards GPU memory and starves other workloads.- Open WebUI’s ComfyUI integration is a one-line config change. The hardest part was deploying ComfyUI itself, not connecting it.
- MCP servers turn local tools into agent capabilities. Five tools, 200 lines of Python, and now any MCP-compatible agent can generate images.
- NAS overflow for models is worth the NFS setup. A 70GB checkpoint collection grows fast. Tiered storage keeps the SSD fast and the NAS useful.
Don’t have a homelab? ComfyUI runs on any machine with a GPU — including Windows with CUDA or Linux with ROCm. The k3s ingress proxy pattern works for any external service you want to expose through your cluster. For a basic setup, a DigitalOcean GPU Droplet can run ComfyUI directly without the proxy layer.