ComfyUI on Mac Studio: MPS-Accelerated Image Generation Behind k3s Ingress

TL;DR

I deployed ComfyUI natively on my Mac Studio M3 Ultra using Apple’s MPS GPU backend, proxied it through k3s Traefik ingress with Authentik SSO, wired it into Open WebUI as the image generation backend (replacing $0.04/image Bedrock calls), and built an MCP server so AI agents can generate images programmatically. The whole pipeline is Ansible-managed and generates images for free on local hardware.

Why native instead of containerized

ComfyUI needs GPU access. On Linux, that’s straightforward — pass through the GPU via device plugins. On macOS, there’s no container runtime that exposes MPS (Metal Performance Shaders) to containers. Docker Desktop on Mac runs a Linux VM — no Metal, no MPS.

So ComfyUI runs natively on macOS, managed by launchd. The k3s cluster handles routing, authentication, and TLS via a proxy-only namespace — no pods run in the cluster for this service. Just a Service with manual Endpoints pointing to the Mac Studio’s IP.

This is the same pattern I use for Ollama. The Mac Studio is a GPU compute host; the k3s cluster is the control plane.

Ansible role

The installation is fully automated via an Ansible role:

# ansible/playbooks/mac-studio-comfyui.yml
- hosts: mac-studio
  roles:
    - mac_studio_comfyui

The role handles:

Git clone — pulls ComfyUI from GitHub to ~/comfyui
Python venv — isolated environment at ~/comfyui/venv/
PyTorch with MPS — installs the MPS-enabled PyTorch build
Dependencies — pip install -r requirements.txt
Model directories — creates the standard ComfyUI model tree
NAS overflow — optional NFS mount for archive models via extra_model_paths.yaml
launchd service — templates and loads com.comfyui.server.plist
Health check — polls /system_stats until ComfyUI is ready

MPS configuration

Two environment variables in the launchd plist make MPS work well on the M3 Ultra:

<key>EnvironmentVariables</key>
<dict>
    <key>PYTORCH_MPS_HIGH_WATERMARK_RATIO</key>
    <string>0.0</string>
</dict>

PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 disables PyTorch’s MPS memory caching. By default, PyTorch reserves GPU memory in chunks and doesn’t release it back to the system. On a machine with 256GB unified memory that also runs Ollama with 70B+ models, that memory pressure matters. Setting it to 0 makes PyTorch allocate and release on demand.

The launch flags:

--listen 0.0.0.0 --port 8188 --force-fp16 --preview-method auto

--force-fp16 is important — it forces 16-bit floating point instead of 32-bit. Halves VRAM usage per model with minimal quality loss. On Apple Silicon where CPU and GPU share the same memory pool, every GB saved for inference is a GB available for Ollama.

k3s ingress proxy

The Kubernetes side is a proxy-only namespace — no pods, just routing resources:

apiVersion: v1
kind: Service
metadata:
  name: comfyui
  namespace: comfyui
spec:
  ports:
    - port: 8188
      targetPort: 8188
---
apiVersion: v1
kind: Endpoints
metadata:
  name: comfyui
  namespace: comfyui
subsets:
  - addresses:
      - ip: 192.168.1.216
    ports:
      - port: 8188

The Service has no selector — it doesn’t match pods. Instead, a manual Endpoints object points to the Mac Studio’s IP. Kubernetes DNS resolves comfyui.comfyui.svc.cluster.local to this endpoint, and Traefik routes external traffic through it.

Authentik SSO

The IngressRoute applies forward-auth middleware so ComfyUI is behind the same SSO as every other cluster service:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: comfyui
  namespace: comfyui
spec:
  entryPoints: [websecure]
  routes:
    - match: Host(`comfyui.k3s.internal.zolty.systems`)
      kind: Rule
      middlewares:
        - name: authentik-forward-auth
          namespace: public-ingress
      services:
        - name: comfyui
          port: 8188
    - match: Host(`comfyui.k3s.internal.zolty.systems`) && PathPrefix(`/outpost.goauthentik.io/`)
      kind: Rule
      services:
        - name: authentik-server
          namespace: authentik
          port: 80
  tls:
    secretName: comfyui-tls

Two routes: the main route applies Authentik forward auth, and the outpost callback route bypasses auth (necessary for the OAuth flow to complete). TLS via cert-manager with Let’s Encrypt DNS-01 challenge.

Users authenticate once through Authentik, then get direct access to the ComfyUI web interface. No separate login.

Open WebUI integration

The real payoff: Open WebUI can use ComfyUI as its image generation backend. Before this, image generation used Amazon Bedrock’s Nova Canvas at $0.04 per image.

# open-webui-values.yaml
ENABLE_IMAGE_GENERATION: "true"
IMAGE_GENERATION_ENGINE: "comfyui"
COMFYUI_BASE_URL: "http://comfyui.comfyui.svc.cluster.local:8188"
IMAGE_SIZE: "1024x1024"

That’s it. Open WebUI connects to ComfyUI over internal cluster DNS, submits SDXL workflows, and displays the generated images inline in chat. Users type “generate an image of…” and it just works.

The cost change:

	Before (Bedrock)	After (Local)
Cost per image	$0.04	$0.00
Latency	~2s (API round-trip)	Sub-second (MPS)
Privacy	Images stored in AWS	Images stay local
Model control	Nova Canvas only	Any SDXL/Flux checkpoint

At 100 images/day (a reasonable estimate for a household with an AI chat interface), that’s $4/day or ~$1,460/year saved. The Mac Studio was already running 24/7 for Ollama — ComfyUI adds negligible power draw.

MCP server

For AI agents (Claude Code, OpenClaw) to generate images programmatically, I built an MCP server that wraps ComfyUI’s workflow API:

from fastmcp import FastMCP

mcp = FastMCP("comfyui")

@mcp.tool()
def generate_image(
    prompt: str,
    negative_prompt: str = "",
    checkpoint: str = "RealVisXL_V5.0_fp16.safetensors",
    width: int = 1024,
    height: int = 1024,
    steps: int = 25,
    cfg: float = 7.0,
    seed: int = -1
) -> str:
    """Generate an image using ComfyUI SDXL pipeline."""
    workflow = build_sdxl_workflow(prompt, negative_prompt, checkpoint, ...)
    prompt_id = submit_workflow(workflow)
    return poll_until_complete(prompt_id)

Five tools exposed via MCP:

generate_image — submit an SDXL txt2img workflow, poll until complete, return image paths
list_models — available checkpoints and LoRAs
get_queue_status — running and pending generation jobs
get_system_stats — ComfyUI version, device type (MPS), VRAM status
get_recent_images — recently generated images with file sizes

The workflow JSON is built programmatically — no need for users to understand ComfyUI’s node graph. The agent says “generate a cyberpunk cityscape” and gets back a file path.

Model management

Models live in two tiers:

Local SSD (hot models): SD 3.5 Large, Juggernaut XI, RealVisXL V5.0. Plus LoRAs for realism enhancement, detail boosting, and anti-blur.

NAS (archive): Older checkpoints, specialized models. Mounted via NFS at /Volumes/comfyui-models and registered in extra_model_paths.yaml. ComfyUI searches both locations when loading.

# extra_model_paths.yaml
nas_overflow:
  base_path: /Volumes/comfyui-models
  checkpoints: checkpoints/
  loras: loras/
  vae: vae/

This keeps the Mac Studio’s 2TB SSD for hot models while archiving everything else on the NAS. ComfyUI’s model loader searches both paths transparently.

What I’d do differently

Monitor ComfyUI inference metrics. Unlike Ollama, ComfyUI doesn’t have a clean metrics proxy available. Right now I monitor it via process presence and log shipping, but I’d like per-generation latency and queue depth in Prometheus. A lightweight Python wrapper around /queue and /history would do it.

Add a webhook for generation completion. The MCP server currently polls /history/{prompt_id} every second until the image is done. A WebSocket listener on ComfyUI’s /ws endpoint would be more efficient for long-running generations (img2img, high step counts).

Lessons

Native macOS + k3s proxy is the right pattern for GPU workloads on Apple Silicon. You can’t pass MPS into a container, so don’t try. Run native, proxy through the cluster.
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 is mandatory when sharing memory with Ollama. Without it, PyTorch hoards GPU memory and starves other workloads.
Open WebUI’s ComfyUI integration is a one-line config change. The hardest part was deploying ComfyUI itself, not connecting it.
MCP servers turn local tools into agent capabilities. Five tools, 200 lines of Python, and now any MCP-compatible agent can generate images.
NAS overflow for models is worth the NFS setup. A 70GB checkpoint collection grows fast. Tiered storage keeps the SSD fast and the NAS useful.

Don’t have a homelab? ComfyUI runs on any machine with a GPU — including Windows with CUDA or Linux with ROCm. The k3s ingress proxy pattern works for any external service you want to expose through your cluster. For a basic setup, a DigitalOcean GPU Droplet can run ComfyUI directly without the proxy layer.

TL;DR#

Why native instead of containerized#

Ansible role#

MPS configuration#

k3s ingress proxy#

Authentik SSO#

Open WebUI integration#

MCP server#

Model management#

What I’d do differently#

Lessons#