From prompt to published: how every image on this blog comes out of a local ComfyUI

TL;DR

I don’t pay for stock photos and I don’t open Canva. Every raster image on this blog is generated on a Mac Studio sitting three feet from me, by asking Claude Code to call a generate_image MCP tool that wraps ComfyUI. The pipeline is: prompt → ComfyUI (MPS) → PNG on disk → upload_media.py → S3 → CloudFront → a Markdown reference in the post. It costs $0 per image, takes ~15 seconds, and the whole thing is repeatable because the prompt and settings live in the commit history.

I already wrote about getting ComfyUI deployed on the Mac Studio behind k3s ingress. That post was about standing it up. This one is about using it — the part that actually shows up on the page.

Why bother generating images at all

Technical blogs have an image problem. You either:

Ship walls of text with no visuals (high bounce rate, looks lazy), or
Bolt on generic stock photos of “businessman pointing at server” (worse — it actively signals low-effort content), or
Pay a per-image cloud generation API and slowly bleed money, or
Spend 20 minutes per post in a design tool you don’t enjoy.

None of those appealed to me. I already had a GPU running 24/7 for local LLM inference. ComfyUI runs on the same box for the cost of the electrons. So the calculus was simple: if image generation is free, local, and scriptable, every post can have a custom header and inline diagrams without the marginal cost ever entering my head.

The key word is scriptable. I am not clicking around the ComfyUI node graph for each image. I describe what I want in plain language, an agent submits the workflow, and a file lands on disk. The node graph is an implementation detail I almost never touch.

The one tool I actually call

The infrastructure post covers the MCP server in detail — and I go deep on writing your own MCP servers elsewhere — but here’s the only part that matters day to day: a single tool with sane defaults:

generate_image(
    prompt: str,
    negative_prompt: str = "",
    checkpoint: str = "RealVisXL_V5.0_fp16.safetensors",
    width: int = 1024,
    height: int = 1024,
    steps: int = 25,
    cfg: float = 7.0,
    seed: int = -1,
)

That’s it. From inside any Claude Code session I can say “generate a header image of a small server rack in a closet, warm light, photoreal” and the agent fills in the arguments, submits the SDXL workflow to ComfyUI over 127.0.0.1:8188, polls until it’s done, and hands me back a path like ~/comfyui/output/mcp_00042_.png. Because it’s an MCP tool, the agent picks the checkpoint and size based on what I asked for — I don’t have to remember the filenames.

Four companion tools round it out: list_models (what checkpoints and LoRAs are loaded), get_queue_status (is anything already running), get_system_stats (which device, how much memory is free), and get_recent_images (show me what just came out, with file sizes). That last one is what I use to eyeball results before deciding whether to upload or re-roll.

Picking the right model for the job

This is the part nobody tells you: the model matters more than the prompt. I keep three checkpoints hot on the SSD, each good at a different thing. Reaching for the wrong one is the single most common reason a generation comes out looking like AI slop.

Job	Checkpoint	Steps	CFG	Size
Photoreal blog header	`RealVisXL_V5.0`	30	7.0	1200×630
Technical diagram / illustration	`sd3.5_large`	25	7.5	1024×768
Wiki / concept illustration	`sd3.5_large`	25	7.0	1024×1024
Portraits, cinematic scenes	`Juggernaut-XI`	30	6.5	1024×1024
Quick throwaway draft	`sd3.5_large`	15	7.0	512×512

The reasoning:

RealVisXL V5.0 is my default for anything that should look like a photograph — hardware on a bench, a rack, a workspace. It does materials and lighting convincingly.
SD 3.5 Large follows prompts the most literally, which is exactly what you want for “clean technical illustration, white background, minimal” diagram-style art. It’s a bigger model (15 GB) but it understands compositional instructions the SDXL checkpoints fumble.
Juggernaut XI is the one I reach for when there’s a person or a cinematic mood involved. Overkill for a network diagram, ideal for a “lone operator at 2am” vibe.

1200×630 is the Open Graph aspect ratio, so headers slot straight into social cards without cropping. Everything internal to a post I generate square or 4:3 and let Hugo handle the sizing.

Prompts that work for technical art

After a few hundred generations, the prompts that consistently land share a structure. I think of it as four slots:

Subject — the literal thing. “A 3-node small-form-factor PC cluster wired together.”
Style anchor — the look. “Clean technical illustration, isometric, flat color, minimal.”
Context cues — “white background, soft shadows, lots of negative space.”
Negatives — what to actively exclude.

A real prompt I used for a diagram-style header:

a clean isometric illustration of three small desktop computers connected by ethernet cables forming a cluster, flat vector style, muted blue and slate palette, white background, generous negative space, technical, minimal

…with a negative prompt of:

photo, realistic, cluttered, busy background, text, watermark, logo, gradient, noise

A few hard-won rules:

Always negative-prompt text, watermark, logo. Diffusion models love to scribble fake garbled text into images. Banning it up front saves a re-roll.
“Generous negative space” earns its keep because a header with breathing room looks intentional, and it leaves room for the title overlay.
Name a palette. “Muted blue and slate” gives the whole blog a consistent feel across posts without me building an actual brand guide.
Don’t ask for legible labels. No SDXL checkpoint can write “k3s control plane” cleanly inside the picture. If a diagram needs real labels, I generate the scene and add text in the layout layer — or just hand-draw an SVG. The category cover art on this blog is hand-authored SVG for exactly that reason; generation is for photos and illustrations, not labeled schematics.

From the GPU to the page

Generation is half the job. The image still has to get onto the CDN. The path:

# 1. Generate (via the agent / MCP) → lands in ~/comfyui/output/mcp_*.png
# 2. Eyeball it
#    get_recent_images(limit=5)
# 3. Push it to S3 + CloudFront
python content-gen/scripts/upload_media.py mcp_00042_.png --prefix media/photos

upload_media.py pushes the file to the content bucket under the media/photos/ prefix with a one-year immutable cache header, and CloudFront serves it from https://blog.zolty.systems/media/photos/.... The script detects mime types and converts where needed, so an iPhone HEIC dropped in the same folder comes out the other side as a browser-friendly JPEG.

Then in the post it’s a one-liner:


     
            Generated locally, $0.

For the post’s cover image specifically, I co-locate the file in the page bundle directory next to index.md and reference it relatively — Hugo’s page-bundle convention. CDN-hosted media goes through upload_media.py; bundle images ride along in git. Either way, no manual S3 console clicking.

The honest caveats

I’m not going to pretend this is magic.

SDXL cannot do text or precise diagrams. If the image needs accurate labels, real UI, or a faithful schematic, generation is the wrong tool. I draw those by hand or in code.
Hands, fine print, and tiny repeated detail still break. I crop around problems more often than I’d like.
Reproducibility is seed-dependent. A seed: -1 is random; if I want the exact image back I have to capture the seed generate_image used. I started logging seeds for anything I might want to regenerate at a different size.
MPS has sharp edges. Apple’s Metal backend occasionally hits an unsupported op and falls back to CPU for a step, which tanks throughput. PYTORCH_ENABLE_MPS_FALLBACK=1 keeps it from crashing outright, but a “slow” generation is usually that.
Curation is the real cost. Free generation means I generate four and keep one. The GPU time is free; my taste is the bottleneck.

Lessons

The model picker is 80% of quality. Match the checkpoint to the job before you touch the prompt.
A single well-defaulted MCP tool beats a node graph. I describe the image; the agent handles ComfyUI. I’ve opened the actual ComfyUI UI maybe twice this month.
Negative-prompt the garbage text every time. It’s the cheapest quality win available.
Log your seeds. Future-you will want to regenerate that header at 2× for a retina cover.
Generation is for art, not schematics. Photos and illustrations: yes. Labeled diagrams: hand-draw the SVG.

No homelab GPU? The exact same pipeline works against a cloud GPU — a DigitalOcean GPU Droplet running ComfyUI plus their Spaces object storage in place of S3 gets you prompt-to-CDN without owning the hardware. The only piece that’s truly local is the GPU; everything downstream is portable.

TL;DR#

Why bother generating images at all#

The one tool I actually call#

Picking the right model for the job#

Prompts that work for technical art#

From the GPU to the page#

The honest caveats#

Lessons#