From prompt to published: how every image on this blog comes out of a local ComfyUI
Every cover photo and inline illustration on this blog is generated locally by asking an agent to call a ComfyUI MCP tool. Here's the model-per-job table, the prompts that actually work, and the prompt-to-CDN pipeline.
I don’t pay for stock photos and I don’t open Canva. Every raster image on this blog is generated on a Mac Studio sitting three feet from me, by asking Claude Code to call a generate_image MCP tool that wraps ComfyUI. The pipeline is: prompt → ComfyUI (MPS) → PNG on disk → upload_media.py → S3 → CloudFront → a Markdown reference in the post. It costs $0 per image, takes ~15 seconds, and the whole thing is repeatable because the prompt and settings live in the commit history.
Technical blogs have an image problem. You either:
Ship walls of text with no visuals (high bounce rate, looks lazy), or
Bolt on generic stock photos of “businessman pointing at server” (worse — it actively signals low-effort content), or
Pay a per-image cloud generation API and slowly bleed money, or
Spend 20 minutes per post in a design tool you don’t enjoy.
None of those appealed to me. I already had a GPU running 24/7 for local LLM inference. ComfyUI runs on the same box for the cost of the electrons. So the calculus was simple: if image generation is free, local, and scriptable, every post can have a custom header and inline diagrams without the marginal cost ever entering my head.
The key word is scriptable. I am not clicking around the ComfyUI node graph for each image. I describe what I want in plain language, an agent submits the workflow, and a file lands on disk. The node graph is an implementation detail I almost never touch.
The infrastructure post covers the MCP server in detail — and I go deep on writing your own MCP servers elsewhere — but here’s the only part that matters day to day: a single tool with sane defaults:
That’s it. From inside any Claude Code session I can say “generate a header image of a small server rack in a closet, warm light, photoreal” and the agent fills in the arguments, submits the SDXL workflow to ComfyUI over 127.0.0.1:8188, polls until it’s done, and hands me back a path like ~/comfyui/output/mcp_00042_.png. Because it’s an MCP tool, the agent picks the checkpoint and size based on what I asked for — I don’t have to remember the filenames.
Four companion tools round it out: list_models (what checkpoints and LoRAs are loaded), get_queue_status (is anything already running), get_system_stats (which device, how much memory is free), and get_recent_images (show me what just came out, with file sizes). That last one is what I use to eyeball results before deciding whether to upload or re-roll.
This is the part nobody tells you: the model matters more than the prompt. I keep three checkpoints hot on the SSD, each good at a different thing. Reaching for the wrong one is the single most common reason a generation comes out looking like AI slop.
Job
Checkpoint
Steps
CFG
Size
Photoreal blog header
RealVisXL_V5.0
30
7.0
1200×630
Technical diagram / illustration
sd3.5_large
25
7.5
1024×768
Wiki / concept illustration
sd3.5_large
25
7.0
1024×1024
Portraits, cinematic scenes
Juggernaut-XI
30
6.5
1024×1024
Quick throwaway draft
sd3.5_large
15
7.0
512×512
The reasoning:
RealVisXL V5.0 is my default for anything that should look like a photograph — hardware on a bench, a rack, a workspace. It does materials and lighting convincingly.
SD 3.5 Large follows prompts the most literally, which is exactly what you want for “clean technical illustration, white background, minimal” diagram-style art. It’s a bigger model (15 GB) but it understands compositional instructions the SDXL checkpoints fumble.
Juggernaut XI is the one I reach for when there’s a person or a cinematic mood involved. Overkill for a network diagram, ideal for a “lone operator at 2am” vibe.
1200×630 is the Open Graph aspect ratio, so headers slot straight into social cards without cropping. Everything internal to a post I generate square or 4:3 and let Hugo handle the sizing.
a clean isometric illustration of three small desktop computers connected by ethernet cables forming a cluster, flat vector style, muted blue and slate palette, white background, generous negative space, technical, minimal
Always negative-prompt text, watermark, logo. Diffusion models love to scribble fake garbled text into images. Banning it up front saves a re-roll.
“Generous negative space” earns its keep because a header with breathing room looks intentional, and it leaves room for the title overlay.
Name a palette. “Muted blue and slate” gives the whole blog a consistent feel across posts without me building an actual brand guide.
Don’t ask for legible labels. No SDXL checkpoint can write “k3s control plane” cleanly inside the picture. If a diagram needs real labels, I generate the scene and add text in the layout layer — or just hand-draw an SVG. The category cover art on this blog is hand-authored SVG for exactly that reason; generation is for photos and illustrations, not labeled schematics.
Generation is half the job. The image still has to get onto the CDN. The path:
# 1. Generate (via the agent / MCP) → lands in ~/comfyui/output/mcp_*.png# 2. Eyeball it# get_recent_images(limit=5)# 3. Push it to S3 + CloudFrontpython content-gen/scripts/upload_media.py mcp_00042_.png --prefix media/photos
upload_media.py pushes the file to the content bucket under the media/photos/ prefix with a one-year immutable cache header, and CloudFront serves it from https://blog.zolty.systems/media/photos/.... The script detects mime types and converts where needed, so an iPhone HEIC dropped in the same folder comes out the other side as a browser-friendly JPEG.
Then in the post it’s a one-liner:
Generated locally, $0.
For the post’s cover image specifically, I co-locate the file in the page bundle directory next to index.md and reference it relatively — Hugo’s page-bundle convention. CDN-hosted media goes through upload_media.py; bundle images ride along in git. Either way, no manual S3 console clicking.
SDXL cannot do text or precise diagrams. If the image needs accurate labels, real UI, or a faithful schematic, generation is the wrong tool. I draw those by hand or in code.
Hands, fine print, and tiny repeated detail still break. I crop around problems more often than I’d like.
Reproducibility is seed-dependent. A seed: -1 is random; if I want the exact image back I have to capture the seed generate_image used. I started logging seeds for anything I might want to regenerate at a different size.
MPS has sharp edges. Apple’s Metal backend occasionally hits an unsupported op and falls back to CPU for a step, which tanks throughput. PYTORCH_ENABLE_MPS_FALLBACK=1 keeps it from crashing outright, but a “slow” generation is usually that.
Curation is the real cost. Free generation means I generate four and keep one. The GPU time is free; my taste is the bottleneck.
The model picker is 80% of quality. Match the checkpoint to the job before you touch the prompt.
A single well-defaulted MCP tool beats a node graph. I describe the image; the agent handles ComfyUI. I’ve opened the actual ComfyUI UI maybe twice this month.
Negative-prompt the garbage text every time. It’s the cheapest quality win available.
Log your seeds. Future-you will want to regenerate that header at 2× for a retina cover.
Generation is for art, not schematics. Photos and illustrations: yes. Labeled diagrams: hand-draw the SVG.
No homelab GPU? The exact same pipeline works against a cloud GPU — a DigitalOcean GPU Droplet running ComfyUI plus their Spaces object storage in place of S3 gets you prompt-to-CDN without owning the hardware. The only piece that’s truly local is the GPU; everything downstream is portable.
Affiliate Disclosure: Some links on this site are affiliate links (Amazon Associates, DigitalOcean referral). As an Amazon Associate, I earn from qualifying purchases. This does not affect the price you pay or my editorial independence — I only recommend products and services I personally use and trust.