Writing MCP servers for your homelab: five tools, 200 lines, and your agents get hands

TL;DR

Model Context Protocol (MCP) is a transport layer that lets Claude and other LLM agents call local tools with typed signatures and structured responses. Any HTTP API running on your homelab — ComfyUI, a wiki, a dashboard, a custom service — can become a set of agent-callable tools by wrapping it in a FastMCP server. A typical server takes 150–250 lines of Python, exposes 3–5 tools via @mcp.tool() decorators, and runs as a stdio process. The pattern scales from single-purpose (image generation) to multi-tool (queue status, model listing, system stats) without complexity explosion. This post shows the anatomy by dissecting the ComfyUI MCP server: how to build workflows, poll for completion, parse results, and return structured JSON that agents actually use.

Why local MCP servers matter

Before MCP, if you wanted an agent to do something on your local network — generate an image, fetch a wiki page, update an inventory database — you had two bad paths:

Write Python and pray. The agent writes and executes Python code that imports your library or hits the API, which means managing dependencies, sandboxing, error handling, and the agent getting it wrong.
Expose via a public API. Terrible for a homelab. HTTPS tunnels, auth, rate limiting — all friction.

MCP sidesteps both. You define a set of well-typed tool functions — generate_image(prompt: str, checkpoint: str = "...") -> str — and the agent calls them like any other tool. The JSON schema lives in the MCP protocol; the agent sees typed parameters and knows what’s required vs optional. Execution happens on your machine in a sandbox you control. No code generation, no untrusted remote APIs.

The real win is composability. An agent running locally on Claude Code can call ComfyUI, wiki, and dashboard tools in the same prompt. They all speak the same protocol. Add a tool in 20 lines of Python and the agent knows it without config changes.

Anatomy of an MCP server

Here’s the skeleton:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("my-service", instructions="Brief description of what I do.")

@mcp.tool()
def my_tool(param1: str, param2: int = 42) -> str:
    """Docstring becomes the tool description that agents see."""
    return "result as a string"

if __name__ == "__main__":
    mcp.run()

That’s it for a single-tool server. Deployed:

python server.py

Or in .mcp.json:

{
  "mcpServers": {
    "my-service": {
      "type": "stdio",
      "command": "python3",
      "args": ["server.py"],
      "cwd": "/path/to/repo"
    }
  }
}

Claude Code reads .mcp.json, launches the stdio process, and the server can now be invoked. When the agent calls my_tool, the MCP server serializes the parameters, calls the Python function, and ships the result back.

Worked example: the ComfyUI MCP server

ComfyUI has an HTTP API (/prompt, /history, /queue, etc.). To make it agent-callable, I wrapped it in a FastMCP server with five tools. (If you’re not yet familiar with ComfyUI’s setup on the Mac Studio, this post covers running it natively with MPS GPU access.) Here’s the real implementation:

Setup: environment and helpers

import json
import os
import time
import urllib.request
import urllib.error
import uuid
from pathlib import Path
from mcp.server.fastmcp import FastMCP

COMFYUI_URL = os.environ.get("COMFYUI_URL", "http://127.0.0.1:8188")
OUTPUT_DIR = os.environ.get("COMFYUI_OUTPUT_DIR", os.path.expanduser("~/comfyui/output"))

mcp = FastMCP("comfyui", instructions="Generate images using ComfyUI on the Mac Studio M3 Ultra (MPS GPU).")

def _api_get(path: str) -> dict:
    req = urllib.request.Request(f"{COMFYUI_URL}{path}")
    with urllib.request.urlopen(req, timeout=10) as resp:
        return json.loads(resp.read())

def _api_post(path: str, data: dict) -> dict:
    body = json.dumps(data).encode()
    req = urllib.request.Request(f"{COMFYUI_URL}{path}", data=body, headers={"Content-Type": "application/json"})
    with urllib.request.urlopen(req, timeout=30) as resp:
        return json.loads(resp.read())

The helpers _api_get and _api_post handle the boilerplate. By centralizing here, if ComfyUI’s API changes I only update one place.

Tool 1: generate_image — the core workflow

@mcp.tool()
def generate_image(
    prompt: str,
    negative_prompt: str = "",
    checkpoint: str = "RealVisXL_V5.0_fp16.safetensors",
    width: int = 1024,
    height: int = 1024,
    steps: int = 25,
    cfg: float = 7.0,
    seed: int = -1,
) -> str:
    """Generate an image from a text prompt using Stable Diffusion on Mac Studio MPS GPU.
    
    Args:
        prompt: Text description of the image to generate
        negative_prompt: Things to avoid in the image (e.g. "blurry, low quality")
        checkpoint: Model checkpoint to use. Available: RealVisXL_V5.0_fp16.safetensors, sd3.5_large.safetensors, Juggernaut-XI-byRunDiffusion.safetensors
        width: Image width in pixels (default 1024)
        height: Image height in pixels (default 1024)
        steps: Number of diffusion steps (more = higher quality, slower). Default 25.
        cfg: Classifier-free guidance scale (higher = closer to prompt). Default 7.0.
        seed: Random seed (-1 for random)
    """
    if seed == -1:
        seed = int(time.time() * 1000) % (2**32)
    
    # Build the workflow JSON — this is a ComfyUI-specific detail.
    # It maps the user's request to the right node graph.
    workflow = _build_sdxl_workflow(prompt, negative_prompt, checkpoint, width, height, steps, cfg, seed)
    client_id = str(uuid.uuid4())
    
    # POST the workflow to /prompt; ComfyUI returns a prompt_id
    result = _api_post("/prompt", {"prompt": workflow, "client_id": client_id})
    prompt_id = result["prompt_id"]
    
    # Poll for completion with a 5-minute timeout
    for attempt in range(300):
        time.sleep(1)
        history = _api_get(f"/history/{prompt_id}")
        if prompt_id in history:
            outputs = history[prompt_id].get("outputs", {})
            # Iterate over node outputs until we find the SaveImage node
            for node_id, node_output in outputs.items():
                if "images" in node_output:
                    images = node_output["images"]
                    paths = []
                    # Download each generated image to local disk
                    for img in images:
                        filename = img["filename"]
                        subfolder = img.get("subfolder", "")
                        params = f"filename={filename}&type=output"
                        if subfolder:
                            params += f"&subfolder={subfolder}"
                        view_url = f"{COMFYUI_URL}/view?{params}"
                        view_req = urllib.request.Request(view_url)
                        output_path = Path(OUTPUT_DIR)
                        output_path.mkdir(parents=True, exist_ok=True)
                        local_path = output_path / filename
                        with urllib.request.urlopen(view_req, timeout=30) as resp:
                            local_path.write_bytes(resp.read())
                        paths.append(str(local_path))
                    
                    # Return structured JSON so agents can parse it
                    return json.dumps({
                        "status": "success",
                        "prompt_id": prompt_id,
                        "seed": seed,
                        "checkpoint": checkpoint,
                        "images": paths,
                        "message": f"Generated {len(paths)} image(s) at {', '.join(paths)}",
                    })
            
            # Check for errors in the history
            status = history[prompt_id].get("status", {})
            if status.get("status_str") == "error":
                return json.dumps({"status": "error", "message": status.get("messages", "Unknown error")})
    
    return json.dumps({"status": "timeout", "prompt_id": prompt_id, "message": "Generation timed out after 5 minutes"})

Key design:

Sensible defaults. An agent doesn’t need to remember checkpoint filenames — it provides a description and the function picks one. Or it trusts the defaults.
Synchronous polling. ComfyUI queues jobs and returns a prompt_id. We poll /history/{prompt_id} until the result is ready. This blocks the MCP tool, which is fine for an agent-driven workflow (agents expect synchronous results).
Structured JSON. The return value is JSON with status, image paths, and metadata the agent can parse. Not a blob of text.

Tool 2: list_models — introspect what’s loaded

@mcp.tool()
def list_models() -> str:
    """List available ComfyUI model checkpoints and LoRAs."""
    info = _api_get("/object_info/CheckpointLoaderSimple")
    checkpoints = info["CheckpointLoaderSimple"]["input"]["required"]["ckpt_name"][0]
    
    lora_info = _api_get("/object_info/LoraLoader")
    loras = lora_info["LoraLoader"]["input"]["required"]["lora_name"][0]
    
    return json.dumps({
        "checkpoints": checkpoints,
        "loras": loras,
    })

This queries ComfyUI’s introspection API to list the models currently available. An agent can call this before generate_image to pick a model, or to report what’s ready. In six lines of Python, the agent gains model awareness.

Tools 3–5: queue, system, recent images

@mcp.tool()
def get_queue_status() -> str:
    """Check ComfyUI's current generation queue."""
    queue = _api_get("/queue")
    return json.dumps({
        "running": len(queue.get("queue_running", [])),
        "pending": len(queue.get("queue_pending", [])),
    })

@mcp.tool()
def get_system_stats() -> str:
    """Get ComfyUI system stats — version, GPU device, memory."""
    stats = _api_get("/system_stats")
    system = stats.get("system", {})
    devices = stats.get("devices", [])
    device = devices[0] if devices else {}
    vram_total_gb = device.get("vram_total", 0) / (1024**3)
    vram_free_gb = device.get("vram_free", 0) / (1024**3)
    return json.dumps({
        "version": system.get("comfyui_version"),
        "pytorch": system.get("pytorch_version"),
        "device": device.get("name"),
        "device_type": device.get("type"),
        "vram_total_gb": round(vram_total_gb, 1),
        "vram_free_gb": round(vram_free_gb, 1),
    })

@mcp.tool()
def get_recent_images(limit: int = 5) -> str:
    """List recently generated images from the output directory."""
    output_path = Path(OUTPUT_DIR)
    if not output_path.exists():
        return json.dumps({"images": [], "message": "Output directory not found"})
    
    images = sorted(output_path.glob("*.png"), key=lambda p: p.stat().st_mtime, reverse=True)[:limit]
    return json.dumps({
        "images": [{"path": str(img), "name": img.name, "size_mb": round(img.stat().st_size / (1024**2), 2)} for img in images],
    })

Three read-only queries: queue depth, system health, and what images just came out. Lightweight, composable. An agent can check the queue before submitting a big job, or list recent outputs to report what’s been generated.

Design patterns that work

1. Return structured JSON, not prose

# Bad
return "Generated an image with seed 12345 at /path/to/image.png successfully"

# Good
return json.dumps({"status": "success", "seed": 12345, "path": "/path/to/image.png"})

Agents can parse JSON. They can’t reliably parse prose.

2. Expose read-only status first

Before adding a write tool (generate, update, delete), add read-only companions (list, status, get). An agent can check the queue depth, list available models, or verify system health before submitting a long-running job. This prevents naive agents from overwhelming your API.

3. Defaults win over config

# Bad: agent has to know the defaults
generate_image(prompt="...", checkpoint="?", width="?", height="?", steps="?", cfg="?")

# Good: agent provides the prompt, everything else is sensible
generate_image(prompt="...", checkpoint="RealVisXL_V5.0", width=1024, height=1024, steps=25, cfg=7.0)

4. Idempotent reads, transactional writes

A read tool like get_queue_status() should be safe to call multiple times with no side effects. A write tool like generate_image should be transactional — either it succeeds fully or fails clearly. No half-states.

Honest caveats

Stdio transport is synchronous. If a tool takes 5 minutes (like ComfyUI generation), the MCP connection blocks. A websocket transport would allow concurrent requests, but FastMCP defaults to stdio because it’s simpler. Fine for homelab scale.
Tools are an attack surface. Each tool is an extra surface an agent can fumble. If @mcp.tool() decorated function uses shell commands or writes to the filesystem, audit it. A malicious agent (or a good one with a bug) could cause damage.
Secrets must come from the environment. Never hardcode API keys. .mcp.json passes environment variables to each server process. Store the secrets in Bitwarden and source them before launching Claude Code.
Error handling is your job. The agent can’t see stack traces. If a tool fails, return structured JSON with a status: "error" and a human-readable message.

Lessons

An MCP server is just a thin HTTP wrapper plus a schema. The hard part is understanding your underlying API and mapping it to agent-sensible tools, not writing the MCP plumbing.
Design for the agent, not the API. If ComfyUI’s raw JSON is verbose or nested, flatten it in the tool’s return value. An agent doesn’t care about your API’s shape — it cares about readable, parseable data.
A few sharp tools beat many dull ones. Five well-designed tools that compose together beat twenty half-baked ones. Prioritize breadth (reads: status, list, get) before depth (writes).
Synchronous tools work at homelab scale. For a personal agent calling local services, 5-minute generation timeouts are acceptable. If you need concurrency, reach for FastMCP’s websocket transport or a custom transport.
Defaults and clear names prevent agent confusion. generate_image(prompt="...", checkpoint="RealVisXL_V5.0") is more agent-usable than call_model(input="...", model_id=2, cfg_scale=7.0).

What’s next

The same pattern wraps any HTTP service. I have MCP servers for:

A local wiki (read/search/create pages)
System dashboards (fetch metrics, alert status)
Custom inventory APIs (read/write tracking)

Each is 100–250 lines of Python. The pattern scales: import your client library, decorate your functions, define the schema in the docstring, return JSON. For a worked example of how MCP servers feed into a blog image generation pipeline, check this post on building the ComfyUI-to-blog workflow.

No homelab? The exact same pattern works against cloud APIs — wrap them in FastMCP running on a laptop, authenticate with your credentials, and Claude Code calls them seamlessly. The difference is transport (stdio vs HTTP) and scale (homelab vs shared instance), not the core design. DigitalOcean’s App Platform can host a FastMCP server if you want it reachable across machines, though for single-user homelab work local stdio is simpler and cheaper.

TL;DR#

Why local MCP servers matter#

Anatomy of an MCP server#

Worked example: the ComfyUI MCP server#

Setup: environment and helpers#

Tool 1: generate_image — the core workflow#

Tool 2: list_models — introspect what’s loaded#

Tools 3–5: queue, system, recent images#

Design patterns that work#

1. Return structured JSON, not prose#

2. Expose read-only status first#

3. Defaults win over config#

4. Idempotent reads, transactional writes#

Honest caveats#

Lessons#

What’s next#