TL;DR
I am designing a WireGuard VPN mesh to connect a small tech collective – a group of friends who each run their own infrastructure. The topology is hub-and-spoke with my k3s cluster as the hub, connecting 4+ remote sites over encrypted tunnels. Shared services include Jellyfin media federation, distributed CI/CD runners, LAN gaming, and centralized monitoring. The logging pipeline is privacy-first: all log filtering and anonymization happens at the edge (spoke side) before anything ships to the hub. This post covers the network design, the three-layer firewall architecture, the privacy model, and the phased rollout plan.
The Collective
This is not a corporate VPN. It is a group of friends who are all technical, all run some form of home infrastructure, and want to share resources without giving up control of their own networks. Everyone has their own ISP, their own hardware, and their own opinions about how things should work.
The requirements came from actual conversations:
- “Can I watch your Jellyfin library from my house?”
- “Can I run CI jobs on your cluster when my laptop is too slow?”
- “Can we play Minecraft without someone hosting a public server?”
- “I want monitoring for my homelab but I do not want to run Grafana myself”
The answer to all of these is “yes, with a VPN.” But the implementation has to respect that these are independent sites, not branch offices.
Network Design
Topology: Hub-and-Spoke
┌─────────────────────┐
│ Hub (k3s cluster) │
│ 10.100.0.0/24 │
│ WireGuard :51821 │
└──────────┬──────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ Spoke 1 │ │ Spoke 2 │ │ Spoke 3 │
│ 10.100.1/24│ │ 10.100.2/24│ │ 10.100.3/24│
│ Steve's │ │ Alex's │ │ Bryce's │
└─────────────┘ └─────────────┘ └─────────────┘
Hub-and-spoke, not full mesh. Reasons:
- Simplicity. Full mesh with N sites requires N*(N-1)/2 tunnels. Hub-and-spoke requires N tunnels. With 4+ sites, the management overhead difference is significant.
- Centralized services. The hub has the monitoring stack, the CI runners, and the media library. Most traffic flows to/from the hub anyway.
- Selective direct links. If two spokes need low-latency connectivity (gaming), I can add a direct WireGuard tunnel between them without changing the overall architecture.
Address Space
Collective supernet: 10.100.0.0/16
Hub (k3s cluster): 10.100.0.0/24
Spoke 1: 10.100.1.0/24
Spoke 2: 10.100.2.0/24
...
Seedbox VPN (separate): 10.200.0.0/24 ← never touches collective traffic
The collective supernet is 10.100.0.0/16. Each site gets a /24. The seedbox VPN (10.200.0.0/24) stays completely isolated – different WireGuard interface, different routing table, no cross-contamination.
Transport
WireGuard is the primary transport. UDP 51821 on the hub, forwarded through the UDM Pro. Most members are on FiOS with symmetric gigabit, so throughput is not a concern.
For sites behind restrictive firewalls that block UDP, a fallback path uses wstunnel to wrap WireGuard in HTTPS (TCP 443). This adds latency but works through corporate firewalls and hotel networks.
Three-Layer Firewall
Security is not optional when you are connecting other people’s networks to yours. Three independent layers, each with a different scope:
Layer 1: iptables on the Hub Node
YAML-driven IaC. The firewall rules are defined in a config file, rendered by Ansible, and applied to the WireGuard interface. This controls which spoke subnets can reach which hub services at the IP level.
# Example: Spoke 1 can reach Jellyfin and Grafana, nothing else
spoke_1:
allowed_destinations:
- 10.100.0.10:8096 # Jellyfin
- 10.100.0.10:3000 # Grafana
denied_destinations:
- 10.100.0.0/24 # Everything else on hub
Layer 2: Kubernetes NetworkPolicy
Standard k3s NetworkPolicies restrict which pods accept traffic from the collective supernet. Even if iptables is misconfigured, a pod without an explicit NetworkPolicy allowing 10.100.0.0/16 ingress will reject mesh traffic.
Layer 3: Traefik ipAllowList Middleware
For web-exposed services, Traefik middleware restricts access to collective IPs:
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
name: collective-only
spec:
ipAllowList:
sourceRange:
- 10.100.0.0/16
Three independent layers means a misconfiguration in any one layer does not expose services. An attacker would need to compromise all three simultaneously.
Privacy-First Logging
This is the part that required the most design thought. Centralized monitoring is useful, but shipping raw logs from someone else’s network to your server creates a trust problem. Even among friends.
The Principle
All log filtering and anonymization happens at the spoke, before anything leaves the site. The hub never sees raw DNS queries, browsing data, or internal hostnames.
Implementation: Grafana Alloy at the Edge
Each spoke runs Grafana Alloy (formerly Grafana Agent) as the log shipper. Alloy has built-in processing stages that filter, transform, and anonymize logs before they leave the spoke.
Log Classification
| Log Type | Treatment | What the Hub Sees |
|---|---|---|
| DNS queries | Anonymized | Query counts by category, not actual domains |
| Firewall logs | Aggregated | Block/allow counts by source subnet, not IPs |
| WireGuard handshakes | Passed through | Peer connection status (needed for monitoring) |
| System metrics | Passed through | CPU, memory, disk, network (non-sensitive) |
| Application logs | Filtered | Health status only, no content |
Alert Tiers
Not every alert should be visible to every member:
| Tier | Visibility | Example |
|---|---|---|
| Site-only | Only the spoke owner | Disk space warning on their NAS |
| Collective-security | All members | WireGuard handshake failures, brute-force attempts |
| Collective-health | All members | Hub service degradation, mesh connectivity loss |
A spoke owner sees all alerts about their own site. They see collective-wide security and health alerts. They never see another spoke’s site-only alerts.
Shared Services
The whole point of the mesh. Six tiers, deployed incrementally:
Tier 1: Monitoring (Day 1)
Every spoke gets access to the hub’s Grafana and Prometheus. Their Alloy agents ship metrics to the hub’s Prometheus via remote-write. They see their own dashboards in Grafana, plus collective-wide views of mesh health. Wiki.js provides shared documentation. Harbor and Gitea provide package hosting.
Tier 2: Media Sharing
Jellyfin federation across the mesh. Spoke members can browse and stream from the hub’s media library without exposing it to the internet. This is the killer feature – the reason most people join.
Tier 3: Distributed CI/CD
GitHub ARC runner pods on the hub, available to collective members’ repositories. When someone pushes to their repo, CI jobs run on the cluster’s hardware instead of GitHub’s shared runners. Faster builds, no usage limits.
Tier 4: LAN Gaming
Minecraft, Valheim, and Steam Remote Play over the mesh. WireGuard’s low latency makes this practical. Game servers run on the hub as k3s deployments. Selective direct spoke-to-spoke links for latency-sensitive games.
Tier 5: Web Hosting
Shared static site hosting for collective members. S3-compatible storage on the hub, CloudFront distributions managed via Terraform. Members can deploy personal sites without managing their own infrastructure.
Tier 6: Future – Shared AI Compute
When the local LLM inference stack goes live (Ray Serve + vLLM on the GPU nodes), collective members get access to self-hosted AI models over the mesh. No API costs, no data leaving the collective.
Phased Rollout
| Phase | Duration | Scope |
|---|---|---|
| 1 | 1-2 days | Hub prep: WireGuard role, port forwarding, firewall IaC |
| 2 | 1-2 days | First spoke onboarding (Steve – most technical, best debugging partner) |
| 3 | 2-3 days | Monitoring infrastructure: Alloy config, Grafana dashboards, alert routing |
| 4 | 2-3 days | Multi-spoke rollout (remaining members) |
| 5 | 1-2 days | HTTPS fallback via wstunnel |
| 6 | 3-5 days | Service sharing: Jellyfin federation, CI/CD, game servers |
Total estimated effort: 2-3 weeks of evenings and weekends. The new repository (collective-mesh) will contain all Ansible roles for hub, spoke, firewall, and logging configuration.
Identity Integration
This mesh design was built with Authentik in mind. Phase 4 of the Authentik plan deploys mesh-specific groups, an LDAP outpost for services that need it, and OIDC endpoints accessible from the collective supernet. Every mesh member authenticates through one identity system, regardless of which site they are connecting from.
Why Not Tailscale or ZeroTier?
Both are excellent products. I use Tailscale personally. But for this use case:
- Control. I want to see every packet that crosses my network boundary. Tailscale’s relay servers (DERP) and ZeroTier’s root servers are third-party infrastructure I cannot inspect or audit.
- Cost at scale. Free tiers work for personal use. A collective with shared services and multiple users per site pushes into paid territory.
- Learning. Building WireGuard infrastructure from scratch teaches networking fundamentals that managed VPN products abstract away. The collective members are all technical – they want to understand the system, not just use it.
- Integration. Native WireGuard peers integrate cleanly with iptables, Kubernetes NetworkPolicies, and Traefik middleware. Overlay networks add abstraction layers that complicate firewall rules.
Lessons Learned (So Far)
This project is still in planning, so these are design-phase lessons rather than operational ones:
Privacy-first logging is a design constraint, not a feature toggle. You cannot retrofit anonymization onto a logging pipeline that was built to ship everything. The privacy model had to be designed before any code was written.
Hub-and-spoke scales better than you think. For a small collective (4-8 sites), the hub is not a bottleneck. WireGuard’s CPU overhead is negligible, and most shared services are request/response patterns, not continuous streaming.
Trust boundaries are more important than encryption. WireGuard encrypts everything in transit. But the harder question is: what data should the hub be allowed to see at all? Encryption solves confidentiality in transit; data minimization solves confidentiality at rest.
Three firewall layers seems like overkill until you misconfigure one. During planning, I found two scenarios where a single-layer firewall would have exposed internal services. Defense in depth is not paranoia – it is engineering for human error.