Building a VPN Mesh for a Tech Collective

TL;DR

I am designing a WireGuard VPN mesh to connect a small tech collective – a group of friends who each run their own infrastructure. The topology is hub-and-spoke with my k3s cluster as the hub, connecting 4+ remote sites over encrypted tunnels. Shared services include Jellyfin media federation, distributed CI/CD runners, LAN gaming, and centralized monitoring. The logging pipeline is privacy-first: all log filtering and anonymization happens at the edge (spoke side) before anything ships to the hub. This post covers the network design, the three-layer firewall architecture, the privacy model, and the phased rollout plan.

The Collective

This is not a corporate VPN. It is a group of friends who are all technical, all run some form of home infrastructure, and want to share resources without giving up control of their own networks. Everyone has their own ISP, their own hardware, and their own opinions about how things should work.

The requirements came from actual conversations:

“Can I watch your Jellyfin library from my house?”
“Can I run CI jobs on your cluster when my laptop is too slow?”
“Can we play Minecraft without someone hosting a public server?”
“I want monitoring for my homelab but I do not want to run Grafana myself”

The answer to all of these is “yes, with a VPN.” But the implementation has to respect that these are independent sites, not branch offices.

Network Design

Topology: Hub-and-Spoke

                    ┌─────────────────────┐
                    │   Hub (k3s cluster)  │
                    │   10.100.0.0/24      │
                    │   WireGuard :51821   │
                    └──────────┬──────────┘
                               │
          ┌────────────────────┼────────────────────┐
          │                    │                    │
   ┌──────▼──────┐     ┌──────▼──────┐     ┌──────▼──────┐
   │  Spoke 1    │     │  Spoke 2    │     │  Spoke 3    │
   │  10.100.1/24│     │  10.100.2/24│     │  10.100.3/24│
   │  Steve's    │     │  Alex's     │     │  Bryce's    │
   └─────────────┘     └─────────────┘     └─────────────┘

Hub-and-spoke, not full mesh. Reasons:

Simplicity. Full mesh with N sites requires N*(N-1)/2 tunnels. Hub-and-spoke requires N tunnels. With 4+ sites, the management overhead difference is significant.
Centralized services. The hub has the monitoring stack, the CI runners, and the media library. Most traffic flows to/from the hub anyway.
Selective direct links. If two spokes need low-latency connectivity (gaming), I can add a direct WireGuard tunnel between them without changing the overall architecture.

Address Space

Collective supernet:   10.100.0.0/16
Hub (k3s cluster):     10.100.0.0/24
Spoke 1:               10.100.1.0/24
Spoke 2:               10.100.2.0/24
...
Seedbox VPN (separate): 10.200.0.0/24  ← never touches collective traffic

The collective supernet is 10.100.0.0/16. Each site gets a /24. The seedbox VPN (10.200.0.0/24) stays completely isolated – different WireGuard interface, different routing table, no cross-contamination.

Transport

WireGuard is the primary transport. UDP 51821 on the hub, forwarded through the UDM Pro. Most members are on FiOS with symmetric gigabit, so throughput is not a concern.

For sites behind restrictive firewalls that block UDP, a fallback path uses wstunnel to wrap WireGuard in HTTPS (TCP 443). This adds latency but works through corporate firewalls and hotel networks.

Three-Layer Firewall

Security is not optional when you are connecting other people’s networks to yours. Three independent layers, each with a different scope:

Layer 1: iptables on the Hub Node

YAML-driven IaC. The firewall rules are defined in a config file, rendered by Ansible, and applied to the WireGuard interface. This controls which spoke subnets can reach which hub services at the IP level.

# Example: Spoke 1 can reach Jellyfin and Grafana, nothing else
spoke_1:
  allowed_destinations:
    - 10.100.0.10:8096   # Jellyfin
    - 10.100.0.10:3000   # Grafana
  denied_destinations:
    - 10.100.0.0/24      # Everything else on hub

Layer 2: Kubernetes NetworkPolicy

Standard k3s NetworkPolicies restrict which pods accept traffic from the collective supernet. Even if iptables is misconfigured, a pod without an explicit NetworkPolicy allowing 10.100.0.0/16 ingress will reject mesh traffic.

Layer 3: Traefik ipAllowList Middleware

For web-exposed services, Traefik middleware restricts access to collective IPs:

apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: collective-only
spec:
  ipAllowList:
    sourceRange:
      - 10.100.0.0/16

Three independent layers means a misconfiguration in any one layer does not expose services. An attacker would need to compromise all three simultaneously.

Privacy-First Logging

This is the part that required the most design thought. Centralized monitoring is useful, but shipping raw logs from someone else’s network to your server creates a trust problem. Even among friends.

The Principle

All log filtering and anonymization happens at the spoke, before anything leaves the site. The hub never sees raw DNS queries, browsing data, or internal hostnames.

Implementation: Grafana Alloy at the Edge

Each spoke runs Grafana Alloy (formerly Grafana Agent) as the log shipper. Alloy has built-in processing stages that filter, transform, and anonymize logs before they leave the spoke.

Log Classification

Log Type	Treatment	What the Hub Sees
DNS queries	Anonymized	Query counts by category, not actual domains
Firewall logs	Aggregated	Block/allow counts by source subnet, not IPs
WireGuard handshakes	Passed through	Peer connection status (needed for monitoring)
System metrics	Passed through	CPU, memory, disk, network (non-sensitive)
Application logs	Filtered	Health status only, no content

Alert Tiers

Not every alert should be visible to every member:

Tier	Visibility	Example
Site-only	Only the spoke owner	Disk space warning on their NAS
Collective-security	All members	WireGuard handshake failures, brute-force attempts
Collective-health	All members	Hub service degradation, mesh connectivity loss

A spoke owner sees all alerts about their own site. They see collective-wide security and health alerts. They never see another spoke’s site-only alerts.

Shared Services

The whole point of the mesh. Six tiers, deployed incrementally:

Tier 1: Monitoring (Day 1)

Every spoke gets access to the hub’s Grafana and Prometheus. Their Alloy agents ship metrics to the hub’s Prometheus via remote-write. They see their own dashboards in Grafana, plus collective-wide views of mesh health. Wiki.js provides shared documentation. Harbor and Gitea provide package hosting.

Jellyfin federation across the mesh. Spoke members can browse and stream from the hub’s media library without exposing it to the internet. This is the killer feature – the reason most people join.

Tier 3: Distributed CI/CD

GitHub ARC runner pods on the hub, available to collective members’ repositories. When someone pushes to their repo, CI jobs run on the cluster’s hardware instead of GitHub’s shared runners. Faster builds, no usage limits.

Tier 4: LAN Gaming

Minecraft, Valheim, and Steam Remote Play over the mesh. WireGuard’s low latency makes this practical. Game servers run on the hub as k3s deployments. Selective direct spoke-to-spoke links for latency-sensitive games.

Tier 5: Web Hosting

Shared static site hosting for collective members. S3-compatible storage on the hub, CloudFront distributions managed via Terraform. Members can deploy personal sites without managing their own infrastructure.

Tier 6: Future – Shared AI Compute

When the local LLM inference stack goes live (Ray Serve + vLLM on the GPU nodes), collective members get access to self-hosted AI models over the mesh. No API costs, no data leaving the collective.

Phased Rollout

Phase	Duration	Scope
1	1-2 days	Hub prep: WireGuard role, port forwarding, firewall IaC
2	1-2 days	First spoke onboarding (Steve – most technical, best debugging partner)
3	2-3 days	Monitoring infrastructure: Alloy config, Grafana dashboards, alert routing
4	2-3 days	Multi-spoke rollout (remaining members)
5	1-2 days	HTTPS fallback via wstunnel
6	3-5 days	Service sharing: Jellyfin federation, CI/CD, game servers

Total estimated effort: 2-3 weeks of evenings and weekends. The new repository (collective-mesh) will contain all Ansible roles for hub, spoke, firewall, and logging configuration.

Identity Integration

This mesh design was built with Authentik in mind. Phase 4 of the Authentik plan deploys mesh-specific groups, an LDAP outpost for services that need it, and OIDC endpoints accessible from the collective supernet. Every mesh member authenticates through one identity system, regardless of which site they are connecting from.

Why Not Tailscale or ZeroTier?

Both are excellent products. I use Tailscale personally. But for this use case:

Control. I want to see every packet that crosses my network boundary. Tailscale’s relay servers (DERP) and ZeroTier’s root servers are third-party infrastructure I cannot inspect or audit.
Cost at scale. Free tiers work for personal use. A collective with shared services and multiple users per site pushes into paid territory.
Learning. Building WireGuard infrastructure from scratch teaches networking fundamentals that managed VPN products abstract away. The collective members are all technical – they want to understand the system, not just use it.
Integration. Native WireGuard peers integrate cleanly with iptables, Kubernetes NetworkPolicies, and Traefik middleware. Overlay networks add abstraction layers that complicate firewall rules.

Lessons Learned (So Far)

This project is still in planning, so these are design-phase lessons rather than operational ones:

Privacy-first logging is a design constraint, not a feature toggle. You cannot retrofit anonymization onto a logging pipeline that was built to ship everything. The privacy model had to be designed before any code was written.
Hub-and-spoke scales better than you think. For a small collective (4-8 sites), the hub is not a bottleneck. WireGuard’s CPU overhead is negligible, and most shared services are request/response patterns, not continuous streaming.
Trust boundaries are more important than encryption. WireGuard encrypts everything in transit. But the harder question is: what data should the hub be allowed to see at all? Encryption solves confidentiality in transit; data minimization solves confidentiality at rest.
Three firewall layers seems like overkill until you misconfigure one. During planning, I found two scenarios where a single-layer firewall would have exposed internal services. Defense in depth is not paranoia – it is engineering for human error.

TL;DR#

The Collective#

Network Design#

Topology: Hub-and-Spoke#

Address Space#

Transport#

Three-Layer Firewall#

Layer 1: iptables on the Hub Node#

Layer 2: Kubernetes NetworkPolicy#

Layer 3: Traefik ipAllowList Middleware#

Privacy-First Logging#

The Principle#

Implementation: Grafana Alloy at the Edge#

Log Classification#

Alert Tiers#

Shared Services#

Tier 1: Monitoring (Day 1)#

Tier 2: Media Sharing#

Tier 3: Distributed CI/CD#

Tier 4: LAN Gaming#

Tier 5: Web Hosting#

Tier 6: Future – Shared AI Compute#

Phased Rollout#

Identity Integration#

Why Not Tailscale or ZeroTier?#

Lessons Learned (So Far)#