TL;DR
Every node in my k3s cluster used to pull images directly from docker.io, ghcr.io, lscr.io, and quay.io. That meant Docker Hub rate limits, occasional 5xx storms from ghcr, and a hard outage when quay.io went sideways for a few hours. I put Harbor in front of all of them as a proxy cache, pointed containerd at Harbor, and the registry-related noise in my cluster effectively went to zero. Image pulls also got faster — 10GbE LAN beats every public CDN I’ve measured against.
The problem
A k3s cluster with seven nodes, dozens of namespaces, and frequent pod churn does a lot of image pulls. Most of them are unauthenticated. Docker Hub started enforcing rate limits years ago — 100 anonymous pulls per IP per six hours. My cluster shares a single NAT IP. Math is unkind.
The first time I noticed was during a cluster upgrade. Half the deployments got stuck in ImagePullBackOff with 429 Too Many Requests. I worked around it by authenticating to Docker Hub and adding a pull secret to every namespace, which lifts the limit to 200/6h. That bought me six months and the same problem came back.
Beyond rate limits:
- ghcr.io occasionally returns 5xx during their incidents. My cluster sees it.
- lscr.io (LinuxServer.io) is fast but their CDN region selection is mediocre from where I sit.
- quay.io had a multi-hour outage that took out anything pulling Red Hat-adjacent images.
I already ran Harbor for my own pushed images. Harbor supports proxy-cache projects out of the box. The fix was obvious; I’d just been lazy.
How a Harbor proxy cache works
A Harbor proxy cache project is a project configured with an upstream registry endpoint. When a client pulls harbor.k3s.internal.zolty.systems/dockerhub-proxy/library/postgres:16-alpine, Harbor:
- Checks if it has the manifest+layers cached.
- If not, fetches from
docker.io/library/postgres:16-alpine. - Stores everything locally.
- Serves it back.
Subsequent pulls hit Harbor only. Manifests are revalidated according to a TTL so you still get updates, but layers — which are content-addressed — never get re-downloaded once cached.
Configuration
Step 1: Registry endpoints
In Harbor → Administration → Registries, I added one endpoint per upstream:
| Name | Provider | URL |
|---|---|---|
dockerhub | Docker Hub | https://hub.docker.com |
ghcr | Docker Registry | https://ghcr.io |
lscr | Docker Registry | https://lscr.io |
quay | Quay | https://quay.io |
For Docker Hub I added authenticated credentials — the rate limit on authenticated pulls is much higher, and Harbor is the only thing pulling, so one set of creds covers the whole cluster.
Step 2: Proxy-cache projects
For each endpoint, a project with Proxy Cache enabled:
dockerhub-proxy→dockerhubghcr-proxy→ghcrlscr-proxy→lscrquay-proxy→quay
Step 3: Point containerd at Harbor
The k3s registries.yaml file (/etc/rancher/k3s/registries.yaml on every node) gets a mirror entry per upstream:
mirrors:
docker.io:
endpoint:
- "https://harbor.k3s.internal.zolty.systems/v2/dockerhub-proxy"
ghcr.io:
endpoint:
- "https://harbor.k3s.internal.zolty.systems/v2/ghcr-proxy"
lscr.io:
endpoint:
- "https://harbor.k3s.internal.zolty.systems/v2/lscr-proxy"
quay.io:
endpoint:
- "https://harbor.k3s.internal.zolty.systems/v2/quay-proxy"
configs:
"harbor.k3s.internal.zolty.systems":
auth:
username: robot$cluster-pull
password: <robot-token>
A pull-only Harbor robot account (robot$cluster-pull) authenticates the cluster against Harbor. Restart k3s on each node and crictl pull docker.io/library/alpine now silently goes through the proxy.
Step 4: Storage sizing
Proxy caches grow. I gave Harbor a 500GB Longhorn PVC and added a Harbor garbage-collection schedule (Administration → Garbage Collection → weekly). After two months of cluster churn the cache settled around 180GB.
Verification
The fastest sanity check: pull an image you know was never on the cluster, and watch Harbor’s project view.
$ crictl pull docker.io/library/postgres:16-alpine
Image is up to date for sha256:8b...
# Harbor UI: dockerhub-proxy → library/postgres now shows the manifest
Then pull it again from a different node. The second pull is dramatically faster (LAN vs. WAN) and the upstream registry never gets touched.
What I gained
- No more rate-limit incidents. A single Docker Hub pull-through credential covers the whole cluster.
- Survived a quay.io outage. Pods that referenced quay images kept restarting fine because the manifests and layers were already cached.
- Faster pulls. 10GbE LAN beats CDN edges. Cold-pull latency for a 200MB image dropped from ~8s to ~1.5s.
- One audit point. Every image my cluster runs is observable through Harbor — one place to look for “what did we pull, when, from where”.
Gotchas
- Don’t push to a proxy-cache project. It’s read-only from the client side; pushes go to a normal Harbor project.
- The robot account needs pull-only scope across all proxy-cache projects. I made the mistake of giving it project-admin once and then panicked when I realized it could delete cached blobs.
- Manifest TTL matters. The default revalidation interval for
latesttags is short, but pinned tags are cached forever. That’s correct behavior — content-addressed layers are immutable — but worth understanding. - Don’t proxy-cache your own registry. I considered making
harbor-proxymirror itself for fun. Don’t.
What’s next
The next phase is retiring Harbor’s role as a push registry — GitLab’s container registry takes over for new builds — but Harbor stays around forever as the proxy-cache layer. Splitting “where I push” from “where the cluster pulls” is the right architectural seam, and I should have done it sooner.