TL;DR

Every node in my k3s cluster used to pull images directly from docker.io, ghcr.io, lscr.io, and quay.io. That meant Docker Hub rate limits, occasional 5xx storms from ghcr, and a hard outage when quay.io went sideways for a few hours. I put Harbor in front of all of them as a proxy cache, pointed containerd at Harbor, and the registry-related noise in my cluster effectively went to zero. Image pulls also got faster — 10GbE LAN beats every public CDN I’ve measured against.

The problem

A k3s cluster with seven nodes, dozens of namespaces, and frequent pod churn does a lot of image pulls. Most of them are unauthenticated. Docker Hub started enforcing rate limits years ago — 100 anonymous pulls per IP per six hours. My cluster shares a single NAT IP. Math is unkind.

The first time I noticed was during a cluster upgrade. Half the deployments got stuck in ImagePullBackOff with 429 Too Many Requests. I worked around it by authenticating to Docker Hub and adding a pull secret to every namespace, which lifts the limit to 200/6h. That bought me six months and the same problem came back.

Beyond rate limits:

  • ghcr.io occasionally returns 5xx during their incidents. My cluster sees it.
  • lscr.io (LinuxServer.io) is fast but their CDN region selection is mediocre from where I sit.
  • quay.io had a multi-hour outage that took out anything pulling Red Hat-adjacent images.

I already ran Harbor for my own pushed images. Harbor supports proxy-cache projects out of the box. The fix was obvious; I’d just been lazy.

How a Harbor proxy cache works

A Harbor proxy cache project is a project configured with an upstream registry endpoint. When a client pulls harbor.k3s.internal.zolty.systems/dockerhub-proxy/library/postgres:16-alpine, Harbor:

  1. Checks if it has the manifest+layers cached.
  2. If not, fetches from docker.io/library/postgres:16-alpine.
  3. Stores everything locally.
  4. Serves it back.

Subsequent pulls hit Harbor only. Manifests are revalidated according to a TTL so you still get updates, but layers — which are content-addressed — never get re-downloaded once cached.

Configuration

Step 1: Registry endpoints

In Harbor → Administration → Registries, I added one endpoint per upstream:

NameProviderURL
dockerhubDocker Hubhttps://hub.docker.com
ghcrDocker Registryhttps://ghcr.io
lscrDocker Registryhttps://lscr.io
quayQuayhttps://quay.io

For Docker Hub I added authenticated credentials — the rate limit on authenticated pulls is much higher, and Harbor is the only thing pulling, so one set of creds covers the whole cluster.

Step 2: Proxy-cache projects

For each endpoint, a project with Proxy Cache enabled:

  • dockerhub-proxydockerhub
  • ghcr-proxyghcr
  • lscr-proxylscr
  • quay-proxyquay

Step 3: Point containerd at Harbor

The k3s registries.yaml file (/etc/rancher/k3s/registries.yaml on every node) gets a mirror entry per upstream:

mirrors:
  docker.io:
    endpoint:
      - "https://harbor.k3s.internal.zolty.systems/v2/dockerhub-proxy"
  ghcr.io:
    endpoint:
      - "https://harbor.k3s.internal.zolty.systems/v2/ghcr-proxy"
  lscr.io:
    endpoint:
      - "https://harbor.k3s.internal.zolty.systems/v2/lscr-proxy"
  quay.io:
    endpoint:
      - "https://harbor.k3s.internal.zolty.systems/v2/quay-proxy"

configs:
  "harbor.k3s.internal.zolty.systems":
    auth:
      username: robot$cluster-pull
      password: <robot-token>

A pull-only Harbor robot account (robot$cluster-pull) authenticates the cluster against Harbor. Restart k3s on each node and crictl pull docker.io/library/alpine now silently goes through the proxy.

Step 4: Storage sizing

Proxy caches grow. I gave Harbor a 500GB Longhorn PVC and added a Harbor garbage-collection schedule (Administration → Garbage Collection → weekly). After two months of cluster churn the cache settled around 180GB.

Verification

The fastest sanity check: pull an image you know was never on the cluster, and watch Harbor’s project view.

$ crictl pull docker.io/library/postgres:16-alpine
Image is up to date for sha256:8b...

# Harbor UI: dockerhub-proxy → library/postgres now shows the manifest

Then pull it again from a different node. The second pull is dramatically faster (LAN vs. WAN) and the upstream registry never gets touched.

What I gained

  • No more rate-limit incidents. A single Docker Hub pull-through credential covers the whole cluster.
  • Survived a quay.io outage. Pods that referenced quay images kept restarting fine because the manifests and layers were already cached.
  • Faster pulls. 10GbE LAN beats CDN edges. Cold-pull latency for a 200MB image dropped from ~8s to ~1.5s.
  • One audit point. Every image my cluster runs is observable through Harbor — one place to look for “what did we pull, when, from where”.

Gotchas

  • Don’t push to a proxy-cache project. It’s read-only from the client side; pushes go to a normal Harbor project.
  • The robot account needs pull-only scope across all proxy-cache projects. I made the mistake of giving it project-admin once and then panicked when I realized it could delete cached blobs.
  • Manifest TTL matters. The default revalidation interval for latest tags is short, but pinned tags are cached forever. That’s correct behavior — content-addressed layers are immutable — but worth understanding.
  • Don’t proxy-cache your own registry. I considered making harbor-proxy mirror itself for fun. Don’t.

What’s next

The next phase is retiring Harbor’s role as a push registry — GitLab’s container registry takes over for new builds — but Harbor stays around forever as the proxy-cache layer. Splitting “where I push” from “where the cluster pulls” is the right architectural seam, and I should have done it sooner.