Migrating from GitHub to self-hosted GitLab CE — and rebuilding it from S3

TL;DR

I moved every private homelab repo off GitHub onto a self-hosted GitLab CE 18.10 instance running on my k3s cluster. GitHub stays as a read-only mirror plus the break-glass k3s_bootstrap repo. Two weeks later I accidentally blkdiscard’d the GitLab volume and rebuilt the entire instance from an S3 backup. It worked, but the boring parts — runner re-registration, group tokens, container-registry pull secrets — were the real cost.

Why bother

GitHub was fine. GitHub Actions was fine. The thing that pushed me over was billing math plus blast radius:

ARC runners on k3s talked to GitHub over the internet. Every job pulled images, hit the GitHub API, and burned a per-installation rate-limit budget. When that budget exhausted, the listener crashlooped with 403 API rate limit exceeded and nothing scheduled.
My private code lived on someone else’s lawyers’ servers. Not the worst place, but irrelevant on the day they decide to interpret a TOS clause differently.
GitLab CE is one Helm chart and a Postgres. It runs entirely inside the cluster I already operate. The container registry comes free.

The plan: GitLab is canonical for home_k3s_cluster and every other private repo. GitHub stays the mirror, with one exception — k3s_bootstrap lives on GitHub forever, because if my cluster is offline I can’t pull break-glass scripts from a GitLab instance hosted on the cluster. That’s a circular dependency I refuse to debug at 2 a.m.

Architecture

                ┌──────────────────────────┐
                │  developer laptop         │
                └────────────┬─────────────┘
                             │ git push
                             ▼
   ┌─────────────────────────────────────────────────┐
   │ gitlab.k3s.internal.zolty.systems  (primary)    │
   │   ├─ gitaly (Longhorn PVC)                      │
   │   ├─ postgres (Longhorn PVC)                    │
   │   ├─ container registry (Longhorn PVC)          │
   │   └─ S3 backups → s3://gitlab-backup            │
   └─────────────────────────────────────────────────┘
                             │ mirror push
                             ▼
   ┌─────────────────────────────────────────────────┐
   │ github.com/zolty-mat/<repo>  (read-only mirror) │
   └─────────────────────────────────────────────────┘

Runners (gitlab-runner namespace) registered against the GitLab instance directly — no internet round-trip. CI templates live in a zolty-mat/ci-templates group repo and every project include:s them.

What broke during the cutover

1. Group-level CI/CD variables aren’t visible to subgroups by default

The first pipeline failed because HARBOR_PUSH_TOKEN wasn’t reachable. GitHub org secrets are global; GitLab group variables stop at the group boundary. Everything had to be re-defined at the zolty-mat group, marked masked (and protected for the registry credentials), and inherited downward.

2. Scheduled jobs needed a different gating pattern

GitHub Actions has schedule: cron at the workflow level. GitLab uses Pipeline Schedules (Build → Pipeline Schedules in the UI) plus a variable convention. I settled on:

nightly:backup:
  rules:
    - if: '$CI_PIPELINE_SOURCE == "schedule" && $SCHEDULED_JOB == "nightly_backup"'
  script:
    - ./scripts/backup.sh

Each schedule sets its own SCHEDULED_JOB value. One pipeline file, multiple cron entries, no duplicated YAML.

3. ARC’s `runs-on` labels don’t translate

GitHub Actions used runs-on: [self-hosted, k3s-runner-v2]. GitLab runners use tags. Re-tagging every job took an afternoon of sed.

4. Container registry auth changed shape

Harbor pull secrets are static and live forever. GitLab container registry uses deploy tokens scoped per-group. I created gitlab-registry-token, dropped it into every namespace that pulls from the new registry, and added it to the default service account’s imagePullSecrets.

The disaster-recovery rebuild

Two weeks in, I was reorganizing Longhorn volumes on k3s-agent-4 and ran blkdiscard against what I thought was a stale replica. It was the live GitLab data volume. Longhorn faithfully replicated the discard to the other replicas. GitLab was gone.

S3 to the rescue. Backups had been running nightly to s3://gitlab-backup/ since day one of the migration:

# inside the gitlab pod (back when it existed)
gitlab-backup create CRON=1
# uploaded by the gitlab helm chart's built-in S3 backup hook

Recovery procedure:

Delete the broken Helm release: helm uninstall gitlab -n gitlab
Delete the orphaned PVCs (Longhorn was confused).
Reinstall the Helm chart with the same values file.
Wait for the new (empty) instance to come up healthy.
Copy the most recent backup tarball into the new gitaly pod.
Run gitlab-backup restore BACKUP=<timestamp> and gitlab-ctl reconfigure.

Total downtime: about 90 minutes. Total data lost: 6 hours of commits, all of which were already pushed to the GitHub mirror, so I cherry-picked them back.

What the rebuild actually cost

The restore itself was the easy part. The boring parts that took the rest of the day:

Every CI runner needed re-registration. Runner registration tokens are server-side state; the new server didn’t recognize the old glrt-* tokens. I uninstalled the runner Helm release and reinstalled fresh — helm upgrade doesn’t fix it.
Every group access token had to be regenerated. I had at least five (Bitwarden sync, GitHub mirror push, k8s pull secrets, an Ansible inventory, a Terraform backend). Each consumer needed the new token.
Harbor’s ci-push robot token rotation script only updates GitHub secrets. It silently left the GitLab group variable stale for hours until I noticed builds failing.
The container registry had to be re-pushed. Image tags were referenced in running deployments; I re-tagged from Harbor’s proxy cache and pushed back to the new GitLab registry.

Lessons

Test the restore before you need it. I had backups. I had not run gitlab-backup restore end-to-end on a scratch instance. The first time I did it was during the actual incident. It worked, but only because the GitLab Helm chart’s restore path is unusually well-documented.
Helm state is sticky. The reflex helm upgrade --install doesn’t reset runner registrations or rebuild PVCs. Sometimes you need helm uninstall and a clean install — especially for things with server-side identity.
Token rotation scripts must update every consumer. A script that updates GitHub secrets only is a foot-gun the day GitLab becomes the source of truth. Audit every rotator after a migration.
Keep one repo on the other platform. k3s_bootstrap on GitHub is the only reason I could imagine recovering from a hypothetical “the cluster won’t come up” scenario. Self-hosting everything is a circular dependency you only notice when it bites.

What’s next

The container registry retirement of Harbor is in flight — GitLab’s registry is taking over for new builds, with Harbor staying on as a proxy cache for upstream images (docker.io, ghcr.io, lscr.io, quay.io). That’s the next post.

TL;DR#

Why bother#

Architecture#

What broke during the cutover#

1. Group-level CI/CD variables aren’t visible to subgroups by default#

2. Scheduled jobs needed a different gating pattern#

3. ARC’s runs-on labels don’t translate#

4. Container registry auth changed shape#

The disaster-recovery rebuild#

What the rebuild actually cost#

Lessons#

What’s next#