Stress Testing GPU Transcoding in Kubernetes with JF_hw_stress

TL;DR

JF_hw_stress is a headless transcoding stress tester that answers one question: how many concurrent transcode streams can your GPU actually handle before quality degrades? It runs escalating FFmpeg transcodes against real media files using VAAPI hardware acceleration, measures FPS ratios, and outputs a JSON report. I run it as a Kubernetes Job on the same k3s cluster from Cluster Genesis, scheduled exclusively on the GPU node (Intel UHD 630). The job auto-deletes after 10 minutes so it does not accumulate stale pods.

TODO_REDDIT_LINK

Source: GitHub - ZoltyMat/JF_hw_stress

The Problem

When I set up Jellyfin on Kubernetes with GPU passthrough on pve4, I needed to know the actual capacity of the Intel UHD 630 for hardware transcoding. Intel’s spec sheets give theoretical decode/encode capabilities, but the real-world throughput depends on the source codec, resolution, bitrate, and how many concurrent streams the GPU can handle before frame drops start.

I documented some of this in my transcoding capacity analysis, but those were estimates based on Intel’s QSV documentation. I wanted measured numbers from my actual hardware, running my actual media files, through the actual VAAPI pipeline that Jellyfin uses.

Existing benchmarks were not useful because:

HandBrake benchmarks test offline encoding speed, not real-time streaming throughput
Phoronix test suites use synthetic test clips, not real media with varying bitrates
Jellyfin’s own metrics only tell you about streams that are already running – they do not tell you how many more the GPU can handle before quality drops

How It Works

JF_hw_stress runs an escalating series of concurrent FFmpeg transcodes against real video files from your media library:

Phase 1: 1 concurrent stream   → measure FPS ratio
Phase 2: 2 concurrent streams  → measure FPS ratio
Phase 3: 3 concurrent streams  → measure FPS ratio
...continues until FPS ratio drops below threshold or timeout...

Each stream transcodes a real file from the media library using the same VAAPI pipeline Jellyfin uses:

ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
  -hwaccel_output_format vaapi \
  -i /media/movies/SomeMovie/SomeMovie.mkv \
  -c:v h264_vaapi -b:v 4M \
  -f null /dev/null

The key metric is the FPS ratio: actual encoding FPS divided by the source video’s FPS. A ratio above 1.0 means the GPU is keeping up with real-time. When the ratio drops below 0.9 (configurable), the GPU is saturated – that is your practical stream limit.

Test Parameters

Parameter	Value	Why
FPS threshold	90% of source FPS	Below this, viewers see buffering
Max duration	180 seconds	Enough to reach steady-state GPU utilization
Source resolution	1080p (primary)	Most common transcode scenario
Output codec	H.264 VAAPI	What Jellyfin actually uses for HLS
Auto-cleanup	10 minutes after completion	Prevents stale pods accumulating

The Kubernetes Job

The stress tester runs as a k3s Job with specific node affinity for the GPU node:

apiVersion: batch/v1
kind: Job
metadata:
  name: jf-hw-stress
  namespace: media
spec:
  ttlSecondsAfterFinished: 600
  template:
    spec:
      nodeSelector:
        gpu: intel-uhd-630
      containers:
        - name: stress
          image: python:3.12-slim
          command: ["bash", "-c"]
          args:
            - |
              apt-get update && apt-get install -y ffmpeg vainfo intel-media-va-driver-non-free curl
              pip install rich
              curl -sL https://raw.githubusercontent.com/ZoltyMat/JF_hw_stress/d9b25b2/jf_hw_stress.py -o /tmp/stress.py
              python3 /tmp/stress.py --device /dev/dri/renderD128 --source /media --duration 180 --threshold 0.9
          resources:
            limits:
              gpu.intel.com/i915: 1
          volumeMounts:
            - name: media
              mountPath: /media
              readOnly: true
            - name: dri
              mountPath: /dev/dri
      volumes:
        - name: media
          nfs:
            server: 192.168.1.100
            path: /volume1/media
        - name: dri
          hostPath:
            path: /dev/dri
      restartPolicy: Never

Key decisions:

ttlSecondsAfterFinished: 600 – the Job and its pod auto-delete after 10 minutes. No orphaned test pods.
nodeSelector: gpu: intel-uhd-630 – only schedules on the GPU node. If the node is down, the Job stays pending instead of running without hardware acceleration.
Pinned commit hash (d9b25b2) – the script is fetched from GitHub at a specific commit, not main. This prevents a broken HEAD from ruining a benchmark run.
Read-only media mount – the test never writes to the media library.

What the Output Looks Like

The stress tester outputs a formatted summary via rich tables and a JSON report:

JF_hw_stress Results — Intel UHD 630 (VAAPI)
┌──────────┬─────────────┬───────────┬──────────┐
│ Streams  │ Avg FPS     │ FPS Ratio │ Status   │
├──────────┼─────────────┼───────────┼──────────┤
│ 1        │ 142.3       │ 5.93x     │ PASS     │
│ 2        │ 71.8        │ 2.99x     │ PASS     │
│ 3        │ 48.1        │ 2.00x     │ PASS     │
│ 4        │ 35.2        │ 1.47x     │ PASS     │
│ 5        │ 27.6        │ 1.15x     │ PASS     │
│ 6        │ 22.1        │ 0.92x     │ PASS     │
│ 7        │ 18.3        │ 0.76x     │ FAIL     │
└──────────┴─────────────┴───────────┴──────────┘

Recommendation: 6 concurrent 1080p HEVC→H.264 streams (90% threshold)

The JSON report includes per-stream breakdown, GPU utilization samples, and the source files used – useful for comparing across different media libraries or after hardware changes.

Results: Intel UHD 630 Capacity

Running against my media library (mix of 1080p and 4K HEVC Blu-ray rips), the Intel UHD 630 on pve4 handles:

Scenario	Concurrent Streams	Notes
4K HEVC → 1080p H.264	2-3	The heavy case – 4K decode + 1080p encode
1080p HEVC → 720p H.264	6-7	Most common for mobile clients
1080p H.264 → 720p H.264	8+	Lightest – H.264 decode is cheaper than HEVC

These numbers align with Intel’s published Quick Sync capabilities for Coffee Lake, but now I have measured confirmation from my specific hardware, drivers, and media.

Why Not Use Jellyfin’s Built-in Metrics?

Jellyfin exposes transcode session data through its API, and I scrape it with Prometheus. But those metrics only tell you about sessions that already exist. They cannot tell you how many more sessions the GPU can handle before quality degrades.

The stress tester answers a different question: what is the ceiling? Once you know the ceiling, you can set Jellyfin’s max transcode limit appropriately and alert when you are approaching capacity.

Lessons Learned

Real media files produce different results than synthetic test clips. My 4K HEVC Blu-ray rips with high bitrates (40-60 Mbps) stress the decoder harder than YouTube-quality test clips. Always benchmark with your actual content.
VAAPI driver version matters. The intel-media-va-driver-non-free package has meaningful performance differences between versions. Pinning the driver version in the container ensures reproducible results.
GPU thermal throttling is real on passthrough VMs. The UHD 630 in pve4 throttles after sustained load because the VM does not have direct fan control. The stress test captures this – FPS drops after about 90 seconds of full utilization, which is why the 180-second test duration matters.
The 90% threshold is the right default. At 1.0x FPS ratio, the GPU is exactly keeping up – but any spike in scene complexity causes frame drops. 90% gives enough headroom for variable bitrate scenes.
Auto-cleanup with ttlSecondsAfterFinished is essential for test Jobs. Without it, completed Jobs accumulate and clutter kubectl get jobs. The 10-minute window gives enough time to grab logs before cleanup.

TL;DR#

The Problem#

How It Works#

Test Parameters#

The Kubernetes Job#

What the Output Looks Like#

Results: Intel UHD 630 Capacity#

Why Not Use Jellyfin’s Built-in Metrics?#

Lessons Learned#