TL;DR
JF_hw_stress is a headless transcoding stress tester that answers one question: how many concurrent transcode streams can your GPU actually handle before quality degrades? It runs escalating FFmpeg transcodes against real media files using VAAPI hardware acceleration, measures FPS ratios, and outputs a JSON report. I run it as a Kubernetes Job on the same k3s cluster from Cluster Genesis, scheduled exclusively on the GPU node (Intel UHD 630). The job auto-deletes after 10 minutes so it does not accumulate stale pods.
TODO_REDDIT_LINK
Source: GitHub - ZoltyMat/JF_hw_stress
The Problem
When I set up Jellyfin on Kubernetes with GPU passthrough on pve4, I needed to know the actual capacity of the Intel UHD 630 for hardware transcoding. Intel’s spec sheets give theoretical decode/encode capabilities, but the real-world throughput depends on the source codec, resolution, bitrate, and how many concurrent streams the GPU can handle before frame drops start.
I documented some of this in my transcoding capacity analysis, but those were estimates based on Intel’s QSV documentation. I wanted measured numbers from my actual hardware, running my actual media files, through the actual VAAPI pipeline that Jellyfin uses.
Existing benchmarks were not useful because:
- HandBrake benchmarks test offline encoding speed, not real-time streaming throughput
- Phoronix test suites use synthetic test clips, not real media with varying bitrates
- Jellyfin’s own metrics only tell you about streams that are already running – they do not tell you how many more the GPU can handle before quality drops
How It Works
JF_hw_stress runs an escalating series of concurrent FFmpeg transcodes against real video files from your media library:
Phase 1: 1 concurrent stream → measure FPS ratio
Phase 2: 2 concurrent streams → measure FPS ratio
Phase 3: 3 concurrent streams → measure FPS ratio
...continues until FPS ratio drops below threshold or timeout...
Each stream transcodes a real file from the media library using the same VAAPI pipeline Jellyfin uses:
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi \
-i /media/movies/SomeMovie/SomeMovie.mkv \
-c:v h264_vaapi -b:v 4M \
-f null /dev/null
The key metric is the FPS ratio: actual encoding FPS divided by the source video’s FPS. A ratio above 1.0 means the GPU is keeping up with real-time. When the ratio drops below 0.9 (configurable), the GPU is saturated – that is your practical stream limit.
Test Parameters
| Parameter | Value | Why |
|---|---|---|
| FPS threshold | 90% of source FPS | Below this, viewers see buffering |
| Max duration | 180 seconds | Enough to reach steady-state GPU utilization |
| Source resolution | 1080p (primary) | Most common transcode scenario |
| Output codec | H.264 VAAPI | What Jellyfin actually uses for HLS |
| Auto-cleanup | 10 minutes after completion | Prevents stale pods accumulating |
The Kubernetes Job
The stress tester runs as a k3s Job with specific node affinity for the GPU node:
apiVersion: batch/v1
kind: Job
metadata:
name: jf-hw-stress
namespace: media
spec:
ttlSecondsAfterFinished: 600
template:
spec:
nodeSelector:
gpu: intel-uhd-630
containers:
- name: stress
image: python:3.12-slim
command: ["bash", "-c"]
args:
- |
apt-get update && apt-get install -y ffmpeg vainfo intel-media-va-driver-non-free curl
pip install rich
curl -sL https://raw.githubusercontent.com/ZoltyMat/JF_hw_stress/d9b25b2/jf_hw_stress.py -o /tmp/stress.py
python3 /tmp/stress.py --device /dev/dri/renderD128 --source /media --duration 180 --threshold 0.9
resources:
limits:
gpu.intel.com/i915: 1
volumeMounts:
- name: media
mountPath: /media
readOnly: true
- name: dri
mountPath: /dev/dri
volumes:
- name: media
nfs:
server: 192.168.1.100
path: /volume1/media
- name: dri
hostPath:
path: /dev/dri
restartPolicy: Never
Key decisions:
ttlSecondsAfterFinished: 600– the Job and its pod auto-delete after 10 minutes. No orphaned test pods.nodeSelector: gpu: intel-uhd-630– only schedules on the GPU node. If the node is down, the Job stays pending instead of running without hardware acceleration.- Pinned commit hash (
d9b25b2) – the script is fetched from GitHub at a specific commit, notmain. This prevents a broken HEAD from ruining a benchmark run. - Read-only media mount – the test never writes to the media library.
What the Output Looks Like
The stress tester outputs a formatted summary via rich tables and a JSON report:
JF_hw_stress Results — Intel UHD 630 (VAAPI)
┌──────────┬─────────────┬───────────┬──────────┐
│ Streams │ Avg FPS │ FPS Ratio │ Status │
├──────────┼─────────────┼───────────┼──────────┤
│ 1 │ 142.3 │ 5.93x │ PASS │
│ 2 │ 71.8 │ 2.99x │ PASS │
│ 3 │ 48.1 │ 2.00x │ PASS │
│ 4 │ 35.2 │ 1.47x │ PASS │
│ 5 │ 27.6 │ 1.15x │ PASS │
│ 6 │ 22.1 │ 0.92x │ PASS │
│ 7 │ 18.3 │ 0.76x │ FAIL │
└──────────┴─────────────┴───────────┴──────────┘
Recommendation: 6 concurrent 1080p HEVC→H.264 streams (90% threshold)
The JSON report includes per-stream breakdown, GPU utilization samples, and the source files used – useful for comparing across different media libraries or after hardware changes.
Results: Intel UHD 630 Capacity
Running against my media library (mix of 1080p and 4K HEVC Blu-ray rips), the Intel UHD 630 on pve4 handles:
| Scenario | Concurrent Streams | Notes |
|---|---|---|
| 4K HEVC → 1080p H.264 | 2-3 | The heavy case – 4K decode + 1080p encode |
| 1080p HEVC → 720p H.264 | 6-7 | Most common for mobile clients |
| 1080p H.264 → 720p H.264 | 8+ | Lightest – H.264 decode is cheaper than HEVC |
These numbers align with Intel’s published Quick Sync capabilities for Coffee Lake, but now I have measured confirmation from my specific hardware, drivers, and media.
Why Not Use Jellyfin’s Built-in Metrics?
Jellyfin exposes transcode session data through its API, and I scrape it with Prometheus. But those metrics only tell you about sessions that already exist. They cannot tell you how many more sessions the GPU can handle before quality degrades.
The stress tester answers a different question: what is the ceiling? Once you know the ceiling, you can set Jellyfin’s max transcode limit appropriately and alert when you are approaching capacity.
Lessons Learned
Real media files produce different results than synthetic test clips. My 4K HEVC Blu-ray rips with high bitrates (40-60 Mbps) stress the decoder harder than YouTube-quality test clips. Always benchmark with your actual content.
VAAPI driver version matters. The
intel-media-va-driver-non-freepackage has meaningful performance differences between versions. Pinning the driver version in the container ensures reproducible results.GPU thermal throttling is real on passthrough VMs. The UHD 630 in pve4 throttles after sustained load because the VM does not have direct fan control. The stress test captures this – FPS drops after about 90 seconds of full utilization, which is why the 180-second test duration matters.
The 90% threshold is the right default. At 1.0x FPS ratio, the GPU is exactly keeping up – but any spike in scene complexity causes frame drops. 90% gives enough headroom for variable bitrate scenes.
Auto-cleanup with
ttlSecondsAfterFinishedis essential for test Jobs. Without it, completed Jobs accumulate and clutterkubectl get jobs. The 10-minute window gives enough time to grab logs before cleanup.