SELinux MCS challenges with GitLab Runners

01 May 2026

Table of Contents
Introduction
The MCS problem
The test script
GitLab’s official suggestion and why it falls short
How GNOME currently handles this
Exploring libkrun
Firecracker and the custom executor path
What comes next

Introduction

GNOME’s GitLab runners use Podman as the container runtime with SELinux in Enforcing mode on Fedora. The GitLab Runner Docker/Podman executor spawns multiple containers per job: a helper container that clones the repository and handles artifacts, and a build container that runs the actual CI script. Both containers need to share a /builds volume — and this is where SELinux’s Multi-Category Security (MCS) becomes a problem.

The MCS problem

An SELinux label has four fields: user:role:type:level. For containers the interesting part is the level, also called the MCS field. A level looks like s0:c123,c456 — s0 is the sensitivity (always s0 in targeted policy), and c123,c456 are the categories. A process or file can carry up to two categories.

MCS access is based on dominance. A subject’s label dominates an object’s label if the subject’s categories are a superset of (or equal to) the object’s categories:

Subject	Object	Access?	Why
`s0:c100,c200`	`s0:c100,c200`	Yes	Exact match
`s0:c100,c200`	`s0:c100`	Yes	Subject’s categories are a superset
`s0:c100,c200`	`s0:c100,c300`	No	Subject lacks `c300`
`s0:c0.c1023`	`s0:c100,c200`	Yes	Full range dominates everything
`s0`	`s0:c100,c200`	No	No categories can’t dominate any
`s0`	`s0`	Yes	Both have no categories

How this applies to the runners:

Container A runs as container_t:s0:c100,c100 — it can only access objects labeled s0:c100,c100 (or s0:c100, or s0)
Container B runs as container_t:s0:c200,c200 — it can only access objects labeled s0:c200,c200 (or s0:c200, or s0)
Container A cannot access Container B’s files — c100,c100 doesn’t dominate c200,c200
Overlay layers labeled s0 (no categories) — accessible by all containers since every category set dominates the empty set
Podman at container_runtime_t:s0-s0:c0.c1023 — the full range means it dominates every possible category combination, so it can manage all containers

The range syntax (s0-s0:c0.c1023) is used for processes that need to operate across multiple levels. It means “my low clearance is s0 and my high clearance is s0:c0.c1023.” The process can read objects at any level within that range and create objects at any level within it. This is why Podman needs the full range — it creates containers with different MCS labels and needs to access all of them.

When Podman starts a container, it picks a random pair of categories (e.g., s0:c512,c768) from within its allowed range and assigns that as the container’s process label. Files created by the container inherit that label. Another container gets a different random pair (e.g., s0:c33,c901). Since c512,c768 and c33,c901 do not match — neither is a superset of the other — SELinux denies cross-container file access. This is the isolation mechanism, and the root cause of the problem with GitLab Runner’s multi-container-per-job architecture.

The helper container gets one random MCS pair, writes the cloned repo to /builds labeled with that pair, and the build container gets a different pair. The build container cannot read or write those files. The :Z volume flag (exclusive relabel) relabels the volume to the mounting container’s category, but that only helps the first container — the second one still has a different label.

The test script

I wrote a script that demonstrates the problem with both standard containers (crun) and microVMs (libkrun). The script creates two containers per test — a helper that writes a file to a shared /builds volume, and a build container that tries to read it — simulating the GitLab Runner workflow:

#!/bin/bash
# Description: SELinux MCS Diagnostic (crun vs krun)

if [ "$(getenforce)" != "Enforcing" ]; then
    echo "WARNING: SELinux is not in Enforcing mode. This test requires Enforcing mode."
    exit 1
fi

TEST_BASE="/tmp/gitlab-runner-mcs-test"
CRUN_DIR="$TEST_BASE/crun-builds"
KRUN_DIR="$TEST_BASE/krun-builds"

# Cleanup from previous runs
rm -rf "$TEST_BASE"
mkdir -p "$CRUN_DIR" "$KRUN_DIR"

echo "======================================================="
echo " TEST 1: Standard Container Isolation (crun)"
echo "======================================================="

# 1. CREATE Helper
podman create --name crun-helper -v "$CRUN_DIR:/builds:Z" fedora bash -c "
    echo '[crun] -> Helper Process Context (Inside):'
    cat /proc/self/attr/current
    echo 'crun-data' > /builds/artifact.txt
    echo '[crun] -> File Label INSIDE Helper:'
    ls -Z /builds/artifact.txt
" > /dev/null

echo "[crun] Starting Helper Container (applying :Z relabel)..."
HELPER_HOST_LABEL_CRUN=$(podman inspect -f '{{.ProcessLabel}}' crun-helper)
echo "[crun] -> HOST METADATA: Podman assigned process label: $HELPER_HOST_LABEL_CRUN"
podman start -a crun-helper

echo ""
echo "[crun] -> File Label ON HOST (Notice the specific MCS category):"
ls -Z "$CRUN_DIR/artifact.txt"

# 2. CREATE Build Container (The Victim)
podman create --name crun-build -v "$CRUN_DIR:/builds" fedora bash -c "
    echo '  [Build-Internal] Process Context:'
    cat /proc/self/attr/current 2>/dev/null
    echo '  [Build-Internal] Executing ls -laZ /builds :'
    ls -laZ /builds 2>&1 | sed 's/^/    /'
    echo '  [Build-Internal] Executing cat /builds/artifact.txt :'
    cat /builds/artifact.txt 2>&1 | sed 's/^/    /'
" > /dev/null

echo ""
echo "[crun] Starting Build Container to inspect shared volume..."
BUILD_HOST_LABEL_CRUN=$(podman inspect -f '{{.ProcessLabel}}' crun-build)
echo "[crun] -> HOST METADATA: Podman assigned process label: $BUILD_HOST_LABEL_CRUN"
podman start -a crun-build

podman rm -f crun-helper crun-build > /dev/null


echo ""
echo "======================================================="
echo " TEST 2: MicroVM Isolation (libkrun / virtio-fs) FIXED"
echo "======================================================="

# --- Write the execution scripts to the host to avoid parsing errors ---
cat << 'EOF' > "$TEST_BASE/krun_helper.sh"
#!/bin/bash
echo '[krun] -> Helper Process Context (Inside VM):'
cat /proc/self/attr/current 2>/dev/null || echo '    (SELinux disabled/unavailable in guest kernel)'
echo 'krun-data' > /builds/artifact.txt
echo '[krun] -> File Label INSIDE Helper VM (Blindspot):'
ls -laZ /builds/artifact.txt 2>&1 | sed 's/^/    /'
EOF

cat << 'EOF' > "$TEST_BASE/krun_build.sh"
#!/bin/bash
echo '  [Build-Internal] Process Context (Inside VM):'
cat /proc/self/attr/current 2>/dev/null || echo '    (SELinux disabled/unavailable in guest kernel)'
echo '  [Build-Internal] Executing ls -laZ /builds :'
ls -laZ /builds 2>&1 | sed 's/^/    /'
echo '  [Build-Internal] Executing cat /builds/artifact.txt :'
cat /builds/artifact.txt 2>&1 | sed 's/^/    /'
EOF

chmod +x "$TEST_BASE/krun_helper.sh" "$TEST_BASE/krun_build.sh"
# ---------------------------------------------------------------------

# 1. CREATE Helper MicroVM
podman create --name krun-helper --runtime krun --memory=1024m \
    -v "$KRUN_DIR:/builds:Z" \
    -v "$TEST_BASE/krun_helper.sh:/script.sh:ro,Z" \
    fedora /script.sh > /dev/null

echo "[krun] Starting Helper MicroVM (applying :Z relabel)..."
HELPER_HOST_LABEL_KRUN=$(podman inspect -f '{{.ProcessLabel}}' krun-helper)
echo "[krun] -> HOST METADATA: Podman assigned process label: $HELPER_HOST_LABEL_KRUN"
podman start -a krun-helper

echo ""
echo "[krun] -> File Label ON HOST (Podman applied the helper's MCS category via :Z):"
ls -Z "$KRUN_DIR/artifact.txt"

# 2. CREATE Build MicroVM (The Victim)
podman create --name krun-build --runtime krun --memory=1024m \
    -v "$KRUN_DIR:/builds" \
    -v "$TEST_BASE/krun_build.sh:/script.sh:ro,Z" \
    fedora /script.sh > /dev/null

echo ""
echo "[krun] Starting Build MicroVM to inspect shared volume..."
BUILD_HOST_LABEL_KRUN=$(podman inspect -f '{{.ProcessLabel}}' krun-build)
echo "[krun] -> HOST METADATA: Podman assigned process label: $BUILD_HOST_LABEL_KRUN"
echo "        *** THE virtiofsd DAEMON ON THE HOST IS TRAPPED IN THIS CONTEXT ***"
podman start -a krun-build

# Cleanup
podman rm -f krun-helper krun-build > /dev/null

echo ""
echo "======================================================="
echo " Test Complete."

Test 1 (crun) creates a helper container that mounts the builds directory with :Z (exclusive relabel) and writes artifact.txt. Podman assigns it a random MCS label — in this run it was s0:c20,c540. The file on disk inherits that label. Then a second container (the build container) mounts the same path without :Z and gets a different random label (s0:c46,c331). Since c46,c331 does not dominate c20,c540, the build container is denied access to the file.

Test 2 (krun) runs the same scenario but with --runtime krun, which boots each container inside a lightweight microVM via libkrun. The helper VM gets container_kvm_t:s0:c823,c999 and the build VM gets container_kvm_t:s0:c309,c405 — same MCS mismatch, same denial. The type changes from container_t to container_kvm_t, but the MCS mechanism is identical. On the host side, virtiofsd — the daemon that serves the volume into the VM via virtio-fs — runs under the MCS label Podman assigned to the VM. The build VM’s virtiofsd is trapped in s0:c309,c405 and cannot access files labeled s0:c823,c999.

An interesting detail: inside the libkrun VMs, cat /proc/self/attr/current returns just kernel — SELinux is not available in the guest. The VM thinks it has no mandatory access control, but the host-side virtiofsd is still fully subject to MCS enforcement. This is a blindspot worth being aware of.

The output from a run on Fedora with SELinux Enforcing and Podman 5.8.2:

=======================================================
 TEST 1: Standard Container Isolation (crun)
=======================================================
[crun] Starting Helper Container (applying :Z relabel)...
[crun] -> HOST METADATA: Podman assigned process label: system_u:system_r:container_t:s0:c20,c540
[crun] -> Helper Process Context (Inside):
system_u:system_r:container_t:s0:c20,c540 [crun] -> File Label INSIDE Helper:
system_u:object_r:container_file_t:s0:c20,c540 /builds/artifact.txt

[crun] -> File Label ON HOST (Notice the specific MCS category):
system_u:object_r:container_file_t:s0:c20,c540 /tmp/gitlab-runner-mcs-test/crun-builds/artifact.txt

[crun] Starting Build Container to inspect shared volume...
[crun] -> HOST METADATA: Podman assigned process label: system_u:system_r:container_t:s0:c46,c331
        *** COMPARE THE cXXX,cYYY ABOVE TO THE FILE LABEL. THIS MISMATCH CAUSES THE DENIAL ***
  [Build-Internal] Process Context:
system_u:system_r:container_t:s0:c46,c331   [Build-Internal] Executing ls -laZ /builds :
    ls: cannot open directory '/builds': Permission denied
  [Build-Internal] Executing cat /builds/artifact.txt :
    cat: /builds/artifact.txt: Permission denied

=======================================================
 TEST 2: MicroVM Isolation (libkrun / virtio-fs) FIXED
=======================================================
[krun] Starting Helper MicroVM (applying :Z relabel)...
[krun] -> HOST METADATA: Podman assigned process label: system_u:system_r:container_kvm_t:s0:c823,c999
[krun] -> Helper Process Context (Inside VM):
kernel [krun] -> File Label INSIDE Helper VM (Blindspot):
    -rw-r--r--. 1 root root system_u:object_r:container_file_t:s0:c823,c999 10 May  2  2026 /builds/artifact.txt

[krun] -> File Label ON HOST (Podman applied the helper's MCS category via :Z):
system_u:object_r:container_file_t:s0:c823,c999 /tmp/gitlab-runner-mcs-test/krun-builds/artifact.txt

[krun] Starting Build MicroVM to inspect shared volume...
[krun] -> HOST METADATA: Podman assigned process label: system_u:system_r:container_kvm_t:s0:c309,c405
        *** THE virtiofsd DAEMON ON THE HOST IS TRAPPED IN THIS CONTEXT ***
  [Build-Internal] Process Context (Inside VM):
kernel   [Build-Internal] Executing ls -laZ /builds :
    ls: /builds: Permission denied
    ls: cannot open directory '/builds': Permission denied
  [Build-Internal] Executing cat /builds/artifact.txt :
    cat: /builds/artifact.txt: Permission denied

=======================================================
 Test Complete.

GitLab’s official suggestion and why it falls short

GitLab’s documentation on configuring SELinux MCS suggests applying the same MCS label to all containers launched by a runner:

[[runners]]
  [runners.docker]
    security_opt = ["label=level:s0:c1000,c1000"]

This works — all containers get the same category pair, so the helper and build containers can share files. But it collapses MCS isolation between all concurrent jobs on that runner. With concurrent = 4, four simultaneous jobs all run as s0:c1000,c1000 and can read each other’s /builds content — cloned source code, build artifacts, cached dependencies. On a shared or multi-tenant runner, this is a security regression: it trades MCS isolation for functionality.

For runners with concurrent = 1 or dedicated single-tenant runners this is an acceptable tradeoff, but it does not generalize to shared infrastructure where multiple untrusted projects run side by side.

How GNOME currently handles this

GNOME’s runners are managed via an Ansible role that enforces SELinux in Enforcing mode, installs rootless Podman running as a dedicated podman system user with linger enabled, and deploys custom SELinux policy modules. The Podman service runs under SELinuxContext=system_u:system_r:container_runtime_t:s0-s0:c0.c1023 via a systemd override — the full MCS range (s0-s0:c0.c1023) gives the container runtime the ability to spawn containers at any MCS level and relabel volumes accordingly, as explained in the dominance rules above.

Four custom SELinux .te modules are compiled and loaded on every runner host: pydocuum (allows the image cleanup daemon to talk to the Podman socket), podman (grants user_namespace create and /dev/null mapping), flatpak (permits the filesystem mounts flatpak builds need), and gnome_runner (covers binfmt_misc access, device nodes, and other permissions GNOME OS builds require).

For the MCS problem specifically, the runner config.toml — rendered from a Jinja2 template via per-host Ansible variables — sets a fixed MCS label per runner type. Here’s a representative snippet from one of the runner hosts:

[[runners]]
  name = "a15948139c78"
  executor = "docker"
  [runners.docker]
    image = "quay.io/fedora/fedora:latest"
    privileged = false
    security_opt = ["label=level:s0:c100,c100"]
    devices = ["/dev/kvm", "/dev/udmabuf"]
    cap_add = ["SYS_PTRACE", "SYS_CHROOT"]

[[runners]]
  name = "a15948139c78-flatpak"
  executor = "docker"
  [runners.docker]
    image = "quay.io/gnome_infrastructure/gnome-runtime-images:gnome-master"
    privileged = false
    security_opt = ["seccomp:/home/podman/gitlab-runner/flatpak.seccomp.json", "label=level:s0:c200,c200"]
    cap_drop = ["all"]

This is the same approach GitLab’s documentation suggests, with one refinement: we use different fixed categories per runner type — c100,c100 for untagged runners and c200,c200 for flatpak runners — so that flatpak builds and regular builds remain MCS-isolated from each other, even though builds of the same type share a category.

This is a pragmatic compromise, not an ideal solution. All concurrent jobs on the same runner type share the same MCS category. With concurrent: 4 on our Hetzner runners, four simultaneous untagged jobs can read each other’s /builds content. For GNOME’s use case — a community CI infrastructure where the runners are shared by GNOME project maintainers — this is an acceptable tradeoff. The alternative, leaving MCS labels random, would break every single job. But it is precisely this tradeoff that motivates exploring per-job VM isolation via microVMs.

Exploring libkrun

libkrun is a lightweight Virtual Machine Monitor (VMM) that integrates with Podman via --runtime krun, running each container inside a microVM with its own lightweight kernel. The appeal is strong: per-container VM isolation would give each job its own kernel and address space, making the MCS cross-container problem irrelevant inside the VM.

I tested libkrun on a Fedora system and hit an immediate blocker: Fatal glibc error: rseq registration failed. The rseq (Restartable Sequences) syscall was introduced in Linux kernel 5.3 and is required by glibc >= 2.35. libkrun uses a custom minimal kernel that does not expose rseq support. Since the guest images — Fedora in our case — ship modern glibc that expects rseq to be available, the process aborts at startup before any user code runs.

The libkrun kernel is compiled into the library itself and cannot be modified or replaced by the user. This is not a configuration issue but a fundamental limitation of the current libkrun release.

Even if the rseq issue were resolved, the MCS challenge would still be there — as the test script demonstrates in Test 2. On the host side, Podman assigns MCS labels to the virtiofsd process that serves the volume into the VM via virtio-fs. Different VMs get different host-side MCS labels, meaning the same :Z relabel / cross-container access denial applies. The mechanism changes from overlay mounts to virtio-fs, but the SELinux enforcement is identical: virtiofsd for the build VM runs at container_kvm_t:s0:c309,c405 and cannot access files labeled s0:c823,c999 by the helper VM’s virtiofsd.

Firecracker and the custom executor path

Firecracker is another microVM technology, the one behind AWS Lambda and Fly.io, that could provide strong per-job isolation. However, there is no native GitLab Runner executor for Firecracker. The only integration path is the Custom Executor, which requires implementing prepare, run, and cleanup scripts from scratch.

The job image is exposed via CUSTOM_ENV_CI_JOB_IMAGE, but everything else is on the operator: pulling the OCI image, extracting a rootfs, booting a Firecracker VM with the right kernel and network configuration, injecting the build script, mounting or copying the cloned repository into the VM, collecting artifacts and cache after the job finishes, and tearing the VM down. GitLab provides an LXD-based example that shows the pattern — prepare creates a container and installs dependencies, run pipes the job script into it, cleanup destroys it — but adapting that to microVMs adds the complexity of VM lifecycle management, kernel and rootfs preparation, networking, and storage. This is a significant engineering effort, essentially rebuilding the entire Docker executor workflow from scratch.

What comes next

MCS is a core SELinux feature. Type enforcement (TE) already confines processes by type — container_t can only access container_file_t, not user_home_t or httpd_sys_content_t — but TE alone cannot distinguish one container_t process from another. MCS adds that layer: by assigning each container a unique category pair, the kernel enforces isolation between processes that share the same type. Container A at s0:c100,c100 and Container B at s0:c200,c200 are both container_t, but MCS ensures they cannot touch each other’s files. The conflict with GitLab Runner’s multi-container-per-job architecture is that two containers that need to share a volume are given different categories by default. The workarounds we deploy today, including the fixed MCS labels on GNOME’s runners, trade that inter-container isolation for functionality.

The most promising direction I’ve found so far is the combination of Cloud Hypervisor and the fleeting-plugin-fleetingd plugin. Cloud Hypervisor is built on Intel’s Rust-VMM crate and is essentially a more capable sibling of Firecracker — it supports CPU and memory hotplugging, VFIO device passthrough, and virtio-fs, features that are often necessary for complex CI tasks like building large binaries or running UI tests and that Firecracker’s minimalist design deliberately omits. The fleeting-plugin-fleetingd is a community plugin for GitLab’s Instance Executor (the modern evolution of the Custom Executor) that automates the full VM lifecycle: downloading cloud images, creating Copy-on-Write disks, launching Cloud Hypervisor VMs with direct kernel boot, provisioning them via cloud-init, and tearing them down after each build. Each job gets a fresh disposable VM, which is exactly the per-job isolation model we need. The plugin already handles networking via TAP interfaces and nftables SNAT, and supports customization of the VM image through cloud-init commands — so preinstalling Podman or other build tools is straightforward.

Beyond that, I’ll also keep evaluating libkrun (promising Red Hat technology), Firecracker with a hand-rolled custom executor, and QEMU’s microvm machine type. The common denominator across all of these — except for the fleeting-plugin-fleetingd path — is that none of them have an existing GitLab Runner integration. Regardless of which microVM technology we settle on, the path forward involves either building a workflow from scratch using the Custom Executor and its prepare, run, cleanup hooks, or leveraging the fleeting plugin ecosystem that GitLab has been building around the Instance and Docker Autoscaler executors.

That should be all for today, stay tuned!