Sandboxing Soldatserver with Bubblewrap and Seccomp

Sat 16 May 2020

by minus

tagged sandboxing, soldat, bubblewrap, seccomp

Today we'll talk about sandboxing. In particular, about sandboxing soldatserver, the dedicated server for the game Soldat. It's a quite old and not especially hardened code base, thus seemed like a good candidate to try to sandbox. We'll sandbox it using Linux namespaces and a simple seccomp filter.

This article depicts my journey through sandboxing soldatserver but should also be a good starting point for sandboxing other (non-graphical) applications. For the rest of the article, I assume the reader to be familiar with techniques used for sandboxing on Linux, namely: chroot (not used directly but the concept is the same), namespaces and seccomp. Even if you're already familiar with what seccomp does on a high level, the kernel docs and manpage are definitely worth a read.

Sandboxing software

There are a couple of choices out there to sandbox applications:

Docker: Probably the most popular one. Its main selling point is probably more its software bundling functionality rather than sandboxing. Requires packing software into an image to be runnable. Not quite what we want here.
firejail: Mainly intended for desktop applications. Supports running with an isolated X11 server and integrating with the host's PulseAudio server, for example. Comes with a lot of pre-built software profiles to sandbox various common desktop applications with zero effort. A bit overkill for what we want.
bubblewrap: A minimal sandbox with no fancy features by itself. Originates from Flatpak. Looks good.
systemd-nspawn: Comes with systemd, thus won't work on systems without systemd. Not the choice I want to make for this project.
building your own sandbox: While certainly possible, that's a bit of a bigger undertaking. Considering that bubblewrap is fairly minimal, it still has over 3000 lines of code. Not so practical.

We'll go with bubblewrap for its simplicity. By default it only creates a new mount namespace with an empty tmpfs as root. That's it. Whatever you want to run in it, you either need to bind-mount or copy; including the loader should you want to run a dynamically linked program.

Discovering dependencies

So, our goal is to run soldatserver in a bubblewrap sandbox, so that obviously has to be mounted into the new mount namespace.

% curl -O https://static.soldat.pl/downloads/soldatserver2.8.2_1.7.1.1.zip
% unzip soldatserver2.8.2_1.7.1.1.zip
% rm soldatserver2.8.2_1.7.1.1.zip
% mv soldatserver2.8.2_1.7.1 soldat

% bwrap --bind ./soldat /soldat --chdir /soldat ./soldatserver

If soldatserver was statically linked, we'd be done now. Instead, we're greeted with the following, slightly misleading error message:

bwrap: execvp ./soldatserver: No such file or directory

The execvp syscall fails because it can't find the dynamic linker/loader specified in the executable. To find what exactly is missing, we can simply reach for the tool ldd, which mainly is a wrapper around the loader called with --list. On an Arch Linux box, I get the following output:

% ldd soldat/soldatserver
    not a dynamic executable

Oops. So either soldatserver is a statically linked executable, or it's not x86_64 like the host system.

% /lib/ld-linux-x86-64.so.2 soldat/soldatserver
soldat/soldatserver: error while loading shared libraries: soldat/soldatserver: wrong ELF class: ELFCLASS32

Yep, there we go. Install the respective 32-bit libraries and we're good:

% ldd soldat/soldatserver
    linux-gate.so.1 (0xf7fae000)
    libpthread.so.0 => /usr/lib32/libpthread.so.0 (0xf7f39000)
    libdl.so.2 => /usr/lib32/libdl.so.2 (0xf7f33000)
    libc.so.6 => /usr/lib32/libc.so.6 (0xf7d47000)
    /lib/ld-linux.so.2 => /usr/lib/ld-linux.so.2 (0xf7fb0000)

Sandboxing with bubblewrap

Since I intend to run the server on an Alpine Linux system though, let's try on such a system. Unfortunately though, my Alpine hosts are all on x86_64 and Alpine has no multilib repo that lets you install 32-bit libraries, so just mounting or copying the loader and libraries from the host won't work. Instead, we can just set up a proper 32-bit Alpine root filesystem in a directory:

# NOTE: This is neither using HTTPS nor verifying signatures in any way!
% curl http://dl-cdn.alpinelinux.org/alpine/latest-stable/main/x86/apk-tools-static-2.10.5-r0.apk | tar --transform='s/.*/apk/' -xz sbin/apk.static
% ./apk -X http://dl-cdn.alpinelinux.org/alpine/latest-stable/main -U --allow-untrusted --root alpine_x86 --initdb add musl busybox

To prevent loaders installed on my desktop from interfering with testing, let's just run the whole thing with bubblewrap. By the way, mounting /dev and /proc is not really necessary here but could otherwise come in handy. Mounting resolv.conf is essential if you need name resolution. Which soldatserver does, it won't even start without.

% bwrap \
    --bind ./alpine_x86 / \
    --dev /dev --proc /proc \
    --setenv PATH /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    --ro-bind /etc/resolv.conf /etc/resolv.conf \
    --bind ./soldat /soldat \
    --chdir /soldat \
    /bin/sh
/soldat $ mkdir /usr/bin && /bin/busybox --install -s
/soldat $ /lib/ld-musl-i386.so.1 --list ./soldatserver
    /lib/ld-linux.so.2 (0xf7f29000)
    libpthread.so.0 => /lib/ld-linux.so.2 (0xf7f29000)
    libdl.so.2 => /lib/ld-linux.so.2 (0xf7f29000)
    libc.so.6 => /lib/ld-linux.so.2 (0xf7f29000)

Conveniently, the musl loader already contains libc, libdl and libpthread, so it alone should be enough to run soldatserver. Unfortunately, the path to the loader is incorrect and does not exist, since soldatserver is linked against glibc and expects the loader in the location glibc normally puts it. When we try to run ./soldatserver, we get the same error message as in the beginning:

/soldat $ ./soldatserver
/bin/sh: ./soldatserver: not found

We can work around that by invoking the loader directly:

/soldat $ /lib/ld-musl-i386.so.1 ./soldatserver

             -= Soldat Dedicated Server 1.7.1 - 2.8.2 =-
…
Unable to open file "/lib/banned.txt"
Shutting down server...
…
/soldat $

For some reason that I haven't figured out yet this changes the working directory to that of the loader executable though, thus won't do. Symlinking the loader to the expect location and running the executable directly works though:

/soldat $ ln -s /lib/ld-musl-i386.so.1 /lib/ld-linux.so.2
/soldat $ ./soldatserver

             -= Soldat Dedicated Server 1.7.1 - 2.8.2 =-
…

Sweet, it's up and running! Now all that's left for the basic sandbox is to trim down the bubblewrap command and make it start soldatserver directly, …and to actually enable sandboxing. Until now, there is only a new mount namespace.

Adding --unshare-all to the bwrap command line fixes that. Except it also create a new network namespace with only a lo interface – not ideal if we want to have connectivity! The easiest way around that is to add the --share-net switch, which simply causes bubblewrap to not create a new network namespace. This is the option we'll continue with. It is possible to make a separate network namespace connected to the outside world by creating a pair of veth network interfaces and moving one to the new namespace. Docker and Firejail do that for you, bubblewrap does not (and that's okay).

When running in a terminal (as in: the program can access the tty), supplying --new-session to bwrap will cause it to create a new terminal session and thus prevent soldatserver from taking over your terminal. In that case you might also want to add a --die-with-parent; otherwise the server will keep on running in the background if you close the terminal or hit ^C.

% bwrap \
    --unshare-all --share-net \
    --ro-bind ./alpine_x86/lib/ld-musl-i386.so.1 /lib/ld-linux.so.2 \
    --ro-bind /etc/resolv.conf /etc/resolv.conf \
    --bind ./soldat /soldat \
    --chdir /soldat \
    ./soldatserver

Note how I replaced the root bind-mount with just the loader (in the desired target location). Now if anything goes wrong, the most that can happen is something getting written to the ./soldat directory (including executables) and arbitrary syscalls getting executed.

Former could be further restricted by mounting the ./soldat directory read-only and just mounting logs writable and with noexec. Mounting with noexec is not supported by bubblewrap though, so we'll pass on that.

Syscalls can be limited. Which brings us to the next point:

Filtering syscalls with seccomp

Bubblewrap allows us to load an arbitrary compiled seccomp filter. How we get to that point is none of its concern though. The go-to way to creating a seccomp filter is to use libseccomp. It allows you to create a sort-of access control list matching on syscall number and parameters, but doesn't let you nested combinations or so. In that case you would have to write cBPF bytecode yourself. Seccomp does not support the more powerful features like maps that eBPF offers.

Since we're gonna build a whitelist, the next step is to find out which syscalls we need. One way to find out is to run the bubblewrap with a filter that just logs all syscalls:

#include <seccomp.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
    scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_LOG);
    seccomp_arch_add(ctx, SCMP_ARCH_X86);
    seccomp_export_bpf(ctx, 1);
    seccomp_export_pfc(ctx, 2);
    seccomp_release(ctx);
    return 0;
}

The program will write a human-readable form the filter to stderr and the BPF bytecode to stdout:

% gcc -o /tmp/genseccomp -lseccomp genseccomp.c && /tmp/genseccomp >/tmp/logging.bpf
#
# pseudo filter code start
#
# filter for arch x86_64 (3221225534)
if ($arch == 3221225534)
  # default action
  action LOG;
# filter for arch x86 (1073741827)
if ($arch == 1073741827)
  # default action
  action LOG;
# invalid architecture action
action KILL;
#
# pseudo filter code end
#

Now let's run bubblewrap with that filter. bwrap reads the filter from a file descriptor passed to it, so we have give it a fd number instead of a file name.¹ Stop the server quickly after starting it, as it creates a huge amount of logs.

% bwrap \
    --unshare-all --share-net \
    --ro-bind ./alpine_x86/lib/ld-musl-i386.so.1 /lib/ld-linux.so.2 \
    --ro-bind /etc/resolv.conf /etc/resolv.conf \
    --bind ./soldat /soldat \
    --chdir /soldat \
    --seccomp 10 10</tmp/logging.bpf \
    ./soldatserver

Now filter the seccomp logs out of your system logs:

% journalctl --since=-1m | grep SECCOMP > seccomp.log
# or
% grep SECCOMP /var/log/messages > seccomp.log

% head -n4 seccomp.log
SECCOMP auid=1001 uid=1001 gid=1001 ses=1 pid=64236 comm="bwrap" exe="/usr/bin/bwrap" sig=0 arch=c000003e syscall=61 compat=0 ip=0x7f852590fb8a code=0x7ffc0000
SECCOMP auid=1001 uid=1001 gid=1001 ses=1 pid=64237 comm="bwrap" exe="/usr/bin/bwrap" sig=0 arch=c000003e syscall=59 compat=0 ip=0x7f85259100ab code=0x7ffc0000
SECCOMP auid=1001 uid=1001 gid=1001 ses=1 pid=64237 comm="soldatserver" exe="/soldat/soldatserver" sig=0 arch=40000003 syscall=243 compat=1 ip=0xf7ec670b code=0x7ffc0000
SECCOMP auid=1001 uid=1001 gid=1001 ses=1 pid=64237 comm="soldatserver" exe="/soldat/soldatserver" sig=0 arch=40000003 syscall=258 compat=1 ip=0xf7e92153 code=0x7ffc0000

Taking a look at them you will notice that not only does the seccomp filter catch soldatserver syscalls, but also some from bwrap. This makes sense. After all, bubblewrap installs the filter. Since it has to exec soldatserver afterwards that unfortunately also means we have to allow the execve syscall.

You may also notice that we get two different values for arch. That makes sense, since the host is x86_64 and soldatserver x86. This is also the reason we added x86 with seccomp_arch_add when generating the bytecode. Since syscall numbers can be different between architectures, libseccomp wraps the syscall rules in architecture checks.

The next step is to get those syscalls whitelisted. For that I wrote a little Python script that parses the logs and spits out C code calls to seccomp_rule_add for each syscall, sorted from the most frequently called syscall. Because, you know, efficiency.

#!/usr/bin/env python3

import sys
import re
from collections import defaultdict
from operator import itemgetter
from functools import lru_cache
from subprocess import check_output

if '-h' in sys.argv:
    print(f"usage: {sys.argv[0]} < seccomp_audit_logs")
    exit(0)

@lru_cache(maxsize=None)
def syscall_name(arch, syscall):
    return check_output(['scmp_sys_resolver', '-a', arch, syscall]).decode().strip()

arch_map = {
        '40000003': 'x86',
        'c000003e': 'x86_64',
}
syscalls = defaultdict(int)
syscalls_per_arch = {arch: defaultdict(int) for arch in arch_map.values()}

log_re = re.compile(r"SECCOMP.*arch=(?P<arch>[0-9a-f]+).*syscall=(?P<syscall>\d+)")
for line in sys.stdin:
    m = log_re.search(line)
    if not m: continue
    arch = arch_map[m.group('arch')]
    syscall = syscall_name(arch, m.group('syscall'))
    syscalls_per_arch[arch][syscall] += 1
    syscalls[syscall] += 1

for syscall, cnt in sorted(syscalls.items(), key=itemgetter(1), reverse=True):
    print(f"seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS({syscall}), 0);")

Feed the logs to it and put the code in a file:

% python gensyscalls.py < seccomp.log > genseccomp-rules.inc
% head -n2 genseccomp-rules.inc
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(_newselect), 0);

Those rules blanket-whitelist the syscalls. One could filter based to parameters as well, but the seccomp logs don't contain those, so another tool to trace syscalls would be necessary, so we'll pass on that.²

Adjust the genseccomp.c program, setting the default action to kill and including the rules:

#include <seccomp.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
    scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL);
    seccomp_arch_add(ctx, SCMP_ARCH_X86);
#include "genseccomp-rules.inc"
    seccomp_export_bpf(ctx, 1);
    seccomp_export_pfc(ctx, 2);
    seccomp_release(ctx);
    return 0;
}

Now let's run it again with our new seccomp filter:

% gcc -o /tmp/genseccomp -lseccomp genseccomp.c && /tmp/genseccomp >/tmp/soldatserver.bpf 2>/dev/null
% bwrap \
    --unshare-all --share-net \
    --ro-bind ./alpine_x86/lib/ld-musl-i386.so.1 /lib/ld-linux.so.2 \
    --ro-bind /etc/resolv.conf /etc/resolv.conf \
    --bind ./soldat /soldat \
    --chdir /soldat \
    --seccomp 10 10</tmp/soldatserver.bpf \
    ./soldatserver

             -= Soldat Dedicated Server 1.7.1 - 2.8.2 =-

Aaand it runs, nice! Try removing just a single rule and see soldatserver getting killed.

That's it, wasn't that a breeze?

Unfortunately, shells don't support directly passing another commands output as a specific fd to another process, so we need to go with a file here. In fact, zsh doesn't even support the syntax with a file, so I had to put the command in a file and run it with sh. ↩
By the way, another interesting tool to generate seccomp filters that also match syscall parameters in a nicely readable way is kafel. Go check it out! ↩