Seccomp
Seccomp in Docker
Seccomp (Secure Computing Mode) is a Linux feature that limits the system calls a process can make. By restricting syscalls, it creates a more secure environment for running programs.
To check if seccomp is supported, inspect the boot config file.
Built-in Seccomp Filters in Docker
Docker uses a built-in Seccomp filter when creating containers, as long as the host kernel has Seccomp enabled. Here's a basic example of the Seccomp profile:
{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": [
"amd64",
"x86_64"
],
"syscalls": [
{
"name": "accept4",
"action": "SCMP_ACT_ALLOW"
},
{
"name": "access",
"action": "SCMP_ACT_ALLOW"
},
{
"name": "adjtimex",
"action": "SCMP_ACT_ALLOW"
},
// ... additional syscalls ...
]
}
Seccomp Modes
Seccomp operates in the following modes:
0
- Disabled1
- Filter Mode2
- Notification Mode
Filter Mode
In Filter Mode, Seccomp allows or denies syscalls based on a set filter.
Example Seccomp filter:
{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": ["amd64"],
"syscalls": [
{ "name": "read" },
{ "name": "write" },
{ "name": "exit" }
]
}
User Notification Mode
In this mode, the process is notified (via a signal) when a specified syscall is about to be executed. The process can then decide how to handle it.
Example in C:
#include <linux/seccomp.h>
#include <stdio.h>
#include <sys/prctl.h>
int main() {
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, NULL);
prctl(PR_SET_SECCOMP, SECCOMP_MODE_NOTIFY, SECCOMP_RET_TRAP);
return 0;
}
Seccomp Profiles
Seccomp profiles control the system calls allowed for a process.
- Can be strict or use a filter expression.
- Created manually or using seccomp-bpf or Docker.
A Seccomp profile consists of:
- Default Action: actions for undefined syscalls.
- Architecture: defines supported systems.
- Syscalls Array: list of syscalls and actions.
Here's an example of a simple seccomp profile in JSON format that allows only a few basic syscalls:
{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": ["amd64"],
"syscalls": [
{ "name": "read" },
{ "name": "write" },
{ "name": "exit" }
]
}
Note that there are two types of profiles:
- Whitelist - Allows defined syscalls, deny the rest .
- Blacklist - Rejects defined syscalls, allows the rest.
Below is an example:
Specifying a Custom Seccomp Profile
Create a custom Seccomp profile and use it when running containers:
## custom.json
{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": ["amd64"],
"syscalls": [
{ "name": "read", "action": "SCMP_ACT_ALLOW" },
{ "name": "write", "action": "SCMP_ACT_ALLOW" },
{ "name": "exit", "action": "SCMP_ACT_ALLOW" },
{ "name": "exit_group", "action": "SCMP_ACT_ALLOW" },
{ "name": "open", "action": "SCMP_ACT_ALLOW" },
{ "name": "close", "action": "SCMP_ACT_ALLOW" },
{ "name": "fstat", "action": "SCMP_ACT_ALLOW" },
{ "name": "arch_prctl", "action": "SCMP_ACT_ALLOW" },
{ "name": "brk", "action": "SCMP_ACT_ALLOW" },
{ "name": "munmap", "action": "SCMP_ACT_ALLOW" },
{ "name": "mmap", "action": "SCMP_ACT_ALLOW" }
// Add more syscalls as needed
]
}
To use this profile:
docker run --security-opt seccomp=/path/to/custom.json -it ubuntu:latest
Disable Seccomp When Running Container
We can also tell the Docker container to completely ignore any seccomp profile completely:
docker run \
--security-opt seccomp=unconfined \
-it ubuntu:latest
By doing this, the container should be able to use all avaiable syscalls from within the container.
This is NOT RECOMMENDED.
Seccomp in Kubernetes
In Kubernetes, Seccomp isn’t enabled by default. To enable it, specify it in the pod's security context.
Add more containers or configurations if needed
Now, if we try to run a pod using the image, we'll see a different output.
From the pod logs above, we see that there's lesser blocked syscalls, and the Seccomp is set to disabled. This is because Kubernetes doesn't implement Seccomp by default.
To implement Seccomp in the Pod, specify it as a Security Context in the Pod definition file.
apiVersion: v1
kind: Pod
metadata:
name: seccomp-pod
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: my-container
image: nginx:latest
securityContext:
allowPrivilegeEscalation: false
# Add more containers or configurations if needed
Using a Custom Seccomp Profile in Kubernetes
To use a custom profile, create the profile in the /var/lib/kubelet/seccomp/profiles/
directory and reference it in the pod definition.
Example custom profile:
{
"defaultAction": "SCMP_ACT_LOG"
}
Kubernetes pod YAML using the custom profile:
apiVersion: v1
kind: Pod
metadata:
name: seccomp-pod
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/audit.json
containers:
- name: my-container
image: nginx:latest
securityContext:
allowPrivilegeEscalation: false
Once pod is created, syslog calls made by the container in the pod will be logged in the /var/log/syslog
file.
From the syslog output above, we could see the syslog call number made by the container in the pod. Note that this number are mapped to specific syscall names, which we can check in the /usr/include/asm/unistd_64.h
Below are just some of the syscall numbers and their corresponding syscall names.
Rejecting All Syscalls in Kubernetes
Ccreate a Seccomp profile that rejects all syscalls by using:
{
"defaultAction": "SCMP_ACT_ERRNO"
}
Then, use it in your pod definition:
apiVersion: v1
kind: Pod
metadata:
name: test-violation
spec:
restartPolicy: Never
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/violation.json
containers:
- name: my-container
image: nginx:latest
securityContext:
allowPrivilegeEscalation: false
After applying this profile, the pod will be in "ContainerCannotRun" status due to rejected syscalls.