Lab 4.0 - Linux kernel capabilities

Return to Workshop

Containers provide a certain degree of process isolation via kernel namespaces. In this module, we’ll examine the capabilities of a process running in a containerized namespace. Specifically we’ll look at how Linux capabilities can be used to grant a particular or subset of root privileges to a process running in a container.

It would be worth taking a few minutes to read this blog post before beginning this lab.

Capabilities

Capabilities are distinct units of privilege that can be independently enabled or disabled.

Start by examining the Linux process capabilities header file.

vim -R /usr/include/linux/capability.h

Turn on line numbering in vim.

:set number

The capability bit mask definitions begin around line 103. Each capability is defined by a bit position followed by a short description. For example, the first capability is CAP_CHOWN (bit 0). This capability grants permission to change the ownership of files.

103 /*
104  ** POSIX-draft defined capabilities.
105  **/
106
107 /* In a system with the [_POSIX_CHOWN_RESTRICTED] option defined, this
108    overrides the restriction of changing file ownership and group
109    ownership. */
110
111 #define CAP_CHOWN            0

Exit vim:

:q

Now examine the capabilities bit mask of a rootless process running on the host.

grep CapEff /proc/self/status
CapEff:	0000000000000000

The bitmask returned is 0x0 meaning this process does not have any capabilities. For example, since bit 0 is not set (CAP_CHOWN) a process with this CapEff bitmask can not execute chown or chgrp on a file that it does not own.

Now examine the capabilities bit mask of a root process on the host.

sudo grep CapEff /proc/self/status
CapEff:	000000ffffffffff

Notice that all 38 capability bits are set indicating this process has a full set of capabilities. In a later exercise, you’ll discover that a container running as root has a filtered set of capabilities by default but this can be changed at run time.

The capsh command can decode a capabilities bitmask into a human readable output. Try it out!

capsh --decode=ffffffffff
0x000000ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,38,39

NOTE: for those interested, capability '38' allows the following:

* CAP_PERFMON relaxes the verifier checks further:
 * - BPF progs can use of pointer-to-integer conversions
 * - speculation attack hardening measures are bypassed
 * - bpf_probe_read to read arbitrary kernel memory is allowed
 * - bpf_trace_printk to print kernel memory is allowed

NOTE: for those interested, capability '39' allows the following:

 * CAP_BPF allows the following BPF operations:
 * - Creating all types of BPF maps
 * - Advanced verifier features
 *   - Indirect variable access
 *   - Bounded loops
 *   - BPF to BPF function calls
 *   - Scalar precision tracking
 *   - Larger complexity limits
 *   - Dead code elimination
 *   - And potentially other features
 * - Loading BPF Type Format (BTF) data
 * - Retrieve xlated and JITed code of BPF programs
 * - Use bpf_spin_lock() helper

In comparison, '200' indicates the immutable capability (CAP_LINUX_IMMUTABLE):

capsh --decode=200
0x0000000000000200=cap_linux_immutable

Exploring the capabilities of containers.

A non-null CapEff value indicates the process has some capabilities. You will discover that the capabilities of a container are often less than what a root process has running on the host.

Start by running a rootless container as --user=32767 and look at it’s capabilities.

podman run --rm -it --user=32767 registry.access.redhat.com/ubi8/ubi grep CapEff /proc/self/status
CapEff:	0000000000000000

Next run a rootless container as --user=0 and look at it’s capabilities. Note that it’s capabilities are filtered from that of a root process on the host.

podman run --user=0 --rm -it registry.access.redhat.com/ubi8/ubi grep CapEff /proc/self/status
CapEff:	00000000a80425fb

Now run a container as privileged and compare the results to the previous exercises. What conclusions can you draw?

sudo podman run --user=0 --rm -it --privileged registry.access.redhat.com/ubi8/ubi grep CapEff /proc/self/status
CapEff: 000000ffffffffff

Examining container processes

Run a container in the background that runs for a few minutes.

podman run --user=0 --name=sleepy -it -d registry.access.redhat.com/ubi8/ubi sleep 999

Use podman top to examine the container’s capabilities.

podman top sleepy capeff
EFFECTIVE CAPS
AUDIT_WRITE,CHOWN,DAC_OVERRIDE,FOWNER,FSETID,KILL,MKNOD,NET_BIND_SERVICE,NET_RAW,SETFCAP,SETGID,SETPCAP,SETUID,SYS_CHROOT

Try out the following very useful shortcut (-l). It tells podman to act on the latest container it is aware of:

podman top -l capeff

Podman top has many additional format descriptors you can check out.

podman top -h

Capabilities Challenge #1

How could you determine which capabilities podman filters from a root process running in a container?

From a previous exercise we know that a root process on the host has a capabilities mask of CapEff = 000000FFFFFFFFFF

From a previous exercise we know that a root process in a container has a capabilities mask of CapEff = 00000000A80427FB

Hint: Below is an example that uses the Linux binary calculator bc to add hexadecimal numbers (0x9 + 0x1) = A.

sudo yum install bc -y
echo 'obase=16;ibase=16;9+1' | bc
A

Capabilities Challenge #2

Suppose an application had a legitimate reason to change the date (ntpd, license testing, etc) How would you allow a container to change the date on the host? What capabilities are needed to allow this?

Run a container, save the date then try to change the date.

podman run --rm -ti --user 0 --name temp registry.access.redhat.com/ubi8/ubi bash
savethedate=$(date)
date -s "$savethedate"
date: cannot set date: Operation not permitted
Mon Apr  8 21:45:24 UTC 2019
exit

Capabilities Challenge #3

You have been given a container image to deploy (quay.io/bkozdemb/hello). The application needs to use the chattr utility but must not be allowed to ping any hosts. Use what you’ve learned about capabilities to properly deploy this application using podman.

For example, ping succeeds but chattr fails. We want the opposite.

podman run -it --name=chattr_no_ping --rm quay.io/bkozdemb/utils bash
ping -c1 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.035 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.035/0.035/0.035/0.000 ms
cd /tmp
touch file
chattr +i file
chattr: Operation not permitted while setting flags on file

Exit from the container:

exit

Example Solutions to Challenges

Challenge #1: One approach would be to use your favorite binary calculator (bc) to calculate the difference in CapEff between a host root process (0xffffffffff) and a containerized root process (0xa80427fb).

  0xFFFFFFFFFF
- 0x00A80427FB
  ------------
  0xFF57FBD804
echo 'obase=16;ibase=16;FFFFFFFFFF-A80427FB' | bc
FF57FBD804

To produce a human readable list, use capsh to decode the vector.

capsh --decode=FF57FBD804
0x000000ff57fbd804=cap_dac_read_search,cap_net_broadcast,cap_net_admin,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_lease,cap_audit_control,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,38,39

Challenge #2: To allow a container to set the system clock, the sys_time capability must be added. Add this capability then try setting the date again.

sudo podman run --rm -ti --user 0 --name temp --cap-add=sys_time registry.access.redhat.com/ubi8/ubi bash
savethedate=$(date)
date -s "$savethedate"
Mon Apr  8 21:46:18 UTC 2019

And exit from the container:

exit

Challenge #3: Drop all capabilities then add linux_immutable. The trick with this is the container must run as root because linux_immutable is a filtered capability.

sudo podman run -it --name=chattr_no_ping --rm --cap-drop=all --cap-add=linux_immutable quay.io/bkozdemb/utils bash

The chattr command should succeed in making a file read only:

cd /tmp
touch file
chattr +i file
rm -rf file
rm: cannot remove 'file': Operation not permitted
lsattr file
----i--------------- file

Remember to reset the file attributes so the container can shutdown cleanly:

chattr -i file
lsattr file
-------------------- file

On the host, check the capabilities of the container. This will require you to open another terminal:

sudo podman top chattr_no_ping capeff
EFFECTIVE CAPS
LINUX_IMMUTABLE

Exit the container (in the original container):

exit

Workshop Details

Domain Red Hat Logo
Workshop
Student ID

Return to Workshop