Experiment with Time Namespace
tl;dr: I tried to touch Time namespace, which was merged in Linux 5.6.
What is Linux namespace? Linux containers belong to “namespaces”, which are separate and independent of the host for various resources. There is a namespace for each type of resource, for example, Namespace for network, Namespace for mount point information, Namespace for UTS such as host name, etc. Up to 5.5, seven namespaces were implemented.
A new member has been born in the namespaces that was thought to be fixed at seven. That is Time Namespace.
Then, what is a Time Namespace?
Time Namespace is literally a namespace about time. This means that you can create a container that has a different concept of time than the host.
However, the first “clock” we see, CLOCK_REALTIME, has nothing to do with this namespace (tests for this clock were not added to the patch test case). If you modify the time with the date -s
command in a container that unshares the Time Namespace, the host will remain the same as shown below.
vagrant@ubuntu-groovy:~/util-linux$ sudo ./unshare -T --fork
root@ubuntu-groovy:/home/vagrant/util-linux# date
Thu Apr 2 18:38:02 UTC 2020
root@ubuntu-groovy:/home/vagrant/util-linux# date -s '2010-01-01 00:00:00'
Fri Jan 1 00:00:00 UTC 2010
root@ubuntu-groovy:/home/vagrant/util-linux# exit
logout
vagrant@ubuntu-groovy:~/util-linux$ date
Fri Jan 1 00:00:33 UTC 2010
As you can see by using strace, date -s
calls clock_settime(CLOCK_REALTIME)
internally, which affects the time on the host.
So what are the benefits of Time Namespace? For one thing, there seems to be a desire to use it for container migration — as mentioned in the patch comments.
There is a technology called CRIU that allows you to create checkpoints for processes and containers and recover tasks from them.However, this technology basically recovers the container “as much as possible” from process attributes, registers and memory dumps.
Using this CRIU, for example, if you try to recover a container that has been checkpointed on one host by migrating it to another host, the time since the kernel started booting (“MONOTONIC and BOOTTIME”) will be different between one host and another. The container will be affected by this, and the recognized value of CLOCK_MONOTONIC/BOOTTIME will be different before and after the migration. If your program has important operations related to time, you may have a problem.
CLOCK_MONOTONIC, for example, is a “non-reversible” clock that is guaranteed to always increase, but process migration can result in a reverted state. It is inevitable that some programs are not prepared for such a situation.
This problem can be solved if the CRIU can store and replay the “current clock information” using Time Namespace, which is not a problem in VM migration, but can be a problem in containers.
Note that if you want to stub the CLOCK_REALTIME equivalent in userland API, you can override the function in the programming language (e.g., timecop in Ruby), or by overriding a specific function with LD_PRELOAD (e.g., libfaketime).
Let’s try the functionality using Linux 5.8. First, let’s prepare a time-namespaced environment, Ubuntu 20.10 Groovy.
vagrant@ubuntu-groovy:~$ uname -a
Linux ubuntu-groovy 5.8.0-53-generic #60-Ubuntu SMP Thu May 6 07:46:32 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Let’s try the unshare command.
vagrant@ubuntu-groovy:~$ unshare -h
Usage:
unshare [options] [<program> [<argument>...]]
Run a program with some namespaces unshared from the parent.
Options:
-m, --mount[=<file>] unshare mounts namespace
-u, --uts[=<file>] unshare UTS namespace (hostname etc)
-i, --ipc[=<file>] unshare System V IPC namespace
-n, --net[=<file>] unshare network namespace
-p, --pid[=<file>] unshare pid namespace
-U, --user[=<file>] unshare user namespace
-C, --cgroup[=<file>] unshare cgroup namespace
-T, --time[=<file>] unshare time namespace
...
--monotonic <offset> set clock monotonic offset (seconds) in time namespaces
--boottime <offset> set clock boottime offset (seconds) in time namespaces
Ubuntu Groovy’s unshare has -T
and —-monotonic/--boottime
options, as shown above.
After changing CLOCK_BOOTTIME, /proc/uptime
seems to be affected. I’ll try it. You can mount your own /proc
filesystem inside the container by using the unshare’s option --mount-proc
, so let’s use that.
vagrant@ubuntu-groovy:~$ uptime
19:32:40 up 1 day, 14:17, 2 users, load average: 0.07, 0.05, 0.01
vagrant@ubuntu-groovy:~/util-linux$ sudo unshare --mount-proc -T --boottime=86400 --root=/var/run/myroot
root@ubuntu-groovy:/# uptime
19:32:47 up 2 days, 14:17, 0 users, load average: 0.06, 0.05, 0.01
root@ubuntu-groovy:/# exit
logout
vagrant@ubuntu-groovy:~$ sudo unshare --mount-proc -T --boottime=-86400 --root=/var/run/myroot
root@ubuntu-groovy:/# uptime
19:32:59 up 14:17, 0 users, load average: 0.05, 0.05, 0.00
root@ubuntu-groovy:/# exit
logout
You can see that you can change the “uptime” inside the container, and you can go to the past or the future.
But how does the kernel shift the time after unshare…? There is one curious thing here: immediately after unsharing, the offset of the boottime and other parameters of the host and container do not change (as is the case with other Namespaces). So, where does it “shift”?
Because functions like clock_settime(2)
cannot override CLOCK_MONOTONIC or CLOCK_BOOTTIME. (ref: clock_getres(2) )
You must be operating in a different way.
In fact, you can find it by putting in an invalid value on purpose.
vagrant@ubuntu-groovy:~$ sudo unshare --mount-proc -T --boottime=-186400 --mount-proc --root=/var/run/myroot
unshare: failed to write to /proc/self/timens_offsets: Numerical result out of range
The file /proc/$PID/timens_offsets
seems to be the key.
An explanation can be found in the time_namespaces(7) manual at:
According to it, you can manipulate monotonic/boottime offsets by doing write operations to /proc/$PID/timens_offsets
in the following format.
<clock-id> <offset-secs> <offset-nanosecs>
The write operation can be done only when there are no other member processes in the namespace. Here, <clock-id>
only supports monotonic
(= CLOCK_MONOTONIC) and boottime
(= CLOCK_BOOTTIME).
You can also find the current offset in read operations.
vagrant@ubuntu-groovy:~/util-linux$ sudo unshare --mount-proc -T --boottime=86400 --root=/var/run/myroot
root@ubuntu-groovy:/# cat /proc/self/timens_offsets
monotonic 0 0
boottime 86400 0
An implementation of the Time Namespace operation with “unshare” is very helpful to understand.
Basically, the Namespace was added for the purpose of container migration, as described in the kernel patch and comments in man, but it could also be used, for example, to make the result of the uptime command look like the time since each container was started by setting the offset back to 0
when the container is created, and.
If you know about it, it might be useful for something…
References to related patches and implementations are listed by tenforward. Note it’s in Japanese.
// Original Japanese article: