Docker

Resources

I was very impressed with this excellent high-quality Docker advice. A lot of good tips here.

The Wrong Docker

On Debian Stretch the docker package is described as follows:

Docker is a docking application (WindowMaker dock app) which acts as a system tray for any desktop environment, allowing you to have a system tray without running the KDE/GNOME panel Docker was designed to work with Openbox 2, but it should work fine in any window manager.

Let that sink in. This is not the Docker you are looking for! That totally exceeds Red Hat’s Anaconda installer naming train wreck! So, do not try apt install docker. This is now a transitional package trying to rename their docker to wmdocker. But again, totally unrelated!

What Exactly Is This Crazy Thing?

It looks like it’s the userland controls for some newish Linux kernel features. (Let’s not even waste too much time pondering how Windows people are running it — their reverse WINE? No idea. Don’t care.) The two notable features are cgroups and the overlay file system.

Cgroups, sometimes called Control groups, are to other Linux-managed resources what ordinary permissions groups are to file access. It seems there is a lot of fine grained control over these resources too. Old style unix file group ownership just controls who can access that file in a boolean way. Cgroups can even effectively replace other quota mechanisms (thank god). It looks like this feature can isolate and ration out CPU, memory, disk, and network. The network is probably instantaneous bandwidth and the disk is probably total bytes used because that is pragmatic. But while this is great, I’m not breathing a big sigh of relief that all desirable management controls are possible because I suspect they are not. I have had situations where I would have loved to ration out weird things like disk access bandwidth, not total bytes written. If you see some non-zero values when you look at /proc/cgroups then you’re probably using this feature. Note that on the host system, all the processes from the cgroups are visible and together in the kernel’s master process list; their PID numbers are completely different.

The other recent addition to the kernel that makes Docker possible is the overlay filesystem. Like the old aufs and unionfs, this is a union file system strategy. I am not sure what improvements exactly have been made but my guess is that this new system is much better at stacking many layers of file systems and also mounting to arbitrary points on a file system tree (not just top levels of volumes). It also can be exported over NFS. If you feel like you do not need anything but a way to manage installation cruft of dependency hogs, you might not need Docker at all; it’s possible that just sensibly using overlay mounts can suffice. Ubuntu has the overlayroot-chroot program which helps out doing the --rbind mounts for /proc/, /run/ /sys, etc, just like we used to do for Gentoo installs. Not sure about where /dev gets taken care of.

Sometimes I hear about "namespaces" being a thing. But that is kind of vague as in this official description: "Docker uses a technology called namespaces to provide the isolated workspace called the container. When you run a container, Docker creates a set of namespaces for that container." Circular definitions are definitions that are circular! Thanks, Docker! I think it just entails Docker doing things with remapping users and groups into other identities. The subuid man page is interesting. It may be a new (to me) extension of shadow-utils that allows for "subordinate user ids" i.e. a list of allowable ID remappings usually found in /etc/subuid and /etc/subgid.

I suspect a docker "container" is really just a cgroup configuration organized and set up plus an overlay filesystem over some base image’s file system plus a chroot.

Note that Docker is a privileged daemon process and anyone who can run it (any normal user who can create or run a container) can probably wreck the host system with an escalation. This is a good example and discussion of security. A process running inside the container might be able to be a bit more constrained, but be careful about such thinking.

Installation

Installation is quite unnerving for security sensitive applications. While entirely unremarkable to Windows (and Apple) users, this extremely privileged software needs you to give docker.com complete control of your system to do basically anything!

In theory there are ways to manage docker as a non-root user which may slightly help with security. But probably not since it misses the point of where I’m insinuating the attack surface really is. The real utility of adding your users to a docker group (helpfully set up by the Ubuntu package) is to spare them tedious sudo requirements. I don’t think it really contributes to meaningful security in any way. Still, it does in fact make using the tools a lot less annoying.

DOCKERURL=https://download.docker.com/linux/ubuntu
DOCKERKEYURL=${DOCKERURL}/gpg
wget -qO- $DOCKERKEYURL | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] ${DOCKERURL} focal stable"

Once you are trusting them to do literally anything they want to your 1s and 0s, you should be able to pretend like the rest of the installation is easy! I think the "ce" here is "container engine"; we can go with that as a mnemonic anyway. Or maybe it’s "community engine" vs. the enterprisey version.

sudo apt update
sudo apt install docker-ce
sudo gpasswd -a xed docker

Status Testing

Wow. Doing docker run --help took 3.6 seconds on my system. YMMV.

Here are some systemd commands and such that can help check if things are automagically working more or less.

systemctl is-active docker
systemctl is-enabled docker
systemctl status docker
docker version
docker ${DOCKER_CMD} --help     # To get sub-command specific help.

Running A Simple Image

The official documentation clarifies important terminology: "A Docker registry stores Docker images. Docker Hub is a public registry that (in theory) anyone can use, and Docker is configured to look for images on Docker Hub by default."

To use Docker you need images. Images are like classes — a Docker container is an instance of an Docker image. Images are writeable layers stacked onto the container’s fixed layers.

docker pull TheImage
docker run -d --name TheImage-instance -p 80:80 TheImage

The -p can map any needed ports (or they might be invisible outside the container). The -d is detach, like a background daemon. Official documentation for run command. The volume created by the middle command probably needs some kind of hook up on the run, but it’s something like that. That also shows where the volumes really live. Symlink it if you want access from a more convenient place.

How is the whole Docker show going? Here are some more diagnostics and a full test cycle.

docker info
docker images
docker run hello-world      # An official test image.
docker images
docker rmi -f feb5d9fea6a5

That hex string identifier came from the "IMAGE ID" field of images sub-command.

Another fancier test.

docker run -d -p 80:80 docker/getting-started

Then you can point a browser at that machine (using http — not https!) and see a fancy web server serving a docker demo (obviously you’ll probably be able to find this on the real www but that’s not the point I guess). Unfortunately it tells you right away about how to access the "[Docker Dashboard]…for either Mac or Windows." Linux thing does not support Linux. Love it.

Another good test image that may be more useful generally.

docker run bash:latest

Note here that bash is the value of the REPOSITORY field and latest is the TAG as viewed in docker images.

Multifarious Ways To Activate Containers

Run

Creates a writeable container over the specified image.
Must start with an image.
--attach or -a possible.
Similar to create followed by start.
"…the container process that runs is isolated in that it has its own file system, its own networking, and its own isolated process tree separate from the host." From Docker run reference.
--interactive or -i, and --terminal or -t are possible and often go together. Or use -a to get specific streams like docker run -a stdin -a stdout -i -t ubuntu /bin/bash.
--restart=<mode> can be no, always, unless-stopped, on-failure.
--rm will get rid of the upper overlay used by the container. This means you can’t get any data out of it (debug?) but it won’t be clogging things up either.
--network=host will allow the host network to be visible on the container.

There are also all kinds of ways to limit resources. Even stuff like --device-write-bps= to rate limit block devices.

Reference

Exec

Runs a new command in a running container.
Can not be paused — must be running.
Needs to be a single command, so the shell command is needed with stuff like this sh -c "cat A && cat B".
--ti interactive tty is possible.
--user possibly useful for those processes that don’t like you running as root.
If you want to see what is going on in a container shell, see attach which will give you a duplicate connection to it. Exec let’s you open multiple terminals on the same container. For example, this is handy running something like ROS with all it’s many pieces that have to be up simultaneously.

Reference

Start

"Start one or more stopped containers"

"A stopped container can be restarted with all its previous changes intact using docker start." Why is this phrase in the documentation for run?

--ia interactive and attach - but not TTY for some reason.

Reference

Restart

Kills a currently running container and then starts it again.
Note that it doesn’t attach automatically and I don’t see that as an option.

Reference

Attach

Attach normal STD streams to your local normal ones (that you’re likely using to type and see action).
Pressing Ctrl-C while attached seems to break (send SIGKILL) the attachment rather than whatever was going on inside. So if you were running a shell and wanted to stop a sub-process Ctrl may just kill the whole container run. To prevent this, check out --sig-proxy.
To get out of an "attachment" quietly while leaving it running, use the magic key combo of [Ctrl+p,Ctrl+q].
If you’re doing some interactive thing in one terminal and then dk attach <hashid> over in another, both terminals will control the original attachment simultaneously. If this is not what you want see exec

Unpause

I can’t remember exactly what SIGs we’re talking about but it’s something like SIGSTOP for docker pause <id> and SIGCONT for this unpause action. This is will allow the process to be resumed. I do not think this survives reboot of course.

Behavior

What exactly is Docker doing? It’s mostly faking you out into thinking you’re looking at a system when you’re only looking at some part of the system filtered by Linux namespaces. Here are some things to try both in a Docker created environment and also in the host environment. You can do this in a tmux session with two screens. Start by firing up a container.

docker run -i -t ubuntu /bin/bash   # Interactive TTY shell of standard Ubuntu image.

Then look at the differences in the host session with stuff like this.

getent passwd  # What users are present.
ps -ef         # Note that container processes are visible on the host but the PIDs are (likely) different!
cat /etc/hosts
cat /etc/resolv.conf
netstat -lntu
mount | wc -l
ls /proc
ifconfig -a    # apt install net-tools
date           # Yes, this can be different at least the TZ.
date > /dev/shm/{docker,real}     # Then ls these.

Here is the host system looking at the PID of a container’s namespace giving some hints about precisely what is isolated.

$ sudo ls -al /proc/4995/ns/
total 0
dr-x--x--x 2 root root 0 Mar 26 14:51 .
dr-xr-xr-x 9 root root 0 Mar 26 14:10 ..
lrwxrwxrwx 1 root root 0 Mar 26 14:51 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 mnt -> 'mnt:[4026531840]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 net -> 'net:[4026531992]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 time -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 uts -> 'uts:[4026531838]'

Since Docker is using cgroups there are some possibilities related to resource limiting (like classic quotas), prioritization (like classic nice), accounting (like classic pacct), control (freezing and suspending groups).

Management

I’m kind of thankful the automagical graphical interface has no support for Linux. Not a problem.

There are two fundamentally different things that need management: images and containers. (Let’s ignore cgroup fancy jails for now — like 99% of Docker users do.) Think of an image like an old fashioned installation CD. You can’t really do much with that. If you want to change it, that’s kind of an involved PITA usually requiring a new thing. But once you get it the way you like it, you can now install that CD’s OS on as many machines as your patience allows. Each of the machines you install to will be "running that" OS and it is its own independent thing. In Docker, this is all done with overlay filesystems. There are overlay file systems for the base "install CDs" and there are more overlays for the "installations". These are images and containers respectively.

Containers

To see what containers are in play check with this.

docker ps           # Only shows _running_ containers.
docker container ls # Very similar or possibly the same.
docker ps -a        # Show stopped ones too.
docker container ls -a # Yup, same still

Note that the silly names you might see in the "NAMES" field (e.g. "happy_hopper") are auto-generated by Docker if you don’t provide one.

Something running you didn’t think was and didn’t want to be?

docker stop 99f468406ae7

The hash looking ID is shown in the ps output. These can be shortened as much as you like as long as they remain unambiguous.

Ok, it’s stopped, but is it really gone? Probably not. To get rid of this overlay filesystem that contains a container, use the rm command.

docker rm 99f468406ae7
docker rm -f 99f468406ae7 # If running this will force a stop first.

Need to get rid of a big long list of containers?

dk ps -a |tail -n+2|cut -c 1-4|while read C; do echo $C; docker rm $C; done

Also you can get rid of all stopped containers with this.

docker container prune

Very commonly I find that I have a stopped container and I actually want to continue using it. First remember that just because you get nothing in docker ps doesn’t mean there’s nothing there — remember to pretty much always use -a for commands like this.

$ dk ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES
$ dk ps -a
CONTAINER ID IMAGE         COMMAND  CREATED      STATUS                     PORTS  NAMES
b736e1121eec xed:base-cool "bash"   31 hours ago Exited (127) 6 seconds ago        happy_hopper

Ok, so you’ve found a stopped container and you’d like to use it. Note that it can be referred by some shortening of its container ID.

docker start b7

Now nothing really seems to happen but docker ps now shows this container even without the -a and its status field is now "Up 6 seconds" or something like that. But you’re still not using it or doing anything useful. Or maybe you are. If it is running some server and the "PORTS" field shows them mapped out properly, then you may be all set. But if you want to use a shell on this container, you’ll need to attach to it.

docker attach b7

You can also start and attach at the same time with the -a option of start.

You can also run as a different user. Note that just attaching as a different user is kind of bad thinking since the container is already running as whatever user it is — you’re just really hooking up STDIN/STDOUT to it, whatever it is. The run command runs a command in a new container (a shell if unspecified).

docker run --attach --user xed imagename /usr/bin/bash

Hmm. Just tried run and it left the container status as "created". I have to also "start" it apparently.

When you exit out of that shell with a normal exit command, it seems to stop the action of the container. I think technically, the shell or highest parent process is killed. If you want that to nohup then you can exit with a "detach keyboard sequence". This is [ctrl+p,ctrl+q] and if you don’t like that for some reason, you can configure it in ~/.docker/config.json by setting the value of detachKeys. I think the normal setting would be {"detachKeys": "Ctrl-p,Ctrl-q"}. This can also be passed in at attach time.

docker attach b7 --detach-keys="Ctrl-p,Ctrl-q"

Note that docker pause seems to do a [Ctl-Z] kind of SIGSTOP for all the container’s processes causing it to look like it’s hanging. And docker restart does not un-pause! It essentially reboots the container. Sensibly the correct command is unpause. There are the save and load commands that seem pretty sane — these pipe out (yes, on STDOUT by default) a tar of the current system.

Don’t forget about docker logs <ID> to find out more details about what it knows about a container. Looks kind of like shell history by default. And docker history <ID> seems to show all the layers that compose the image.

Images

This is all very similar to the containers but I don’t think you can use them directly. You need to spin up a container using an image as the "lower" directory tree to the container’s "upper" for the composite overlay.

Check what images Docker knows about.

docker images

And if you find some you’d really like to be gone.

docker rmi 99f468406ae7
docker rmi -f 99f468406ae7 # If running this will force a stop first.

Sometimes dk images -a gives you a bunch of stuff like this.

:->(36.3ms)[ip-172-31-11-3:~/X/AE/imgbuild-testROS1]$ dk images -a
REPOSITORY      TAG            IMAGE ID       CREATED SIZE
<none>          <none>         edd792601680   6 minutes ago 6.01GB
<none>          <none>         b9f4b42ada17   6 minutes ago 6.01GB
<none>          <none>         2c22873502e0   6 minutes ago 6.01GB
...

And you are trying to delete things. One thing you can do is find unnecessary images with this command.

docker images --filter "dangling=true" -q --no-trunc

Whatever that produces add it to a rmi operation.

dk rmi sha256:2c22873502e02df38f7a4875f6d6d22e16f87225a853b0d9ffe01faefcb1b553

Or if you’re 100% sure you want to nuke all images in Docker knows about, you can do this.

docker rmi $(docker images -aq) -f

How about deleting all containers?

docker stop $(docker ps -aq)
docker rm $(docker ps -aq)

I think that the stop is only necessary if you need some kind of graceful shutdown. For example, maybe you don’t want to leave scrambled eggs in a shared mount or something; though I’m not really sure how rough the stop is.

Scanning What?

I wondered what exactly the docker scan command was scanning for (using a proprietary module); turns out it is an malware thing. Which is creepy and weird and, who knows, maybe helpful once you get the cruft really pouring in.

Creating And Managing Containers

Something like this can create a container.

docker run -it bash:latest

The -it puts you into an interactive terminal. You can then see this container on the host.

docker ps

This should tip you off to the 12 character hex ID. What if you didn’t manage to connect to the container in an interactive way? Maybe the container is working and everything is set aside for it and it could be doing stuff but it maybe isn’t and you’d like to "switch to" it, whatever that means. Often it means chrooting to the top of the overlay containing this image and getting your cgroups to help you with the illusion that you’re "contained" and then running a shell. This can be done at any time with this.

docker exec -it bfbfcc62f603 bash

Note that this also is how you can log in multiple times to the same running container. There is a similar but subtly different command called attach.

docker container attach bfb

This one just hooks up your STDIO and SDTOUT to the container. If you already executed it with a bash shell, this will pick that up exactly and give you two points of control (e.g. from tmux). I got it to do that anyway. I don’t know which one it chooses if you do that with two or more bash processes running. What’s tricky is that when you want to detach I don’t know how you do that. When you exit the shell, it exits both of them and kind of nerfs the container.

There is a whole giant collection of docker container sub-commands.

inspect
stats
top - Or while sleep 1; do clear && docker top 37990fd13d80; done
kill - Unclear on details; does it send SIGKILL to all processes like pause sends Ctl-Z signals?
rm - Removes containers entirely. I presume it’s stopped and the overlay file system is unlinked.
run - Runs a command, e.g. start a server or ML job.
pause - Sends Ctl-Z type signal to all processes in container.
unpause
stop - Stops Docker action on this container completely.
export - As a tar archive.

Dockerfile

A Dockerfile is a text-based script used to create a container image. Here is the Dockerfile official reference. Highlights:

The first instruction must be FROM.
# starts a comment (only if at beginning). These are entirely removed before execution allowing you to continue on to the next line with back slashes while assuming the comment will be gone.
Not case sensitive but all caps for instructions.
Executed in order.

Example Dockerfile:

FROM ubuntu
# No idea where this goes.
ENV EDITOR=/usr/bin/vim
RUN echo "This runs in /bin/sh -c by default."
RUN /bin/bash -c echo "This can change the shell"
# This also can change the shell.
SHELL /usr/bin/bash
# Only one CMD is valid; only the last one runs.
CMD echo "I think this is like the RUN instruction but this is what \
the container is left with, so /usr/bin/bash could be sane."
# See also ENTRYPOINT.
# This just adds a key value pair(s) to the image.
LABEL version="1.0" original_date="2022-03-26"
LABEL sense="CCW"
# This helps for situations where you want a port to be published.
# This does not actually do that, but allows for it.
EXPOSE 80/udp
EXPOSE 80/tcp
# Creates a mount point with the specified name.
VOLUME /path
# This is the user that the RUN, CMD, and ENTRYPOINT commands use.
USER xed:xed
# This is the directory for the RUN, CMD, ENTRYPOINT.
WORKDIR /tmp
# This allows for customization at build time.
ARG buildtimevar=DefaultValue
# Copy local files on the host to the destination.
COPY /source/path  /destination/path
ADD /source/path  /destination/path
# Additional features of ADD - download external file to destination.
ADD http://external.file/url  /destination/path
# Can decompresses files too.
ADD source.file.tar.gz /destination/path

You’ll often see MAINTAINER show up too, but it is deprecated. To achieve the same effect, use the LABEL instruction. I’m not sure if there’s an official format, but it seems pretty flexible.

LABEL maintainer="Chris X Edwards - xed.ch"

An interesting one is COPY --from=<name>. This copies files from a previous FROM base-image AS <name> build. This can be used to make multi-stage builds. In case a build stage with a specified name can’t be found an image with the same name is attempted to be used instead.

Ok, you have a valid Dockerfile; now what? Now you have to "build" an "image".

tmux  # Would be smart. This is going to take a while and be flaky.
mkdir dockerbuild-my_project
cd dockerbuild-my_project
ln -s SomeDockerFile Dockerfile
docker build -f MyDockerfile -t "xedch:myserverimage" .

It is also possible to create an image by changing files in a container and saving the modified file system as a new image. The commit command creates a new image from a container’s changes.

docker commit \     # Seems to be the same as `docker container commit`
--message "Commit message" \
--author "Chris X Edwards <author at xed.ch>" \
--change "ENV DEBUG=true"
49fca79a5e67 \
xedch/myserverimage:version2

The penultimate argument is the current base container ID. Then the name of the new image. The change option contains Dockerfile syntax for what you’d like to be different.

Creating An Image

When you have a Dockerfile set up (i.e. named Dockerfile and in the current directory) and ready you’ll need to build the container image with something like this.

docker build -t "xedch:testROS1" .

The -t seems to be "tag" and its exact functionality can get complex but pretend this is just how you name the container in simple cases like this.

Docker Hub Images

The whole Docker Hub thing sounds tedious. It seems to be like pastebin but for Docker images and you need to make an account and log in. But the general idea is something like this.

docker tag MyLittleImage ${DockerUserName}/MyLittleImage
docker push ${DockerUserName}/MyLittleImage

And with that there, I guess you can run these images from anywhere with an internet connection. Seems like a massive security hole if you’re trying to keep a complex proprietary computing pipeline somewhat secret.

Persistent Storage

When dealing with Docker it can bring in who knows what and while it can do this somewhat efficiently, it’s easy to get all that efficiency blowing out some small root partition. The first question then is where is this in the file system. As far as I can tell, all Docker file system usage that it natively knows about is under /var/lib/docker. You can do other custom mounts to other interesting places, but if it’s intrinsic to Docker, that’s where to look for it.

Generally Docker containers seem to use some kind of fake file system that vaporizes when the container is removed. Actually there are a few possibilities.

Bind Mounds

Where the containers are hooked up to something in the host’s base lower filesystem. This is similar to how Gentoo installs would --rbind down to the install system’s /proc and so on before a chroot to the new system.

mkdir ~/X/dkfs
cp /tmp/stuff4container2use ~/X/dkfs/
dk run --mount type=bind,src=/home/xed/X/dkfs,dst=/root/dkmnt -it xed:mycon bash

Note that the target volume’s mount point, /root/dkmnt, is created automatically if it doesn’t exist (not sure if it’s a full -p parent kind of thing but definitely one missing level was fine).

Note also that if you put the --mount <arg> option after the -it it throws a cryptic and stupid jumble of inappropriate errors.

docker: Error response from daemon: failed to create shim: OCI runtime
   create failed: runc create failed: unable to start container process:
   exec: "--mount": executable file not found in $PATH: unknown.

So don’t do that!

Volume Mounts

Where the Docker system manages a file system space itself. It is physically located in the /var/run/docker tree. When the container dies, it vanishes. I think. It seems like this doesn’t do much to make data available to the containers. It also seems low performance (docs hint at that by saying bind mounts are "performant").

docker volume create TheImage-data
docker volume inspect TheImage-data
ls /var/lib/docker/volumes/TheImage-data/_data

To use this, use the -v option to mount it when you run a container.

docker run -d -v TheImage-data:/mnt/mtpt_in_container hello-world

Or you can skip a lot of nonsense and use "bind mounts" where you specify both the host path and the target container’s path.

docker run -d -v mtpt_in_host:/mnt/mtpt_in_container hello-world

tempfs

I think this is a way to hook containers up with some private /dev/shm space. Vanishes immediately. Quick but takes RAM.