Resources
-
I was very impressed with this excellent high-quality Docker advice. A lot of good tips here.
The Wrong Docker
On Debian Stretch the docker package is described as follows:
Docker is a docking application (WindowMaker dock app) which acts as a system tray for any desktop environment, allowing you to have a system tray without running the KDE/GNOME panel Docker was designed to work with Openbox 2, but it should work fine in any window manager.
Let that sink in. This is not the Docker you are looking for! That
totally exceeds Red Hat’s
Anaconda
installer naming train wreck! So, do not try apt install docker
.
This is now a transitional package trying to rename their docker
to
wmdocker
. But again, totally unrelated!
What Exactly Is This Crazy Thing?
It looks like it’s the userland controls for some newish Linux kernel features. (Let’s not even waste too much time pondering how Windows people are running it — their reverse WINE? No idea. Don’t care.) The two notable features are cgroups and the overlay file system.
Cgroups, sometimes called Control groups, are to other Linux-managed
resources what ordinary permissions groups are to file access. It seems
there is a lot of fine grained control over these resources too. Old
style unix file group ownership just controls who can access that file
in a boolean way. Cgroups can even effectively replace other quota
mechanisms (thank god). It looks like this
feature can isolate and ration out CPU, memory, disk, and network. The
network is probably instantaneous bandwidth and the disk is probably
total bytes used because that is pragmatic. But while this is great,
I’m not breathing a big sigh of relief that all desirable management
controls are possible because I suspect they are not. I have had
situations where I would have loved to ration out weird things like
disk access bandwidth, not total bytes written.
If you see some non-zero values when you look at /proc/cgroups
then
you’re probably using this feature. Note that on the host system,
all the processes from the cgroups are visible and together in the
kernel’s master process list; their PID numbers are completely
different.
The other recent addition to the kernel that makes Docker possible is
the overlay filesystem. Like the old aufs and unionfs, this is a union
file system strategy. I am not sure what improvements exactly have
been made but my guess is that this new system is much better at
stacking many layers of file systems and also mounting to arbitrary
points on a file system tree (not just top levels of volumes). It also
can be exported over NFS. If you feel like you do not need anything
but a way to manage installation cruft of dependency hogs, you might
not need Docker at all; it’s possible that just sensibly using overlay
mounts can suffice. Ubuntu has the
overlayroot-chroot
program which helps out doing the --rbind
mounts for /proc/
,
/run/
/sys
, etc, just like we used to do
for Gentoo installs. Not sure about where /dev
gets taken care of.
Sometimes I hear about "namespaces" being a thing. But that is kind of
vague as in this
official description: "Docker uses a technology called namespaces to
provide the isolated workspace called the container. When you run a
container, Docker creates a set of namespaces for that container."
Circular definitions are definitions that are circular! Thanks,
Docker! I think it just entails Docker doing things with remapping
users and groups into other identities. The
subuid man page
is interesting. It may be a new (to me) extension of shadow-utils that
allows for "subordinate user ids" i.e. a list of allowable ID
remappings usually found in /etc/subuid
and /etc/subgid
.
I suspect a docker "container" is really just a cgroup configuration
organized and set up plus an overlay filesystem over some base image’s
file system plus a chroot
.
Note that Docker is a privileged daemon process and anyone who can run it (any normal user who can create or run a container) can probably wreck the host system with an escalation. This is a good example and discussion of security. A process running inside the container might be able to be a bit more constrained, but be careful about such thinking.
Installation
Installation is quite unnerving for security sensitive applications. While entirely unremarkable to Windows (and Apple) users, this extremely privileged software needs you to give docker.com complete control of your system to do basically anything!
In theory there are ways to
manage
docker as a non-root user which may slightly help with security. But
probably not since it misses the point of where I’m insinuating the
attack surface really is. The real utility of adding your users to a
docker
group (helpfully set up by the Ubuntu package) is to spare
them tedious sudo
requirements. I don’t think it really contributes
to meaningful security in any way. Still, it does in fact make using
the tools a lot less annoying.
DOCKERURL=https://download.docker.com/linux/ubuntu
DOCKERKEYURL=${DOCKERURL}/gpg
wget -qO- $DOCKERKEYURL | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] ${DOCKERURL} focal stable"
Once you are trusting them to do literally anything they want to your 1s and 0s, you should be able to pretend like the rest of the installation is easy! I think the "ce" here is "container engine"; we can go with that as a mnemonic anyway. Or maybe it’s "community engine" vs. the enterprisey version.
sudo apt update
sudo apt install docker-ce
sudo gpasswd -a xed docker
Status Testing
Wow. Doing docker run --help
took 3.6 seconds on my system. YMMV.
Here are some systemd commands and such that can help check if things are automagically working more or less.
systemctl is-active docker
systemctl is-enabled docker
systemctl status docker
docker version
docker ${DOCKER_CMD} --help # To get sub-command specific help.
Running A Simple Image
The official documentation clarifies important terminology: "A Docker registry stores Docker images. Docker Hub is a public registry that (in theory) anyone can use, and Docker is configured to look for images on Docker Hub by default."
To use Docker you need images. Images are like classes — a Docker container is an instance of an Docker image. Images are writeable layers stacked onto the container’s fixed layers.
docker pull TheImage
docker run -d --name TheImage-instance -p 80:80 TheImage
The -p
can map any needed ports (or they might be invisible outside
the container). The -d
is detach, like a background daemon.
Official
documentation for run
command. The volume created by the middle
command probably needs some kind of hook up on the run, but it’s
something like that. That also shows where the volumes really live.
Symlink it if you want access from a more convenient place.
How is the whole Docker show going? Here are some more diagnostics and a full test cycle.
docker info
docker images
docker run hello-world # An official test image.
docker images
docker rmi -f feb5d9fea6a5
That hex string identifier came from the "IMAGE ID" field of images sub-command.
Another fancier test.
docker run -d -p 80:80 docker/getting-started
Then you can point a browser at that machine (using http — not https!) and see a fancy web server serving a docker demo (obviously you’ll probably be able to find this on the real www but that’s not the point I guess). Unfortunately it tells you right away about how to access the "[Docker Dashboard]…for either Mac or Windows." Linux thing does not support Linux. Love it.
Another good test image that may be more useful generally.
docker run bash:latest
Note here that bash
is the value of the REPOSITORY
field and
latest
is the TAG
as viewed in docker images
.
Multifarious Ways To Activate Containers
Run
-
Creates a writeable container over the specified image.
-
Must start with an image.
-
--attach
or-a
possible. -
Similar to
create
followed bystart
. -
"…the container process that runs is isolated in that it has its own file system, its own networking, and its own isolated process tree separate from the host." From Docker run reference.
-
--interactive
or-i
, and--terminal
or-t
are possible and often go together. Or use-a
to get specific streams likedocker run -a stdin -a stdout -i -t ubuntu /bin/bash
. -
--restart=<mode>
can beno
,always
,unless-stopped
,on-failure
. -
--rm
will get rid of the upper overlay used by the container. This means you can’t get any data out of it (debug?) but it won’t be clogging things up either. -
--network=host
will allow the host network to be visible on the container.
There are also all kinds of ways to limit resources. Even stuff like
--device-write-bps=
to rate limit block devices.
Exec
-
Runs a new command in a running container.
-
Can not be paused — must be running.
-
Needs to be a single command, so the shell command is needed with stuff like this
sh -c "cat A && cat B"
. -
--ti
interactive tty is possible. -
--user
possibly useful for those processes that don’t like you running as root. -
If you want to see what is going on in a container shell, see
attach
which will give you a duplicate connection to it. Exec let’s you open multiple terminals on the same container. For example, this is handy running something like ROS with all it’s many pieces that have to be up simultaneously.
Start
"Start one or more stopped containers"
"A stopped container can be restarted with all its previous changes
intact using docker start
." Why is this phrase in the documentation
for run?
-
--ia
interactive and attach - but not TTY for some reason.
Restart
-
Kills a currently running container and then starts it again.
-
Note that it doesn’t attach automatically and I don’t see that as an option.
Attach
-
Attach normal STD streams to your local normal ones (that you’re likely using to type and see action).
-
Pressing Ctrl-C while attached seems to break (send SIGKILL) the attachment rather than whatever was going on inside. So if you were running a shell and wanted to stop a sub-process Ctrl may just kill the whole container run. To prevent this, check out
--sig-proxy
. -
To get out of an "attachment" quietly while leaving it running, use the magic key combo of [Ctrl+p,Ctrl+q].
-
If you’re doing some interactive thing in one terminal and then
dk attach <hashid>
over in another, both terminals will control the original attachment simultaneously. If this is not what you want seeexec
Unpause
I can’t remember exactly what SIGs we’re talking about but it’s
something like SIGSTOP for docker pause <id>
and SIGCONT for this
unpause action. This is will allow the process to be resumed. I do not
think this survives reboot of course.
Behavior
What exactly is Docker doing? It’s mostly faking you out into thinking you’re looking at a system when you’re only looking at some part of the system filtered by Linux namespaces. Here are some things to try both in a Docker created environment and also in the host environment. You can do this in a tmux session with two screens. Start by firing up a container.
docker run -i -t ubuntu /bin/bash # Interactive TTY shell of standard Ubuntu image.
Then look at the differences in the host session with stuff like this.
getent passwd # What users are present.
ps -ef # Note that container processes are visible on the host but the PIDs are (likely) different!
cat /etc/hosts
cat /etc/resolv.conf
netstat -lntu
mount | wc -l
ls /proc
ifconfig -a # apt install net-tools
date # Yes, this can be different at least the TZ.
date > /dev/shm/{docker,real} # Then ls these.
Here is the host system looking at the PID of a container’s namespace giving some hints about precisely what is isolated.
$ sudo ls -al /proc/4995/ns/
total 0
dr-x--x--x 2 root root 0 Mar 26 14:51 .
dr-xr-xr-x 9 root root 0 Mar 26 14:10 ..
lrwxrwxrwx 1 root root 0 Mar 26 14:51 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 mnt -> 'mnt:[4026531840]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 net -> 'net:[4026531992]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 time -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 time_for_children -> 'time:[4026531834]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 Mar 26 14:51 uts -> 'uts:[4026531838]'
Since Docker is using cgroups there are some possibilities related to resource limiting (like classic quotas), prioritization (like classic nice), accounting (like classic pacct), control (freezing and suspending groups).
Management
I’m kind of thankful the automagical graphical interface has no support for Linux. Not a problem.
There are two fundamentally different things that need management: images and containers. (Let’s ignore cgroup fancy jails for now — like 99% of Docker users do.) Think of an image like an old fashioned installation CD. You can’t really do much with that. If you want to change it, that’s kind of an involved PITA usually requiring a new thing. But once you get it the way you like it, you can now install that CD’s OS on as many machines as your patience allows. Each of the machines you install to will be "running that" OS and it is its own independent thing. In Docker, this is all done with overlay filesystems. There are overlay file systems for the base "install CDs" and there are more overlays for the "installations". These are images and containers respectively.
Containers
To see what containers are in play check with this.
docker ps # Only shows _running_ containers.
docker container ls # Very similar or possibly the same.
docker ps -a # Show stopped ones too.
docker container ls -a # Yup, same still
Note that the silly names you might see in the "NAMES" field (e.g. "happy_hopper") are auto-generated by Docker if you don’t provide one.
Something running you didn’t think was and didn’t want to be?
docker stop 99f468406ae7
The hash looking ID is shown in the ps
output. These can be
shortened as much as you like as long as they remain unambiguous.
Ok, it’s stopped, but is it really gone? Probably not. To get rid of
this overlay filesystem that contains a container, use the rm
command.
docker rm 99f468406ae7
docker rm -f 99f468406ae7 # If running this will force a stop first.
Need to get rid of a big long list of containers?
dk ps -a |tail -n+2|cut -c 1-4|while read C; do echo $C; docker rm $C; done
Also you can get rid of all stopped containers with this.
docker container prune
Very commonly I find that I have a stopped container and I actually
want to continue using it. First remember that just because you get
nothing in docker ps
doesn’t mean there’s nothing there — remember
to pretty much always use -a
for commands like this.
$ dk ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
$ dk ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b736e1121eec xed:base-cool "bash" 31 hours ago Exited (127) 6 seconds ago happy_hopper
Ok, so you’ve found a stopped container and you’d like to use it. Note that it can be referred by some shortening of its container ID.
docker start b7
Now nothing really seems to happen but docker ps
now shows this
container even without the -a
and its status field is now "Up 6
seconds" or something like that. But you’re still not using it or
doing anything useful. Or maybe you are. If it is running some server
and the "PORTS" field shows them mapped out properly, then you may be
all set. But if you want to use a shell on this container, you’ll need
to attach to it.
docker attach b7
You can also start and attach at the same time with the -a
option of
start.
You can also run as a different user. Note that just attaching as a
different user is kind of bad thinking since the container is already
running as whatever user it is — you’re just really hooking up
STDIN/STDOUT to it, whatever it is. The run
command runs a command
in a new container (a shell if unspecified).
docker run --attach --user xed imagename /usr/bin/bash
Hmm. Just tried run and it left the container status as "created". I have to also "start" it apparently.
When you exit out of that shell with a normal exit
command, it seems
to stop the action of the container. I think technically, the shell
or highest parent process is killed. If you want that to nohup then
you can exit with a "detach keyboard sequence". This is [ctrl+p,ctrl+q]
and if you don’t like that for some reason, you can configure it in
~/.docker/config.json
by setting the value of detachKeys
. I think
the normal setting would be {"detachKeys": "Ctrl-p,Ctrl-q"}
. This
can also be passed in at attach time.
docker attach b7 --detach-keys="Ctrl-p,Ctrl-q"
Note that docker pause
seems to do a [Ctl-Z] kind of SIGSTOP for
all the container’s processes causing it to look like it’s hanging.
And docker restart
does not un-pause! It essentially reboots the
container. Sensibly the correct command is unpause
. There are the
save
and load
commands that seem pretty sane — these pipe out
(yes, on STDOUT by default) a tar of the current system.
Don’t forget about docker logs <ID>
to find out more details about what
it knows about a container. Looks kind of like shell history by default.
And docker history <ID>
seems to show all the layers that compose
the image.
Images
This is all very similar to the containers but I don’t think you can use them directly. You need to spin up a container using an image as the "lower" directory tree to the container’s "upper" for the composite overlay.
Check what images Docker knows about.
docker images
And if you find some you’d really like to be gone.
docker rmi 99f468406ae7
docker rmi -f 99f468406ae7 # If running this will force a stop first.
Sometimes dk images -a
gives you a bunch of stuff like this.
:->(36.3ms)[ip-172-31-11-3:~/X/AE/imgbuild-testROS1]$ dk images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> edd792601680 6 minutes ago 6.01GB
<none> <none> b9f4b42ada17 6 minutes ago 6.01GB
<none> <none> 2c22873502e0 6 minutes ago 6.01GB
...
And you are trying to delete things. One thing you can do is find unnecessary images with this command.
docker images --filter "dangling=true" -q --no-trunc
Whatever that produces add it to a rmi
operation.
dk rmi sha256:2c22873502e02df38f7a4875f6d6d22e16f87225a853b0d9ffe01faefcb1b553
Or if you’re 100% sure you want to nuke all images in Docker knows about, you can do this.
docker rmi $(docker images -aq) -f
How about deleting all containers?
docker stop $(docker ps -aq)
docker rm $(docker ps -aq)
I think that the stop
is only necessary if you need some kind of
graceful shutdown. For example, maybe you don’t want to leave
scrambled eggs in a shared mount or something; though I’m not really
sure how rough the stop is.
Scanning What?
I wondered what exactly the docker scan command was scanning for (using a proprietary module); turns out it is an malware thing. Which is creepy and weird and, who knows, maybe helpful once you get the cruft really pouring in.
Creating And Managing Containers
Something like this can create a container.
docker run -it bash:latest
The -it
puts you into an interactive terminal.
You can then see this container on the host.
docker ps
This should tip you off to the 12 character hex ID. What if you didn’t manage to connect to the container in an interactive way? Maybe the container is working and everything is set aside for it and it could be doing stuff but it maybe isn’t and you’d like to "switch to" it, whatever that means. Often it means chrooting to the top of the overlay containing this image and getting your cgroups to help you with the illusion that you’re "contained" and then running a shell. This can be done at any time with this.
docker exec -it bfbfcc62f603 bash
Note that this also is how you can log in multiple times to the same
running container. There is a similar but subtly different command
called attach
.
docker container attach bfb
This one just hooks up your STDIO and SDTOUT to the container. If you
already executed it with a bash shell, this will pick that up exactly
and give you two points of control (e.g. from tmux). I got it to do
that anyway. I don’t know which one it chooses if you do that with
two or more bash processes running. What’s tricky is that when you
want to detach I don’t know how you do that. When you exit
the
shell, it exits both of them and kind of nerfs the container.
There is a whole giant collection of docker container
sub-commands.
-
inspect
-
stats
-
top
- Orwhile sleep 1; do clear && docker top 37990fd13d80; done
-
kill
- Unclear on details; does it send SIGKILL to all processes likepause
sends Ctl-Z signals? -
rm
- Removes containers entirely. I presume it’s stopped and the overlay file system is unlinked. -
run
- Runs a command, e.g. start a server or ML job. -
pause
- Sends Ctl-Z type signal to all processes in container. -
unpause
-
stop
- Stops Docker action on this container completely. -
export
- As a tar archive.
Dockerfile
A Dockerfile is a text-based script used to create a container image. Here is the Dockerfile official reference. Highlights:
-
The first instruction must be
FROM
. -
#
starts a comment (only if at beginning). These are entirely removed before execution allowing you to continue on to the next line with back slashes while assuming the comment will be gone. -
Not case sensitive but all caps for instructions.
-
Executed in order.
Example Dockerfile:
FROM ubuntu
# No idea where this goes.
ENV EDITOR=/usr/bin/vim
RUN echo "This runs in /bin/sh -c by default."
RUN /bin/bash -c echo "This can change the shell"
# This also can change the shell.
SHELL /usr/bin/bash
# Only one CMD is valid; only the last one runs.
CMD echo "I think this is like the RUN instruction but this is what \
the container is left with, so /usr/bin/bash could be sane."
# See also ENTRYPOINT.
# This just adds a key value pair(s) to the image.
LABEL version="1.0" original_date="2022-03-26"
LABEL sense="CCW"
# This helps for situations where you want a port to be published.
# This does not actually do that, but allows for it.
EXPOSE 80/udp
EXPOSE 80/tcp
# Creates a mount point with the specified name.
VOLUME /path
# This is the user that the RUN, CMD, and ENTRYPOINT commands use.
USER xed:xed
# This is the directory for the RUN, CMD, ENTRYPOINT.
WORKDIR /tmp
# This allows for customization at build time.
ARG buildtimevar=DefaultValue
# Copy local files on the host to the destination.
COPY /source/path /destination/path
ADD /source/path /destination/path
# Additional features of ADD - download external file to destination.
ADD http://external.file/url /destination/path
# Can decompresses files too.
ADD source.file.tar.gz /destination/path
You’ll often see MAINTAINER
show up too, but it is deprecated. To
achieve the same effect, use the LABEL
instruction. I’m not sure if
there’s an official format, but it seems pretty flexible.
LABEL maintainer="Chris X Edwards - xed.ch"
An interesting one is COPY --from=<name>
. This copies files from a
previous FROM base-image AS <name>
build. This can be used to make
multi-stage
builds. In case a build stage with a specified name can’t be found an
image with the same name is attempted to be used instead.
Ok, you have a valid Dockerfile
; now what? Now you have to "build"
an "image".
tmux # Would be smart. This is going to take a while and be flaky.
mkdir dockerbuild-my_project
cd dockerbuild-my_project
ln -s SomeDockerFile Dockerfile
docker build -f MyDockerfile -t "xedch:myserverimage" .
It is also possible to create an image by changing files in a
container and saving the modified file system as a new image.
The commit
command creates a new image from a container’s changes.
docker commit \ # Seems to be the same as `docker container commit`
--message "Commit message" \
--author "Chris X Edwards <author at xed.ch>" \
--change "ENV DEBUG=true"
49fca79a5e67 \
xedch/myserverimage:version2
The penultimate argument is the current base container ID. Then the name of the new image. The change option contains Dockerfile syntax for what you’d like to be different.
Creating An Image
When you have a Dockerfile set up (i.e. named Dockerfile
and in the
current directory) and ready you’ll need to build the container image
with something like this.
docker build -t "xedch:testROS1" .
The -t
seems to be "tag" and its exact functionality can get complex
but pretend this is just how you name the container in simple cases
like this.
Docker Hub Images
The whole Docker Hub thing sounds tedious. It seems to be like pastebin but for Docker images and you need to make an account and log in. But the general idea is something like this.
docker tag MyLittleImage ${DockerUserName}/MyLittleImage
docker push ${DockerUserName}/MyLittleImage
And with that there, I guess you can run
these images from anywhere
with an internet connection. Seems like a massive security hole if
you’re trying to keep a complex proprietary computing pipeline
somewhat secret.
Persistent Storage
When dealing with Docker it can bring in who knows what and while it
can do this somewhat efficiently, it’s easy to get all that efficiency
blowing out some small root partition. The first question then is
where is this in the file system. As far as I can tell, all Docker
file system usage that it natively knows about is under
/var/lib/docker
. You can do other custom mounts to other interesting
places, but if it’s intrinsic to Docker, that’s where to look for it.
Generally Docker containers seem to use some kind of fake file system that vaporizes when the container is removed. Actually there are a few possibilities.
Bind Mounds
Where the containers are hooked up to something in the host’s base
lower filesystem. This is similar to how Gentoo installs would
--rbind
down to the install system’s /proc
and so on before a
chroot
to the new system.
mkdir ~/X/dkfs
cp /tmp/stuff4container2use ~/X/dkfs/
dk run --mount type=bind,src=/home/xed/X/dkfs,dst=/root/dkmnt -it xed:mycon bash
Note that the target volume’s mount point, /root/dkmnt
, is created
automatically if it doesn’t exist (not sure if it’s a full -p
parent
kind of thing but definitely one missing level was fine).
Note also that if you put the --mount <arg>
option after the -it
it throws a cryptic and stupid jumble of inappropriate errors.
docker: Error response from daemon: failed to create shim: OCI runtime
create failed: runc create failed: unable to start container process:
exec: "--mount": executable file not found in $PATH: unknown.
So don’t do that!
Volume Mounts
Where the Docker system manages a file system space itself. It is
physically located in the /var/run/docker
tree. When the container
dies, it vanishes. I think. It seems like this doesn’t do much to make
data available to the containers. It also seems low performance (docs
hint at that by saying bind mounts are "performant").
docker volume create TheImage-data
docker volume inspect TheImage-data
ls /var/lib/docker/volumes/TheImage-data/_data
To use this, use the -v
option to mount it when you run a container.
docker run -d -v TheImage-data:/mnt/mtpt_in_container hello-world
Or you can skip a lot of nonsense and use "bind mounts" where you specify both the host path and the target container’s path.
docker run -d -v mtpt_in_host:/mnt/mtpt_in_container hello-world
tempfs
I think this is a way to hook containers up with some
private /dev/shm
space. Vanishes immediately. Quick but takes RAM.