Linux. What’s not to like? It seems odd that I didn’t have a Linux help document after 17 years of using it exclusively, but the fact is that almost all of my notes are about the Linux lifestyle. See my kernel notes or my Gentoo notes, for example. This document will be for more generic broad reaching system level things.

What Happens When You Type cat /etc/passwd?

Before some interviews I had a look to see what kinds of things were typically asked. One of the common ones was "What exactly happens when you type cat /etc/passwd?" (Or basically execute a command.) Perhaps ironically I was somewhat surprised to get this unoriginal rote question. What really bugged me was that I was dismissed from consideration as "not knowing enough about Linux" because of this question (the only one that was asked about Linux). My answer was "Well, I don’t really know. But it involves some stuff like fork() and exec() and [a ton of other details I could remember]." I now feel like the correct part about that really was the "I don’t know" since I’m pretty confident that nobody really does.

Wanting to explore why I felt that "I have almost no clue" is actually a better answer than the ones that are typically advised, I looked into the topic in more detail. What follows is my limited and perhaps heavily flawed account of what happens when commands are run. The one thing I am certain of is that I’m glossing over 99.999% of it. There are also probably outright counterfactual errors. But I think this is sufficient to establish that, yes indeed, it is complicated, and that I have given it some thought.

Since writing this, this more serious look at the topic was published.

Bash

Let’s assume by "command line" we’re talking about the default that’s been in every Linux distro I’ve ever seen - this means the software you were just using is the Bash shell. Here are some of the things I believe Bash does in preparation to dispatch a program to be run.

  • Check $POSIXLY_CORRECT, shopt.

  • Do some mystical locale stuff - $LC_ALL, $LC_COLLATE, $LANG?

  • Check history, $HISTIGNORE, $histchars, history|wc -l < $HISTSIZE.

  • Checks for globbing, parameter expansion, path expansion, subshell expansion, brace expansion, tilde expansion, arithmetic expansion, et al.

  • Check for alias expansion.

  • Checks for syntax (ex. comment #, end of line \, compound ;, unclosed quotes).

  • checks for $IFS, splits words

  • Looks for functions and then builtins.

  • Check for and set up redirection, here docs, here strings, file descriptors, et al.

  • Check for pre-command variable assignments.

  • Sets $BASH_COMMAND, $BASH_ARGC, $BASH_ARG.

  • $PATH, look up in command hash (associate with builtin hash).

  • If the command is not present, the command_not_found_handle function is called if it exists.

  • If the command is found it is checked to see if it is in a directly executable format (probably with stat(2) system call).

    • Is the file empty?

    • Is it some sort of special file (sockets, symlinks, named pipe)?

  • Variables marked for export or specified with the command are prepared for insertion into the new execution environment. Also $_ is set.

  • The shell attempts to execute the command.

Note that the shell can check for interpreter scripts but Linux’s execve does it (and presumably doesn’t fail) before Bash tries.

fork (Or More Accurately clone)

When this esoteric topic comes up, it is usually talked about as "forking" which may not be entirely correct. There was a time (when textbooks were written?) when this was true and there was a fork() system call. Today (and even as early as 2001) on Linux, the preferred function is clone. fork() (from glibc) is a wrapper for the clone() glibc function. This in turn finally calls the Linux sys_clone system call. This call differs from fork in that the child process can share parts of its execution context with the calling process (though when used as a fork replacement it does not do much of this). This has important implications in creating threads with clone which are not important in running programs.

Here are some things that I believe clone does.

  • Checks if process limit is reached.

  • Determines a unique process ID for the child.

  • The entire memory state of the parent process is duplicated with the following exceptions.

    • pending signals (sigpending)

    • semaphore adjustments (semop)

    • process resource utilizations (getrusage(2))

    • CPU time counters (times(2)) (set to zero in the child)

    • directory change notifications (dnotify)

    • calls madvise() and don’t copy memory mappings that it marks with the MADV_DONTFORK flag

  • Child process gets it’s PR_SET_PDEATHSIG reset so it does not recveive a signal when its parent terminates.

  • Clone returns PID to parent and returns 0 to child.

exec (Or more accurately execve)

Once a new process is running it needs to change from what the parent was doing to what the child is supposed to be doing. This is done by the exec family of calls which replaces the current stuff going on in the process with the desired program’s stuff. The ve in execve indicates that this version of the call (there are many but this is dominant) includes an argument vector (i.e. argv) and an environment (i.e. envp). Here are some of the things I believe execve does.

  • Make sure environment is not too large.

  • Make sure target code is a proper file.

  • Make sure process, and then system, does not have too many files open.

  • Make sure the target code file is not open for writing.

  • Checks the program file’s suid bit and conditionally the file system’s ‘MS_NOSUID` flag, sets calling process’ effective user ID. Same with group.

  • May check if the calling process is being ptraced.

  • Checks if executable is a.out. It shouldn’t be.

  • Check if executable is ELF; if so the interpreter (typically /lib/ld-linux.so.2 when linked with glibc 2) named in the PT_INTERP segment is used to load the needed shared libraries.

  • The executable is loaded or located in memory.

  • Process attributes are "preserved" with some exceptions like memory mappings, timers, semaphores, directory streams, memory locks, exit handlers, floating-point environment (per POSIX.1-2001).

  • All threads but the calling one are destroyed. Mutexes, pthread objects are also cut loose.

  • Signals are reset to ignore.

  • Any IO operations are canceled.

  • File descriptors may be closed but this should not be assumed.

  • The target code is executed.

  • A default locale is set.

  • execve does not return on success.

cat Command Actually Running

Now that the command’s compiled code is loaded into memory and being executed by the processor, things are simple. Not. If you think you know exactly what’s going on internally on a simple command like cat, you are most likely wrong. Fortunately there is a way to not rely on our limited human capacity to mentally emulate computers to find this out.

strace cat /etc/passwd
execve("/bin/cat", ["cat", "/etc/passwd"], [/* 38 vars */]) = 0
brk(0)                                  = 0x1912000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f399cad7000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=132403, ...}) = 0
mmap(NULL, 132403, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f399cab6000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\37\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1852120, ...}) = 0
mmap(NULL, 3966008, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f399c4ee000
mprotect(0x7f399c6ad000, 2093056, PROT_NONE) = 0
mmap(0x7f399c8ac000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1be000) = 0x7f399c8ac000
mmap(0x7f399c8b2000, 17464, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f399c8b2000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f399cab5000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f399cab3000
arch_prctl(ARCH_SET_FS, 0x7f399cab3740) = 0
mprotect(0x7f399c8ac000, 16384, PROT_READ) = 0
mprotect(0x60a000, 4096, PROT_READ)     = 0
mprotect(0x7f399cad9000, 4096, PROT_READ) = 0
munmap(0x7f399cab6000, 132403)          = 0
brk(0)                                  = 0x1912000
brk(0x1933000)                          = 0x1933000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=7212544, ...}) = 0
mmap(NULL, 7212544, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f399be0d000
close(3)                                = 0
fstat(1, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
open("/etc/passwd", O_RDONLY)           = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=1914, ...}) = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 65536) = 1914
write(1, "root:x:0:0:root:/root:/bin/bash\n"..., 1914) = 1914
read(3, "", 65536)                      = 0
close(3)                                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?

Clean up

  • Shell will set the exit status parameter $? (maybe others) to the exit value of the command.

Threads? Processes?

What Exactly Happens When Linux Boots?

A good start on this one can be found on the sysresccd page discussing initrd issues.

And this excellent discussion of UEFI can help tame that monster.