Motivation

Did you just start a job where you now have a Linux account? Do you own a Mac? These are both good reasons for people who otherwise never expected to worry about Unix to learn a little bit about it. It turns out that the Unix way of doing things isn’t confined to Unix or Linux. Since Apple products don’t intentionally suck, there is a pretty complete collection of Unix tools pre-installed on every Apple that make it practically as functional as Linux. If you didn’t know this, it’s like discovering your car has a second engine that doubles its power.

It seems I’m not alone in being sad about the untapped power of your computer. Some guys at MIT have a very nice on-line resource called The Missing Semester of Your CS Education. They are implying that even (especially) computer science majors are graduating with a painfully deficient education. Their motivation for the class is very much aligned with mine.

Note For Mac Users

It is not strictly necessary, but when exploring Unix on a Mac, the experience is enhanced by having all resources available installed. To start with it is good to install Apple’s developer tools known as Xcode. I found on a Mavericks OS X (10.9, I think) in 2014 when it was the latest and greatest that I was able to install Xcode by simply opening a terminal and typing gcc. Since the Gnu Comipler Collection was missing, a fancy GUI box opened and asked if I wanted Xcode to automagically be installed. After Xcode is successfully installed, the next thing to have to make your Apple system really useful is Homebrew which is a package manager for Macs that makes installing the things Apple neglected to quite easy. Just go to the Homebrew page and cut and paste the line they show for "Install Homebrew". After providing your system password one or many times, it’s ready to go. Then you can do things like brew install wget to install the very useful program wget. Here are a few of my favorite free software packages that Apple should have included, but didn’t, which you can get with brew: mercurial, cvs, source-highlight, imagemagick, asciidoc.

Windows, unfortunately, does intentionally suck and that has forced clear thinking people to seek alternative solutions. Fortunately there are many.

The most traditional solution to the severe shortcomings of Windows is Cygwin. There are also some newer solutions such as GNUWin32 and Gnu On Windows and Babun. You could also just run genuine Linux in a virtual machine. Or run a genuine Linux on a USB memory stick.

Actually these days Windows supports Bash natively and a bunch of Unix tools can be easily installed using Debian. That’s great news for anybody stuck with Windows.

From about the late 1960s to the mid 1990s, having the power of Unix at your command was generally rather expensive. However, that was 20 years ago. People have been getting comfortable with Unix for 40 years. By now, thanks to Linux, there is no excuse for it to cost, well, anything. This is why if you are not taking advantage of its power, you are missing out.

Some people might be skeptical that a 40+ year old computer technology could be useful in the modern world. Keep in mind that pretty much all the serious software you have ever used was written in a programming language whose general syntax was first formulated about the same time. The main principles of computer science have not radically changed since being established.

Unix methodology allows you to get beyond the clumsy wasteful interfaces designed to help people who know nothing about computers. If you know nothing about computers, maybe you’re not ready for Unix. But if your job or college major is in something involving data or computation (which today is most everything) then it is often valuable to know something about your most important tool and how to use it to best effect.

With Unix you can powerfully find things. You can organize things very unambiguously. Unix file features are very sophisticated and solve many difficult problems.

Even better, you can apply the maximum resources of your computer to the actual problem you’re trying to solve. This may seem unimportant unless you realize that in normal consumer computers most of the computing power, arguably the most expensive aspect, goes towards supporting interface infrastructure and not whatever it is that you wanted a computer to compute in the first place. If you’re just trying to log into Facebook, this is fine, but if you really want your computer to produce useful work, it can be problematic.

Unix also allows you to automate things so that the computer works when you don’t. Finally, you can communicate and share things easily; the defining hallmark that separated the original personal computers from expensive computers was the "feature" that they did not communicate with each other over a network, they were "personal". It’s not hard to see why that was a dead end.

Connecting

The first thing you might have to do to start using Linux is log into your account. The way Linux and Mac people log into other Linux and Mac machines is to simply open up a terminal and type:

$ ssh chris@xed.example.com

In this case chris is my username and xed.example.com is the machine I want to remotely use.

Windows users can install an ssh client that can do this too.

Shells

The ssh command stand for "Secure SHell". This means that you want a "shell" to a remote operating system that is securely transferred, i.e. no eavesdropping. But what is this "shell" business? In a human conversation your ears and mouth are like a "shell" to your brain. It’s called a shell because it’s on the outside and regulates what comes in and goes out. A telephone would be analogous to a remote ssh connection. Through the shell you tell the computer what you want and it tells you anything it thinks you should know.

Strictly speaking the shell is an abstract part of the system that mediates how the OS interacts with the user. In practice, shells are often run in a "terminal" also called a "console". This is a program that hosts a shell and actually draws something on your screen to interact with. Some very primitive terminals exist that just pass the shell output to the text screen and pass key presses to the shell. But fancy ones have colors, selectable fonts, scrollbars, and are resizable, etc.

It’s important to note here that if you’re not used to fixed width fonts, that is fonts where the "W" is as wide as the "i", then it’s time to change that. The kind of feedback a computer wants to show you is far more likely to resemble regular matrices than the kinds of normal printing typeface idiosyncrasies inspired by human handwriting such as kerning. The basic point is to make sure that if you type…

iiiiii
wwwwww

…they turn out to be the same length.

bash vs tcsh

Note
This was written when I was working for a lab which heavily used tcsh. Today the discussion is unchanged but it is more likely that you’ll be considering bash vs. zsh which is what Apple recently switched to. See this blog post I wrote in April 2020 for the full story. Feel free to mentally substitute zsh for tcsh for the rest of this section.

The next issue is bash vs. tcsh. If your job puts you in a culture surrounded by people who are all happily using tcsh, then you may have to be comfortable with that and you certainly need to understand what the difference is. Everyone else can skip ahead and take bash for granted. The reason for this is that the whole world has pretty well standardized on bash as a default. Apple started out with tcsh but has since moved to bash.

Basically bash and tcsh are both programs that implement shells. They are quite similar but there are important differences. If you’re a tcsh user and find yourself using a bash shell, typing tcsh will often put you back into your favorite shell. An important tip is that this almost always works the other way, if someone’s given you a tcsh account, typing bash will get you a bash shell.

To demystify it a bit, bash is a derivative or extension of the ancient Unix shell called simply "sh", sometimes referred to as the "Bourne shell". Turns out that sh is still a valid command and you can still run this primitive shell on pretty much all Unix systems. The derivative shell, bash, is short for "Bourne Again SHell". On the other side of the spectrum is tcsh which stands for "The C SHell". Notice how both compete fiercely for snazziest pun. Unix is like that. Get used to it. Anyway, in theory, tcsh embodies more syntax elements borrowed from the C programming language.

If you’re very interested in the differences between bash and tcsch, this discussion highlighting the differences is quite detailed. Or maybe this classic discussion will be interesting to you.

Running Commands

Primarily what a shell does is it allows a user to specify commands which it then causes the operating system to actually execute. When you first log in or start a terminal, the shell confirms its presence with a "prompt". This is a character that is supposed to be the user’s cue or prompt to do something. If you don’t see the prompt then it is not prompting you to do something and you probably need to wait until something is finished. The prompt is conventionally $ (or > in tcsh) but it could be whatever you set it to. At the prompt, you can type commands. Here’s a nice example of that in action:

$ cal 7 2011
      July 2011
Su Mo Tu We Th Fr Sa
                1  2
 3  4  5  6  7  8  9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31
$

The prompt, $, said to do something, I entered the command cal 7 2011 whose function I hope is self evident by the results which then appeared. The prompt appears again inviting you to do something else.

Tab Completion

People new to working with their computers through a command line interface or CLI (as opposed to a GUI, graphic user interface) often panic about how tedious they imagine typing in long complex commands will be. When I was starting out learning Linux I asked myself how likely it was that the smartest computer scientists and programmers, all of whom use a CLI to some extent, would do something flagrantly tedious? The answer is, of course, not too likely. It turns out that entering commands by typing them is quite efficient thanks to a couple of key tricks.

If you’re familiar with the ancient DOS command line you may not know about these tricks as command line input with that system was indeed dreadful. But all modern full-featured shells allow the user to only partially type things and, if the thing is sufficiently unambiguous, magically fill in the rest. This slashes typing by quite a bit. What typing remains is highly specific and important information you are trying to impart to your computer. Since you’re touch typing (it is helpful to learn to type well!) you’ll find that in many situations, you can tell your computer what your intentions are much faster than if you had to fumble for a mouse and go through what is essentially a little target shooting video game with the mouse pointer.

In the previous example, cal is a pretty short command. What if the command were factor. There are lots of commands that start with f so that’s not enough to be unambiguous. There are several that start with fa, so that’s not enough. But there is only one that starts with fac (on my system). Because fac is unambiguous, I can type f then a then c and then tab and the command factor will appear. Then type space and then your argument, which is just a fancy way of saying the thing you want this program to think about when it runs. In the following arbitrary example, I’m showing that the fourth root of 81 is 3.

$ factor 81
81: 3 3 3 3

It turns out that the arguments can also often be "tab completed", generally when they are file names. Here’s an example:

$ file /usr/bin/eject

This command was specified by typing file then space then forward slash then u then tab then b then tab then ej followed by a final tab. This sequence specifies the parts of the location of this file named eject. That’s 12 keys instead of 19, or less than 5 seconds for a very slow typist. These parts are separated by forward slashes. If you’re used to DOS, you might think that something like C:\Trash contains a "slash". Not so. That is a "backslash". Get used to the terminology of "slash" meaning /. This is important in Unix which uses proper slashes to help organize files.

Getting Help With man

A very important Unix trick to know is the man command. This is short for "manual" and it is the key to not having to memorize all of the arcane details of every Unix command. Most commands and programs have a "man page" which is an explicit record of all the command’s acceptable syntax. This includes a reference list of all of the command’s options, a description of its output and a description of what is required for input. An example:

$ man factor

This shows the man page for the factor command. Use arrows to navigate and q to quit viewing the man page.

Another trick for getting help is to try the -h option. Many commands adhere to the convention of interpreting the -h option to mean that the user wants some information, albeit very terse, about how the program is used. Here’s an example:

$ file -h
Usage: file [-bcikLhnNsvz] [-f namefile] [-F separator] [-m magicfiles] file...
       file -C -m magicfiles
Try `file --help' for more information.

As it says, using the long form option, --help produces even more help. This is very important in programs that use the -h option for something else like (often "human readable" as in ls or df).

$ ls --help

This produces a pretty comprehensive summary of what options ls can take.

History

The other critical trick that the command line shell has is called "history". Even if forced to type a complicated command sequence, Unix experts would feel foolish doing that twice. The shells remember hundreds of the most recent command sequences you send to the computer and can reuse them. The easiest way to use history is to use the up and down arrows. The up arrow puts the most recent command on the command line. You can edit it and resubmit it or just resubmit it. Or you can keep pressing the up arrow until you get to the command from the past that you want. This is very useful and makes things, especially recovering from mistakes, much quicker.

You can also type the command history which will show you the history as the shell remembers it. There are much fancier history tricks for more advanced usage.

Files

One fundamental idea of the Unix philosophy is that "everything is a file". Most people have heard that in actual fact, on a computer, everything is really a one or a zero. It turns out that scheme is too hard for humans to deal with. Unix represents the closest thing to that which humans can still easily understand, text and files. Binary digits, 1s and 0s are coded into human letters like "abcd,etc" and that’s pretty much it. You can even have big chunks of raw 1s and 0s but keeping track of them still requires some human letters for organization. Since this is such a fundamental concept, the early Unix designers thought to go ahead and impart file-like properties to many other system features since the thinking was that tools to work with files would become highly sophisticated. They did. This strategy turned out to be very powerful.

File Naming Tips

It might also be a good opportunity to advise some good habits with respect to naming files. In Linux you can make files with terrible and inconvenient names, but it will turn out to be a terrible and inconvenient idea. Since command lines are parsed looking for certain special things, it’s really not wise to have those special things as a part of your file’s name. Special things that should not go in your file names include:

  • space - This is what the interpreter uses to separate elements. Consider a file named "A", a file named "B" and a file named "A B". Imagine the confusion! Whenever you’re tempted to use a space in a file name, use an underscore, _, instead.

  • " - Any kind of quote, single, double, backtick, etc, is handled very specially by the shell so making it part of your file names is a disaster. This is a terrible filename: satan's data.xls

  • ()[]{} - Any kind of parentheses or brace or bracket is also special to the shell making it wise to avoid them too. A bad filename: trig_sin(x)_output

  • !#$&*?|\;>< - Severely problematic characters! These characters are all used by the shell and if used in filenames could cause very strange messes. A file like Tom & Jerry.avi or Yahoo!_results or #1priority.txt could make quite a mess in normal operation.

  • @%^:, - Somewhat problematic characters. You might get away with these but it will be messy and some programs may not like parsing names with these characters.

Working With Files

To start learning how to use files we’ll start with a somewhat obscure command. This command creates files from nothing. It’s obscure because there are many other more natural ways to do this. Nonetheless, try creating a file with:

$ touch ANewFile

Barring some kind of problem (disk full?), you’ve just created a new file. Note that almost everything in Unix is case sensitive. This file is empty right now. The only space on the hard drive it uses is to keep a record of its own existence. It has no contents.

We can change the file’s name with this command:

$ mv ANewFile CPUdata

This changes the name of ANewFile to CPUdata. The command here is short for "move". Notice that moving and renaming are essentially the same thing.

Similar to moving is copying which creates a completely new file and leaves the original intact. Here is an interesting way to use cp, the copy command.

$ cp /proc/cpuinfo CPUdata

This command is populating the file we created and renamed by copying the contents of another file into it. Another way to think of it is that CPUdata gets obliterated with the contents of /proc/cpuinfo, something to be careful of if CPUdata was valuable to you as it was.

It turns out that /proc/cpuinfo is an interesting example file. It is found on all Linux systems and what is interesting about it is that it is fake. All of the files that start with /proc/ are actually not files stored on the hard drive. They are messages supplied directly by the kernel in the disguise of files. In this case, we want to know what the CPU’s specifications are so we find out by checking the contents of this file. When we check, the kernel immediately synthesizes some contents. This content could change depending on the state of the kernel. But you can use normal file handling tools to interact with this information. This consistency is an example of the Unix philosophy of everything is a file.

To actually have a look at the contents of this new file we created or the source we got the information from, we need the following command:

$ cat CPUdata

This will cause the OS to lookup the contents of the specified file and spew it out on the screen for you. Why cat? It is short for "concatenate" and this command’s nominal purpose is to join multiple files into one big long one.

$ cat /proc/meminfo /proc/cpuinfo

This command produces a long output of the contents of /proc/meminfo, another virtual file, followed by /proc/cpuinfo.

This output is probably longer than your terminal and when that happens, the output is pretty much useless if it runs off the top of the screen before you have a chance to see it. One thing you can do is hold the Shift key and PageUp to scroll up and see what happened. Another way which works better if you can anticipate more than a screen full of output is to use what is called a "pager". There are two generally used pagers, the classic more and the modern less. It turns out that less is more. More of those questionable Unix puns. This shows how to use both pagers on a rather big file that should exist on most systems:

$ more /etc/services
$ less /etc/services

You might notice that more only stops the output and waits for you to press Space or Enter every so often so you can read it and once all of the lines have been shown, it ends. More useful is less which allows you to use the arrow keys to go up or down and the Space to jump forward a page. At the end less is still active in case you want to go back to the beginning (pressing g goes to the beginning and G goes to the end). To quit using less press q.

Now you know how to create files, rename them, copy them, and look at them. The last major command to know about in the life cycle of a file is how to get rid of it when you no longer need it. The remove command, rm, deletes a file:

$ rm CPUdata

You can try to erase /proc/cpuinfo but it won’t work. It’s not really there physically anyway. Generally you need to be very, very careful with the rm command. There is no "Trash" or "Recycle bin" when using normal Unix command line tools. If you tell the OS to erase the whole drive, you can expect to have that done in the fastest most brutal way possible. It’s like a gun — a serious tool that can easily produce very serious accidents. Be super careful with rm. It’s good to use tab completion to make sure you’re really deleting what you think you are and not a similarly named typo file.

Directories

Computer science, just like nature, likes trees. Real trees are tree-shaped because that organization is often very efficient. Computer scientists noticed this and organized data into trees. We could call the components of this organization data nodes and branching nodes (notice the arboreal metaphor) but we have a different metaphor for these things, files and directories respectively. Files are the collections of 1s and 0s that you want to preserve. Directories are also collections of 1s and 0s, in fact, they’re files too. However, directories contain the structural elements of how you want your other files organized. Don’t get too concerned with the fact that directories are really files. That’s a technicality. In practice, directories have their own set of tools that are pretty simple. Just think of directories as containers for files.

The first and perhaps most important command of all is ls. This command lists the files contained in a directory. Here is the simplest example.

$ ls

By itself, the ls command just returns a list of the file names in the current directory. The idea of "current directory" is something the shell uses to make sensible assumptions about what you might want if you’re not in the mood to be explicit. It’s not preserved between different instances of shells you might be running concurrently. To find out what your working directory is, you can print it with this command, short for "print working directory".

$ pwd
/home/xed

This shows that the current directory is /home/xed which for user xed is a typical "home" directory. Home directories can be abbreviated with ~. This means that these two commands do the same thing.

$ ls /home/xed
$ ls ~

If there is no output, it is likely that your account and home directory are still empty.

In previous examples, "arguments" are given with some commands to tell the commands what the target of the action is. If the commands are like the verbs and the arguments are like the nouns, then the adverbs are the "options". Options, sometimes called "command line options" tell the command how to do what it does. Here are some examples of the ls command using various options.

$ ls -l
$ ls -a
$ ls -al
$ ls -lrt

The first, -l asks the ls command to produce a "long" listing of the files which includes other interesting properties of the file such as what kind of file it is, who owns it, when it was created, and how big it is. The second, -a tells the command to show "all" files, even ones normally hidden for convenience. By default files that start with a period are not shown by the ls command unless there is a -a option. It’s good to get in the habit of using this option so you can avoid overlooking important files, usually configuration files, that start with period. The third, -al does both "all" and "long". You could have used -a -l, it’s the same thing. The final example tells ls to do a long listing, reversed, sorted by creation time (not alphabetically which is default). This is useful when trying to find a file you were recently working on.

Now that we can thoroughly inspect a directory let’s look at how to manage them. The first thing you might want to do is create a new directory. Here’s the command for that.

$ mkdir ~/unixlesson

This will create a directory called unixlesson in my home directory, ~. It can be checked with this.

$ ls ~/unixlesson/

Or this is the same thing.

$ ls /home/xed/unixlesson/

Don’t forget to use tab completion! This directory should be empty since it was just created. This is an example of using the ls command explicitly on an argument which is an absolute path. The path is simply the string of all a file’s parent directories. In this case there is the root directory which is the leading slash, /. Then comes home/ which contains all the user home directories including mine, xed/. Finally comes the directory we made inside of that, unixlesson/. When ls sees this is a directory it shows a list of the contents.

You can make the present working be this new one with one of the most important Unix commands, the "change directory" command or cd:

$ cd ~/unixlesson
$ pwd
/home/xed/unixlesson

Notice the forward slash separates path elements. Thus far I have shown absolute paths, but there is another kind, relative paths. This is a way to specify a path not with respect from the top root directory, but rather with respect to the directory that is your current working one now. There are two relative path elements that are important to learn. The first is . which means the current directory and the other is .. which means the parent directory. This allows constructions like this.

$ cd ~
$ cd ./unixlesson
$ cd ..

The first is an absolute path to the user’s home directory. Then the directory is changed from the current one, the ., to the one called unixlesson that is a sub directory of it. The final command changes back out of the subdirectory into the parent which is back to the home directory. Complicated constructions are valid using these.

$ cat ../../etc/issue
CentOS release 6.7 (Final)

This means from the home directory go up a level, go up another level, then come back to a subdirectory at that level called etc and show the file called issue.

A final thing to say about directories is that when you’re done with them you can remove them with rmdir.

$ cd ~
$ rmdir unixlesson

Note that rmdir only works if the directory is empty, i.e. does not contain any files or subdirectories. See the rm command if they’re not.

Unixland

When beginners first start using Linux or any new operating system, it’s often quite disorienting how everything is organized and why things are called what they are. Here is a quick look at the general layout of a normal Linux file system. This list comes from a command like this.

$ ls /
  • bin - Programs to run, traditionally often called binaries.

  • boot - Where the kernel lives and other things needed to boot at start up. The kernel is the single program that manages all other programs and resources; as such it is your operating system strictly speaking.

  • dev - A virtual directory where hardware and virtual devices can be accessed.

  • etc - Mostly system configuration files, look up lists, miscellaneous program default configurations, etc.

  • home - Where home directories of users are stored.

  • lib, lib64, /usr/lib - Where low level software libraries live.

  • lost+found - A place for the file system to do file housekeeping.

  • media - In modern Linux systems, a place to access mounted drives like USB flash drives, CDs, and other media.

  • mnt - In old Linux a mount point like /media.

  • opt - Often contains optional software from external sources which are not especially well integrated into the OS distribution.

  • proc - Another virtual directory tree which allows the user to use file and directory commands to interact directly with the kernel process.

  • root - The home directory for the "superuser", i.e. the administrator. Don’t confuse the root user account with the root directory which is /. It is confusing!

  • sbin - System binaries or programs generally not needed by ordinary users doing ordinary work.

  • sys - Another virtual directory like /proc for interacting with the system's kernel.

  • tmp - A temporary directory. Normal users can write things here, but you should not count on them being permanent.

  • usr - Contains software used by the users (as opposed to used autonomously by the system).

  • var - Contains variable state information to support various programs. Web sites, databases, print queues, logs, and mail queues are stored here.

Expansion (Wildcards)

When working with files and directories it’s often tedious to explicitly type out all of a long path name and especially tedious to type out multiple files that have a common theme. The shells provide many tricks to make this as efficient as possible.

The most common of these tricks is the wildcard expansion using *. When you use a * in constructing a command, the shell takes that star and replaces it with a list of files from the path specified. If no path other than * is specified, it substitutes the names of all of the files in the current directory. Here’s an example:

$ mkdir SomeTestDir
$ cd SomeTestDir/
$ touch Alpha Bravo Charlie
$ echo *
Alpha Bravo Charlie
$ rm Alpha Bravo Charlie
$ cd ..
$ rmdir SomeTestDir

The echo command just outputs the argument list that is provided with the command. It’s basically a "print" command. When the shell sees the * it removes it and replaces it with a listing of the current directory, basically similar to what ls would return.

Warning
I could have used rm * to delete the temp files, but I’ll take this opportunity to point out that rm * can be a disastrous command if you are in a directory you did not intend to be. Under certain circumstances, a small command like rm * can wipe out an entire system. So be super careful!

The * can be used to match parts of paths too. The second command only lists files ending in .html:

$ ls
contact.html icon.ico index.html logo.png privacy.html
$ ls *.html
contact.html index.html privacy.html

A lesser known but useful expansion feature which operates on single characters is the ?. It is useful for things like this:

$ ls
bike1.jpg bike2.jpg bus1.jpg bus2.jpg car1.jpg car2.jpg
$ ls bus?.jpg
bus1.jpg bus2.jpg

Redirecting Streams

One of the most powerful features of Unix is not surprisingly a bit weird for beginners to fully grasp at first. This is the use of pipes and stream operators. There are subtle complexities, but the normal ways of using these tricks can just be easily memorized and put to immediate use. It works like this:

$ ls > my_file_list

This takes the output of ls and sends it to a file called my_file_list. If the file does not exist, it creates one (in the current directory if no path is specified).

Warning
If the file does exist, it is overwritten! Any data that previously existed in the file will be gone forever. Make sure this is what you want!

If you do not want the redirect to overwrite you can use this syntax:

$ cal 6 2011 > summer2011
$ cal 7 2011 >> summer2011
$ cal 8 2011 >> summer2011

This will produce a file called summer2011 containing a calendar of June, July, and August. The >> operator is for "append".

One very strange but common and useful thing to do is to redirect output into a black hole thereby preventing that output from doing something annoying or unstable. Unix systems traditionally have such a black hole in the form of a special virtual file called /dev/null. If you write to this "file", the system pretends like it wrote it, but in reality, the data is just thrown away. This tends to be used with noisy programs which print status messages to the screen and write useful data to files. It is also used heavily in automating repetitive lists of shell commands into what are called scripts.

Pipes

Not only can you take output and redirect it to a file, you can also send output of a program to be the input of another. This is called a pipe and it allows you to string together a sequence of simple commands into a very powerful complex one. Here’s a simple example.

$ ls --help | less

This takes the long help message of the ls command and pipes it to the less command which allows you to scroll through it at a comfortable reading pace (q to quit less).

You can string together many programs like so:

$ cat /proc/cpuinfo | grep name | wc -l
4

Here I’m dumping the contents of /proc/cpuinfo into the filter program grep which will output only the lines containing the word name. That output will go to a program called wc which is a "word counter"; it also counts lines if you give it the -l option. The result shows how many CPUs the system thinks it has to work with.

Sometimes these pipe chains can get quite long and complex. A major aspect of the Unix philosophy which seems to have held up well over the decades is to try and focus on small tools that do one very small specific task very well. Pipes are the glue that bind these high quality simple tools into high quality complex ones.

Being useful

Here is a quick list of some of the major commands that Unix users should be aware of.

  • grep - Global Regular Expression Parser, finds specific text or patterns hiding in files. Compare cat /proc/cpuinfo with grep name /proc/cpuinfo.

  • sed, awk - Powerful stream editing languages, very useful with piped data.

  • head, tail - Outputs specified file but only the first 10 lines (for head) or the last 10 lines (for tail). The number of lines can be set with the -n option. So head -n1 /proc/meminfo shows how much memory the system has (a single line from the top of that virtual file).

  • du, df - Check on disk usage or see how much disk free space is left.

  • wc - Word count, also counts lines and bytes.

  • chmod - Change the modification privileges for files.

  • find - A very thorough and efficient file finding program.

  • passwd - The command to use if you ever need to change your password.

  • sleep, at - Take a little break until the next thing.

  • tar - Stands for "tape archiver" but it’s still actually a useful generic way to bundle multiple files into one single one.

  • zip, unzip, gzip, gunzip, bzip2, bunzip2 - File compression programs. For normal Unix applications, gzip is recommended. It is conventional to have files end in .gz if compressed with gzip. Also common is .tgz which is a gzipped tar file.

  • w, who, lastlog - Who else is logged into this machine with you.

  • clear - Clears the screen of text. Ctrl-L does the same thing usually.

  • exit - Logs out of a shell. Ctrl-D also does the same thing usually.

Processes

When you execute a command the operating system (Linux) keeps track of it throughout the time it runs. It assigns every running command a "process ID". You can manage these manually to some extent when necessary. To start with, you can use this to see what’s running on your system.

$ top

This shows a continuous display of the top resource intensive programs running on your system. Press q to quit top.

A similar but more fussy and precise command is ps. It shows the process spawned by the current user in the current shell. I tend to always use the following to show all process by all users on the system.

$ ps -ef

If you find that a job is stuck and will not complete, or maybe it’s causing some kind of other problem, you can kill it by doing this.

$ kill 1337

In this example the process with an ID number of 1337 will be sent the kill signal. Most times that does kill it. Sometimes you have to get out the big guns to kill a problem job, the -9 option.

$ kill -9 1337

That definitely kills it dead or you have major problems.

There are other nice tricks you can do to manage how your processes run. Here are two useful ones. First is to see how long your job takes to run.

$ time gzip -c /boot/vmlinuz > repacked_kernel.gz
real    0m4.533s
user    0m0.137s
sys 0m0.007s

This shows that it took about 4.5 seconds to compress the file containing the Linux kernel. The other two times are technical measures of how much time was used by this process actually in the CPU.

A simple stopwatch can be implemented like this.

$ time read

Another useful command modifier is watch which reruns a program over and over every couple of seconds (by default) allowing you to watch any changes unfold in real time. Here’s a crude clock.

$ watch date

This brings up a good question. What if a job gets stuck and you need to stop it. As mentioned you can use kill if you can access another terminal. But if the job is running in the currently focused terminal, you might be able to kill it by simply pressing Ctrl-C. This is how you get out of watch.

Command Line

Unix commands have a reputation for being inscrutable and bewildering. The goal of brevity often does not help but there is another goal that is helpful: eliminating ambiguity. By understanding the common conventions of Unix commands as they are typically used, you can gain enormous insight and power. Behind what looks to be a complex monstrosity is often an elegant clear concise way to tell your computer what to do.

Commands Options

Here is a simple command. (Try running it like this!)

date

Here is a command with an option - an option changes how the command behaves, sort of an adverb.

date -u

This makes the date come out in UTC. Useful! Note you can also achieve the same effect with this.

date --utc

And in this (well-chosen?) example you can even do this.

date --universal

The "--universal" and "--utc" variants are simply convenient synonyms. That’s actually not too common to see such redundancy. Ignore that.

But the "-u" and "-utc" are quite different strategies to achieve the same thing and both are quite common. The first is a classic Unix option. They are one letter. It was not envisioned that a program should have 8000 "options" that you’d control on the command line. And if you did need that, you’d do something like this:

vim -c "set textwidth=70" file_to_edit

Classic Unix one letter options tend to be shorter (obviously) and more cryptic. Both of those properties intensify when you consider multiple options. Compare these two ways of saying the same thing.

date --utc --rfc-email
date -u -R

It turns out that because the Unix one letter options are guaranteed to be, by definition, one letter, you can get away with either of these.

date -uR
date -Ru

This is why I have an alias called "rap" which I use a lot.

alias rap='rsync -aP '

Which is short for the long version which is this.

rsync --archive --partial --progress

These full double dash options are often called GNU long options because GNU popularized it in ancient Unix times. The benefit to long options are that they are more comprehensible. This is valid and valuable when scripting. Although I might type "-u" on the command line, if I put "--universal" in scripts, when troubleshooting my script years hence, I won’t have to look up what the hell "-u" is. It’s also helpful when you need a "--rate" option and also "--recursive", "--relative", and "--read-only" options.

So that’s "options". The next thing to understand is the concept of "arguments". A command argument is simply something you want to program to operate on. Grammatically, if the program is a "verb", the arguments are the "direct objects".

Consider this.

date +%Y

The + looks like some kind of option but here it is actually an argument. It is the text string describing what kind of date format you’d like (year only in this case - "2020"). I use this somewhat confusing example because some programs do use "+" for a type of option. This is not exactly standard or common, but it does show up occasionally. For example using the unruly maverick software xed.ch/h/im[ImageMagick], you can negate an option like this.

$ identify +verbose pic.jpg | wc -l # One line result.
1
$ identify -verbose pic.jpg | wc -l # 142 lines result.
142

Did you notice something else non-standard about that ImageMagick identify command? They didn’t use double dash for a long option. In a normal Unix program it would be interpreted like this.

identify -v -e -r -b -o -s -e

But ImageMagick is not playing along. Pretty much everything else does! Remember, how the program handles options is up to the program. Although there are exceptions like this, there are important conventions.

So that’s complex options and arguments. What about option arguments? They are arguments not to the program itself, but rather to the option. Consider this example.

rsync /home/${USER}/ /mnt/backupvolume/backups2020/

this is the second "/mnt/backupvolume/backups2020/" (since it is the last argument, the destination). But what about this?

rsync -e "ssh -p123" /home/${USER}/ /mnt/backupvolume/backups2020/

What is "ssh -p123"? That is an option argument; it is the help that the -e option needs to do the right thing. Obviously you can’t do something like this.

rsync -raeP "ssh -p127" src dest

Because "-P" does not take an option argument. But this unfortunate set of options would work.

rsync -raPe "ssh -p127" src dest

Here’s the actual command I tend to type in real life.

rsync -raP --rsh="ssh -p127" src dest

With a long option argument the equal is usually necessary (true in rsync). This style minimizes confusion. But if you type this so often that it hurts, go with the short option and reduce the command to the minimum.

That explanation goes a long way to unravelling the weird things you see on the command line. When programs break this tradition, they are doing it wrong. Programs like ffmpeg are so complex and baroque that they do break with tradition and their command lines are very difficult to use. ImageMagick does it wrong. Ironically, both of those pieces of software are very hardcore and the idiosyncratic architecture must be begrudgingly respected because they do indeed have Reasons. Normal little people like us do not and we should adhere to the conventions when creating software.

Understanding this goes a long way towards unravelling the whole mystery. After reading this, just check out some man pages to see how useful this is. Check specifically man date and man rsync. In the rsync man page - which is a lengthy beast - practice finding the thing you really want to know about by doing something like "/" (forward search) and then, for example, "--rsh".

Next level Of Unix Mastery

There is an entire universe of things to explore in Unix. After getting comfortable with the beginner topics, here are some other things you might want to explore:

  • Vim - Become omnipotent in all matters of text processing. Checkout this cool game designed to help you learn about Vim.

  • regular expressions - The standard Unix method for specifying patterns for matching.

  • job control - putting things in the background effectively.

  • permissions - Understand and work with file permissions.

  • screen - Manage multiple shells over one remote connection.

  • cvs / Mercurial / Git - Manage your software development sensibly.

  • sed / awk - These do very powerful things to text streams.

  • xargs, parallel - Working with large and complex file lists.