Motivation
Did you just start a job where you now have a Linux account? Do you own a Mac? These are both good reasons for people who otherwise never expected to worry about Unix to learn a little bit about it. It turns out that the Unix way of doing things isn’t confined to Unix or Linux. Since Apple products don’t intentionally suck, there is a pretty complete collection of Unix tools pre-installed on every Apple that make it practically as functional as Linux. If you didn’t know this, it’s like discovering your car has a second engine that doubles its power.
It seems I’m not alone in being sad about the untapped power of your computer. Some guys at MIT have a very nice on-line resource called The Missing Semester of Your CS Education. They are implying that even (especially) computer science majors are graduating with a painfully deficient education. Their motivation for the class is very much aligned with mine.
It is not strictly necessary, but when exploring Unix on a Mac, the
experience is enhanced by having all resources available installed. To
start with it is good to install Apple’s developer tools known as
Xcode.
I found on a Mavericks OS X (10.9, I think) in 2014 when it was the
latest and greatest that I was able to install Xcode by simply opening
a terminal and typing gcc
. Since the Gnu Comipler Collection was
missing, a fancy GUI box opened and asked if I wanted Xcode to
automagically be installed. After Xcode is successfully installed, the
next thing to have to make your Apple system really useful is
Homebrew which is a package manager for Macs that
makes installing the things Apple neglected to quite easy. Just go to
the Homebrew page and cut and paste the line they show for "Install
Homebrew". After providing your system password one or many times,
it’s ready to go. Then you can do things like brew install wget
to
install the very useful program wget
. Here are a few of my favorite
free software packages that Apple should have included, but didn’t,
which you can get with brew
: mercurial, cvs, source-highlight,
imagemagick, asciidoc.
Windows, unfortunately, does intentionally suck and that has forced clear thinking people to seek alternative solutions. Fortunately there are many.
The most traditional solution to the severe shortcomings of Windows is Cygwin. There are also some newer solutions such as GNUWin32 and Gnu On Windows and Babun. You could also just run genuine Linux in a virtual machine. Or run a genuine Linux on a USB memory stick.
Actually these days Windows supports Bash natively and a bunch of Unix tools can be easily installed using Debian. That’s great news for anybody stuck with Windows.
From about the late 1960s to the mid 1990s, having the power of Unix at your command was generally rather expensive. However, that was 20 years ago. People have been getting comfortable with Unix for 40 years. By now, thanks to Linux, there is no excuse for it to cost, well, anything. This is why if you are not taking advantage of its power, you are missing out.
Some people might be skeptical that a 40+ year old computer technology could be useful in the modern world. Keep in mind that pretty much all the serious software you have ever used was written in a programming language whose general syntax was first formulated about the same time. The main principles of computer science have not radically changed since being established.
Unix methodology allows you to get beyond the clumsy wasteful interfaces designed to help people who know nothing about computers. If you know nothing about computers, maybe you’re not ready for Unix. But if your job or college major is in something involving data or computation (which today is most everything) then it is often valuable to know something about your most important tool and how to use it to best effect.
With Unix you can powerfully find things. You can organize things very unambiguously. Unix file features are very sophisticated and solve many difficult problems.
Even better, you can apply the maximum resources of your computer to the actual problem you’re trying to solve. This may seem unimportant unless you realize that in normal consumer computers most of the computing power, arguably the most expensive aspect, goes towards supporting interface infrastructure and not whatever it is that you wanted a computer to compute in the first place. If you’re just trying to log into Facebook, this is fine, but if you really want your computer to produce useful work, it can be problematic.
Unix also allows you to automate things so that the computer works when you don’t. Finally, you can communicate and share things easily; the defining hallmark that separated the original personal computers from expensive computers was the "feature" that they did not communicate with each other over a network, they were "personal". It’s not hard to see why that was a dead end.
Connecting
The first thing you might have to do to start using Linux is log into your account. The way Linux and Mac people log into other Linux and Mac machines is to simply open up a terminal and type:
$ ssh chris@xed.example.com
In this case chris
is my username and xed.example.com
is the
machine I want to remotely use.
Windows users can install an ssh client that can do this too.
Shells
The ssh
command stand for "Secure SHell". This means that you want a
"shell" to a remote operating system that is securely transferred,
i.e. no eavesdropping. But what is this "shell" business? In a human
conversation your ears and mouth are like a "shell" to your brain.
It’s called a shell because it’s on the outside and regulates what
comes in and goes out. A telephone would be analogous to a remote ssh
connection. Through the shell you tell the computer what you want and
it tells you anything it thinks you should know.
Strictly speaking the shell is an abstract part of the system that mediates how the OS interacts with the user. In practice, shells are often run in a "terminal" also called a "console". This is a program that hosts a shell and actually draws something on your screen to interact with. Some very primitive terminals exist that just pass the shell output to the text screen and pass key presses to the shell. But fancy ones have colors, selectable fonts, scrollbars, and are resizable, etc.
It’s important to note here that if you’re not used to fixed width fonts, that is fonts where the "W" is as wide as the "i", then it’s time to change that. The kind of feedback a computer wants to show you is far more likely to resemble regular matrices than the kinds of normal printing typeface idiosyncrasies inspired by human handwriting such as kerning. The basic point is to make sure that if you type…
iiiiii
wwwwww
…they turn out to be the same length.
bash vs tcsh
Note
|
This was written when I was working for a lab which heavily used tcsh. Today the discussion is unchanged but it is more likely that you’ll be considering bash vs. zsh which is what Apple recently switched to. See this blog post I wrote in April 2020 for the full story. Feel free to mentally substitute zsh for tcsh for the rest of this section. |
The next issue is bash vs. tcsh. If your job puts you in a culture surrounded by people who are all happily using tcsh, then you may have to be comfortable with that and you certainly need to understand what the difference is. Everyone else can skip ahead and take bash for granted. The reason for this is that the whole world has pretty well standardized on bash as a default. Apple started out with tcsh but has since moved to bash.
Basically bash
and tcsh
are both programs that implement shells.
They are quite similar but there are important differences. If you’re
a tcsh user and find yourself using a bash shell, typing tcsh
will
often put you back into your favorite shell. An important tip is that
this almost always works the other way, if someone’s given you a tcsh
account, typing bash
will get you a bash shell.
To demystify it a bit, bash is a derivative or extension of the
ancient Unix shell called simply "sh", sometimes referred to as the
"Bourne shell". Turns out that sh
is still a valid command and you
can still run this primitive shell on pretty much all Unix systems.
The derivative shell, bash, is short for "Bourne Again SHell". On the
other side of the spectrum is tcsh which stands for "The C SHell".
Notice how both compete fiercely for snazziest pun. Unix is like that.
Get used to it. Anyway, in theory, tcsh embodies more syntax elements
borrowed from the C programming language.
If you’re very interested in the differences between bash and tcsch, this discussion highlighting the differences is quite detailed. Or maybe this classic discussion will be interesting to you.
Running Commands
Primarily what a shell does is it allows a user to specify commands
which it then causes the operating system to actually execute. When
you first log in or start a terminal, the shell confirms its presence
with a "prompt". This is a character that is supposed to be the user’s
cue or prompt to do something. If you don’t see the prompt then it is
not prompting you to do something and you probably need to wait until
something is finished. The prompt is conventionally $
(or >
in
tcsh) but it could be whatever you set it to. At the prompt, you can
type commands. Here’s a nice example of that in action:
$ cal 7 2011
July 2011
Su Mo Tu We Th Fr Sa
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31
$
The prompt, $
, said to do something, I entered the command cal 7
2011
whose function I hope is self evident by the results which then
appeared. The prompt appears again inviting you to do something else.
Tab Completion
People new to working with their computers through a command line interface or CLI (as opposed to a GUI, graphic user interface) often panic about how tedious they imagine typing in long complex commands will be. When I was starting out learning Linux I asked myself how likely it was that the smartest computer scientists and programmers, all of whom use a CLI to some extent, would do something flagrantly tedious? The answer is, of course, not too likely. It turns out that entering commands by typing them is quite efficient thanks to a couple of key tricks.
If you’re familiar with the ancient DOS command line you may not know about these tricks as command line input with that system was indeed dreadful. But all modern full-featured shells allow the user to only partially type things and, if the thing is sufficiently unambiguous, magically fill in the rest. This slashes typing by quite a bit. What typing remains is highly specific and important information you are trying to impart to your computer. Since you’re touch typing (it is helpful to learn to type well!) you’ll find that in many situations, you can tell your computer what your intentions are much faster than if you had to fumble for a mouse and go through what is essentially a little target shooting video game with the mouse pointer.
In the previous example, cal
is a pretty short command. What if the
command were factor
. There are lots of commands that start with f
so that’s not enough to be unambiguous. There are several that start
with fa
, so that’s not enough. But there is only one that starts
with fac
(on my system). Because fac
is unambiguous, I can type
f
then a
then c
and then tab and the command factor
will
appear. Then type space and then your argument, which is just a
fancy way of saying the thing you want this program to think about
when it runs. In the following arbitrary example, I’m showing that the
fourth root of 81 is 3.
$ factor 81
81: 3 3 3 3
It turns out that the arguments can also often be "tab completed", generally when they are file names. Here’s an example:
$ file /usr/bin/eject
This command was specified by typing file
then space then forward
slash then u
then tab then b
then tab then ej
followed by a
final tab. This sequence specifies the parts of the location of this
file named eject
. That’s 12 keys instead of 19, or less than 5
seconds for a very slow typist. These parts are separated by forward
slashes. If you’re used to DOS, you might think that something like
C:\Trash
contains a "slash". Not so. That is a "backslash". Get used
to the terminology of "slash" meaning /
. This is important in Unix
which uses proper slashes to help organize files.
Getting Help With man
A very important Unix trick to know is the man
command. This is
short for "manual" and it is the key to not having to memorize all of
the arcane details of every Unix command. Most commands and programs
have a "man page" which is an explicit record of all the command’s
acceptable syntax. This includes a reference list of all of the
command’s options, a description of its output and a description of
what is required for input. An example:
$ man factor
This shows the man page for the factor
command. Use arrows to
navigate and q
to quit viewing the man page.
Another trick for getting help is to try the -h
option. Many
commands adhere to the convention of interpreting the -h
option to
mean that the user wants some information, albeit very terse, about
how the program is used. Here’s an example:
$ file -h
Usage: file [-bcikLhnNsvz] [-f namefile] [-F separator] [-m magicfiles] file...
file -C -m magicfiles
Try `file --help' for more information.
As it says, using the long form option, --help
produces even more
help. This is very important in programs that use the -h
option for
something else like (often "human readable" as in ls
or df
).
$ ls --help
This produces a pretty comprehensive summary of what options ls
can
take.
History
The other critical trick that the command line shell has is called "history". Even if forced to type a complicated command sequence, Unix experts would feel foolish doing that twice. The shells remember hundreds of the most recent command sequences you send to the computer and can reuse them. The easiest way to use history is to use the up and down arrows. The up arrow puts the most recent command on the command line. You can edit it and resubmit it or just resubmit it. Or you can keep pressing the up arrow until you get to the command from the past that you want. This is very useful and makes things, especially recovering from mistakes, much quicker.
You can also type the command history
which will show you the
history as the shell remembers it. There are much fancier history
tricks for more advanced usage.
Files
One fundamental idea of the Unix philosophy is that "everything is a file". Most people have heard that in actual fact, on a computer, everything is really a one or a zero. It turns out that scheme is too hard for humans to deal with. Unix represents the closest thing to that which humans can still easily understand, text and files. Binary digits, 1s and 0s are coded into human letters like "abcd,etc" and that’s pretty much it. You can even have big chunks of raw 1s and 0s but keeping track of them still requires some human letters for organization. Since this is such a fundamental concept, the early Unix designers thought to go ahead and impart file-like properties to many other system features since the thinking was that tools to work with files would become highly sophisticated. They did. This strategy turned out to be very powerful.
File Naming Tips
It might also be a good opportunity to advise some good habits with respect to naming files. In Linux you can make files with terrible and inconvenient names, but it will turn out to be a terrible and inconvenient idea. Since command lines are parsed looking for certain special things, it’s really not wise to have those special things as a part of your file’s name. Special things that should not go in your file names include:
-
space
- This is what the interpreter uses to separate elements. Consider a file named "A", a file named "B" and a file named "A B". Imagine the confusion! Whenever you’re tempted to use a space in a file name, use an underscore,_
, instead. -
"
- Any kind of quote, single, double, backtick, etc, is handled very specially by the shell so making it part of your file names is a disaster. This is a terrible filename:satan's data.xls
-
()[]{}
- Any kind of parentheses or brace or bracket is also special to the shell making it wise to avoid them too. A bad filename:trig_sin(x)_output
-
!#$&*?|\;><
- Severely problematic characters! These characters are all used by the shell and if used in filenames could cause very strange messes. A file likeTom & Jerry.avi
orYahoo!_results
or#1priority.txt
could make quite a mess in normal operation. -
@%^:,
- Somewhat problematic characters. You might get away with these but it will be messy and some programs may not like parsing names with these characters.
Working With Files
To start learning how to use files we’ll start with a somewhat obscure command. This command creates files from nothing. It’s obscure because there are many other more natural ways to do this. Nonetheless, try creating a file with:
$ touch ANewFile
Barring some kind of problem (disk full?), you’ve just created a new file. Note that almost everything in Unix is case sensitive. This file is empty right now. The only space on the hard drive it uses is to keep a record of its own existence. It has no contents.
We can change the file’s name with this command:
$ mv ANewFile CPUdata
This changes the name of ANewFile
to CPUdata
. The command here is
short for "move". Notice that moving and renaming are essentially the
same thing.
Similar to moving is copying which creates a completely new file and
leaves the original intact. Here is an interesting way to use cp
,
the copy command.
$ cp /proc/cpuinfo CPUdata
This command is populating the file we created and renamed by copying
the contents of another file into it. Another way to think of it is
that CPUdata
gets obliterated with the contents of /proc/cpuinfo
,
something to be careful of if CPUdata was valuable to you as it was.
It turns out that /proc/cpuinfo
is an interesting example file. It
is found on all Linux systems and what is interesting about it is that
it is fake. All of the files that start with /proc/
are actually not
files stored on the hard drive. They are messages supplied directly by
the kernel in the disguise of files. In this case, we want to know
what the CPU’s specifications are so we find out by checking the
contents of this file. When we check, the kernel immediately
synthesizes some contents. This content could change depending on the
state of the kernel. But you can use normal file handling tools to
interact with this information. This consistency is an example of the
Unix philosophy of everything is a file.
To actually have a look at the contents of this new file we created or the source we got the information from, we need the following command:
$ cat CPUdata
This will cause the OS to lookup the contents of the specified file
and spew it out on the screen for you. Why cat
? It is short for
"concatenate" and this command’s nominal purpose is to join multiple
files into one big long one.
$ cat /proc/meminfo /proc/cpuinfo
This command produces a long output of the contents of
/proc/meminfo
, another virtual file, followed by /proc/cpuinfo
.
This output is probably longer than your terminal and when that
happens, the output is pretty much useless if it runs off the top of
the screen before you have a chance to see it. One thing you can do is
hold the Shift key and PageUp to scroll up and see what happened.
Another way which works better if you can anticipate more than a
screen full of output is to use what is called a "pager". There are
two generally used pagers, the classic more
and the modern less
.
It turns out that less
is more. More of those questionable Unix
puns. This shows how to use both pagers on a rather big file that
should exist on most systems:
$ more /etc/services
$ less /etc/services
You might notice that more
only stops the output and waits for you
to press Space or Enter every so often so you can read it and once all
of the lines have been shown, it ends. More useful is less
which
allows you to use the arrow keys to go up or down and the Space to
jump forward a page. At the end less
is still active in case you
want to go back to the beginning (pressing g
goes to the beginning
and G
goes to the end). To quit using less press q
.
Now you know how to create files, rename them, copy them, and look at
them. The last major command to know about in the life cycle of a file
is how to get rid of it when you no longer need it. The remove
command, rm
, deletes a file:
$ rm CPUdata
You can try to erase /proc/cpuinfo
but it won’t work. It’s not
really there physically anyway. Generally you need to be very, very
careful with the rm
command. There is no "Trash" or "Recycle bin"
when using normal Unix command line tools. If you tell the OS to erase
the whole drive, you can expect to have that done in the fastest most
brutal way possible. It’s like a gun — a serious tool that can easily
produce very serious accidents. Be super careful with rm
. It’s good
to use tab completion to make sure you’re really deleting what you
think you are and not a similarly named typo file.
Directories
Computer science, just like nature, likes trees. Real trees are tree-shaped because that organization is often very efficient. Computer scientists noticed this and organized data into trees. We could call the components of this organization data nodes and branching nodes (notice the arboreal metaphor) but we have a different metaphor for these things, files and directories respectively. Files are the collections of 1s and 0s that you want to preserve. Directories are also collections of 1s and 0s, in fact, they’re files too. However, directories contain the structural elements of how you want your other files organized. Don’t get too concerned with the fact that directories are really files. That’s a technicality. In practice, directories have their own set of tools that are pretty simple. Just think of directories as containers for files.
The first and perhaps most important command of all is ls
. This
command lists the files contained in a directory. Here is the simplest
example.
$ ls
By itself, the ls
command just returns a list of the file names in
the current directory. The idea of "current directory" is something
the shell uses to make sensible assumptions about what you might want
if you’re not in the mood to be explicit. It’s not preserved between
different instances of shells you might be running concurrently. To
find out what your working directory is, you can print it with
this command, short for "print working directory".
$ pwd
/home/xed
This shows that the current directory is /home/xed
which for user
xed
is a typical "home" directory. Home directories can be
abbreviated with ~
. This means that these two commands do the same
thing.
$ ls /home/xed
$ ls ~
If there is no output, it is likely that your account and home directory are still empty.
In previous examples, "arguments" are given with some commands to
tell the commands what the target of the action is. If the commands
are like the verbs and the arguments are like the nouns, then the
adverbs are the "options". Options, sometimes called "command line
options" tell the command how to do what it does. Here are some
examples of the ls
command using various options.
$ ls -l
$ ls -a
$ ls -al
$ ls -lrt
The first, -l
asks the ls command to produce a "long" listing of the
files which includes other interesting properties of the file such as
what kind of file it is, who owns it, when it was created, and how big
it is. The second, -a
tells the command to show "all" files, even
ones normally hidden for convenience. By default files that start with
a period are not shown by the ls
command unless there is a -a
option. It’s good to get in the habit of using this option so you can
avoid overlooking important files, usually configuration files, that
start with period. The third, -al
does both "all" and "long". You
could have used -a -l
, it’s the same thing. The final example tells
ls
to do a long listing, reversed, sorted by creation time (not
alphabetically which is default). This is useful when trying to find a
file you were recently working on.
Now that we can thoroughly inspect a directory let’s look at how to manage them. The first thing you might want to do is create a new directory. Here’s the command for that.
$ mkdir ~/unixlesson
This will create a directory called unixlesson
in my home directory,
~
. It can be checked with this.
$ ls ~/unixlesson/
Or this is the same thing.
$ ls /home/xed/unixlesson/
Don’t forget to use tab completion! This directory should be empty
since it was just created. This is an example of using the ls command
explicitly on an argument which is an absolute path. The path is
simply the string of all a file’s parent directories. In this case
there is the root directory which is the leading slash, /
. Then
comes home/
which contains all the user home directories including
mine, xed/
. Finally comes the directory we made inside of that,
unixlesson/
. When ls
sees this is a directory it shows a list of
the contents.
You can make the present working be this new one with one of the most
important Unix commands, the "change directory" command or cd
:
$ cd ~/unixlesson
$ pwd
/home/xed/unixlesson
Notice the forward slash separates path elements. Thus far I have
shown absolute paths, but there is another kind, relative paths.
This is a way to specify a path not with respect from the top root
directory, but rather with respect to the directory that is your
current working one now. There are two relative path elements that are
important to learn. The first is .
which means the current directory
and the other is ..
which means the parent directory. This allows
constructions like this.
$ cd ~
$ cd ./unixlesson
$ cd ..
The first is an absolute path to the user’s home directory. Then the
directory is changed from the current one, the .
, to the one called
unixlesson
that is a sub directory of it. The final command changes
back out of the subdirectory into the parent which is back to the home
directory. Complicated constructions are valid using these.
$ cat ../../etc/issue
CentOS release 6.7 (Final)
This means from the home directory go up a level, go up another level,
then come back to a subdirectory at that level called etc
and show
the file called issue
.
A final thing to say about directories is that when you’re done with
them you can remove them with rmdir
.
$ cd ~
$ rmdir unixlesson
Note that rmdir
only works if the directory is empty, i.e. does not
contain any files or subdirectories. See the rm
command if they’re
not.
Unixland
When beginners first start using Linux or any new operating system, it’s often quite disorienting how everything is organized and why things are called what they are. Here is a quick look at the general layout of a normal Linux file system. This list comes from a command like this.
$ ls /
-
bin
- Programs to run, traditionally often called binaries. -
boot
- Where the kernel lives and other things needed to boot at start up. The kernel is the single program that manages all other programs and resources; as such it is your operating system strictly speaking. -
dev
- A virtual directory where hardware and virtual devices can be accessed. -
etc
- Mostly system configuration files, look up lists, miscellaneous program default configurations, etc. -
home
- Where home directories of users are stored. -
lib
,lib64
,/usr/lib
- Where low level software libraries live. -
lost+found
- A place for the file system to do file housekeeping. -
media
- In modern Linux systems, a place to access mounted drives like USB flash drives, CDs, and other media. -
mnt
- In old Linux a mount point like/media
. -
opt
- Often contains optional software from external sources which are not especially well integrated into the OS distribution. -
proc
- Another virtual directory tree which allows the user to use file and directory commands to interact directly with the kernel process. -
root
- The home directory for the "superuser", i.e. the administrator. Don’t confuse the root user account with the root directory which is/
. It is confusing! -
sbin
- System binaries or programs generally not needed by ordinary users doing ordinary work. -
sys
- Another virtual directory like/proc
for interacting with the system's kernel. -
tmp
- A temporary directory. Normal users can write things here, but you should not count on them being permanent. -
usr
- Contains software used by the users (as opposed to used autonomously by the system). -
var
- Contains variable state information to support various programs. Web sites, databases, print queues, logs, and mail queues are stored here.
Expansion (Wildcards)
When working with files and directories it’s often tedious to explicitly type out all of a long path name and especially tedious to type out multiple files that have a common theme. The shells provide many tricks to make this as efficient as possible.
The most common of these tricks is the wildcard expansion using *
.
When you use a *
in constructing a command, the shell takes that
star and replaces it with a list of files from the path specified. If
no path other than *
is specified, it substitutes the names of all
of the files in the current directory. Here’s an example:
$ mkdir SomeTestDir
$ cd SomeTestDir/
$ touch Alpha Bravo Charlie
$ echo *
Alpha Bravo Charlie
$ rm Alpha Bravo Charlie
$ cd ..
$ rmdir SomeTestDir
The echo
command just outputs the argument list that is provided
with the command. It’s basically a "print" command. When the shell
sees the *
it removes it and replaces it with a listing of the
current directory, basically similar to what ls
would return.
Warning
|
I could have used rm * to delete the temp files, but I’ll
take this opportunity to point out that rm * can be a disastrous
command if you are in a directory you did not intend to be. Under
certain circumstances, a small command like rm * can wipe out an
entire system. So be super careful! |
The *
can be used to match parts of paths too. The second command
only lists files ending in .html
:
$ ls
contact.html icon.ico index.html logo.png privacy.html
$ ls *.html
contact.html index.html privacy.html
A lesser known but useful expansion feature which operates on single
characters is the ?
. It is useful for things like this:
$ ls
bike1.jpg bike2.jpg bus1.jpg bus2.jpg car1.jpg car2.jpg
$ ls bus?.jpg
bus1.jpg bus2.jpg
Redirecting Streams
One of the most powerful features of Unix is not surprisingly a bit weird for beginners to fully grasp at first. This is the use of pipes and stream operators. There are subtle complexities, but the normal ways of using these tricks can just be easily memorized and put to immediate use. It works like this:
$ ls > my_file_list
This takes the output of ls and sends it to a file called
my_file_list
. If the file does not exist, it creates one (in the
current directory if no path is specified).
Warning
|
If the file does exist, it is overwritten! Any data that previously existed in the file will be gone forever. Make sure this is what you want! |
If you do not want the redirect to overwrite you can use this syntax:
$ cal 6 2011 > summer2011
$ cal 7 2011 >> summer2011
$ cal 8 2011 >> summer2011
This will produce a file called summer2011 containing a calendar of
June, July, and August. The >>
operator is for "append".
One very strange but common and useful thing to do is to redirect
output into a black hole thereby preventing that output from doing
something annoying or unstable. Unix systems traditionally have such a
black hole in the form of a special virtual file called /dev/null
.
If you write to this "file", the system pretends like it wrote it, but
in reality, the data is just thrown away. This tends to be used with
noisy programs which print status messages to the screen and write
useful data to files. It is also used heavily in automating repetitive
lists of shell commands into what are called scripts.
Pipes
Not only can you take output and redirect it to a file, you can also send output of a program to be the input of another. This is called a pipe and it allows you to string together a sequence of simple commands into a very powerful complex one. Here’s a simple example.
$ ls --help | less
This takes the long help message of the ls
command and pipes it to
the less
command which allows you to scroll through it at a
comfortable reading pace (q
to quit less
).
You can string together many programs like so:
$ cat /proc/cpuinfo | grep name | wc -l
4
Here I’m dumping the contents of /proc/cpuinfo
into the filter
program grep
which will output only the lines containing the word
name
. That output will go to a program called wc
which is a "word
counter"; it also counts lines if you give it the -l
option. The
result shows how many CPUs the system thinks it has to work with.
Sometimes these pipe chains can get quite long and complex. A major aspect of the Unix philosophy which seems to have held up well over the decades is to try and focus on small tools that do one very small specific task very well. Pipes are the glue that bind these high quality simple tools into high quality complex ones.
Being useful
Here is a quick list of some of the major commands that Unix users should be aware of.
-
grep
- Global Regular Expression Parser, finds specific text or patterns hiding in files. Comparecat /proc/cpuinfo
withgrep name /proc/cpuinfo
. -
sed
,awk
- Powerful stream editing languages, very useful with piped data. -
head
,tail
- Outputs specified file but only the first 10 lines (forhead
) or the last 10 lines (fortail
). The number of lines can be set with the-n
option. Sohead -n1 /proc/meminfo
shows how much memory the system has (a single line from the top of that virtual file). -
du
,df
- Check on disk usage or see how much disk free space is left. -
wc
- Word count, also counts lines and bytes. -
chmod
- Change the modification privileges for files. -
find
- A very thorough and efficient file finding program. -
passwd
- The command to use if you ever need to change your password. -
sleep
,at
- Take a little break until the next thing. -
tar
- Stands for "tape archiver" but it’s still actually a useful generic way to bundle multiple files into one single one. -
zip
,unzip
,gzip
,gunzip
,bzip2
,bunzip2
- File compression programs. For normal Unix applications,gzip
is recommended. It is conventional to have files end in.gz
if compressed withgzip
. Also common is.tgz
which is a gzipped tar file. -
w
,who
,lastlog
- Who else is logged into this machine with you. -
clear
- Clears the screen of text. Ctrl-L does the same thing usually. -
exit
- Logs out of a shell. Ctrl-D also does the same thing usually.
Processes
When you execute a command the operating system (Linux) keeps track of it throughout the time it runs. It assigns every running command a "process ID". You can manage these manually to some extent when necessary. To start with, you can use this to see what’s running on your system.
$ top
This shows a continuous display of the top resource intensive programs
running on your system. Press q
to quit top.
A similar but more fussy and precise command is ps
. It shows the
process spawned by the current user in the current shell. I tend to
always use the following to show all process by all users on the
system.
$ ps -ef
If you find that a job is stuck and will not complete, or maybe it’s causing some kind of other problem, you can kill it by doing this.
$ kill 1337
In this example the process with an ID number of 1337
will be sent
the kill signal. Most times that does kill it. Sometimes you have to
get out the big guns to kill a problem job, the -9
option.
$ kill -9 1337
That definitely kills it dead or you have major problems.
There are other nice tricks you can do to manage how your processes run. Here are two useful ones. First is to see how long your job takes to run.
$ time gzip -c /boot/vmlinuz > repacked_kernel.gz
real 0m4.533s
user 0m0.137s
sys 0m0.007s
This shows that it took about 4.5 seconds to compress the file containing the Linux kernel. The other two times are technical measures of how much time was used by this process actually in the CPU.
A simple stopwatch can be implemented like this.
$ time read
Another useful command modifier is watch
which reruns a program over
and over every couple of seconds (by default) allowing you to watch any
changes unfold in real time. Here’s a crude clock.
$ watch date
This brings up a good question. What if a job gets stuck and you need
to stop it. As mentioned you can use kill if you can access another
terminal. But if the job is running in the currently focused terminal,
you might be able to kill it by simply pressing Ctrl-C. This is how
you get out of watch
.
Command Line
Unix commands have a reputation for being inscrutable and bewildering. The goal of brevity often does not help but there is another goal that is helpful: eliminating ambiguity. By understanding the common conventions of Unix commands as they are typically used, you can gain enormous insight and power. Behind what looks to be a complex monstrosity is often an elegant clear concise way to tell your computer what to do.
Commands Options
Here is a simple command. (Try running it like this!)
date
Here is a command with an option - an option changes how the command behaves, sort of an adverb.
date -u
This makes the date come out in UTC. Useful! Note you can also achieve the same effect with this.
date --utc
And in this (well-chosen?) example you can even do this.
date --universal
The "--universal" and "--utc" variants are simply convenient synonyms. That’s actually not too common to see such redundancy. Ignore that.
But the "-u" and "-utc" are quite different strategies to achieve the same thing and both are quite common. The first is a classic Unix option. They are one letter. It was not envisioned that a program should have 8000 "options" that you’d control on the command line. And if you did need that, you’d do something like this:
vim -c "set textwidth=70" file_to_edit
Classic Unix one letter options tend to be shorter (obviously) and more cryptic. Both of those properties intensify when you consider multiple options. Compare these two ways of saying the same thing.
date --utc --rfc-email
date -u -R
It turns out that because the Unix one letter options are guaranteed to be, by definition, one letter, you can get away with either of these.
date -uR
date -Ru
This is why I have an alias called "rap" which I use a lot.
alias rap='rsync -aP '
Which is short for the long version which is this.
rsync --archive --partial --progress
These full double dash options are often called GNU long options because GNU popularized it in ancient Unix times. The benefit to long options are that they are more comprehensible. This is valid and valuable when scripting. Although I might type "-u" on the command line, if I put "--universal" in scripts, when troubleshooting my script years hence, I won’t have to look up what the hell "-u" is. It’s also helpful when you need a "--rate" option and also "--recursive", "--relative", and "--read-only" options.
So that’s "options". The next thing to understand is the concept of "arguments". A command argument is simply something you want to program to operate on. Grammatically, if the program is a "verb", the arguments are the "direct objects".
Consider this.
date +%Y
The + looks like some kind of option but here it is actually an argument. It is the text string describing what kind of date format you’d like (year only in this case - "2020"). I use this somewhat confusing example because some programs do use "+" for a type of option. This is not exactly standard or common, but it does show up occasionally. For example using the unruly maverick software xed.ch/h/im[ImageMagick], you can negate an option like this.
$ identify +verbose pic.jpg | wc -l # One line result.
1
$ identify -verbose pic.jpg | wc -l # 142 lines result.
142
Did you notice something else non-standard about that ImageMagick
identify
command? They didn’t use double dash for a long option.
In a normal Unix program it would be interpreted like this.
identify -v -e -r -b -o -s -e
But ImageMagick is not playing along. Pretty much everything else does! Remember, how the program handles options is up to the program. Although there are exceptions like this, there are important conventions.
So that’s complex options and arguments. What about option arguments? They are arguments not to the program itself, but rather to the option. Consider this example.
rsync /home/${USER}/ /mnt/backupvolume/backups2020/
this is the second "/mnt/backupvolume/backups2020/" (since it is the last argument, the destination). But what about this?
rsync -e "ssh -p123" /home/${USER}/ /mnt/backupvolume/backups2020/
What is "ssh -p123"? That is an option argument; it is the help that the -e option needs to do the right thing. Obviously you can’t do something like this.
rsync -raeP "ssh -p127" src dest
Because "-P" does not take an option argument. But this unfortunate set of options would work.
rsync -raPe "ssh -p127" src dest
Here’s the actual command I tend to type in real life.
rsync -raP --rsh="ssh -p127" src dest
With a long option argument the equal is usually necessary (true in rsync). This style minimizes confusion. But if you type this so often that it hurts, go with the short option and reduce the command to the minimum.
That explanation goes a long way to unravelling the weird things you
see on the command line. When programs break this tradition, they are
doing it wrong. Programs like ffmpeg
are so complex and baroque that
they do break with tradition and their command lines are very
difficult to use. ImageMagick does it wrong.
Ironically, both of those pieces of software are very hardcore and
the idiosyncratic architecture must be begrudgingly respected because
they do indeed have Reasons. Normal little people like us do not and
we should adhere to the conventions when creating software.
Understanding this goes a long way towards unravelling the whole
mystery. After reading this, just check out some man pages to see how
useful this is. Check specifically man date
and man rsync
. In the
rsync man page - which is a lengthy beast - practice finding the thing
you really want to know about by doing something like "/" (forward
search) and then, for example, "--rsh".
Next level Of Unix Mastery
There is an entire universe of things to explore in Unix. After getting comfortable with the beginner topics, here are some other things you might want to explore:
-
Vim - Become omnipotent in all matters of text processing. Checkout this cool game designed to help you learn about Vim.
-
regular expressions - The standard Unix method for specifying patterns for matching.
-
job control - putting things in the background effectively.
-
permissions - Understand and work with file permissions.
-
screen
- Manage multiple shells over one remote connection. -
cvs
/ Mercurial / Git - Manage your software development sensibly. -
xargs
,parallel
- Working with large and complex file lists.