Most of my notes are for my reference, but this one is for you! Several people have asked me how to take the first (or next) steps on the path to Unix command line mastery. Here is a collection of commands that I feel are essential for a proper computer professional to know about. If you’re a regular person who would like to leverage the built-in power of your Unix operating system, it may be good to have a look at this list and see if there is anything that you might find useful.
I tried to check every listed command on FreeBSD and OSX. The entire list works on those platforms unless otherwise noted. Be advised, however, that outside of Linux, the exact syntax of the commands and their options may differ slightly (you know what to do!).
Beginners may safely skip the subsections labeled "ℱancy" which cover (even) more esoteric aspects of the topic.
Also you may find my old Unix Beginner notes helpful. That covers more of the major concepts while this list looks specifically at the collection of commands that I personally use and feel that others will generally find useful.
All of my help notes live at http://xed.ch/help or use xed.ch/h which is the short version.
alias at awk bash bzip2 cal cat cd chgrp chmod chown clear convert cp crontab curl cut date dd df diff dmesg dmidecode df du echo exit export false feh file find fmt free grep gzip head help host htop identify ifconfig iftop import ip iptraf kill less ln locate lsblk ls lspci lsusb man md5sum mkdir more mount mv netstat nohup nslookup parallel passwd ping ps pwd rev rmdir rm rsync screen sed sensors set shutdown sleep smartctl sort split ssh ss stat sudo su tac tail tar time tmux top touch tr true truncate type umount uname uniq unzip vim wc wget whereis which xargs xxd uptime umask wall yes
Note that you can directly refer back to a specific command in this document with a URL in your browser formatted like this.
http://xed.ch/h/unix#cat
Or, once you get the hang of this stuff, you can read sections directly from a Unix command line like this.
$ CMD=cat && wget -qO- xed.ch/h/unix.txt | sed -n "/== ${CMD}/,/==/p"
Here I’m using wget and sed to show me the cat command’s section. That’s pretty handy. On second thought, I’ll be using these notes myself too!
+--------------+
| Start with |
| Unix problem |
+--------------+
|
Obviously you don't but
did you think you knew ---(no)---.
what you were doing? |
| |
(yes) |
| V
+----------------------+ +------------------------------+
| man relevant_command | | Think of way to |
| Search for options |<--| express the problem |
| involved. | | using many or none of the |
+----------------------+ | ambiguous keywords involved: |
| | mount, cat, cut, tail, kill |
Fix problem?----(no)---->| Search for that on the web. |
| +------------------------------+
(yes)
|
+-----------------------+
| You win 0 exit codes! |
+-----------------------+
man / whatis
This is the most important command because if you can remember it and
the existence of the other commands you need, you can instantly get a
thorough reminder of exactly what you need to know. man
is short for
"manual" and in the old days it was envisioned that the system would
help you typeset the documentation and many people probably did that.
But Unix people were some of the first to realize that actually
getting it onto paper didn’t really improve, well, anything. Using an
electronic version, you can keep it very up-to-date, do some brutally
fast searching (use the slash key "/"), and zero in on what you need
to know very quickly.
It is simple to use in normal cases. To see the documentation for the
cal
program just do this.
man cal
When you’re done with man
just hit "q" to quit.
The reason these notes are mostly for you and not me is that I use man
all
the time. The advantage these notes have for beginners is that I am
highlighting my favorite aspects of the various Unix tools while man
pages are usually scrupulously thorough about every mote of
functionality. It doesn’t help that man pages are written in an
idiosyncratic style that is extremely compact and efficient. At first
glance it looks about as useful and inviting as reading legalese in a
license agreement. This can be overwhelming for beginners. But trust
me, man pages are a very powerful resource. It’s good to bravely face
their style quirks until you can refer to them without fear. Remember
that the alternative is "dummy" "help" which says inane stuff like,
"The paste icon pastes." Grr. Better to know the answer is there but
your abilities are not than the other way around.
ℱancy man
If you’re a C programmer or doing some other serious business, you may
need to specify the section number of the manual. For example, man
stat
gets you the man page of the Unix stat
command. However if
you’re looking for the C programming system call you need this.
man 2 stat
Normal people never need to worry about this (by definition). If you
do, man man
has a nice list of what the sections are.
The whatis
command searches man pages for some word that you
specify as an argument. For example, doing whatis printf
shows that
there are two man pages for this command, a command (man section 1)
and a C function (man section 3).
exit
Nothing is more frustrating than being stuck in some program you want
to stop using. This is also true of a command line terminal. The
exit
command is really a feature of the shell. Beginners don’t need
to worry about what that means — just know that it will exit your
shell’s session, probably closing your terminal too.
ℱancy exit
I personally like to just type "Ctrl-D" at the beginning of a new
line. This exits the shell immediately. Note that as with stat
, the
man page for exit
can be misleading. There is a C function called
exit()
which is what the man page talks about.
As you begin to truly understand all your options, you may find that
ordinary GUI operations become more and more tedious. Here’s a very
powerful trick involving the exit
command. This is how I start my
big clumsy web browser.
$ grep BROW ~/.bashrc
export BROWSER=/home/xed/bin/firefox/firefox
alias ff='bash -c "${BROWSER} --new-instance --P default &";exit'
The "normal" way to start a browser from the command line is pretty
good. I can just type "firefox" (and with command completion, that is
pretty easy — f - i - r - tab). But the problem with that is that the
terminal then stays open the whole time waiting for you to finish
using the browser. With the alias shown above, I type ff
and hit
enter — the browser starts and the terminal I launched it from
disappears, thanks to the exit
command.
What is so valuable about this approach? Well, if you use command line terminals a lot, as you can imagine I do, it’s likely that you have a very fast way to launch a terminal. I think that many systems come preconfigured with Ctrl+Alt+T as a way to launch one. I always define Ctrl+Shift+N to instantly launch a (new) terminal. This means that when I want to run Firefox, I press Ctrl+Shift+N, type "ff", and then enter. If you think there is a faster mouse-based way to reliably do that (and anything else that needs doing), I will happily and strongly disagree with you.
clear
Got a bunch of busy junk on the terminal screen and you want to clear
it? Just use the clear
command. Another way to achieve the same
effect is to just press Ctrl-L. This works in many text interactive
situations including the shell. The advantage is that you can clear
previous lines while working on a new command line. Why use
clear
as a command? It is helpful if you want to put it into a
script to make sure the screen is not cluttered before starting some
output. I find it very helpful in scripts that print out a small
report that you want to watch change.
This will keep the screen filled with a report of sensor readings updated every 10 minutes in a way that is not cluttered.
while sensors; do sleep 600 && clear; done
help
When dealing with shell built-in functions, for example exit
or
read
, you can use the help
command to get more information about
them. This is way easier than reading the 46k word man page for bash
which also incidentally contains that information.
ℱancy help
You can just specify a part of the word you want help with. If it is unambiguous it will work.
help he # Help on "help"
help de # Help on "declare"
shutdown
Sometimes you have a shell and you don’t have a GUI thing and you want
to turn the system off in a polite way. The answer is to use the
shutdown
command. I do it like this.
sudo shutdown -h now
Or if you want it to shutdown in 5 minutes, do it like this.
sudo shutdown -h 5
The -h
means "halt" as opposed to -r
which means "reboot" (it will
turn off, but then come back to life). You can also just use the
commands halt
or reboot
. I tend not to do that because they are
actually aliases for the systemctl
command, not that there’s
anything wrong with that. There are many ways to do the job. I think
it’s traditional for init 0
to also shut a system down, but I’d save
that until you know what you’re doing and why you’d choose that
option.
passwd
I have set up Unix accounts for users who only surf the web. The one
command line program they did need to use, as I stood there coaching
them, is the passwd
program so that they could change their
password. Simply type passwd
and it will ask you for your existing
password. (This prevents jokers from messing with you by changing your
password while you’re away from your desk.)
Here’s a thing that I find weird: normal people find it weird that when they type their password nothing happens. People have been so conditioned by GUI form boxes that if there are not dots showing up, they easily get confused. I’ve been amazed by this at least a dozen times and that’s dealing with PhD level researchers. So… when you type your private passwords that no one else is supposed to see they will not show up on the screen. Got it? Ok.
The next face palm opportunity for the person administering your account is what constitutes a smart password. We’re all familiar with password strength meters on web sites, etc. Linux also does some quick checking to see if your password is idiotic. I don’t know all the rules, but here’s a rough idea based on my experience.
-
Don’t use your own username in a password.
-
Don’t use a word sitting in the system’s spelling dictionary.
-
Don’t make it very short (less than 7 characters).
-
Don’t repeat characters or patterns too much.
-
Don’t use keyboard patterns (QWERTY, etc.).
I’ve had people sit down to type their password into a system that I and their colleagues are all counting on them to keep secure with a decent password and they get rejection after rejection. Don’t let that be you! I’m talking to you ann123!
Note that if you want a terrible password, Linux is cool with that,
but you must run it as root. So maybe like this for user lax
.
sudo passwd lax
Note that it won’t ask you for your current password as root because it assumes that if you’re root, it can’t stop you from doing what you want anyway.
true / false
Some of the strangest commands in Unix don’t do anything at all. The
true
command just immediately exits and emits an exit code saying
that the run went fine. The false
command does almost the same
thing, but its exit code seems to complain that something wasn’t
right (but that’s what you asked for!). This is useful for scripting
more often than you’d expect. It’s like Python’s pass
command which
also does nothing — it allows you to fill blanks that need filling.
By the way, if you’re a normal person you probably don’t need to know about exit codes at all and you can use all of these nothing commands interchangeably.
One interesting use of these commands is to create a new blank file.
true > mynewblankfile
This will create an empty file called mynewblankfile
. This is
because true produces no output and directing that to a file (with
>
) is filling that file with the nothing. Use some caution because
if the file already existed and had something important in it, it will
be gone. In other words this technique is especially good at blanking
an existing file.
A similar thing is the shell builtin :
. It is basically a shell
level version of true
. You can even make empty files (or make files
empty) with it the same way as with true
. And if that’s too much,
you actually don’t need any command really. Just >mynewblankfile
and
nothing else will do the same.
I like to use :
to take random notes. Need to jot down a phone
number and you just have a Unix terminal? Type : 6191234567
and
nothing bad will happen. Don’t get too poetic doing that however,
since the shell still tries to make sense of what you typed. If you’ve
used complex syntax, you can invoke an error. A safer similar way to
do the same thing is to use the shell comment character like this.
$ # Most things are 'ok' to type here with no side effects.
Or if you like hard to remember Bash tricks, you can type something and then after the fact press "escape" then "#" for the same effect.
ℱancy : Sometimes you want some command that normally produces output to not produce output. The traditional way people do this is by redirecting the output to the "null" device which gobbles it up and that’s that. It looks like this in practice.
$ ls > /dev/null
In that command, no output is produced. There is another technique
which I find easier to type — pipe your output to true
or :
.
$ ls|:
The colon command accepts all the output of the ls
command,
does nothing with it, and both processes end quietly. This works with
true
and false
too.
time
The time
command does not tell the time for you. If you want that,
see the date
command. The time
command is a verb and it times
durations of programs for you. This is very useful when quantifying
performance issues.
Let’s see how long it takes to look up the IP addresses of two domains.
$ time host -t A xed.ch
xed.ch has address 66.39.97.213
real 0m0.017s
user 0m0.012s
sys 0m0.000s
$ time host -t A google.com
google.com has address 172.217.3.110
real 0m0.028s
user 0m0.016s
sys 0m0.000s
Why is Google slower? Who knows. But is Google slower? Today the answer is yes.
One neat trick I really like is to use the time
like a stopwatch.
Type this on the command line and press enter when you’re ready
to start.
$ time read
It will then sit there waiting for you to create some input (as
described in help read
). Simply type enter as your only input when
you’re done timing. You’ll get a nice accurate report on the duration
of the interval between you pressing enter the first time and the
second.
ℱancy time
The "real", "user", and "sys" values get into some hairy internal details. Basically, the "real" is the wall clock time or how long you needed to wait from start to finish. This can be affected by what else is going on with your system and how long that process had to wait in any queues. The sum of the "sys" and "user" times reflects how much CPU time was actually used by this process (in kernel system mode and user mode). Here is all you ever wanted to know about that.
echo
The echo
command is another shell built-in. This means it’s part of
your shell rather than a stand-alone executable. Fortunately its
normal use is very easy. It is like a "print" statement in other
languages and it simply takes the things you specify — called the
arguments — and it sends them back out to you. It echoes them.
$ echo "Repeat this back to me."
Repeat this back to me.
Since the argument gets expanded you can do helpful things like this too.
$ echo /tmp/*jpg
/tmp/forest.jpg /tmp/flower.jpg
Or you can check the value of variables.
$ echo "User $USER uses this editor: $EDITOR"
User xed uses this editor: /usr/bin/vim
ℱancy echo
Remember just a few sentences ago when I said that it was not a stand-alone executable? Well, that’s not exactly true. On many Linux systems, not only will echo be a (very important) shell built-in command, but there will also be a stand-alone C program that does the same thing. Why? Good question. Probably some subtle performance benefits. For example, here’s Bash doing a million echo commands.
time seq 1000000 | while read N; do echo $N; done
...
real 0m16.858s
And here’s the C stand-alone.
time seq 1000000 | xargs /usr/bin/echo '{}' \;
...
real 0m0.959s
This shows that the shell is powerful but it’s not optimized for
performance. In real scripts, if you’re dumping something on the screen
it’s probably fine to always use the shell built-in. For more details
about these very similar commands see Bash’s help echo
or the
stand-alone’s man echo
.
One little trick you can do with both versions of the command is the
-n
option which suppresses the new line. This is handy for things
where you want to consolidate the output a bit. Here is a nice demo of
dots being printed, perhaps to indicate some subprocess is being
completed.
$ for N in {1..50}; do echo -n . ; sleep .2; done ; echo
..................................................
cal
I include cal
near the top of my list of commands to learn
because it is so innocuous, obvious, and useful. I pretty much always
use it to generate sample material to use in examples when I’m giving
demonstrations. However, it is genuinely useful!
I use the -3
option a lot because I’m often curious about what day
of the week a date next month lies on. So cal
is easy to use even if
calendars are inherently complicated. For example here is how to get a
calendar of September 1752.
$ cal 9 1752
September 1752
Su Mo Tu We Th Fr Sa
1 2 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
The reason it looks broken is because the cal
command is not.
Enjoy
learning about that!
ℱancy cal
It seems that modern versions of cal have a bug. If you do a
which cal, you’ll find that it’s really a
link to ncal
. When run as cal, it makes the normal calendar but it
also tries to highlight the current day. Which can be annoying
(directing it into Vim or a cut and paste style operation). In theory
the escape codes for the highlighting are supposed to be automatically
suppressed when piped, but that is not always effective I have found.
Anyway, the fix is easy enough. Use this.
ncal -hb
The bug which is interesting is that the -h
suppresses the
highlighting for the normal ncal style of calendar. But when you just
try cal -h
, you get an undocumented feature of the help page. So,
ya,
that’s
not right. Still easy to work around. I’m sure they’ll sort this out
soon. Normal classic cal
from ancient times is still a simple cal
and 99.9% of the time it does what you want anyway.
date
The date
command can be easy. Type it and you get the current date
and the current time. Easy and handy! Besides being good for a quick
time check, it can also be good to append its output to log files.
ℱancy date
The date
command seems easy, but there is immense depth in this
command. First of all, some very OCD people made sure it is as
accurate as can be. You can have it produce dates that are a certain
time from now (or any other time you specify) in the past or future.
And the formatting of the resulting date is extremely flexible. Whatever you need labeled with a date, this command can make it correct.
less / more
The less
command is a true workhorse. It is a "pager" which is a
program that shows you text one page at a time. If, for example, you
have a list of all the hockey games Tim Horton played in, that will
most likely not fit on one screen. By using the less
command you can
scroll through the list as you need to.
You can use something like this.
$ less timhortons
And it will show you the first screenful of games from your file. Hit space and it will go on to the second page. Repeat until happy.
One thing helpful to know about less
that will help it stick in your
memory is that it is actually the second generation of pager. The
classic ancient pager was called by the sensible name more
as in
"show me more every time I hit the space bar". The deficiency of
more
(which is still included for old school people; try it out) is
that if you pass up your region of interest, too bad. And with more
,
at the end of the document, it exits.
With less
however, it just stops at the bottom of the document and
waits for you to further navigate arbitrarily. To go to the top of the
file, press "g" to go to the end "G". I think page up and page down
work but I use space and "b" for "back". I also heavily use "j" and
"k" (vi keys) to move up and down one line at a time.
Importantly, to quit using less, press "q". (Just like when using
man which usually is configured to actually use less
.)
ℱancy less
The less
command is so powerful that it often does more than you’d
like. If you send it an HTML file, some configurations of less
can
filter the HTML into readable text. If this is not what you
want you can pipe the file to less -
where the dash indicates you
want to page the standard input. Without a hint of what kind of file
it is, it won’t do anything overly ambitious with it.
I like to dress up console output with control characters to make
things have nice colors. To preserve this in less use the -r
(raw)
option. Easy!
top / htop
A very common useful task for a Unix command shell is to find out what
the hell is going wrong on the system. If the performance has come
grinding to a crawl, you can find out what is responsible. The top
command is universally available on all good operating systems and
shows an ongoing display of the "top" processes as measured by CPU
resource use. The ones near the eponymous top gobbling up 99% of your
CPU will likely be the ones you want to attend to.
The top
command is fine, but as with more to less
there is an improved version called htop
. That is a nicer version in
many ways and the functionality is more powerful too. If you have it,
use it.
For people who really like to know what’s going on, considering
installing the dstat
package. I like to SSH into a machine I’m
playing a game on (from another computer) and then run dstat
. This
shows me in real time while I play/experiment how much network
traffic, disk activity, memory, and CPU usage is being consumed. This
is a great trick to get at the source of problems.
ℱancy top
Sometimes you don’t need an interactive display. Maybe you’re creating a log of activity and you want to record the top processes periodically. Maybe you just want to check CPU business before a job runs to make sure no one else is doing anything. Unfortunately top’s non-interactive mode (called the "batch" mode) is kind of sucky. But, hey, this is unix, we can deal with it.
This shows top’s very informative header and the top 3 processes.
$ top -bn1 | head -n10
If you really just need the top process and that’s it, you can use something like this.
$ top -bn1 | sed -n 7,8p # With header labeling everything.
$ top -bn1 | sed -n 8p # Just the top process' line.
Or if you need only the top CPU percent field.
$ top -bn1 -o %CPU | sed -n '8s@ *@\t@gp' | cut -f10
This is good for scripts and can be especially handy to see if something is running at 99% or 100%, i.e. very busy. You can become even better informed by having just the mean and standard deviation of the top 4 processes displayed.
$ top -bn1 -o %CPU | sed -n '8,11s@ *@\t@gp' | cut -f10 | \
awk '{X+=$1;Y+=$1^2}END{print X/NR, sqrt(Y/NR-(X/NR)^2)}'
This is useful on a 4 core machine to see if all of them are being used. As you can see, you have complete control of the infinite possibilities.
ps
The ps
command shows a list of your processes. It is a venerable
Unix command that professionals need to know about. But normal people
can stick with htop
pretty much always. I’ve never ever found the
ps
command useful without at least some "options". Mostly I use ps
-ef
but there are other arcane styles too.
It’s mostly handy for processes you do not need or want which are
running but quietly (escaping the notice of top
by being near the
"bottom"). For example check this out
$ ps -ef | grep Mode[m]
root 500 1 0 14:39 ? 00:00:00 /usr/sbin/ModemManager
I’m using the ps
command to find out if I have something called
"ModemManager" running. Yup. I do. Wow, even though it doesn’t take
much for resources, I am pretty sure I don’t have a SIM modem on board
and I’m even more sure I’m not going to get teleported to 1995.
pstree
Another way to view the processes. This highlights the relationships between parent and child processes by diagramming them as a tree. See also column — which can make this kind of tree diagram out of tables — for another way to achieve this effect.
kill
So you’ve found a process that needs to die. In the previous section, I
found ModemManager
running and found its process ID is 500. This
should kill it.
$ kill 500
ℱancy kill
Some processes ignore the default kill signal (SIGTERM, i.e. terminate). That’s super rude in my opinion and the next level of retribution is this.
$ kill -9 500
$ kill -SIGKILL 500
These are the same thing but one may be more explanatory if it’s in a
script. The -9
is the traditional common way to specify SIGKILL and
is a bit more emphatic than plain kill
.
If it’s not your process, you might need a sudo
. (And, of course,
next time you reboot, it will probably come back. But that’s
another topic.)
pwd
This simply Prints the Working Directory. I actually remember it as "present working directory" for some reason. I don’t use this command much because I have this reported in my shell prompt so I’m always seeing it on every command. If you are in a situation where that’s not the case, it can be useful.
ℱancy pwd
The shell usually has an environment variable that contains this
directory. You can use or check it in scripts, but here’s how to just
emulate the pwd
command.
$ echo $PWD
cd
This command Changes the current Directory to whatever you specify. For beginners the reasonable question is, what does it mean to be "in" a directory? Basically the shell just keeps track of some directory that we can pretend you’re "in" as if each node on the directory tree were a room. The end use of this is that you can specify actions with ambiguous locations and the system can assume you must be talking about the directory you are "in". It’s a decent system, but as you become more proficient, it’s good to start appreciating that it’s all kind of fake really.
ℱancy cd
Imagine you’re working on some project. Say in this directory.
[~]$ cd X/myspecialproject/
Then someone distracts you and wants you to acquire a bunch of files
to look at locally on your machine. You don’t want to muck up the
pristine beauty of myspecialproject
and you don’t want to keep these
files once you’ve taken a look at them; /tmp
is a good choice.
If you are only going to jump over to one distracting directory and
come right back, you can use the special cd
argument -
. That looks
like this.
[~/X/myspecialproject]$ cd /tmp
[/tmp]$ cd -
/home/xed/X/myspecialproject
[~/X/myspecialproject]$ echo "Returned to the previous directory."
However, if you are going to potentially do many things in many directories, simply preserving your last used directory is not sufficient. A Bash feature I often forget about but which can be very handy is the directory stack. This allows you to save directory paths on a stack. This sounds kind of overly technical but follow along with the typical use case and you can see how easy and useful it is — if you can remember to use it!
Instead of using the normal cd /tmp
to change to the /tmp
directory, do this.
[~/X/myspecialproject]$ pushd /tmp
/tmp ~/X/myspecialproject
[/tmp]$ mkdir -p a/b/c && cd a/b/c && touch the_distraction
Now you can do the distracting thing with those messy files in the
/tmp
directory (or any directory or sequence of directories) and
when you are done and ready to resume where you left off, you simply
do this.
[/tmp]$ popd
~/X/myspecialproject
[~/X/myspecialproject]$ echo "Back to myspecialproject directory"
In summary, if you want to cd
with the option of returning back to
your current directory, use -
to return from very simple diversions
and for complex diversions, bookmark your current directory by using
pushd
when changing to the distraction directory.
mkdir / rmdir
Like living things, computer science loves its trees! My understanding
of filesystem design is that this tree aspect of how your files are
organized is for your benefit only. Deeper down, they’re not really in
the trees you made up. But again, it’s an ok system and allows you to
pretty well keep things kind of organized. (To give you an idea of how
else it could be done, I believe that file organization interfaces
should be in Cartesian 3d space which our minds are naturally
optimized for, not tree topologies which computer science nerds can
eventually force us to get used to.) Anyway, if you need another
branch in your organizational scheme, create it with mkdir
. This
will make a directory.
$ mkdir cats
$ mkdir cats/tigers
Actually, it will specifically make a subdirectory since the top level (the trunk of the tree) should already be established when the system starts.
ℱancy mkdir
If one of the intermediate directories is not present, you can use the
-p
(for "parent") option and have it create them automatically.
$ mkdir -p mammals/placental/feline/tigers/bengal/cincinnati
The rm
in rmdir
is for "remove" and it does what you might expect
and removes a directory. The catch is that the directory must be
empty. This prevents serious catastrophes where you lose a lot of
important directory contents. If you really want to delete a directory
and contents there are ways to do that, but rmdir
is not one of
them. (See the very dangerous rm -r
for that.)
ls
You’re trying to remove a directory and it won’t let you because it is
not empty, how do you see what is there? Perhaps the most fundamental
and common Unix command is ls
which shows you the files a directory
contains. It can also list a particular file. That may seem pointless
at first glance — why would you do something like this?
$ ls myfile
myfile
Nothing you didn’t already know, right? But you can use "globbing" patterns which is a bit more interesting.
$ ls myfile*
myfile1 myfile2 myfile3
ℱancy ls
The ls
command is so critical to my day to day existence that I
personally must have an alias for it that optimizes it for my needs.
The alias I use is this.
alias v='/bin/ls -laFk --human-readable --color=auto'
Let’s break that down and look at each option.
-
-l
Produces a "long" listing. This also includes owner, group, permissions, and timestamp. I always want to see this. -
-a
Shows all the files. Why would it not? Well, I guess it’s considered a "feature" but files that begin with a period are often hidden in Unix operations. So something like/home/xed/.bashrc
(where I define this alias) will not show up when I runls /home/xed/
— this drives me crazy! So the-a
cures that and stops hiding the truth from me. That to me is the most Windows-like "feature" of Unix — and therefore obviously super annoying! -
-F
This is also called--classify
and it will put a little symbol on the end of the file to indicate what kind of file it is (directory? device? named pipe?). I like it and it reduces confusion for me, but for others, maybe it wouldn’t. -
-k
The default ofls
is to show sizes in "blocks" which to me is damn near useless. This makes it show kilobytes which is more in line with what I can understand. -
--human-readable
Not just kB but I also want it to convert to GB if that makes sense. I’d leave this off only if I’m usingls
output to do some kind of calculation. And I don’t do that often. -
--color=auto
Colorizes the output if possible. A nice feature.
Remember, if you have a fancy alias like this, you can always just
type ls
to get default behavior.
(Why the "v"? Well, in ancient days,
Slackware used to come with
that defined and I got used to it and now can’t live without it. For
modern humans, I’d recommend using ll
as an alias since that’s very
commonly defined for modern distributions. Type alias ll
to see if
it is defined already on your system and to what if so.)
touch
I mentioned that the true
command can be used to create new and empty
files. The first thing most people who need this think to do is use
the touch
command. It will definitely do that job and that is that.
touch mynewfile
ℱancy touch
But to really appreciate touch
you need to understand why it is
named the way it is. In complicated build systems, it is common for
some component B to depend on component A. If A is modified, B must be
rebuilt. If neither A nor B has changed then you can mostly get away
with leaving them alone. But sometimes, for strange but strangely
common reasons, you want to pretend like A did get modified so that B
definitely does get rebuilt. The way systems like this figure it is by
looking at the timestamps. If B is newer than A, fine. But if A
becomes newer than B, then B needs the rebuild to catch up. The idea
is that we "touch" A but don’t really do anything to it. This just
means that its timestamp is now, which is presumably later than B’s
and B will be lined up for a rebuild.
That’s the historical proper use of that command, but it comes in
handy for any kind of hanky panky with timestamps. I’ve used it, for
example, when I’ve messed up my timestamps on photos I copied (i.e.
timestamps changed from when the photo was taken to the inappropriate
"now" of when I’m looking through them). If you want to tamper with
evidence in the filesystem’s metadata, the touch
command is
critical!
truncate
The touch command is very handy but where I find it lacking
is when I need a test file that is not empty. Sure there are other
ways to do this but if you want to create a new file that is filled
with zeros, you can use the truncate
command with the -s
size
option. If the file does not exist, a new file will be created with
the specified size. It will be filled with zeros.
You can specify the size with units to make things easier.
truncate -s 10K 10240_zeros.bin
truncate -s 10MB 1000000_zeros.bin
ℱancy truncate
Note that if the file does exist then the truncate
command does what
its name implies and it truncates it to the specified size (again with
-s
). This is not usually a helpful operation except when doing
exotic stuff like preparing raw disk images.
chown / chgrp
If you’ve traditionally used bad operating systems, you may not have any experience with the concept of file "ownership". In Unix, files have "owners". They also have "groups". This allows the owner to limit who can do things (see chmod) with the files. In some cases the group property can allow multiple people to do things with the file but exclude non-members.
It is very common in modern Linux systems for every user to have their
own group. So if I am user xed
, it is common for there to exist a
group xed
. This is simple and obscures the point of groups. In fancy
systems that may be different. Here I’m showing the id
command for
my user account on a high-quality commercial web
host.
:-> [www666.pair.com][~]$ id
uid=71849(xed) gid=1000(users) groups=1000(users)
I am user 71849 but I am in the group 1000, "users". (And only that
group.) This means that I have proper control over files whose
ownership property is set to 71849. If I want to change the ownership
I could try to chown
command. It probably won’t work because if I
don’t already own the files, they’re probably off-limits. And if I do,
the target owner property that is not me is probably also off-limits.
In practice, this command is most always used with sudo
. A good use
case is when I copy files from a system like the one shown and those
files are owned by xed(71849). But on my systems, I like to be be
xed(11111). Since 71849 won’t even be in my /etc/passwd
list of
valid users on my system this command might be useful.
sudo chown xed:xed file_from_webhost
Note the syntax of xed:xed
changes the user to the xed
user and
the group to the xed
group (from pair’s group 1000 which is wrong).
You can leave off the :xed
if the group is already fine. Now instead
of being owned by a nonexistent user, it is owned by my real local
account and I can now access it properly.
As you can see, I can change the group with the chown
file, but if
for some very strange reason you need to change only the group, you
can use chgrp
. In practice, I have almost never used this thanks to
chown
being usually sufficient.
ℱancy chown
The mischief ownership problems can cause! Most users can just stick with simple organizational strategies but if you need to dig deeper, check out my Unix permissions notes which take a deep look into some of the details of topic.
Speaking of mischief, all of these "ch" commands take an option -R
which means "recursive". If you specify a directory in the argument
list, it will open that directory and change all of the contents.
Recursively. This is very useful for making major changes to big
archives but obviously it’s very easy to cause subtle problems with
your file metadata. Subtle problems and complete hosings!
chmod
Focusing even more on permissions is the chmod
command.
People typically pronounce the "mod" part as they do in the word
"modify". I’ve heard "chuh-mod", "change-mod", and I personally say
"see-aych-mod". However the command really is there to change the
file mode bits. Most people think that this is about permissions
(true) and only permissions (not true).
To be extra confusing for beginners there are two totally different syntaxes that can be used to specify a file’s "mode". The way I like to do things is the "advanced" way but I think of it as the simple way because I’m used to it now. It might be better said that there is a computer friendly way and a human friendly way. I like the computer friendly way.
For normal people doing normal things there is a pretty limited palette of what you’d ever need to do with this. Let’s look at some of these. The commands are followed by explanatory comments after the # (i.e. not part of the required command).
chmod 644 myfile # Normal file read/write by owner, read by all
chmod 600 myfile # Normal file read/write by owner, locked to all others
chmod 755 myprogram # Program read/write/execute by owner, read/execute by all
chmod 700 myprogram # Program read/write/execute by owner, locked to all others
The only other quirky thing is that programs that need execute permission have the same "mode" as directories which need access permission.
chmod 755 mydir # Directory read/write/access by owner, read/access by all
chmod 700 mydir # Directory read/write/access by owner, locked to all others
If you can memorize that (or look it up here) that’s 99% of your
chmod
chores.
ℱancy chmod
If you’re a computer science person you should appreciate that these strange numbers (644,700, etc) are 3 digit octal numbers requiring 3 bits each. Each of the digits represent a class of user: 1st, the user herself, 2nd, the group members, and 3rd, everyone else. And each of the three bits in each of the digits is a switch controlling 1:executable, 2:writable, and 4:readable. Doing the math on the 644 example we see that the owner has 4(readable) plus 2(writable) and the group members and everyone else has just 4(readable).
That’s how it really works and if you’re a computer science major you should embrace this as a nice example of binary encoding — but if you are not, you can safely ignore this.
I won’t even go into the other scheme for specifying permissions. I know it but I hardly ever use it. There are some very weird cases where it does seem necessary when changing tons of files selectively. But deep down, all files have octal modes as described.
Ok, that’s not even true. I said they were 3 digit octal numbers, but the full mode is 4 digits. If you’re interested in that see my Unix permissions notes which covers a lot of very strange stuff.
rm
Got too much stuff? (You can check on that with df.) The rm
command will get rid of some (or all!) of it. Of course this is
exactly like saying that a chainsaw will get rid of some trees or a
stick of dynamite will get rid of some rocks or a scalpel will get rid
of a tumor, etc. Sure it will, if used correctly. If you bungle it
however,… Wait, that’s not right… When you bungle it! When you
bungle it, you want to be relieved that you have in place a good
system of backups. Seriously. Come up with and adhere to a good system
of backups! Do it.
Ok, your real stuff is backed up. You’re on some throwaway system. You’re a malicious psychopath who is actively trying to destroy someone’s life. How do Unix people get rid of files? Like this.
rm moribundfile
That’s it. It’s gone. If you’re nervous you can try this.
$ rm -iv moribundfile
rm: remove regular empty file 'moribundfile'? y
removed 'moribundfile'
The -v
, as with many Unix commands, stands for "verbose" which is
why it reported that the file is "removed". (Normally it kills
silently). The -i
is for "interactive" causing it to pause and ask
you to think first and then confirm your reckless ambitions. I find
this is kind of pointless and I don’t use it. To me it is as useless
as saying, "Are you sure? Are you sure you’re sure? Are you sure
you’re sure you’re sure?" until it gets annoying. But hey, create your
own illusion of safety. Many systems come with the rm
(and others)
aliased to rm -i
to force the issue. Again, I find it not helpful.
If you come from a computer environment designed for normal people, I
have an important fact to tell you about: Not only do files not go
in the "trash can" when using the rm
command, there is no such
thing! That "garbage can" or "recycle bin" stuff is a ridiculous
fiction created by an
office supply company
which naturally saw the world through a bizarre office supply lens.
If for some bad reason you think the "trash" is a legitimate and useful innovation, simply create a directory called "Trash" and mv your files to it. Done. That’s all that is going on with desktop "trashes". There are two massive problems with that approach of course. One, sometimes you need to free up space because your drive is full and pretending to delete data but not really doing it is absurd. And two, sometimes some files really, really, really need to be deleted (…or hey, let’s give that malware another chance!). A bonus problem is bungling the "emptying" of the trash and not even reaping the putative benefits of it.
Folks, let me stress it again: Have! Good! Backups! There is no substitute. Don’t fool yourself into false security and performance problems with a pointless illusory "trashcan".
If you want some kind of crutch to keep you from deleting things you
didn’t want to delete, a far better system (in addition to sensible
backups) is to always get in the habit of checking your rm
arguments
with ls
.
Imagine I had the notion to delete these parasitic files, i.e. all the ones in this directory.
rm .cache/mozilla/firefox/h1337c28.default/cache2/entries/*
It is very, very reasonable to run this command first.
ls .cache/mozilla/firefox/h1337c28.default/cache2/entries/*
If that looks good, pull the trigger with rm
.
ℱancy rm
You can pick off one file after another like a sniper as shown above. But sometimes you need a much bigger calamity to strike. Note that things can go from "oops" to "you’re fired!" very quickly when you start applying Unix’s power to getting rid of stuff.
If you need to get rid of an entire tree, including sub trees and sub
sub trees, etc., you need to use the very dangerous -r
"recursive"
option. This will get rid of the entire mozilla
tree.
rm -r .cache/mozilla/
Use -v
if you want to see what files it’s finding and purging. That
sounds safe, but with big deletions, that can get tedious. Having all
the millions of files of a big system you’re trying to clear go
scrolling by can slow things down a lot. Sometimes I start it with
-v
, check that it’s deleting what I intended, press Ctrl-C to
interrupt it, and then restart it silently. Use good judgement there.
By the way, if you’re cool enough to be reading this, you’re warmly invited to be my "friend" on Steam where I am known as rmdashrstar. Now you know why.
cp
The simple explanation is that the cp
command copies files. The
details can get somewhat complex but usually it’s as simple as doing
this.
cp myfile myduplicatefile
Now you have two files with identical contents (see md5sum to prove it).
As with other commands that can make a mess of your file system, it is
good to be very careful and to use the -v
verbose flag.
I think the most common use I have for cp
is when I’m about to mess
with some configuration. Here’s an example of what I mean.
sudo cp -v /etc/ssh/sshd_config /etc/ssh/sshd_config.orig
I’m basically preserving a copy of this important (SSH server) configuration file so that if the edit I’m planning goes awry, I can restore it to its original condition (e.g. with mv).
ℱancy cp
In fact, making a backup of files I’m about to mess with is about all
I ever do with cp
. This may seem strange to people who might
naturally assume that this command is one of the most important
cornerstones of Unix. Maybe for some people it is, but I actually
don’t use it much and here is why.
First of all, the whole idea of copying implies a duplication of
resource requirements. So right out of the gate, it is almost an
exemplar of inefficiency. You might say, what about backups?
Indeed, it is definitely not efficient to lose all your work to media
failure (or administration clumsiness). But for making backups I tend
to always use the more serious tool, rsync. Always.
After all, I’m just as likely to back up to a completely separate
machine and cp
just can’t keep up.
If it’s an intermediate scale between entire archives and single
configuration files, I’m often keeping things backed up and organized
with version control (hg, cvs, git). This leaves few jobs that need to
be done with cp
. When I reflect on what I’ve used cp
for I’m kind
of embarrassed at how clumsy the entire premises of those operations
were.
mv
What do you get when you combine the cp command and the
rm command? You get the mv
command! As far as I know it is
largely superfluous. You could copy the file and then remove the
original to create the same effect as mv
. But mv
is intuitive and
simple when that’s what you want to do.
The move operation is also pretty much the same thing as "rename". The syntax is simple enough for simple things.
mv oldname newname
Or relocating with the same name. This will put the oldname
file
into the directory oldfiles
.
mv oldname /home/xed/oldfiles/
It can be good to use the -v
(verbose) flag to output a report of what
got moved.
mv -v oldname newname
ℱancy mv
Normally the mv
command is lighting fast. After all, it’s just
relabeling things really. But sometimes those ones and zeros in a file
do need to actually move! If you’re changing which filesystem a file
lives on, it needs to actually go and occupy new disk space. In these
cases, it’s often better for large operations to use cp or even
rsync.
Another quirk of mv
is that it can move multiple files into a single
directory. In that case the directory must be at the end of the
argument list.
mv file1 file2 file3 dir4files/
Sometimes if you make a mistake and do something like mv *jpg
it
will complain that the last jpg
file it finds is not a directory and
the other files can’t move to it. A much sadder case is when it is
coincidentally a directory and you accidentally muddle everything up.
Been there; done that.
ln
In bad operating systems you can make "shortcuts" or "aliases" that look like files but point elsewhere. The proper name for such a thing is a "link". For most everybody most always, the link is a "symbolic link" or "symlink". To create a symlink the proper way use this syntax.
ln -s /tmp /home/xed/Downloads
What that example does is it creates a symlink called "Downloads" in
my home directory. This is not a file or a directory, it is a symlink.
However, it points to a directory, the universal /tmp
directory so
this symlink acts like a directory. What this does for me practically
is that when a garish clumsy program like Chrome downloads things into
the "Downloads" directory, heh heh, well, it goes to the /tmp
directory — and the next time I reboot my computer, all that cruft is
deleted.
My helpful way to remember the order of the arguments for ln -s
is:
the real thing comes first (/tmp
) and the fake thing comes second
(the symlink, Downloads
).
ℱancy ln
Note that the -s
is for "symbolic". It would seem there is another
kind and there is. The default type of link that the plain ln
command produces is called a "hard" link. What’s different about that
is that it is not different from normal files. Your filesystem will
see it as not just a file like the one it’s linked to but as the
actual file itself. In other words, you will have multiple names (and
complete file records) for the same exact blob of ones and zeros on
the filesystem. If you change the hard link you’re changing the
target too. Isn’t that what happens with symlinks though? Yes, but the
difference is that if you delete the source of the hard link, the
linked file will carry on as a regular file. If a symlink’s referent
is deleted, well, now it just causes errors (e.g. "No such file or
directory").
Normally you should stay away from hard links without a very good reason. And obviously you should stay away from circular symlink redirection cycles.
rsync
wget
This stands for "Web Get" and it gets things from the web. In reality it is a tiny web browser that just acquires the web resource and leaves you to look at it any way you like. Super useful. My wget notes.
Note that if you are a Mac user, you should have a look at the man
page for curl
which is a very similar program that is installed by
default on Macs.
grep
The grep
command is one of Unix’s most famous. Among serious
computer professionals the
word "grep" is
used in conversation as a verb that means to extract specific
information from a larger source. Its use can be very complex, but
mostly it’s quite simple conceptually and practically. Does a file
named "myfile" contain the phrase "searchterm"? Find out with this.
$ grep searchterm myfile
Here’s a simple useful example that I like. If you look at the Linux
file /proc/cpuinfo
you get a big dumping of stuff. But if you narrow
what you’re interested in with grep
, you get this.
$ grep processor /proc/cpuinfo
processor : 0
processor : 1
That would indicate to me that I’m on a 2 core machine. Let’s try a
different approach on a different machine. Combining grep
with other
tools shows this.
$ grep processor /proc/cpuinfo | wc -l
8
This one is an eight core machine. Now I have a good way to check how
many cores a machine has. Once you start using grep
for various
things, you’ll start to realize how powerful it is.
A neat trick you can do with grep is solve crossword puzzles. All you need besides grep is a list of all the words in your language. Since computers can check your spelling these days, this is usually already present. Using the normal Linux spelling word list here is how I can find a six letter word that starts with "c" and ends with "xed".
$ grep '^c..xed$' /usr/share/dict/words
coaxed
There are tons of options to grep
but there are two that I use way
more than the others. The first is -i
which makes the searching case
insensitive. That way you’ll find "PROCESSOR" and "Processor" as well
as "processor" (if they’re there). This is very useful when scanning
natural language texts.
The other very useful option is -v
which inVerts the sense of the
search. That is, it shows you all lines that do not contain the
search phrase. One example is if you’re trying to do some further
processing of another command which has a descriptive header. For
example, the df disk free command prints this at the top of its
output.
Filesystem 1K-blocks Used Available Use% Mounted on
That’s nice for standalone use, but in a script, I may be able to take that for granted and I want all the lines but this line that I know will always contain "Filesystem".
df | grep -v Filesystem
That does it. But since it’s always the first line maybe you could have just used tail.
df | tail -n +2
Fair enough. But what about if you want to exclude all the entries for
tmpfs
(I have 6)? This will cut off the header and exclude the
tmpfs
entries.
df | tail -n +2 | grep -v tmpfs
ℱancy grep
It turns out that there are usually many "grep" commands you can run on a typical system.
$ which grep egrep fgrep rgrep
/bin/grep
/bin/egrep
/bin/fgrep
/usr/bin/rgrep
These are mostly aliases (actually links) which automatically invoke
particular options. Most of the subtlety has to do with "regular
expressions". The search term that grep
uses is actually a very
powerful syntax called regular expressions. This ancient syntax is
easy to use in its simplest cases but can get very hairy quickly. If
you’re interested in learning more, you can check out my
2003 Regular Expression
Tutorial. Maybe I’ll update that some day.
Regular expressions are so fundamental to grep
that the name "grep"
itself stands for "global regular expression parser".
find
The find
command is not grep. It does not find things in
files. What it does is finds files in the
filesystem tree. This may
not seem useful if you only have a couple of files. "Which pocket are
my keys in?" is not a difficult question. "Where in my house are my
lost keys?" could be. When your filesystem becomes a haystack, find
can quickly locate needles.
One simple but useful use of find
is to just dump out all the files
in a tree’s branch. This is different from ls because it
operates recursively looking through all sub directories too. We can
use this to see just how massive the haystack is.
$ find / | wc -l
511624
On my system running this (as root) shows that my entire filesystem
tree has over a half million files. If you have misplaced one of them,
you can see how find
can be useful.
The normal way to use it is to specify a starting directory and a search criteria, usually a name. Here is an example I have used in the past when I’m trying to test speakers.
$ find /usr -iname "*wav"
I need a sound file, any sound file! I don’t care where it comes from.
I know from experience that there will be some giant package stored
somewhere below the /usr
node in the tree which will include some
sound files. Running this command informs me that there is a file
called
/usr/lib/libreoffice/share/gallery/sounds/strom.wav
(and 47 others),
just like the (iname
) pattern I specified asked for. I never would
have found that by casual file manager browsing. The "i" in iname
stands for case Insensitive so that file.WAV
will also be found. If
you don’t want that, just use -name
and then the pattern.
Remember, that the starting directory (the parents and ancestors of which you will not search) is specified first. Then comes the filter to narrow down what you’re looking for.
ℱancy find
Of course that simple usage is fine but the tip of the iceberg. The
find
command does a ton of other things. In fact, if it’s possible
to filter files by their location based on their metadata (timestamps,
ownership, etc) then it’s likely that the find
command can do it.
There is almost a full programming language of find
filter syntax to
make every thing plausible, possible.
My find notes have many more examples of find
including some more exotic situations.
mount / umount
To "mount" a disk means to have the operating system detect it and confirm that it’s organized in a compatible way and that it’s ready for business. That’s all. The important thing for beginners is to understand that it is possible for disks to be connected and not mounted. In bad operating systems it is often because they don’t understand how good operating systems organize their filesystems. But in good operating systems, it may be strategic to not mount a drive to ensure that it is left completely untouched. Mounting can also come with options so that a drive could be mounted, but only for reading for example. This allows you to do forensics without any chance of modifying or corrupting anything.
Another important thing to know is that in environments that normal
people use, the concept of "ejecting" a disk is really an unmounting
process. Unix can do this explicitly with a lot of very fine control
using the umount
command.
Other than that, modern systems mount things automagically and regular users can ignore the topic.
ℱancy mount
Professionals however should know how to do this explicitly. The general format is something like this.
sudo mount /dev/sdc2 /mnt/mybackupdrive
The first argument is the device and the second is the "mountpoint", that is, where in the tree will this new system graft on. I think of the mount command in the same way I think of the ln command: real thing first, fake thing second.
Another useful tip is to use UUIDs when possible to mount things. The
reason for this is that device names can change capriciously. Maybe
when you plug your device in you always get /dev/sdg
(I do). But
what if you buy another similar device and plug it in too. Now what?
Explicitly using the UUID removes any ambiguity and targets the
correct device with certainty. This is especially important and useful
with back up drives. Note that the UUID=
must be capitalized.
mount UUID=e52a8c89-c84e-4dab-b71f-68ecda5cc4ec /mnt/backup/
A common thing to mount is a USB drive or SD card that is used in some device like a camera. The camera will inevitably want to use a crappy VFAT file system. One of the ways it is crappy is that it doesn’t handle unix metadata such as ownerships and permissions very well. I’ve had success using an option to the mount command.
mount -v -t vfat -o uid=11111,gid=11111 /dev/sdg /mnt/sdg
One other tip is that when you’ve unmounted a volume with umount
there
still may be outstanding writes that need to finish up. To make sure
they are finished, issue the sync
command and wait for it to finish.
If you’ve done a umount
and then a sync
it is safe to remove the
drive (assuming it’s removable!).
stat
Files are ones and zeros on your disk, but the illusion of files
is stored by the filesystem that knows things about the file. This
can be very useful to query. The stat
command prints out all of a
file’s metadata that it can find. Here is an example.
$ stat /bin/true
File: /bin/true
Size: 31464 Blocks: 64 IO Block: 4096 regular file
Device: 811h/2065d Inode: 5767266 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2019-01-10 09:23:01.105907652 -0500
Modify: 2017-02-22 07:23:45.000000000 -0500
Change: 2019-01-08 13:25:53.607888592 -0500
Birth: -
Note this is especially good for finding out more about timestamps.
df
One day on the path to being a serious computer user, you will get a
"No space left on device" error. Thereafter you will have a heightened
sense of vigilance for that problem being possible. How do you check
to see how much of your Disk is Free to use? The df
command. By
itself, it shows you all the filesystems it knows about and how full
they are. If you specify a mount point or a device (or more than one)
it shows you only that. Here I’m checking the space used (sort of the
opposite of "free" isn’t it?) on my main top level filesystem (/
).
$ df /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb1 117716872 17804248 93889868 16% /
I don’t know about you but I can’t understand that because it’s
written in 1k "blocks" which is hard for me to think about. Adding the
-h
option, for human-readable, cleans things up nicely. For humans
anyway.
$ df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 113G 17G 90G 16% /
So I have a 113 gigabyte drive that I’m using 17 gigabytes of. It’s really a 128GB drive but remember that the filesystem sets aside some of that to organize the metadata for the half million or so files on a typical system.
If you’re on a Mac, the df
command reports in nearly useless
512 byte blocks. You can use the -h
human readable option just fine
but to get comprehensible detailed output use the -k
flag which will
use the 1k blocks like Linux.
ℱancy df
Sometimes you need to see filesystems that are hidden. Use the -a
option to really see All of them. Sometimes you use symlinks
to add some more space to your setup and it’s easy to get confused
about where the data really is being written. You can give df
a
path to a file or directory and it will show just the location of
where it truly resides.
du
The df command is great for big picture knowledge about the
entire disk but what if you want to know how much a particular branch
of your filesystem’s tree is taking? The du
command can add that up
to your specification. By itself, I find du
kind of useless. It will
show you the size (in hard to understand blocks) of each directory in the
current directory and below. That can get cumbersome. A better way is
to limit it with the -s
summary option and to use the human readable
option, -h
. Here’s what I generally do.
$ du -hs /home/xed/.mozilla
122M /home/xed/.mozilla
Here we can see that my stupid clumsy browser’s working directory is packed with 122MB of garbage (mostly cached web stuff I suppose).
I can also count up all the subdirectories this contains.
$ du /home/xed/.mozilla | wc -l
623
That’s 623 (including the top level one) which means that this directory is filled with a baroque maze that’s less of a tree and more like a bramble patch. When planning backups, this kind of targeted analysis can be handy.
ℱancy du
What if you want to see which directory trees are the big ones?
I use the --max-depth
option to only descend one level. Here are the
top 4 (tail -n4
) biggest directory trees in my home directory.
$ du --max-depth=1 /home/xed | sort -n | head -n-1 | tail -n4
47780 /home/xed/.config
125648 /home/xed/.mozilla
691912 /home/xed/Downloads
1905432 /home/xed/.cache
Note that they’re all related to browsers!
This technique can also be useful if you have a bunch of users and want to see who is hogging all the space.
free
Shows memory stats. As in "free memory". Use the -h
human readable
option so you don’t have to think as much about what it means. Memory
management on a Linux system is complex and baroque (though freakishly
effective) so don’t feel bad if it’s not all crystal clear. I don’t
exactly know everything it’s trying to tell me. But it’s a good quick
check of memory.
On Linux, you can also do cat /proc/meminfo
for similar information.
Probably Linux only. Not on FreeBSD or OSX.
lsblk
Shows block devices. Wonder what drives are on your system? Use this.
Wonder what device name that USB flash drive was given? Run lsblk
before inserting it and then run it again after and see which device
just showed up.
Not on FreeBSD. Probably Linux only.
ℱancy du
I use this so often that I have this very helpful alias defined.
alias lsb='lsblk -o NAME,SIZE,TYPE,FSTYPE,MOUNTPOINT,UUID'
This eliminates some of the LSBLK cruft like major and minor device numbers (that I have yet to ever care about) and replaces them with some interesting things like file system type and the UUID code which can uniquely and positively identify a device. Not 100% sure if you’re formatting that USB flash drive you just inserted or your main hard drive? Double check the UUIDs and you can’t easily choose the wrong one.
lsusb
Need to see a list of your USB devices? Good for finding out things like your USB hub is swallowing some or your peripherals.
Not on FreeBSD. Probably Linux only.
ℱancy lsusb
The main thing I do with this command is to see if the USB system is identifying a new USB device that I’ve just plugged in. Here is a smooth workflow for isolating just that.
$ lsusb | tee /tmp/usblist
$ diff /tmp/usblist <(lsusb)
Now you’ve conclusively isolated it and can be sure it’s present without hunting through the entire list. Also, don’t forget to check dmesg.
lspci
Need to see a list of your internal PCI peripherals? Even if it’s physically not on a card per se, this will show you what the PCI subsystem knows about. It’s most handy for figuring out which graphics adaptor chipset you have so you can target obtaining the correct driver.
It can also be useful for figuring out what other chipsets you have on various hardware (NIC, sound, etc).
Not on FreeBSD. Probably Linux only.
dmesg
Something bad happen at boot? Getting some strange hardware related
error? You can check the kernel’s "ring buffer" where it dumps
internal messages that it thinks you may want to know about. To do
this use the dmesg
command. On some systems, this is considered
privileged information and you’ll have to use sudo
.
sudo dmesg
Or if you want to know if some Linux thing is active you can grep for it. Here are some examples.
sudo dmesg | grep -i nvidia
sudo dmesg | grep -i ipv6
This checks to see if the Nvidia driver started ok. And the other checks to see if IPv6 is active.
What the kernel tells you is a bit arcane but it’s still good to know how to check it and do some web searching for any problems that you find reported that concern you.
The kernel has other ways of providing you information. For example check out this command.
cat /proc/cpuinfo /proc/meminfo
The kernel will quickly pretend that there are two files (cpuinfo and
meminfo) that contain a bunch of stats that the kernel knows. The
cat
command will dump them out for you. Very handy. Try it.
ℱancy dmesg
The timestamps dmesg produces are in time intervals since the computer
was powered up. I find that to be pretty useless. The -e
flag will
give you sensible timestamps that let you know if some logged event is
related to the problem you just had a minute ago.
Sometimes if you’re doing some robotics kind of thing with weird hardware and you want to see it get detected or whatever, you may need to monitor the kernel’s ring buffer in real time. Fortunately it’s simple to do. Check out these options.
sudo dmesg -wH
uname
This is supposed to tell you about the platform you’re running on.
This is often used in scripts so the script can know what kind of
system you’re on. For less boring use try the -a
option.
$ uname
Linux
$ uname -a
Linux ctrl-sb 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux
I tend never to use this. Instead, I usually just do this.
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
In the old days, I did this and it still mostly works.
$ cat /etc/issue
Debian GNU/Linux 9 \n \l
sensors
Start by installing this somehow if it’s not there. On Debian/Ubuntu type systems do this.
$ sudo apt-get install lm-sensors
Then have it figure out what sensors it can read (basically it’s autoconfiguring).
$ sudo sensors-detect
I just hit enter to go with the defaults of every question. Because I’m lazy.
Then you can put it to use and see what your computer knows about its sensors, usually temperature data.
$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +28.0°C (high = +84.0°C, crit = +100.0°C)
Core 0: +27.0°C (high = +84.0°C, crit = +100.0°C)
Core 1: +28.0°C (high = +84.0°C, crit = +100.0°C)
Core 2: +26.0°C (high = +84.0°C, crit = +100.0°C)
Core 3: +26.0°C (high = +84.0°C, crit = +100.0°C)
That’s pretty handy to know, especially if you are having intermittent shutdowns on hot summer days.
Not on FreeBSD or OSX. Probably Linux only.
smartctl
dmidecode
Produces a massive dumping of everything the system knows about your
hardware. You need to use sudo
and you probably need to pipe it to
something like less
or grep
to be useful at all. Still, good to
know when you need to find things out about your system. I think it is
a good choice for archiving a hardware profile of machines you manage
or care about.
By the way, DMI stands for Desktop Management Interface.
I found dmidecode
on FreeBSD but not OSX.
tr
The tr
command can be simple. It stands for "translate" but I think
of it as "transpose" too — something like that. As an example, if you
need all n’s converted to b’s and all a’s converted to e’s, this will
do it.
$ cal | head -n1 | tr na be
Jebuery 2019
ℱancy tr
Things can get more complicated with ranges of characters. Here is a technique to get ROT13 output.
$ echo "This is mildly obfuscated." | tr [A-Za-z] [N-ZA-Mn-za-m]
Guvf vf zvyqyl boshfpngrq.
$ echo "Guvf vf zvyqyl boshfpngrq." | tr [A-Za-z] [N-ZA-Mn-za-m]
This is mildly obfuscated.
Here I explain a cool application of tr that I came up with.
sed
cut
The cut command can be used to extract specific columns of data.
You pretty much always need two options for this command. First you
need to specify a delimiter, that is a character that will be the one
which separates your fields. For example, in comma separated values,
you’d say -d,
. The other option is the -f
which is the field
number you want. Here I’m taking the 3rd field when separated by
commas.
$ echo "one fish,two fish,red fish,blue fish" | cut -d, -f3
red fish
If I separate by spaces, I get a different result.
$ echo "one fish,two fish,red fish,blue fish" | cut -d' ' -f3
fish,red
awk
It turns out that a lot of people comfortably get away with not
knowing about the cut
command because they use awk
. If you only
know one thing about awk
it should be that it is very good at
easily extracting fields (i.e. columns) from lines of text. One nice
thing about awk
is that its default delimiter is space, so you often
can just do something like this.
$ echo "one fish,two fish,red fish,blue fish" | awk '{print $3}'
fish,red
And if you need a different delimiter, you can do this.
$ echo "one fish,two fish,red fish,blue fish" | awk -F, '{print $3}'
red fish
Or this.
$ echo "one fish,two fish,red fish,blue fish" | awk 'BEGIN{FS=","}{print $3}'
red fish
That’s really all normal people need to know about awk
.
ℱancy awk
But that’s just the tip of the iceberg! The reason that awk
is
generally better than cut
is that it is way more powerful than the
simple cut
(which is fine if you’re going for minimal).
Awk is one of Brian Kernighan’s favorite languages. This is surprising since he is the "K" in the original K&R C. But less surprising because he is also the "K" in awK which he helped write. It is in fact a complete and powerful programming language. I have written some amazingly powerful and effective programs in Awk and I encourage professionals to get familiar with its power and potential.
Here is an example of a Bash program I wrote that creates specialty Awk programs and then runs them to solve problems in a custom yet scalable way.
Here is an example of how I used Awk to create pie charts on the fly from the Unix command line.
Definitely check out my my awk notes for other interesting applications and ideas.
cat / tac / rev
Short for "conCATenate". Although this is really to join multiple files together it is commonly used to just dump the contents of a file to standard output where it can be picked up by pipes and sent as input to other programs.
tac
is a little known but fun and somewhat useful command that
returns the lines of input starting with the last and ending with the
first. I found tac
on FreeBSD but not OSX.
rev
is another little known but fun and less useful command that
returns the lines of input reversed (right to left becomes left to
right). Let me know if you come up with a brilliant application of
this.
ℱancy cat
Don’t use cat
! Use file arguments and redirections instead where possible.
If you’re not concatenating, you probably don’t really need cat
.
That said, normal people shouldn’t feel bad for using it as a do-all
convenience function. It works. It’s just that if you want to get to
the next level of efficiency, it’s good to economize your scripts by
getting rid of cats when possible.
One helpful simple use of cat
is when a program tries too hard to
format data. For example, the ls command tries to figure out
how wide your terminal is and pack as many columns of output as
possible. This normally makes sense, but if for some reason you want
the display output in one continuous column, just pipe it to cat
.
split
Most people don’t know about this command because it is rarely needed. But when it is, it’s nice to solve your problem in a way that you can be sure is direct and efficient. It is one of the many programs written by Richard Stallman personally.
Basically, like it says in the name, it splits data. Into multiple files.
Here’s an example. If you have a giant list of a million ligands and
you have 100 computers that can check to see which ligands bind to a
certain protein, you could use the split
command to produce 100
files each containing 10000 ligands.
The split
command is the natural complement of the cat
command.
How would you reassemble files produced with split
? Simply
concatenate them with cat
.
fmt
While generally obscure, the fmt
command is of special interest to
people who — like I am doing right now — write large amounts of
prose in a text editor. An important question arises: do you type away
hitting enter only after you’ve reached the end of a paragraph? Or do
you interject hard returns in your text as you type? The fmt
command
allows the best of both worlds. You can type away without worrying
about where exactly those hard returns should go (something people
had to think about when typing on an old manual typewriter), yet at
the end of your input, you can have your lines be a sensible 70 or 80
characters wide (maximum) or whatever you want with the -w WIDTH
option (default is 75).
Interestingly I went for 25 years without knowing about this program
explicitly, and yet all of my prose is broken down into manageable line
lengths for easier reading; my default value is 70 characters and
longer as necessary for code comments. The way I make this happen is
Vim’s set textwidth=70
and, if needed, the gq
function. What I
never knew is that behind the scenes Vim does have a set formatprg=
setting which is described like this:
The name of an external program that will be used to format the lines selected with the |gq| operator. The program must take the input on stdin and produce the output on stdout. The Unix program "fmt" is such a program.
If you appreciate this great feature of Vim but would like to scale it
up or automate it in some way, the fmt
command is the answer. Or see
fold below.
fold
Another classic tool very similar to fmt
is fold
You can specify
the -s
option and get breaks only at spaces. The -w 70
setting
allows you to specify the width to break at (or before). fold
is a
little more serious about cutting off long lines while fmt
will let
them run over if there’s no sensible place to break them.
One interesting and useful application is to break down a line of text into a column of characters. Here is an example showing this and adding line numbers to each character position.
$ echo "xed.ch" | fold -w1 | sed '=' | sed 'N;s/\n/ /'
1 x
2 e
3 d
4 .
5 c
6 h
diff
The diff
command simply finds the differences in two files. Well,
no, not simply. It actually is a very hard problem to even formulate
an efficient way for you to be apprised of what is different. diff
solves that hard problem and other subtleties related to the task. If
you think you have a better way to express file differences than
diff
, you’re probably wrong. Let’s look at an example.
Here I list block devices with the lsblk command and save the output to a file. I do this once before inserting a USB flash drive and once after.
$ lsblk > /tmp/before
$ # ... Now I insert a USB drive.
$ lsblk > /tmp/after
$ diff /tmp/before /tmp/after
7a8,10
> sdc 8:32 1 29.9G 0 disk
> ├─sdc1 8:33 1 1G 0 part /media/xed/ef6b577f-d3c3-4075-8da8-333d031b4515
> └─sdc2 8:34 1 28.9G 0 part
What diff shows me is only the part of the two output files that is different. Its format says that around line 7 you need to add the following lines (which start with ">"). On "diffs" (as they are called), deleted lines are indicated with the symbol is "<".
Note that the order of how the files are specified is important. The "diff" reflects what must be done to the first specified file to achieve the state found in the second.
ℱancy diff
That’s nice, you’re thinking, but maybe you don’t see yourself with a
huge diff
agenda. It turns out that this program is a cornerstone of
human civilization. This is because pretty much all version control
systems like RCS, CVS, Mercurial, and Git (and therefore all software)
use diffs to keep track of what changed.
There is another related Unix command called patch
(written by
Perl creator Larry Wall) which takes something like /tmp/before
plus
a diff and produces a /tmp/after
. If you’re wanting to tell Linus
Torvalds about some brilliant change you have in mind for the Linux
kernel, the normal way to do this is to post a diff (in email is fine)
which will effect your changes if it is "patched" into the code. This
is where the word "patch" in software contexts comes from.
Here
is a hardcore use of diffs where I created a script to apply numerous
custom patches to fix a terrible but important dataset. The point of
this example is that I am taking for granted that there is no better
system to explicitly record and apply what needs to be changed than
diff
and patch
.
sdiff
This shows two files side by side highlighting their differences. This
is the quick command line technique that might be able to replace
something like vimdiff
which does a fantastic job of such tasks.
cmp
There is also a unix command called cmp
which mostly just
checks to see if two files are the same. Unlike diff
which is line
based, cmp
works on bytes making it a good choice for hunting down
bytes that are different in a binary file. Getting an MD5 hash can
tell you that there are different bytes, but cmp
can tell you where
those bytes are. The bytewise approach may be slightly more efficient.
This might, for example, be useful if you suspect cosmic radiation has
disturbed a large binary file — this is a thing that happens!
comm
This is another file comparing utility that operates on lines like
diff
. Unlike diff
, its output is not optimized for creating a
patch
to reconcile the two files. Rather its output is designed to
allow further processing in Unix pipelines. The command outputs 3
columns by default — the lines that are only in the first file, the
lines that are only in the second file, and the lines that are in
both. You can use options to suppress any of these columns to get the
Boolean operation you need.
This command can be sensitive to ordering — here is a useful syntax to ensure the input files are ordered.
comm -12 <(sort fileA) <(sort fileB)
This shows only column 3, or the lines that both files have in
common. Or you could use -3
to show just the differing lines; this
might be useful, for example, when trying to compare file system trees
and figure out what the differences are.
md5sum
I have a lot of respect for diff
of course, but in my experience I
have more occasions where I need to know if something changed rather
than what about it changed. Make no mistake, diff
can do that job,
but there is a simpler way.
The MD5 message digest algorithm creates a "hash" of input data. This means that it makes a short (128 bits) numeric summary of input.
Think of it like some kind of inscrutable rhyming slang. Why would a "person from the United States of America" be called a "septic"? Well, "septic" from "septic tank" rhymes with "Yank" which is short for Yankee. You input "American" to a Cockney and you get "septic". WTF — go figure. Without a complex breakdown of the algorithm, you just have to accept that it is what it is. Hashes are similar.
If I feed md5sum
that phrase, its rhyming slang produces this.
$ echo "person from the United States of America" | md5sum
f3b811b934ee28ba9e55b29c6658c5b7 -
That 32 character nickname is no more or less understandable than "septic". What it is, however, is unique. If I send it anything else, that "nickname" will be reliably different. And not just a little different but completely different in a completely random looking way. Just like rhyming slang.
Where is this useful? Well, everywhere for starters! This kind of
thing is used heavily to make cryptographic keys. Something like this
(and formerly this exact thing) was used to record your password in
Unix /etc/passwd
files. That allowed the system to check if it was
really you without actually recording the secret word. It would just
check the secret word you enter at log in by running it through md5sum
and seeing if the hash on record matches.
Here is a simpler example that I run into a lot. Let’s say you have a
big music collection or a big photo collection. With multiple backups
and offload operations, it’s easy to get some files that are
duplicates. (If this hasn’t happened to you, you have not managed
enough files.) The question md5sum
is ideal to answer is, "Are these
files exactly the same?" Just run something like this.
$ md5sum /etc/resolv.conf /var/run/NetworkManager/resolv.conf
069bf43916faa9ee49158e58ac36704b /etc/resolv.conf
069bf43916faa9ee49158e58ac36704b /var/run/NetworkManager/resolv.conf
Here the MD5 hash for these two files is identical — these files contain the same exact 1s and 0s in the same exact order. (I cheated here a bit since the first is a symlink linking to the second.) If these are two distinct files, one of them is redundant.
On FreeBSD and OSX there is an equivalent command called simply md5
.
ℱancy md5sum
Ok, so how would one find all the duplicated files on a filesystem?
The md5sum
command will surely be at the heart of the solution.
I wrote this little one liner script which takes a starting top level path as input and searches the entire sub tree for files which are actually the same. It saves a bit of time by only looking at the beginning of the files instead of the whole thing. Note that apparently 1k of an mp3 is not enough to distinguish it from others reliably enough.
#!/bin/bash
find $1 -type f \
| while read X; do head -c 100000 "$X" | echo -n `md5sum | cut -b-32`; echo " $X"; done \
| sort -k1 \
| awk 's==$1{print p;p=$0;print p}{s=$1;p=$0}' | uniq
Run like this:
./undup /media/WDUSB500TB/musicvault > musicdups
Just to sprinkle a little confusion and philosophical doubt into the
topic, it turns out that it is possible that md5
for two different
inputs will cause the same hash to be output. This is called a
"collision" and it is very rare. Very. Rare. It is rare in the same
way that choosing two random drops of water on earth would find them
touching each other. Still for some life or death applications (mainly
cryptography), this is not rare enough. There are fancier (n.b. harder
to compute!) hash algorithms where the collision potential is
something you could comfortably bet your life on.
Here
are nerds in late 2018 debating whether MD5 is sufficient for
anything. To figure out which songs in your collection are dups, I say
it’s more than fine.
sum
Note that there is an old program called sum
that computes a very
simplistic checksum of the blocks in a file. I feel like unless you
have a very rudimentary check to perform and you have serious
performance objectives, it’s always better to favor md5sum
.
sort
One huge disconnect between the real world and computer science
education seems to be the emphasis on sorting algorithms. You know who
writes production code containing sorting algorithms? Nobody. Because
it has all been exhaustively implemented for all practical purposes.
In the Unix world that done deal is the sensibly named sort
command.
Need something sorted? It’s an off-the-shelf solution that’s probably
better than what the typical CS education was going to provide for.
Don’t worry too much about how this works, but the following code can produce helpful examples.
for N in {1..5}; do echo $(($RANDOM%10));done
That basically outputs 5 random numbers between (and including) 0 and 9. If that output is sent (with a pipe) to the sort command you get something like this.
$ for N in {1..5}; do echo $(($RANDOM%10));done | sort
1
3
4
6
9
Those random numbers are now sorted. However for humans there lurks some unintuitive behavior. Let’s try it with numbers from 0 to 99.
for N in {1..5}; do echo $(($RANDOM%100));done | sort
14
17
5
55
62
That 5 does not look sorted. But in fact, from a text standpoint, it
is. If you want the sort to be done from a numerical perspective, add
the -n
flag.
$ for N in {1..5}; do echo $(($RANDOM%100));done | sort -n
3
13
27
45
80
Now the single digit number does come first even though "3" is bigger than the "1" of the "13" and the "2" of the "27".
Need to reverse the sort? You could pipe it to tac or simply
use the -r
option. This produces the alphabet in reverse.
for N in {a..z}; do echo $N;done | sort -r
Need to "sort" things into a random order? The -R
command can do that.
This produces the alphabet in a random order. Run it multiple times.
for N in {a..z}; do echo $N;done | sort -R
ℱancy sort
Although that last example works, if you want to sort things randomly,
you need to be careful. The sort
command actually randomizes sets of
duplicates. This means if there are two entries that are the same,
they will still be stuck together after -R
and that is not really
quite random. You can see this by counting unique values with
something like this.
$ for N in {1..1000}; do echo $(( $RANDOM%20 )) ; done | sort -R | uniq -c | wc -l
20
How could there only be 20 unique (see uniq) sets of numbers if they were randomly distributed? They are not. If you want truly random distribution, try the shuf command which does the right thing.
$ for N in {1..1000}; do echo $(( $RANDOM%20 )) ; done | shuf | uniq -c | wc -l
944
Sometimes I need to sort on two fields in a special way. Consider the following competition results data.
Alice Smith 11.02 F
Bob Tio 9.32 M
Charlie Angel 10.08 M
Eve Ng 9.96 F
Imagine that you would like this sorted by gender and then numerically
by score, highest first. You could use sort -k4 -k3,3nr
to produce this.
Alice Smith 11.02 F
Eve Ng 9.96 F
Charlie Angel 10.08 M
Bob Tio 9.32 M
The trickiest bit for me is the -k3,3
which defines the "key". The
comma notation says that the sorting goes from field 3 to (in this
case) field 3. But you could also sort on ranges of fields too. If the
comma and second specifier is left off, it assumes to the end of the
line.
Another tip about sort shows that it’s good to read your man pages. I
just learned that sort now has a mode that can sort by human readable
sizes (-h
). This will come in handy when sorting output of the
du command and many other applications involving file sizes.
shuf
As mentioned in sort, the shuf
command
is a way to "shuffle" lines of input into randomly ordered lines of
output. This is great for the obvious uses like shuffling cards or
your music or slideshow playlists. It’s also good for getting a subset
of the items you need; for example I have used it for machine learning
training where I have video footage but need to extract some
representative stills.
Here’s a simple demo.
$ seq 10 | shuf | tr '\n' ' '
9 3 6 4 5 2 1 10 8 7
But shuf
is actually more clever than just scrambling the input.
Instead of providing input on stdin yourself, if you need integer
numbers (as shown above using seq
) you can use the -i
option and
get an integer range (you can also use the long option
--input-range=LO-HI
).
$ for _ in . . . ; do shuf -i 1-10 | tr '\n' ' '; printf '\n'; done
4 8 3 10 7 6 9 1 5 2
2 5 7 9 6 8 3 1 10 4
9 8 10 4 3 2 5 7 1 6
With this technique you can see that it could be useful to get random
integers by piping this to head -n1
. But shuf
has you covered with
a -n
option of its own. For example, the following command will
simulate a 6-sided dice roll.
shuf -i 1-6 -n1
uniq
The uniq
command seems very weird at first. Why would anyone need a
special command that eliminates duplicates? Especially since the
sort
command has a -u
option that does this (kind of).
The typical usage is to find out "how many different things are there"? For example, you may have a log file and wonder how many different addresses made a connection to your web server (i.e. ignoring the many hits where the same customers are merely busy interacting with it).
Here’s a typical example. Which processes are logging things in syslog (which is just some Linux log thing)?
$ sudo cut -d' ' -f5 /var/log/syslog | sed 's/\[.*$//' | sort | uniq -c
14 anacron
6 CRON
2 dbus-daemon
39 kernel:
1 liblogging-stdlog:
4 mtp-probe:
20 systemd
2 udisksd
Here I cut out the service name and clipped off the process number.
The real action starts at sort
. That sort organizes the naturally
occurring interleaved list. By then sending it to uniq
I eliminate
the duplicates. The -c
option causes uniq
to count up how many
duplicates there are.
Of course when you look at output like that you might think to sort again and this is very common! Here I’m finding only the top process that generates log messages.
$ sudo cut -d' ' -f5 /var/log/syslog | sed 's/\[.*$//' | sort | uniq -c | sort -n | tail -n1
39 kernel:
As your Unix proficiency increases, piping results to something like | sort |
uniq -c | sort -n | tail -n1
becomes quite ordinary.
Since pipes work by running processes in parallel, this kind of workflow is also extremely high performance as a bonus.
ℱancy uniq
Still not convinced that the sort
command is insufficient for making
things unique? The problem with the unique option of sort
is that it
must actually sort the data first. Sometimes this is not what you
want. Compare the following.
$ for N in {1..1000}; do echo $(( $RANDOM%20 )) ; done | shuf | uniq | wc -l
958
$ for N in {1..1000}; do echo $(( $RANDOM%20 )) ; done | shuf | sort -u | wc -l
20
You can now see that there naturally occurred 42 cases where there
were duplicates that uniq
had to get rid of while the sort -u
got
rid of 980 cases of duplicates giving the answer to quite a
different question.
The reason to break uniq
out into its own stand alone program is
that it becomes more modular. The uniq
command itself is pretty
powerful on its own too. It can even eliminate lines that have
duplications limited to a specific whitespace-separated fields (-f
).
wc
The wc
command stands for "word count" and is one of the most useful
command line Unix tools. It is especially useful as a building block
in a complex pipeline allowing you to distill a lot of data into
something you can deal with. For example instead of seeing pages of
files go zooming by, I can do this.
$ ls *txt | wc -l
152
And find out that I have 152 help files in my
directory. The -l
is to count, not words, but lines. Specifically
I’m counting the lines returned by the ls
program. That will list
all of the files I’m interested in.
To see how many words are in all of my help files I simply do this.
$ cat *txt | wc -w
316522
Or if you do it like this, you get a breakdown of every file’s word count and a total. (I’ll just limit it to ones starting with "o" to illustrate.)
$ wc -w o*txt
10002 opencv.txt
3846 opengl.txt
13848 total
Without the -w
option it shows you lines, words, and bytes.
$ wc o*txt
1671 10002 71753 opencv.txt
731 3846 28402 opengl.txt
2402 13848 100155 total
If you’re a professional writer commissioned for a certain number
of words the -w
will be very useful to you. However, for normal Unix
command line usefulness, wc -l
is extremely useful. For example, if
I wondered how many times I used the word Linux in my notes, the
answer is immediately available.
$ grep -i Linux *txt | wc -l
503
ℱancy wc
In that last example, if I had a line with "linux" in it twice, it would only be counted once. To really do a proper job, I would want to break up every word into its own line an then find the lines containing the search target and then count them. Here is how I can do that.
$ cat *txt | tr ' ' '\n' | grep -i Linux | wc -l
526
Apparently I have (at most) 23 lines with multiple "Linux" mentions.
But as you can see, no matter what you must do to get the data to be
correct, when it comes time to count it all up, wc
is your friend.
which / whereis / type / file / locate
Before you can use man
to read about a command, you need to know
if that command is even present on your system. There are many ways to
do this. I prefer the which
command for finding where executables
really live (if they’re even installed). Let’s say I wanted to find
out where the mysql
executable is on my system. Here’s a comparison
of techniques.
$ which mysql
/usr/local/bin/mysql
$ whereis mysql
mysql: /usr/local/bin/mysql /usr/local/man/man1/mysql.1
$ type mysql
mysql is /usr/local/bin/mysql
$ locate mysql | wc -l
12550
A lot of people go right for locate
, but as you can see, it produces
a list of 12550 lines of stuff I’m not interested in reading.
I also find whereis
to be too verbose unless you’re looking for man
page paths for some reason. I tend to go with which
when I want to
find a command.
The type
command is not a command but a shell built in. Its great
feature is that it can recognize shell built-ins for what they are.
$ type type
type is a shell builtin
The file
command is good to figure out what exactly the executable
is once you’ve found its location. (A shell script? 32 bit or 64bit?
Designed for a bad OS?)
$ file /usr/local/bin/mysql
/usr/local/bin/mysql: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 11.2, FreeBSD-style, with debug_info, not stripped
In fact, checking the /sbin/init
program is a good way to figure out
if your underlying system is (for some god-awful reason) 32 bit.
$ file /sbin/init
/sbin/init: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD),
statically linked, for FreeBSD 11.2, FreeBSD-style, stripped
This one is thankfully not.
ℱancy file
If file seems great but you’d like to know absurd levels of detail
about an executable start with ldd
. This shows all the shared
library dependencies and can be a life saver when trying to resolve
dependencies. One very common thing I do is ldd ickyexecutable | grep
Found
— this will ironically catch all the items on the list that
are "Not Found". (You can search for "Not Found" if you feel like
including the quoting.) With this list of missing libraries, you now
have a great place to start tracking down what you need to get the
thing to run.
If you’re hungry to learn yet more about executables, check out the
man page for the readelf
command. Also the nm
command specializes
in extracting an executable’s symbols but readelf
can too.
export / alias / set
These are shell commands but they can tell you a lot of useful things.
If you run it with nothing else export
will give you a list of
exported variables in your environment. That is useful to review
sometimes. The alias
command by itself shows all the defined aliases.
And set
by itself shows everything the shell knows about that is,
well, set. This includes shell functions, aliases, variables, and maybe
more.
These are super useful commands to keep in mind when troubleshooting or perfecting your shell environment.
An example is if I’m interested in knowing how my shell history settings are configured. I could do this.
$ set | grep HIST
HISTCONTROL=ignoredups:ignorespace
HISTFILE=/home/xed/.bash_history
HISTFILESIZE=1000000
HISTSIZE=100000
Those are good settings, BTW.
unzip
Normal people are used to "compression" taking the form of .zip
files. Unix can deal with this. To see what monsters may be lurking in
a zip file check with something like this.
unzip -l stupidly_packed_thing.zip
unzip -l normal_java_stupidity.jar
(Yes, Java jar files are just zip files in reality.)
To actually do the extraction do the same thing but without the -l
option.
That all seems fine, but I actually do not like zip files. To find out why I think you should not ever create them see this post I wrote called, Why Your Zip Files Irk Me.
gzip
If you have a file called elephants.gz
— how can you read that? It
is compressed with the gzip program. You can use your choice of these
equivalent commands.
$ gunzip elephants.gz
$ gzip -d elephants.gz
And you will be left with a (bigger, natch) file called simply
elephants
. You can now make ordinary changes to that file.
Normal people prefer the gunzip
way but I like to stick to the
gzip
way since that’s really the program that is run by gunzip
anyway.
To compress it back down into its smaller form, simply use the following obvious usage.
$ gzip elephants
Which will leave you with an elephants.gz
file. Easy.
ℱancy gzip
Let’s say elephants
is really big and I don’t have the disk space
to actually unpack it. But I want to see if that data contains
something of interest. I can do something like this.
$ gzip -cd elephants.gz | grep Babar
This will begin unpacking the file into its readable form and send it
to standard output. This is fed into the grep program which looks for
(and reports if it finds it, only) a line containing the word "Babar".
The entire unpacked data set never is stored on your computer (grep
just discards what you weren’t interested in). This is actually the
simplest form of this trick. You can use streams of compressed data
like this to send big things (e.g. streaming audio) over network
connections and other such magic.
zcat
If you just want to look at a text file that has been compressed with
gzip, the zcat
command will do that. It basically decompresses it
and sends the original out its standard output.
bzip2
It’s just like gzip
. There’s even a comically named bunzip2
alternate equivalent to bzip2 -d
. Files compressed this way are
usually saved as file.bz2
.
Basically bzip2
is more hard core than gzip
. It will work harder
to compress your stuff more intensely saving you disk space. The catch
is that it will work harder meaning it will use up more CPU resources.
You decide what you’d like to economize.
tar
The tar
command stands for "tape archive" and that sounds boring and
irrelevant for most modern homo sapiens. However, back in the ancient
times the programmers who came up with sensible methods to put entire
filesystems on tapes did a pretty solid job of it. So much so that
it is still very useful and very much used today.
The most common interaction with tar
files is needing to deal with
something you download from an open source project like
somegreatsoftware.tgz
.
The gz
means it is compressed and step one is to decompress it. Using
gunzip
on a .tgz
file as described above will leave you with
somegreatsoftware.tar
. From here you can see the contents of the
archive with this.
tar -tvf somegreatsoftware.tar
The -t
is for "test" I believe. The -v
is for verbose, i.e. show
anything it can show. And the -f
is getting the command ready for
the file to look at.
To actually extract the files from the archive, just change the -t
to -x
for eXtract. I usually like to run the -t
version first to
see if the archive creator included a polite containing top directory
for the contents. Some (bad) archive creators just cause tar
extractions to dump hundreds of files into your current working
directory which can be quite tedious to clean up.
cpio
Note that there is an ancient utility called cpio
which does mostly
the same thing. The main important difference seems to be that
cpio
treats directories as file entries in the archive, while tar
preserves the directory structure. When extracting files with cpio,
directories are recreated as regular files with the directory
contents. Probably not what is wanted.
head / tail
This pair of commands can be quite intuitive and also extremely useful
in unintuitive ways. The head
command just shows the beginning of
the file (or input), by default the first 10 lines. The tail
command
shows the last 10 lines. This is extremely useful for seeing what’s in
a file. The tail
command is especially valuable for seeing what the
latest entries in a log file are (you are less likely to care about
stuff at the beginning that may have happened months ago).
To see some amount other than 10 lines you can use an option like this.
$ history | tail -n3
ℱancy head / tail
In ancient days these commands could take an option styled like this.
$ head -2 /proc/meminfo
MemTotal: 16319844 kB
MemFree: 15155016 kB
This shows the first two lines of the /proc/meminfo
(memory
information pseudo-) file that Linux creates. But this style is not
recommended. It is best to use the full -n2
syntax.
There are (at least) two reasons for this. For one, it is explicit about exactly what units you want. For example, you could do this.
$ head -c3 /proc/meminfo
Mem
This stops after the first 3 characters of the file, a very useful feature.
The other reason is that head
and tail
have a clever mode where
you can explicitly use positive and negative numbers for different
effects.
$ ABC=$(echo {a..z}|tr -d ' ') # Don't worry about how I did this.
$ echo $ABC
abcdefghijklmnopqrstuvwxyz
$ echo $ABC | head -c5 # Normal mode - first 5 items
abcde
$ echo $ABC | head -c-5 # All except items _after_ position 5 from end *
abcdefghijklmnopqrstuv
$ echo $ABC | head -c+5 # Plus is like normal head mode
abcde
$ echo $ABC | tail -c5 # Normal mode - 4 items _after_ position 5 from end *
wxyz
$ echo $ABC | tail -c+5 # All _except_ 4 items _before_ number's position *
efghijklmnopqrstuvwxyz
$ echo $ABC | tail -c-5 # Minus is like normal tail mode *
wxyz
I used the character mode of head
to make these examples compact;
this mostly works the same for lines too with -n
but it’s good to
check. Note the ones with asterisks are not producing or eliminating
the number of items listed, but seem to have an off-by-one issue.
I’m pretty sure this is caused by new line characters getting counted.
$ seq 10 | tail -n3
8
9
10
$ echo "123456789" | tail -c3
89
So pay close attention to that.
Yes, the sense of what plus and minus do can be confusing, but it’s
enough to know these tricks are possible. Simply look up the detail
with man
when you have a need and/or perform a quick little
experiment to verify it works how you want. (Or reference this very
explanation which is what I may start doing!)
Here is what the man page says about this important parameter for
head
:
print the first NUM bytes ...; with ... '-', print all but the last NUM bytes...
print the first NUM lines ...; with ... '-', print all but the last NUM lines...
And for tail
:
output the last NUM bytes; or use -c +NUM to output starting with byte NUM of each file
output the last NUM lines, instead of the last 10; or use -n +NUM to output starting with line NUM
One more extremely powerful feature of tail
specifically is that it
can actively watch a file and show you new additions as they arrive
and are appended. This uses the -f
option (for "follow"). This is
extremely useful for watching log files collect messages in real time.
xxd
xxd / od
I love xxd
! This tool is not just really good at what it does and
useful for serious computer nerds, but it has a kind of refreshing
purity to me. Basically its function is to give a "hex dump" of the
input. Beginners need not worry about that. In fact beginners do not
need to use this command ever.
But what I love most about this command might appeal to beginners and
experts alike. I love how this command allows you to see the actual
ones and zeros that make up your data. Everyone has heard that
computers work with ones and zeros, but who has really seen direct
evidence of this? If you have a Unix system and xxd
you can! Check
it out.
echo -n "xed.ch" | xxd -b
00000000: 01111000 01100101 01100100 00101110 01100011 01101000 xed.ch
This shows the binary string for my website’s domain. These are the exact ones and zeros that encode "xed.ch". That’s pretty cool! Here’s more such low level obsessing.
ℱancy xxd
Beyond that brush with obsessive detail, xxd
is tremendously useful
for picking apart files that are not in ASCII text. Its usual output
is byte (not bit as above) oriented and it tends to be in hex. It can
however do a lot of very clever things. Here I’m pulling data right
off the first 512 positions of the disk and checking if there’s a GRUB
installation there. (BTW - be inspired by, but don’t try this example
unless you know what you’re doing.)
dd if=/dev/sda count=1 bs=512 | xxd | grep -B1 RUB
I could have grepped the weird hex values that make GRUB but this was
simply easier to let xxd
do that math.
The od
command is like xxd
but does an octal dump.
dd
If you’re a beginner, you probably should just know that dd
is damn
dangerous. Some people call it "disk destroyer".
ℱancy dd
As with using xxd
, there may come a time when you want to cut
through all nonsense and handle each one and each zero yourself
explicitly. No messing around! That is what dd
can do. The man page
doesn’t really say what "dd" stands for but I think of it as "direct
data" (transfer).
The way it works is you give it a source and a target and it takes the ones and zeros from the source and puts them on the target. Simple, right?
Here’s an example you should not try.
dd if=/usr/lib/extlinux/mbr.bin of=/dev/sdd
This will take the master boot record bits located in the binary file
specified by if=
(input file) and copy them to the of=
(output
file) specified as the disk device "/dev/sdd". The reason not to do
stuff like this willy-nilly is that this will hose some possibly
important stuff on /dev/sdd
. So make sure you’re very sure what
you’re doing with this.
Some common options can look like this.
dd if=wholedisk.img of=justtheboot.img bs=512 skip=2048 count=1024000
This pulls just a portion of the original "wholedisk.img" image when creating the new file, "justtheboot.img".
The bs
specifier is for block size. Think of this like a bucket that
the bits are transferred in. Besides the minimum raw transfer time it
takes a little while to load each bucket. If you specify a small
bucket, there will be many fillings of it. However, if you specify a
huge bucket and you change your mind while the command is running, you
may have to wait for the current huge bucket to finish processing
before the program will check in and see if you want to interrupt and
exit politely. The default is 512 (bytes) which is probably a bit
small for modern use. I usually find that 4M
is a good value for
most things and very little benefit comes from using a different value.
My favorite option is status=progress
which will give you a running
assessment of how much has been transferred so far.
sudo dd if=ubuntu-16.04.3.iso of=/dev/sdg bs=4M status=progress oflag=sync
Here I’m burning the ones and zeros of an OS iso image onto a flash
drive and the status allows me to keep an eye on how it’s doing. The
fsync
"conversion" (on output only) just ensures that a physical
sync
is called to make sure that the device is truly updated and not
just presenting illusions from cached memory still queued for the
device. Don’t forget the block size bs=4M
(or some largish number)
so that the transfer doesn’t take 20 times longer while it processes
(default is 512B so 4 orders of magnitude!) more blocks.
Also if you’re trying to save data on a disk that’s going bad and it’s
giving you IO errors with dd
, you can investigate
ddrescue which can valiantly
overlook IO errors and press on with the job.
ssh
ping
Troubleshooting a network? You could do worse than starting with
ping
. I like to send 3 (ECHO_REQUEST) packets in case the first one
dies in a fluke traffic accident (but the connection is really up).
Any more than 3 and it just takes too long.
ping -c3 xed.ch
One thing to check right away with network troubleshooting is if it’s really the network connection to your target or simply the name lookup.
ping -c3 1.1.1.1
This will see if you can reach the domain one.one.one.one (yes, that is their domain).
Or try Google’s name server which is almost as easy to remember.
ping -c3 8.8.8.8
If those don’t come back successfully, you’ve got a real network problem!
ℱancy ping
Also in the ping family is traceroute
and that classic program’s
modern fancy version mtr
. I like those tools, however, I’m finding
those to be broken now that I have a blazing fast fiber optic
connection. It’s like the "Time To Live" setting can’t get low enough
to find out anything meaningful.
host / nslookup
Often you can reach the internet but you can’t use names, only IP
numbers. If you need to troubleshoot that process or just find out
about how names and IP numbers relate, used host
and nslookup
.
$ host -t A xed.ch
xed.ch has address 66.39.97.213
$ host images.google.com
images.google.com is an alias for images.l.google.com.
images.l.google.com has address 172.217.7.14
images.l.google.com has IPv6 address 2607:f8b0:4006:819::200e
Note how you can find the IP address of a name and other useful information (like alias targets and IPv6 numbers).
The nslookup
command (normally) goes the other way, finding names
from numbers.
$ nslookup 66.39.97.213
Server: 192.168.1.1
Address: 192.168.1.1#53
Non-authoritative answer:
213.97.39.66.in-addr.arpa name = xed.ch.
Authoritative answers can be found from:
Who just tried to log into your SSH server 800,000 times? Find out by
looking up their connection IP address with nslookup
.
You can also use web based services like
iplocation.net for this and find out a
rough guess where that host is physically located in the world.
iptraf / iftop
Do you have some kind of server which provides data to many clients at
once and one of them is doing something uncool. The iptraf
program
can let you see the flows to each connected host.
Another possible tool that might serve this purpose is iftop
.
Or nethogs.
ℱancy iptraf
If you’re logged into your server with SSH, set the update interval to
1 sec so you do not overwhelm the network traffic with your own
iptraf
feedback loop.
It seems iptraf
is a Linux only program. The
FreeBSD community describes it
as having "Too many linuxisms" to port. How about nethogs
? That’s a
nice program too. Also iftop
. Maybe dstat
or slurm
.
ip addr / ifconfig
What IP number (or numbers!) are you using right now? Find out with
ip addr
(that’s the ip
command with the addr
option) or the
classic ifconfig
. The latter provides pretty much the same
information but is regarded as the old way to control (configure)
network devices (interfaces).
ip
is not on FreeBSD, but ifconfig
is.
Also on modern Linux distributions using Network Manager you can get
some excellent information with nmcli dev show
.
I also used to think that nmtui
(NetworkManager Text User Interface)
was limited to the Red Hat ecosystem, but I recently found it on
Ubuntu on an Arduino so that can definitely help tame the mess that
NetworkManager can make of connections, especially wifi.
ss / netstat
What network connections are currently active on your computer? Find
out with ss
. Its older brother, netstat
, is not used as much
these days, but it is a well-known classic that does the same thing.
ss
is probably Linux only while netstat
is also on FreeBSD and OSX.
feh
Skilled command line practitioners do not avoid computer graphics per se — they avoid superfluous, wasteful, and misleading graphical interfaces. Sometimes, however, the job at hand is all about graphics. If you have a file that encodes an image, you may reasonably want to see what that image looks like. To illustrate the rule by its exception, the only time I feel like the normal GUI desktop claptrap is even slightly valuable is when I need to organize very disorganized photos. But if you’ve been organized and you know that things are where they should be, the command line approach performs efficiently again.
To see an image file as a human viewable image, I like the viewer program feh. Obviously you need some support for graphics (so no SSH terminals without X tunnelling or bare consoles) but most people have that these days.
The advantage of feh
is that it is lighting fast and skips a lot of
superfluous nonsense. If you just need to see an image (including
maybe scaling it to fit your screen) feh
will not be beat.
Just for completeness I’ll mention
display which is a command
line tool that comes with installing the
ImageMagick command line graphics tools. A
lot of serious Unix people use display
but I find it much slower
than feh
. However, it is usually present on systems where feh
might not be installed and is a good backup to know about. The
ImageMagick tools in general are extremely powerful and any
intelligent person who does anything with images whatsoever should
know about them.
xpdf
I use xpdf
so often that I have it aliased as simply o
. It opens
PDFs and is very efficient and fast doing so. People used to other PDF
readers may carp about functionality deficiencies or even the old
school Xlib interface. But xpdf
is your true friend.
ℱancy xpdf
If you’ve been reading this whole thing here’s a nice example of how this Unix stuff is commonly used.
$ URL=https://helpx.adobe.com/security/products/acrobat/apsb18-21.html
$ wget -qO- $URL \
| tr ' ,' \n\n | sed -e 's/CVE-/\nCVE-/g' | sed -e 's/[^0-9]*$//g' \
| sort | uniq | grep CVE | wc -l
104
I’ve broken this single command line into three physical lines (with
backslash). The first uses wget to get the URL
variable. I
set that variable to be Adobe’s security page for Acrobat. The next
line converts that HTML mess into a long list of words that must end
in numbers. The third line sorts all of these,
eliminates duplicates, throws away everything but the key
phrase I’m interested in — "CVE", and then counts the results.
Reviewing it now, I can spot some places I could optimize this
process, but I’ll leave it alone as an illustration of the kinds of
rough jobs that can be done with Unix extemporaneously, on-the-fly. I
quickly built that command line up piece by piece until running it
gave me the final answer I was looking for. Very powerful.
So what is it telling us? It is telling us that on Adobe’s own Acrobat
security page, they are mentioning no less than 104 unique registered
CVE
(Common
Vulnerabilities and Exposures) problems related to Adobe Acrobat. Not
impressed? If you take off the wc -l
and look at these, you’ll see
they all start with "2018". I’m no longer a professional security
researcher, but that makes me think that we’re talking about 104
registered vulnerabilities in 2018 alone. I have seen
Defcon
presentations talking about what a delicious attack surface
Acrobat/Reader is. It contains an absurd level of functionality — Wikipedia says "PDF files may
contain a variety of content besides flat text and graphics including
logical structuring elements, interactive elements such as annotations
and form-fields, layers, rich media (including video content) and
three dimensional objects using U3D or PRC, and various other data
formats." PDFs can contain not just one programming language, the
obvious PostScript, but also JavaScript code! And who knows what else!
What could possibly go wrong?
Let all that sink in and then ask yourself if Acrobat is part of a
secure computing environment. My answer is "no". xpdf
, with its
"limited" functionality, saves the day!
vim
crontab
at
A lot of people know about Cron but not as many know about at
.
Cron is for recurring jobs. While at
is for jobs that you want to
run once at some time in the future. A lot of times at isn’t
installed by default which is a shame but it’s always easy to get.
If you’re constantly setting alarms or timers, at
is perhaps a
better choice.
You can simulate at
easily enough with something like this.
sleep $((7 * 24 * 60 * 60)) && aplay loudnoise.wav
That will play a loud noise in one week. Still, if you do a lot of
this, consider at
.
sleep
The sleep
command is a lot more useful than it seems like a command
that does nothing would be. For example, if you want to log something
every minute (not 800 times per second) simply add a sleep 60
to the
loop that does the logging. Easy.
bash
ar
The ar
command manages files that are (or need to be) in an
archive file. It is an alternative to something like
tar (or zip
— without compression
clumsily added whether you want it or not). These are used for shared
libraries that are linked with the ld
linker. Apparently Debian’s
.deb
files are ar
archives. Also the initramfs
file system used
to boot your real one in a Linux system is an ar
archive. Firmware
blobs can sometimes be in this format. Sometimes compilers put
debugging symbols into this format too. So definitely not dead!
as
The unix assembler. Converts assembly language instructions to
executable machine object code. Note that there is a utility called
dis
(that can be installed) which disassembles object code.
basename
Strips off the path and, optionally, any suffixes (e.g. -s.jpg
)
from file names. Useful in scripts.
dirname
Like basename but retains only the directory path part of the input.
col
This command is of dubious value today without printers being
controlled by control characters. But it still may have a use in
stripping out non-printable "characters" from text streams. By piping
to (and from) | col -b |
you can ensure that non-printing characters
(the -b
stands for "backspace") are stripped. If you really want
columns, check out the column command.
column
This formats input into columns. For example, if I need to look over all of the four letter words that begin with "f" I could
grep '^f...$' /usr/shar/dict/words | column -c 100
Without the column command, they all go flying by. Normally you’d page them with a pager like less but if it is helpful to see everything on the same page, this can be helpful.
ℱancy column
Another use for the column
command is to make tree diagrams. This
isn’t fun or easy but it can be reasonable if trying to come up with
an alias or function that solves a particular problem. For example,
the process list command ps produces really
unintelligible output without some serious care. Here is a way to have
your processes shown with the parent/child relationships made a bit
more obvious.
ps -u $USER -o pid,ppid,command| \
sed 's/^ *\([^ ][^ ]*\) *\([^ ][^ ]*\) *\(.*\)$/\1|\2|\3/'| \
column --tree-id 1 --tree-parent 2 --tree 3 -s'|' -W 3
Replace the -u $USER
with -e
to get "every" process. Note this is
like the pstree command but shows the PIDs a bit better.
Column can also convert some normal unix stdout into everybody’s favorite overly verbose format, JSON. In theory.
colrm
Need to remove columns from an output table? This command kind of does that. Note that it operates on character columns, not delimited columns (use cut for that).
$ yes "123456789"| head -n3 | colrm 3 6
12789
12789
12789
csplit
Though obscure, this interesting command can be quite useful. If you have a big file that you want to break up into smaller files, this can do it breaking it down by some regular expression patter. For example, if you have a log file that you want broken into multiple files by day, this command can do exactly that.
csplit server.log '/^20[0-9][0-9]-[01][0-9]-[0-3][0-9]/' '{*}'
Or, perhaps you have an SDF (structure data file) containing molecule
definitions and you want each one in their own file. These can be
glommed into one giant file with the weird section delimiter
but this command can separate them into their own files.
csplit input.sdf '/^\$\$\$\$/' '{*}'
ed / ex
Note that the ex
editor is probably the one you want if you’re
thinking about ed
; it’s just the more modern version and should be
included.
This is an ancient editor that is the foundation of vi
(the Visual
Interface to ed
). It can still have some application as a command
line tool. (You can obtain it with apt install ed
.) It can be useful
for more complex scripting of large batch file edits. Of course vim
can also handily do such jobs. If you really have a big agenda though,
ex
can outperform vim
. Also consider ex
if you have an
absolutely gigantic file that you must reach into with precision and
make some changes; it will handle memory issues efficiently.
expr
A lot of this command’s functionality is now in modern Bash. But in the past Bash was quick and efficient because it left this functionality to other processes. This process specifically calculates stuff. Here are some examples.
$ expr 10 + 5 # 15
$ expr 10 - 5 # 5
$ expr 10 \* 5 # 50
$ expr 10 / 5 # 2
$ expr 10 % 3 # 1 (remainder)
$ expr length "xed" # 3
$ expr length "$USER" # Length of the current user's username
$ expr substr "xed.ch" 1 3 # "xed" (from pos 1 with len 5)
$ expr index "xed.ch" "." # 4 (counts from 1 not 0)
$ expr match "xed.ch" "xed" # 3 (characters matched from start)
$ expr 10 = 10 # 1 (true)
$ expr 10 != 5 # 1 (true)
$ expr 10 \< 5 # 0 (false)
$ expr 10 \> 5 # 1 (true)
$ expr 0 \&\& 1 # 0 (logical AND)
$ expr 0 \|\| 1 # 1 (logical OR)
$ expr \( $counter = 5 \) : 1 # Conditional output
Very useful in scripting.
stat
Display "status" details about a file. This can often be more accurate and direct than pulling file metadata off of directory listings.
Here’s an example.
$ echo "ok" > ok && stat ok
This will produce output that looks like this.
File: ok
Size: 3 Blocks: 8 IO Block: 4096 regular file
Device: 802h/2050d Inode: 30543966 Links: 1
Access: (0640/-rw-r-----) Uid: (11111/ xed) Gid: (11111/ xed)
Access: 2023-06-28 11:23:32.668705500 -0400
Modify: 2023-06-28 11:23:50.280678400 -0400
Change: 2023-06-28 11:23:50.280678400 -0400
Birth: 2023-06-28 11:23:32.668705500 -0400
id / whoami / logname / who
The whoami
command is basically a synonym for echo $USER
showing
you whom you are logged in as. The id
(note, easier to type)
command also supplies you with numeric values for the user and also
detailed group information. The logname
command is exactly like
whoami
but longer to type and slightly more cryptic in my opinion.
who
shows all the users logged into the system.
seq / nl
The seq
command produces a sequence of numbers.
$ seq 4
1
2
3
4
This can be very helpful for many purposes. Maybe you have a lot of images and you want to keep only every third one or something.
for N in $(seq 1 3 25); do rm -v IMG${N}.jpg ; done
This will get rid of IMG3.jpg
, IMG6.jpg
, IMG9.jpg
, etc.
The nl
command is similar but it numbers lines that are supplied
as input.
$ nl /proc/meminfo | head -n3
1 MemTotal: 65652732 kB
2 MemFree: 59790048 kB
3 MemAvailable: 60952312 kB
join
There are many ways to maintain normalized databases but one of the
most spartan and efficient is using the unix join
command and saving
yourself the need to have a particular SQL engine installed. The join
command wants the inputs to be sorted, but it can then match up fields
from one file to another an output what is effectively an ordinary
SQL-like "join". See my SQL notes for an
example.
paste
Paste is similar to join except that it just takes two files and merges a line from the first file with the same line (position) from the second. Here’s a quick example of what paste looks like.
$ paste <(seq 5) <(seq 5 -1 1)
1 5
2 4
3 3
4 2
5 1
pr
Not to be outdone by paste
the pr
command can also do what paste
does, more or less. The pr
command is for automatically formatting a
big block of text or lists of data, ostensibly, for printing. Here it
is doing a similar thing to the paste
example.
$ pr -tm <(seq 5) <(seq 5 -1 1)
1 5
2 4
3 3
4 2
5 1
There are tons of options and obviously this can be used for more than
printouts. It can be used to structure data in more readable sensible
ways. Note that it can break up long lists into columns. For example,
see what seq 200 | pr -3
does.
lp / lpr / lpq / lprm / lpstat
The lp
command submits jobs to the "line printer" — in other words
what normal modern people would think of as simply "print" to an
actual printer. The lpr
is a simplified version of lp
for normal
cases; lp
can take a zillion options that lpr
simplifies for you.
The lpq
command shows you the printer queue and is helpful for
figuring out why your print job disappeared into the void. And the
lprm
command will help you get rid of those zombie jobs in the print
queue. And lpstat
checks the status of the printer.
I hardly ever use printers now, so I don’t really know how the details work these days. The whole printing functionality in Unix is provided by something called CUPS — the Common Unix Printing System. And despite this being developed by Apple, it is properly open source and publicly licensed and it works pretty well and is universal on all normal Linux systems that can print.
This obviously does something with mail. It turns out to be a
fantastically useful command line scripting trick to send emails and
that’s what the mail
command does. A great example of when this is
useful is in a cronjob. The following entry in a crontab will send an
email containing "The message!" to myself@xed.ch
with the subject
"Daily Reminder" every day at midnight — 0 minute and 0 hour of every
(* * *) day.
0 0 * * * echo "The message!" | mail -a "From: REMINDER <x@xed.ch" -s "Daily Reminder" tomyself@xed.ch
Or if you want to just send some output to someone or yourself, you can do something like this.
cal 2023 | mail -s "2023" myfriend@geemail.co
Note that you need to have some kind of mail handler configured. I
have been using apt install nullmailer
on Debian, but you can set up
much more elaborate mail handling.
mesg / write / talk
These commands relate to an old party trick of writing messages to the
terminal of some other user. This sounds kind of crazy and now that
Unix is actually important and security is a thing, it is kind of
crazy. The mesg
command configures permissions
It’s helpful to know of the existence of the ‘write` command since it seems like it would be an important concept and one might naturally wonder what that word is up to. But since people don’t write on each others’ terminals much these days, that original command is somewhat vestigial.
In the early 1990s I used talk
as a kind of primitive Discord or
Slack. However, that tool is largely obsolete for me because I have a
technique for implementing a complete
chat utility using Bash’s named pipes. (Discord works mostly fine on
Linux today too)
printf
The printf
command is borrowed from C and (on my system) is both a
standalone executable (see /usr/bin/printf --version
) and
simultaneously a Bash shell builtin (see help printf
). It is used to
create formatted output templates.
It works by taking two things: the template which has place holders for the variables, and variables. The variables don’t have to be variable but then there’s not much point. Here’s a simple example.
$ printf "Hello %s\n" $USER
Hello xed
Numbers follow the conventions for C printf templates.
reset
Initializes the terminal. I find this very helpful after I make a
mistake and send a bunch of non text data to the terminal. For
example, lets say you accidentally tab completed badly and did this:
cat myproject.jpg
instead of .txt
. Well, that JPEG is filled with
binary data that is not well suited to being displayed on a text
terminal. Many of the 1s and 0s of the image will be interpreted by
the terminal as some kind of exotic control codes. Basically if you
send hundreds of thousands of random bytes to your terminal, things
will get very messy very fast. You will invariably end up modifying
terminal settings that cause much mischief. The reset
command fixes
all this and gets you back to a normal terminal again. I often type
this command blind because I’ve just accidentally turned off user
input echoing!
stty
So you know about reset and how to repair
your terminal settings if you mess them up. Well, stty
is one
terrific way to mess them up. It is the command you use to set
terminal settings. Note that "tty" shows up a lot in Unix and
it stands for "teletypewriter", often abbreviated as "TTYs". They were
electromechanical devices that combined a typewriter-like keyboard
with a printer. These were used before actual interactive displays
(that did not leave a permanent hard copy) were common.
This command has a bazillion options and it’s all quite confusing. So if you find it confusing, you’re not alone. Mostly this is used when following some instructions that eliminate some particular unusual problem for your unusual software or setup. As things have standardized over the decades, the need to do this has decreased a lot. Thankfully!
tabs
If you want a more obvious way to make mischief than stty
then the
tabs
command is for you! It can mess with the default tab stop
value. Or something like that. I don’t know really because tabs are
evil and should always be avoided.
script
This is an interesting command that can create a transcript of an
entire terminal session. Let’s say you’re trying to write
documentation of how to setup a system or run a complex job or
something complex like that. You would like to show other people the
command line steps you took. Normally I would just cut and paste the
bits that I care about. But when the transcript is going to be quite
long, it can be very tedious to do this. The script
command will
create a subshell where every character used in the terminal is also
recorded in a specified file. You can also use the -c COMMAND
option
to have the subshell run some program of interest instead of a general
shell. For example, if you want to show a sequence of clever things in
Gnuplot, you could do script -c gnuplot clever_plots.log
size
I’m sure this seemed very obvious and sensible back in the first days
of Unix where understanding the size of object files was perhaps more
important. You cat do something like size
/usr/lib/xorg/modules/libwfb.so
to see obscure size information about
certain types of binary files. What you do not want to do is use
this keyword for your own purposes. Or if you do, just understand that
there may be a Unix command called that already present.
strings
Sometimes you have a binary file and you’re not quite sure what it is
or what should open it. The strings
command looks through the 1s and
0s of the binary file and tries to find plausible ASCII characters
that make some sense. It then prints these as a list.
strip
Often object files are compiled containing debug symbols and other
helpful human oriented stuff that may not be strictly necessary to the
computer. The strip
command can get rid of these. The benefit of
course is you’ll have smaller files that will, in theory, load faster.
tee
Duplicate the standard input sending one copy to standard output (like
normal) and one copy to a file. This command is used like a special
kind of pipe (e.g. | tee myfile |
taking the place of a simple pipe)
is a way to intercept a pipeline at some point and leave a record of
what was going through it. This is an extremely useful trick. I often
use it where I both want to see what is happening and skim a record
for further analysis or doing something else with.
test
This command is another one like printf
which has both a compiled
executable program and which is also a Bash builtin function. The
reason for this, by the way, is that in the old days before Bash,
there was just sh
and it didn’t have such fancy features built in.
The test
command is basically a way to check many things about files
and strings. For example, if I do test -r /home/xed
the exit code
will come back true (0, which in exit codes is the successful state — think of it like "was there an error?"). If I do test -r
/var/log/apt/term.log
it will be false (1 in this case). This means
that my own home directory is readable (the -r
) by me but not that
system log. There are a lot of things test can do. See help test
for
a list of them.
Also the test
word can be replaced by double brackets. This does the
same thing as the previous examples.
[[ -r /home/xed ]]
[[ -r /var/log/apt/term.log ]]
This construction is frequently used in Bash if
statements and other
conditionals. It’s good to keep in mind that test
is mostly going to
be a taken word in a Unix system and you’d best not name your own
things that.
tsort
This does a topological sort. This basically has the ability to unravel a directed acyclic graph. Check out the following example.
$ cat deps
taskB taskA
taskC taskB
taskD taskC
taskE taskA
$ tsort deps
taskD
taskE
taskC
taskB
taskA
Here the syntax of the dependencies file is taskB
depends on taskA
(as written in the first line) and so on.
umask
This command displays the user file creation mask. What does that
mean? This has to do with the default permissions file have when you
create them. This is a shell built-in and this allows the shell to
know how to protect files you create with something like myprogram >
myoutput.txt
. The file creation mask can be set in a Bash
configuration file so it’s always active or you can change it. Note
that this is a mask, not the permission itself. This means these are
common reasonable settings.
-
umask 002
- file permissions of 664 (rw-rw-r--) and directory permissions of 775 (rwxrwxr-x). Group members can work on group files. -
umask 022
- file permissions of 644 (rw-r—r--) and directory permissions of 755 (rwxr-xr-x). Good for sharing your documents with others but keeping them from modifying them. -
umask 027
- file permissions of 640 (rw-r-----) and directory permissions of 750 (rwxr-x---). more privacy, restricts group members from reading owner’s files, but allows them to access directories. -
umask 077
- file permissions of 600 (rw-------) and directory permissions of 700 (rwx------). The most restrictive sensible setting. Used when you want to keep files completely private.
uname
Prints the current unix OS, often just "Linux". I like to include the
-a
option for "all" information.
$ uname -a
Linux ra22 5.10.0-14-amd64 #1 SMP Debian 5.10.113-1 (2022-04-29) x86_64 GNU/Linux
If you’re interested in this, you might also wonder what distribution you’re using.
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
[... etc]
uptime / w [uptime]]
Did you have a power outage? How can you tell for sure? The uptime
command will tell you how long your computer has been running.
$ uptime
11:25:30 up 190 days, 22:39, 18 users, load average: 0.02, 0.02, 0.00
I have had over 1000 days, but note that you should be doing OS updates (with a reboot which resets this) way more often than that!
The single letter w
command is like a combination of uptime
and
who
.
wall
This will write a message on all of the terminals currently being used.
You can just type wall
and enter. Then type some stuff and end with [ctl+d].
It’s actually confusing if you can supply a message string directly or a path
to a file containing the message. I did the former and got this put on
all my terminals, including all those in tmux
sessions.
Broadcast message from xed@ra22 (pts/17) (Fri Jul 21 11:35:33 2023):
Just a test of the wall command.
This is used by the shutdown
command.
yes
At first this command seems baffling and perhaps a bit daft. If you
just run it with no arguments it will print the word "yes" over and
over and over forever like some kind of nerd practical joke. What is
the point of that? To understand the motivation imagine some software
experience in the old days where a program is run in a text console
and it asks you a lot of pesky questions. Modern computer users are
certainly familiar with the experience of clicking "I agree" umpteen
times when trying to install something. The yes
command was
conceived to help text console users do similar things. Although it’s
not actually an optimal solution, just for example purposes you can
consider the rm
command asking too many pesky questions and using
yes
to move things along. Here is a classic usage setup for this
weird command.
yes | rm --interactive=always oops-*
But yes
has more tricks up its sleeve! If you don’t like "yes" being
repeatedly generated forever, you can supply an argument and that
will be the word that is spewed out.
yes no
Will produce "no" over and over forever. Sometimes you have a bunch of
repetition in your output that yes
can fill in for you.
$ yes place award | nl | head -n3
1 place award
2 place award
3 place award
ℱancy yes
A subtle point about a command as simplistic as this is that it really
is a surprisingly good way to fully load your CPU. It basically prints
its output and if there is no impediment, it immediately prints more.
It’s kind of the difference between lifting weights and waving your
empty hands. You can get out of breath lifting weights but you’ll get
out of breath faster waving your hands as fast as you can. How could
this property ever be useful? I like to use yes
when I’m testing
CPU loading and/or cooling capabilities. Here is a way to define a
function that can replace sleep
but instead of doing nothing and
leaving your CPU alone, this one manically does everything it can
allowing you to test loading multiple cores or prolonged CPU stress.
function hotsleep { timeout ${1}s yes > /dev/null ;}
hotsleep 5 # Full CPU usage for 5 seconds.