Unix Commands

Most of my notes are for my reference, but this one is for you! Several people have asked me how to take the first (or next) steps on the path to Unix command line mastery. Here is a collection of commands that I feel are essential for a proper computer professional to know about. If you’re a regular person who would like to leverage the built-in power of your Unix operating system, it may be good to have a look at this list and see if there is anything that you might find useful.

I tried to check every listed command on FreeBSD and OSX. The entire list works on those platforms unless otherwise noted. Be advised, however, that outside of Linux, the exact syntax of the commands and their options may differ slightly (you know what to do!).

Beginners may safely skip the subsections labeled "ℱancy" which cover (even) more esoteric aspects of the topic.

Also you may find my old Unix Beginner notes helpful. That covers more of the major concepts while this list looks specifically at the collection of commands that I personally use and feel that others will generally find useful.

All of my help notes live at http://xed.ch/help or use xed.ch/h which is the short version.

Table of Contents

alias at awk bash bzip2 cal cat cd chgrp chmod chown clear convert cp crontab curl cut date dd df diff dmesg dmidecode df du echo exit export false feh file find fmt free grep gzip head help host htop identify ifconfig iftop import ip iptraf kill less ln locate lsblk ls lspci lsusb man md5sum mkdir more mount mv netstat nohup nslookup numfmt parallel passwd ping ps pwd rev rmdir rm rsync screen sed sensors set shutdown sleep smartctl sort split ssh ss stat sudo su tac tail tar time tmux top touch tr true truncate type umount uname uniq unzip vim wc wget whereis which xargs xxd uptime umask wall yes

Note that you can directly refer back to a specific command in this document with a URL in your browser formatted like this.

http://xed.ch/h/unix#cat

Or, once you get the hang of this stuff, you can read sections directly from a Unix command line like this.

$ CMD=cat && wget -qO- xed.ch/h/unix.txt | sed -n "/== ${CMD}/,/==/p"

Here I’m using wget and sed to show me the cat command’s section. That’s pretty handy. On second thought, I’ll be using these notes myself too!

HOWTO: Be The UNIX Expert

    +--------------+
    |  Start with  |
    | Unix problem |
    +--------------+
          |
    Obviously you don't but
    did you think you knew ---(no)---.
    what you were doing?             |
          |                          |
        (yes)                        |
          |                          V
    +----------------------+   +------------------------------+
    | man relevant_command |   | Think of way to              |
    | Search for options   |<--| express the problem          |
    | involved.            |   | using many or none of the    |
    +----------------------+   | ambiguous keywords involved: |
            |                  | mount, cat, cut, tail, kill  |
      Fix problem?----(no)---->| Search for that on the web.  |
            |                  +------------------------------+
          (yes)
            |
    +-----------------------+
    | You win 0 exit codes! |
    +-----------------------+

man / whatis

This is the most important command because if you can remember it and the existence of the other commands you need, you can instantly get a thorough reminder of exactly what you need to know. man is short for "manual" and in the old days it was envisioned that the system would help you typeset the documentation and many people probably did that. But Unix people were some of the first to realize that actually getting it onto paper didn’t really improve, well, anything. Using an electronic version, you can keep it very up-to-date, do some brutally fast searching (use the slash key "/"), and zero in on what you need to know very quickly.

It is simple to use in normal cases. To see the documentation for the cal program just do this.

man cal

When you’re done with man just hit "q" to quit.

The reason these notes are mostly for you and not me is that I use man all the time. The advantage these notes have for beginners is that I am highlighting my favorite aspects of the various Unix tools while man pages are usually scrupulously thorough about every mote of functionality. It doesn’t help that man pages are written in an idiosyncratic style that is extremely compact and efficient. At first glance it looks about as useful and inviting as reading legalese in a license agreement. This can be overwhelming for beginners. But trust me, man pages are a very powerful resource. It’s good to bravely face their style quirks until you can refer to them without fear. Remember that the alternative is "dummy" "help" which says inane stuff like, "The paste icon pastes." Grr. Better to know the answer is there but your abilities are not than the other way around.

ℱancy man

If you’re a C programmer or doing some other serious business, you may need to specify the section number of the manual. For example, man stat gets you the man page of the Unix stat command. However if you’re looking for the C programming system call you need this.

man 2 stat

Normal people never need to worry about this (by definition). If you do, man man has a nice list of what the sections are.

The whatis command searches man pages for some word that you specify as an argument. For example, doing whatis printf shows that there are two man pages for this command, a command (man section 1) and a C function (man section 3).

exit

Nothing is more frustrating than being stuck in some program you want to stop using. This is also true of a command line terminal. The exit command is really a feature of the shell. Beginners don’t need to worry about what that means — just know that it will exit your shell’s session, probably closing your terminal too.

ℱancy exit

I personally like to just type "Ctrl-D" at the beginning of a new line. This exits the shell immediately. Note that as with stat, the man page for exit can be misleading. There is a C function called exit() which is what the man page talks about.

As you begin to truly understand all your options, you may find that ordinary GUI operations become more and more tedious. Here’s a very powerful trick involving the exit command. This is how I start my big clumsy web browser.

$ grep BROW ~/.bashrc
export BROWSER=/home/xed/bin/firefox/firefox
alias ff='bash -c "${BROWSER} --new-instance --P default &";exit'

The "normal" way to start a browser from the command line is pretty good. I can just type "firefox" (and with command completion, that is pretty easy — f - i - r - tab). But the problem with that is that the terminal then stays open the whole time waiting for you to finish using the browser. With the alias shown above, I type ff and hit enter — the browser starts and the terminal I launched it from disappears, thanks to the exit command.

What is so valuable about this approach? Well, if you use command line terminals a lot, as you can imagine I do, it’s likely that you have a very fast way to launch a terminal. I think that many systems come preconfigured with Ctrl+Alt+T as a way to launch one. I always define Ctrl+Shift+N to instantly launch a (new) terminal. This means that when I want to run Firefox, I press Ctrl+Shift+N, type "ff", and then enter. If you think there is a faster mouse-based way to reliably do that (and anything else that needs doing), I will happily and strongly disagree with you.

clear

Got a bunch of busy junk on the terminal screen and you want to clear it? Just use the clear command. Another way to achieve the same effect is to just press Ctrl-L. This works in many text interactive situations including the shell. The advantage is that you can clear previous lines while working on a new command line. Why use clear as a command? It is helpful if you want to put it into a script to make sure the screen is not cluttered before starting some output. I find it very helpful in scripts that print out a small report that you want to watch change.

This will keep the screen filled with a report of sensor readings updated every 10 minutes in a way that is not cluttered.

while sensors; do sleep 600 && clear; done

help

When dealing with shell built-in functions, for example exit or read, you can use the help command to get more information about them. This is way easier than reading the 46k word man page for bash which also incidentally contains that information.

ℱancy help

You can just specify a part of the word you want help with. If it is unambiguous it will work.

help he  # Help on "help"
help de  # Help on "declare"

shutdown

Sometimes you have a shell and you don’t have a GUI thing and you want to turn the system off in a polite way. The answer is to use the shutdown command. I do it like this.

sudo shutdown -h now

Or if you want it to shutdown in 5 minutes, do it like this.

sudo shutdown -h 5

The -h means "halt" as opposed to -r which means "reboot" (it will turn off, but then come back to life). You can also just use the commands halt or reboot. I tend not to do that because they are actually aliases for the systemctl command, not that there’s anything wrong with that. There are many ways to do the job. I think it’s traditional for init 0 to also shut a system down, but I’d save that until you know what you’re doing and why you’d choose that option.

passwd

I have set up Unix accounts for users who only surf the web. The one command line program they did need to use, as I stood there coaching them, is the passwd program so that they could change their password. Simply type passwd and it will ask you for your existing password. (This prevents jokers from messing with you by changing your password while you’re away from your desk.)

Here’s a thing that I find weird: normal people find it weird that when they type their password nothing happens. People have been so conditioned by GUI form boxes that if there are not dots showing up, they easily get confused. I’ve been amazed by this at least a dozen times and that’s dealing with PhD level researchers. So… when you type your private passwords that no one else is supposed to see they will not show up on the screen. Got it? Ok.

The next face palm opportunity for the person administering your account is what constitutes a smart password. We’re all familiar with password strength meters on web sites, etc. Linux also does some quick checking to see if your password is idiotic. I don’t know all the rules, but here’s a rough idea based on my experience.

Don’t use your own username in a password.
Don’t use a word sitting in the system’s spelling dictionary.
Don’t make it very short (less than 7 characters).
Don’t repeat characters or patterns too much.
Don’t use keyboard patterns (QWERTY, etc.).

I’ve had people sit down to type their password into a system that I and their colleagues are all counting on them to keep secure with a decent password and they get rejection after rejection. Don’t let that be you! I’m talking to you ann123!

Note that if you want a terrible password, Linux is cool with that, but you must run it as root. So maybe like this for user lax.

sudo passwd lax

Note that it won’t ask you for your current password as root because it assumes that if you’re root, it can’t stop you from doing what you want anyway.

true / false

Some of the strangest commands in Unix don’t do anything at all. The true command just immediately exits and emits an exit code saying that the run went fine. The false command does almost the same thing, but its exit code seems to complain that something wasn’t right (but that’s what you asked for!). This is useful for scripting more often than you’d expect. It’s like Python’s pass command which also does nothing — it allows you to fill blanks that need filling.

By the way, if you’re a normal person you probably don’t need to know about exit codes at all and you can use all of these nothing commands interchangeably.

One interesting use of these commands is to create a new blank file.

true > mynewblankfile

This will create an empty file called mynewblankfile. This is because true produces no output and directing that to a file (with >) is filling that file with the nothing. Use some caution because if the file already existed and had something important in it, it will be gone. In other words this technique is especially good at blanking an existing file.

A similar thing is the shell builtin :. It is basically a shell level version of true. You can even make empty files (or make files empty) with it the same way as with true. And if that’s too much, you actually don’t need any command really. Just >mynewblankfile and nothing else will do the same.

I like to use : to take random notes. Need to jot down a phone number and you just have a Unix terminal? Type : 6191234567 and nothing bad will happen. Don’t get too poetic doing that however, since the shell still tries to make sense of what you typed. If you’ve used complex syntax, you can invoke an error. A safer similar way to do the same thing is to use the shell comment character like this.

$ # Most things are 'ok' to type here with no side effects.

Or if you like hard to remember Bash tricks, you can type something and then after the fact press "escape" then "#" for the same effect.

ℱancy : Sometimes you want some command that normally produces output to not produce output. The traditional way people do this is by redirecting the output to the "null" device which gobbles it up and that’s that. It looks like this in practice.

$ ls > /dev/null

In that command, no output is produced. There is another technique which I find easier to type — pipe your output to true or :.

$ ls|:

The colon command accepts all the output of the ls command, does nothing with it, and both processes end quietly. This works with true and false too.

time

The time command does not tell the time for you. If you want that, see the date command. The time command is a verb and it times durations of programs for you. This is very useful when quantifying performance issues.

Let’s see how long it takes to look up the IP addresses of two domains.

$ time host -t A xed.ch
xed.ch has address 66.39.97.213
real    0m0.017s
user    0m0.012s
sys     0m0.000s
$ time host -t A google.com
google.com has address 172.217.3.110
real    0m0.028s
user    0m0.016s
sys     0m0.000s

Why is Google slower? Who knows. But is Google slower? Today the answer is yes.

One neat trick I really like is to use the time like a stopwatch. Type this on the command line and press enter when you’re ready to start.

$ time read

It will then sit there waiting for you to create some input (as described in help read). Simply type enter as your only input when you’re done timing. You’ll get a nice accurate report on the duration of the interval between you pressing enter the first time and the second.

ℱancy time

The "real", "user", and "sys" values get into some hairy internal details. Basically, the "real" is the wall clock time or how long you needed to wait from start to finish. This can be affected by what else is going on with your system and how long that process had to wait in any queues. The sum of the "sys" and "user" times reflects how much CPU time was actually used by this process (in kernel system mode and user mode). Here is all you ever wanted to know about that.

echo

The echo command is another shell built-in. This means it’s part of your shell rather than a stand-alone executable. Fortunately its normal use is very easy. It is like a "print" statement in other languages and it simply takes the things you specify — called the arguments — and it sends them back out to you. It echoes them.

$ echo "Repeat this back to me."
Repeat this back to me.

Since the argument gets expanded you can do helpful things like this too.

$ echo /tmp/*jpg
/tmp/forest.jpg /tmp/flower.jpg

Or you can check the value of variables.

$ echo "User $USER uses this editor: $EDITOR"
User xed uses this editor: /usr/bin/vim

ℱancy echo

Remember just a few sentences ago when I said that it was not a stand-alone executable? Well, that’s not exactly true. On many Linux systems, not only will echo be a (very important) shell built-in command, but there will also be a stand-alone C program that does the same thing. Why? Good question. Probably some subtle performance benefits. For example, here’s Bash doing a million echo commands.

time seq 1000000 | while read N; do echo $N; done
...
real    0m16.858s

And here’s the C stand-alone.

time seq 1000000 | xargs /usr/bin/echo '{}' \;
...
real    0m0.959s

This shows that the shell is powerful but it’s not optimized for performance. In real scripts, if you’re dumping something on the screen it’s probably fine to always use the shell built-in. For more details about these very similar commands see Bash’s help echo or the stand-alone’s man echo.

One little trick you can do with both versions of the command is the -n option which suppresses the new line. This is handy for things where you want to consolidate the output a bit. Here is a nice demo of dots being printed, perhaps to indicate some subprocess is being completed.

$ for N in {1..50}; do echo -n . ; sleep .2; done ; echo
..................................................

cal

I include cal near the top of my list of commands to learn because it is so innocuous, obvious, and useful. I pretty much always use it to generate sample material to use in examples when I’m giving demonstrations. However, it is genuinely useful!

I use the -3 option a lot because I’m often curious about what day of the week a date next month lies on. So cal is easy to use even if calendars are inherently complicated. For example here is how to get a calendar of September 1752.

$ cal 9 1752
   September 1752
Su Mo Tu We Th Fr Sa
       1  2 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

The reason it looks broken is because the cal command is not. Enjoy learning about that!

ℱancy cal

It seems that modern versions of cal have a bug. If you do a which cal, you’ll find that it’s really a link to ncal. When run as cal, it makes the normal calendar but it also tries to highlight the current day. Which can be annoying (directing it into Vim or a cut and paste style operation). In theory the escape codes for the highlighting are supposed to be automatically suppressed when piped, but that is not always effective I have found. Anyway, the fix is easy enough. Use this.

ncal -hb

The bug which is interesting is that the -h suppresses the highlighting for the normal ncal style of calendar. But when you just try cal -h, you get an undocumented feature of the help page. So, ya, that’s not right. Still easy to work around. I’m sure they’ll sort this out soon. Normal classic cal from ancient times is still a simple cal and 99.9% of the time it does what you want anyway.

Oh, and if you don’t have cal at all, they moved it from bsdmainutils to the ncal package in Debian.

date

The date command can be easy. Type it and you get the current date and the current time. Easy and handy! Besides being good for a quick time check, it can also be good to append its output to log files.

ℱancy date

The date command seems easy, but there is immense depth in this command. First of all, some very OCD people made sure it is as accurate as can be. You can have it produce dates that are a certain time from now (or any other time you specify) in the past or future.

And the formatting of the resulting date is extremely flexible. Whatever you need labeled with a date, this command can make it correct.

My date notes.

less / more

The less command is a true workhorse. It is a "pager" which is a program that shows you text one page at a time. If, for example, you have a list of all the hockey games Tim Horton played in, that will most likely not fit on one screen. By using the less command you can scroll through the list as you need to.

You can use something like this.

$ less timhortons

And it will show you the first screenful of games from your file. Hit space and it will go on to the second page. Repeat until happy.

One thing helpful to know about less that will help it stick in your memory is that it is actually the second generation of pager. The classic ancient pager was called by the sensible name more as in "show me more every time I hit the space bar". The deficiency of more (which is still included for old school people; try it out) is that if you pass up your region of interest, too bad. And with more, at the end of the document, it exits.

With less however, it just stops at the bottom of the document and waits for you to further navigate arbitrarily. To go to the top of the file, press "g" to go to the end "G". I think page up and page down work but I use space and "b" for "back". I also heavily use "j" and "k" (vi keys) to move up and down one line at a time.

Importantly, to quit using less, press "q". (Just like when using man which usually is configured to actually use less.)

ℱancy less

The less command is so powerful that it often does more than you’d like. If you send it an HTML file, some configurations of less can filter the HTML into readable text. If this is not what you want you can pipe the file to less - where the dash indicates you want to page the standard input. Without a hint of what kind of file it is, it won’t do anything overly ambitious with it.

I like to dress up console output with control characters to make things have nice colors. To preserve this in less use the -r (raw) option. Easy!

top / htop

A very common useful task for a Unix command shell is to find out what the hell is going wrong on the system. If the performance has come grinding to a crawl, you can find out what is responsible. The top command is universally available on all good operating systems and shows an ongoing display of the "top" processes as measured by CPU resource use. The ones near the eponymous top gobbling up 99% of your CPU will likely be the ones you want to attend to.

The top command is fine, but as with more to less there is an improved version called htop. That is a nicer version in many ways and the functionality is more powerful too. If you have it, use it.

For people who really like to know what’s going on, considering installing the dstat package. I like to SSH into a machine I’m playing a game on (from another computer) and then run dstat. This shows me in real time while I play/experiment how much network traffic, disk activity, memory, and CPU usage is being consumed. This is a great trick to get at the source of problems.

ℱancy top

Sometimes you don’t need an interactive display. Maybe you’re creating a log of activity and you want to record the top processes periodically. Maybe you just want to check CPU business before a job runs to make sure no one else is doing anything. Unfortunately top’s non-interactive mode (called the "batch" mode) is kind of sucky. But, hey, this is unix, we can deal with it.

This shows top’s very informative header and the top 3 processes.

$ top -bn1 | head -n10

If you really just need the top process and that’s it, you can use something like this.

$ top -bn1 | sed -n 7,8p # With header labeling everything.
$ top -bn1 | sed -n 8p   # Just the top process' line.

Or if you need only the top CPU percent field.

$ top -bn1 -o %CPU | sed -n '8s@  *@\t@gp' | cut -f10

This is good for scripts and can be especially handy to see if something is running at 99% or 100%, i.e. very busy. You can become even better informed by having just the mean and standard deviation of the top 4 processes displayed.

$ top -bn1 -o %CPU | sed -n '8,11s@  *@\t@gp' | cut -f10 | \
  awk '{X+=$1;Y+=$1^2}END{print X/NR, sqrt(Y/NR-(X/NR)^2)}'

This is useful on a 4 core machine to see if all of them are being used. As you can see, you have complete control of the infinite possibilities.

ps

The ps command shows a list of your processes. It is a venerable Unix command that professionals need to know about. But normal people can stick with htop pretty much always. I’ve never ever found the ps command useful without at least some "options". Mostly I use ps -ef but there are other arcane styles too.

It’s mostly handy for processes you do not need or want which are running but quietly (escaping the notice of top by being near the "bottom"). For example check this out

$ ps -ef | grep Mode[m]
root       500     1  0 14:39 ?   00:00:00 /usr/sbin/ModemManager

I’m using the ps command to find out if I have something called "ModemManager" running. Yup. I do. Wow, even though it doesn’t take much for resources, I am pretty sure I don’t have a SIM modem on board and I’m even more sure I’m not going to get teleported to 1995.

pstree

Another way to view the processes. This highlights the relationships between parent and child processes by diagramming them as a tree. See also column — which can make this kind of tree diagram out of tables — for another way to achieve this effect.

kill

So you’ve found a process that needs to die. In the previous section, I found ModemManager running and found its process ID is 500. This should kill it.

$ kill 500

ℱancy kill

Some processes ignore the default kill signal (SIGTERM, i.e. terminate). That’s super rude in my opinion and the next level of retribution is this.

$ kill -9 500
$ kill -SIGKILL 500

These are the same thing but one may be more explanatory if it’s in a script. The -9 is the traditional common way to specify SIGKILL and is a bit more emphatic than plain kill.

If it’s not your process, you might need a sudo. (And, of course, next time you reboot, it will probably come back. But that’s another topic.)

pwd

This simply Prints the Working Directory. I actually remember it as "present working directory" for some reason. I don’t use this command much because I have this reported in my shell prompt so I’m always seeing it on every command. If you are in a situation where that’s not the case, it can be useful.

ℱancy pwd

The shell usually has an environment variable that contains this directory. You can use or check it in scripts, but here’s how to just emulate the pwd command.

$ echo $PWD

cd

This command Changes the current Directory to whatever you specify. For beginners the reasonable question is, what does it mean to be "in" a directory? Basically the shell just keeps track of some directory that we can pretend you’re "in" as if each node on the directory tree were a room. The end use of this is that you can specify actions with ambiguous locations and the system can assume you must be talking about the directory you are "in". It’s a decent system, but as you become more proficient, it’s good to start appreciating that it’s all kind of fake really.

ℱancy cd

Imagine you’re working on some project. Say in this directory.

[~]$ cd X/myspecialproject/

Then someone distracts you and wants you to acquire a bunch of files to look at locally on your machine. You don’t want to muck up the pristine beauty of myspecialproject and you don’t want to keep these files once you’ve taken a look at them; /tmp is a good choice.

If you are only going to jump over to one distracting directory and come right back, you can use the special cd argument -. That looks like this.

[~/X/myspecialproject]$ cd /tmp
[/tmp]$ cd -
/home/xed/X/myspecialproject
[~/X/myspecialproject]$ echo "Returned to the previous directory."

However, if you are going to potentially do many things in many directories, simply preserving your last used directory is not sufficient. A Bash feature I often forget about but which can be very handy is the directory stack. This allows you to save directory paths on a stack. This sounds kind of overly technical but follow along with the typical use case and you can see how easy and useful it is — if you can remember to use it!

Instead of using the normal cd /tmp to change to the /tmp directory, do this.

[~/X/myspecialproject]$ pushd /tmp
/tmp ~/X/myspecialproject
[/tmp]$ mkdir -p a/b/c && cd a/b/c && touch the_distraction

Now you can do the distracting thing with those messy files in the /tmp directory (or any directory or sequence of directories) and when you are done and ready to resume where you left off, you simply do this.

[/tmp]$ popd
~/X/myspecialproject
[~/X/myspecialproject]$ echo "Back to myspecialproject directory"

In summary, if you want to cd with the option of returning back to your current directory, use - to return from very simple diversions and for complex diversions, bookmark your current directory by using pushd when changing to the distraction directory.

mkdir / rmdir

Like living things, computer science loves its trees! My understanding of filesystem design is that this tree aspect of how your files are organized is for your benefit only. Deeper down, they’re not really in the trees you made up. But again, it’s an ok system and allows you to pretty well keep things kind of organized. (To give you an idea of how else it could be done, I believe that file organization interfaces should be in Cartesian 3d space which our minds are naturally optimized for, not tree topologies which computer science nerds can eventually force us to get used to.) Anyway, if you need another branch in your organizational scheme, create it with mkdir. This will make a directory.

$ mkdir cats
$ mkdir cats/tigers

Actually, it will specifically make a subdirectory since the top level (the trunk of the tree) should already be established when the system starts.

ℱancy mkdir

If one of the intermediate directories is not present, you can use the -p (for "parent") option and have it create them automatically.

$ mkdir -p mammals/placental/feline/tigers/bengal/cincinnati

The rm in rmdir is for "remove" and it does what you might expect and removes a directory. The catch is that the directory must be empty. This prevents serious catastrophes where you lose a lot of important directory contents. If you really want to delete a directory and contents there are ways to do that, but rmdir is not one of them. (See the very dangerous rm -r for that.)

ls

You’re trying to remove a directory and it won’t let you because it is not empty, how do you see what is there? Perhaps the most fundamental and common Unix command is ls which shows you the files a directory contains. It can also list a particular file. That may seem pointless at first glance — why would you do something like this?

$ ls myfile
myfile

Nothing you didn’t already know, right? But you can use "globbing" patterns which is a bit more interesting.

$ ls myfile*
myfile1 myfile2 myfile3

ℱancy ls

The ls command is so critical to my day to day existence that I personally must have an alias for it that optimizes it for my needs. The alias I use is this.

alias v='/bin/ls -laFk --human-readable --color=auto'

Let’s break that down and look at each option.

-l Produces a "long" listing. This also includes owner, group, permissions, and timestamp. I always want to see this.
-a Shows all the files. Why would it not? Well, I guess it’s considered a "feature" but files that begin with a period are often hidden in Unix operations. So something like /home/xed/.bashrc (where I define this alias) will not show up when I run ls /home/xed/ — this drives me crazy! So the -a cures that and stops hiding the truth from me. That to me is the most Windows-like "feature" of Unix — and therefore obviously super annoying!
-F This is also called --classify and it will put a little symbol on the end of the file to indicate what kind of file it is (directory? device? named pipe?). I like it and it reduces confusion for me, but for others, maybe it wouldn’t.
-k The default of ls is to show sizes in "blocks" which to me is damn near useless. This makes it show kilobytes which is more in line with what I can understand.
--human-readable Not just kB but I also want it to convert to GB if that makes sense. I’d leave this off only if I’m using ls output to do some kind of calculation. And I don’t do that often.
--color=auto Colorizes the output if possible. A nice feature.

Remember, if you have a fancy alias like this, you can always just type ls to get default behavior.

(Why the "v"? Well, in ancient days, Slackware used to come with that defined and I got used to it and now can’t live without it. For modern humans, I’d recommend using ll as an alias since that’s very commonly defined for modern distributions. Type alias ll to see if it is defined already on your system and to what if so.)

touch

I mentioned that the true command can be used to create new and empty files. The first thing most people who need this think to do is use the touch command. It will definitely do that job and that is that.

touch mynewfile

ℱancy touch

But to really appreciate touch you need to understand why it is named the way it is. In complicated build systems, it is common for some component B to depend on component A. If A is modified, B must be rebuilt. If neither A nor B has changed then you can mostly get away with leaving them alone. But sometimes, for strange but strangely common reasons, you want to pretend like A did get modified so that B definitely does get rebuilt. The way systems like this figure it is by looking at the timestamps. If B is newer than A, fine. But if A becomes newer than B, then B needs the rebuild to catch up. The idea is that we "touch" A but don’t really do anything to it. This just means that its timestamp is now, which is presumably later than B’s and B will be lined up for a rebuild.

That’s the historical proper use of that command, but it comes in handy for any kind of hanky panky with timestamps. I’ve used it, for example, when I’ve messed up my timestamps on photos I copied (i.e. timestamps changed from when the photo was taken to the inappropriate "now" of when I’m looking through them). If you want to tamper with evidence in the filesystem’s metadata, the touch command is critical!

numfmt

If you were impressed with the ability of the ls to do human-readable output, you’re in luck because numfmt is the general purpose tool for making weird big computer numbers into something comfortable to read. Really this doesn’t get used much because normal commands like ls and lsblk have built in human-readable options. But if you are generating your own interesting log files or data and want to have big numbers simplified to kilo and mega, etc. then this is the way.

echo "123456789" | numfmt --to=iec --suffix=B
118MB

truncate

The touch command is very handy but where I find it lacking is when I need a test file that is not empty. Sure there are other ways to do this but if you want to create a new file that is filled with zeros, you can use the truncate command with the -s size option. If the file does not exist, a new file will be created with the specified size. It will be filled with zeros.

You can specify the size with units to make things easier.

truncate -s 10K 10240_zeros.bin
truncate -s 10MB 1000000_zeros.bin

ℱancy truncate

Note that if the file does exist then the truncate command does what its name implies and it truncates it to the specified size (again with -s). This is not usually a helpful operation except when doing exotic stuff like preparing raw disk images.

chown / chgrp

If you’ve traditionally used bad operating systems, you may not have any experience with the concept of file "ownership". In Unix, files have "owners". They also have "groups". This allows the owner to limit who can do things (see chmod) with the files. In some cases the group property can allow multiple people to do things with the file but exclude non-members.

It is very common in modern Linux systems for every user to have their own group. So if I am user xed, it is common for there to exist a group xed. This is simple and obscures the point of groups. In fancy systems that may be different. Here I’m showing the id command for my user account on a high-quality commercial web host.

:-> [www666.pair.com][~]$ id
uid=71849(xed) gid=1000(users) groups=1000(users)

I am user 71849 but I am in the group 1000, "users". (And only that group.) This means that I have proper control over files whose ownership property is set to 71849. If I want to change the ownership I could try to chown command. It probably won’t work because if I don’t already own the files, they’re probably off-limits. And if I do, the target owner property that is not me is probably also off-limits.

In practice, this command is most always used with sudo. A good use case is when I copy files from a system like the one shown and those files are owned by xed(71849). But on my systems, I like to be be xed(11111). Since 71849 won’t even be in my /etc/passwd list of valid users on my system this command might be useful.

sudo chown xed:xed file_from_webhost

Note the syntax of xed:xed changes the user to the xed user and the group to the xed group (from pair’s group 1000 which is wrong). You can leave off the :xed if the group is already fine. Now instead of being owned by a nonexistent user, it is owned by my real local account and I can now access it properly.

As you can see, I can change the group with the chown file, but if for some very strange reason you need to change only the group, you can use chgrp. In practice, I have almost never used this thanks to chown being usually sufficient.

ℱancy chown

The mischief ownership problems can cause! Most users can just stick with simple organizational strategies but if you need to dig deeper, check out my Unix permissions notes which take a deep look into some of the details of topic.

Speaking of mischief, all of these "ch" commands take an option -R which means "recursive". If you specify a directory in the argument list, it will open that directory and change all of the contents. Recursively. This is very useful for making major changes to big archives but obviously it’s very easy to cause subtle problems with your file metadata. Subtle problems and complete hosings!

chmod

Focusing even more on permissions is the chmod command. People typically pronounce the "mod" part as they do in the word "modify". I’ve heard "chuh-mod", "change-mod", and I personally say "see-aych-mod". However the command really is there to change the file mode bits. Most people think that this is about permissions (true) and only permissions (not true).

To be extra confusing for beginners there are two totally different syntaxes that can be used to specify a file’s "mode". The way I like to do things is the "advanced" way but I think of it as the simple way because I’m used to it now. It might be better said that there is a computer friendly way and a human friendly way. I like the computer friendly way.

For normal people doing normal things there is a pretty limited palette of what you’d ever need to do with this. Let’s look at some of these. The commands are followed by explanatory comments after the # (i.e. not part of the required command).

chmod 644 myfile # Normal file read/write by owner, read by all
chmod 600 myfile # Normal file read/write by owner, locked to all others
chmod 755 myprogram # Program read/write/execute by owner, read/execute by all
chmod 700 myprogram # Program read/write/execute by owner, locked to all others

The only other quirky thing is that programs that need execute permission have the same "mode" as directories which need access permission.

chmod 755 mydir # Directory read/write/access by owner, read/access by all
chmod 700 mydir # Directory read/write/access by owner, locked to all others

If you can memorize that (or look it up here) that’s 99% of your chmod chores.

ℱancy chmod

If you’re a computer science person you should appreciate that these strange numbers (644,700, etc) are 3 digit octal numbers requiring 3 bits each. Each of the digits represent a class of user: 1st, the user herself, 2nd, the group members, and 3rd, everyone else. And each of the three bits in each of the digits is a switch controlling 1:executable, 2:writable, and 4:readable. Doing the math on the 644 example we see that the owner has 4(readable) plus 2(writable) and the group members and everyone else has just 4(readable).

That’s how it really works and if you’re a computer science major you should embrace this as a nice example of binary encoding — but if you are not, you can safely ignore this.

I won’t even go into the other scheme for specifying permissions. I know it but I hardly ever use it. There are some very weird cases where it does seem necessary when changing tons of files selectively. But deep down, all files have octal modes as described.

Ok, that’s not even true. I said they were 3 digit octal numbers, but the full mode is 4 digits. If you’re interested in that see my Unix permissions notes which covers a lot of very strange stuff.

rm

Got too much stuff? (You can check on that with df.) The rm command will get rid of some (or all!) of it. Of course this is exactly like saying that a chainsaw will get rid of some trees or a stick of dynamite will get rid of some rocks or a scalpel will get rid of a tumor, etc. Sure it will, if used correctly. If you bungle it however,… Wait, that’s not right… When you bungle it! When you bungle it, you want to be relieved that you have in place a good system of backups. Seriously. Come up with and adhere to a good system of backups! Do it.

Ok, your real stuff is backed up. You’re on some throwaway system. You’re a malicious psychopath who is actively trying to destroy someone’s life. How do Unix people get rid of files? Like this.

rm moribundfile

That’s it. It’s gone. If you’re nervous you can try this.

$ rm -iv moribundfile
rm: remove regular empty file 'moribundfile'? y
removed 'moribundfile'

The -v, as with many Unix commands, stands for "verbose" which is why it reported that the file is "removed". (Normally it kills silently). The -i is for "interactive" causing it to pause and ask you to think first and then confirm your reckless ambitions. I find this is kind of pointless and I don’t use it. To me it is as useless as saying, "Are you sure? Are you sure you’re sure? Are you sure you’re sure you’re sure?" until it gets annoying. But hey, create your own illusion of safety. Many systems come with the rm (and others) aliased to rm -i to force the issue. Again, I find it not helpful.

If you come from a computer environment designed for normal people, I have an important fact to tell you about: Not only do files not go in the "trash can" when using the rm command, there is no such thing! That "garbage can" or "recycle bin" stuff is a ridiculous fiction created by an office supply company which naturally saw the world through a bizarre office supply lens.

If for some bad reason you think the "trash" is a legitimate and useful innovation, simply create a directory called "Trash" and mv your files to it. Done. That’s all that is going on with desktop "trashes". There are two massive problems with that approach of course. One, sometimes you need to free up space because your drive is full and pretending to delete data but not really doing it is absurd. And two, sometimes some files really, really, really need to be deleted (…or hey, let’s give that malware another chance!). A bonus problem is bungling the "emptying" of the trash and not even reaping the putative benefits of it.

Folks, let me stress it again: Have! Good! Backups! There is no substitute. Don’t fool yourself into false security and performance problems with a pointless illusory "trashcan".

If you want some kind of crutch to keep you from deleting things you didn’t want to delete, a far better system (in addition to sensible backups) is to always get in the habit of checking your rm arguments with ls.

Imagine I had the notion to delete these parasitic files, i.e. all the ones in this directory.

rm .cache/mozilla/firefox/h1337c28.default/cache2/entries/*

It is very, very reasonable to run this command first.

ls .cache/mozilla/firefox/h1337c28.default/cache2/entries/*

If that looks good, pull the trigger with rm.

ℱancy rm

You can pick off one file after another like a sniper as shown above. But sometimes you need a much bigger calamity to strike. Note that things can go from "oops" to "you’re fired!" very quickly when you start applying Unix’s power to getting rid of stuff.

If you need to get rid of an entire tree, including sub trees and sub sub trees, etc., you need to use the very dangerous -r "recursive" option. This will get rid of the entire mozilla tree.

rm -r .cache/mozilla/

Use -v if you want to see what files it’s finding and purging. That sounds safe, but with big deletions, that can get tedious. Having all the millions of files of a big system you’re trying to clear go scrolling by can slow things down a lot. Sometimes I start it with -v, check that it’s deleting what I intended, press Ctrl-C to interrupt it, and then restart it silently. Use good judgement there.

By the way, if you’re cool enough to be reading this, you’re warmly invited to be my "friend" on Steam where I am known as rmdashrstar. Now you know why.

cp

The simple explanation is that the cp command copies files. The details can get somewhat complex but usually it’s as simple as doing this.

cp myfile myduplicatefile

Now you have two files with identical contents (see md5sum to prove it).

As with other commands that can make a mess of your file system, it is good to be very careful and to use the -v verbose flag.

I think the most common use I have for cp is when I’m about to mess with some configuration. Here’s an example of what I mean.

sudo cp -v /etc/ssh/sshd_config /etc/ssh/sshd_config.orig

I’m basically preserving a copy of this important (SSH server) configuration file so that if the edit I’m planning goes awry, I can restore it to its original condition (e.g. with mv).

ℱancy cp

In fact, making a backup of files I’m about to mess with is about all I ever do with cp. This may seem strange to people who might naturally assume that this command is one of the most important cornerstones of Unix. Maybe for some people it is, but I actually don’t use it much and here is why.

First of all, the whole idea of copying implies a duplication of resource requirements. So right out of the gate, it is almost an exemplar of inefficiency. You might say, what about backups? Indeed, it is definitely not efficient to lose all your work to media failure (or administration clumsiness). But for making backups I tend to always use the more serious tool, rsync. Always. After all, I’m just as likely to back up to a completely separate machine and cp just can’t keep up.

If it’s an intermediate scale between entire archives and single configuration files, I’m often keeping things backed up and organized with version control (hg, cvs, git). This leaves few jobs that need to be done with cp. When I reflect on what I’ve used cp for I’m kind of embarrassed at how clumsy the entire premises of those operations were.

One rare case where cp is redundant but perhaps easier is when you have a symlink A pointing to a real file R and you want a second symlink B pointing to R. Yes, you could just use the ln command but it seems like cp should work and it does. The slight advantage is you don’t have to think about R. These are the same.

cp -P A B
cp --no-dereference A B

mv

What do you get when you combine the cp command and the rm command? You get the mv command! As far as I know it is largely superfluous. You could copy the file and then remove the original to create the same effect as mv. But mv is intuitive and simple when that’s what you want to do.

The move operation is also pretty much the same thing as "rename". The syntax is simple enough for simple things.

mv oldname newname

Or relocating with the same name. This will put the oldname file into the directory oldfiles.

mv oldname /home/xed/oldfiles/

It can be good to use the -v (verbose) flag to output a report of what got moved.

mv -v oldname newname

ℱancy mv

Normally the mv command is lighting fast. After all, it’s just relabeling things really. But sometimes those ones and zeros in a file do need to actually move! If you’re changing which filesystem a file lives on, it needs to actually go and occupy new disk space. In these cases, it’s often better for large operations to use cp or even rsync.

Another quirk of mv is that it can move multiple files into a single directory. In that case the directory must be at the end of the argument list.

mv file1 file2 file3 dir4files/

Sometimes if you make a mistake and do something like mv *jpg it will complain that the last jpg file it finds is not a directory and the other files can’t move to it. A much sadder case is when it is coincidentally a directory and you accidentally muddle everything up. Been there; done that.

ln

In bad operating systems you can make "shortcuts" or "aliases" that look like files but point elsewhere. The proper name for such a thing is a "link". For most everybody most always, the link is a "symbolic link" or "symlink". To create a symlink the proper way use this syntax.

ln -s /tmp /home/xed/Downloads

What that example does is it creates a symlink called "Downloads" in my home directory. This is not a file or a directory, it is a symlink. However, it points to a directory, the universal /tmp directory so this symlink acts like a directory. What this does for me practically is that when a garish clumsy program like Chrome downloads things into the "Downloads" directory, heh heh, well, it goes to the /tmp directory — and the next time I reboot my computer, all that cruft is deleted.

My helpful way to remember the order of the arguments for ln -s is: the real thing comes first (/tmp) and the fake thing comes second (the symlink, Downloads).

ℱancy ln

Note that the -s is for "symbolic". It would seem there is another kind and there is. The default type of link that the plain ln command produces is called a "hard" link. What’s different about that is that it is not different from normal files. Your filesystem will see it as not just a file like the one it’s linked to but as the actual file itself. In other words, you will have multiple names (and complete file records) for the same exact blob of ones and zeros on the filesystem. If you change the hard link you’re changing the target too. Isn’t that what happens with symlinks though? Yes, but the difference is that if you delete the source of the hard link, the linked file will carry on as a regular file. If a symlink’s referent is deleted, well, now it just causes errors (e.g. "No such file or directory").

Normally you should stay away from hard links without a very good reason. And obviously you should stay away from circular symlink redirection cycles.

rsync

My rsync notes.

wget

This stands for "Web Get" and it gets things from the web. In reality it is a tiny web browser that just acquires the web resource and leaves you to look at it any way you like. Super useful. My wget notes.

Note that if you are a Mac user, you should have a look at the man page for curl which is a very similar program that is installed by default on Macs.

grep

The grep command is one of Unix’s most famous. Among serious computer professionals the word "grep" is used in conversation as a verb that means to extract specific information from a larger source. Its use can be very complex, but mostly it’s quite simple conceptually and practically. Does a file named "myfile" contain the phrase "searchterm"? Find out with this.

$ grep searchterm myfile

Here’s a simple useful example that I like. If you look at the Linux file /proc/cpuinfo you get a big dumping of stuff. But if you narrow what you’re interested in with grep, you get this.

$ grep processor /proc/cpuinfo
processor   : 0
processor   : 1

That would indicate to me that I’m on a 2 core machine. Let’s try a different approach on a different machine. Combining grep with other tools shows this.

$ grep processor /proc/cpuinfo | wc -l
8

This one is an eight core machine. Now I have a good way to check how many cores a machine has. Once you start using grep for various things, you’ll start to realize how powerful it is.

A neat trick you can do with grep is solve crossword puzzles. All you need besides grep is a list of all the words in your language. Since computers can check your spelling these days, this is usually already present. Using the normal Linux spelling word list here is how I can find a six letter word that starts with "c" and ends with "xed".

$ grep '^c..xed$' /usr/share/dict/words
coaxed

There are tons of options to grep but there are two that I use way more than the others. The first is -i which makes the searching case insensitive. That way you’ll find "PROCESSOR" and "Processor" as well as "processor" (if they’re there). This is very useful when scanning natural language texts.

The other very useful option is -v which inVerts the sense of the search. That is, it shows you all lines that do not contain the search phrase. One example is if you’re trying to do some further processing of another command which has a descriptive header. For example, the df disk free command prints this at the top of its output.

Filesystem     1K-blocks     Used Available Use% Mounted on

That’s nice for standalone use, but in a script, I may be able to take that for granted and I want all the lines but this line that I know will always contain "Filesystem".

df | grep -v Filesystem

That does it. But since it’s always the first line maybe you could have just used tail.

df | tail -n +2

Fair enough. But what about if you want to exclude all the entries for tmpfs (I have 6)? This will cut off the header and exclude the tmpfs entries.

df | tail -n +2 | grep -v tmpfs

ℱancy grep

It turns out that there are usually many "grep" commands you can run on a typical system.

$ which grep egrep fgrep rgrep
/bin/grep
/bin/egrep
/bin/fgrep
/usr/bin/rgrep

These are mostly aliases (actually links) which automatically invoke particular options. Most of the subtlety has to do with "regular expressions". The search term that grep uses is actually a very powerful syntax called regular expressions. This ancient syntax is easy to use in its simplest cases but can get very hairy quickly. If you’re interested in learning more, you can check out my 2003 Regular Expression Tutorial. Maybe I’ll update that some day.

Regular expressions are so fundamental to grep that the name "grep" itself stands for "global regular expression parser".

find

The find command is not grep. It does not find things in files. What it does is finds files in the filesystem tree. This may not seem useful if you only have a couple of files. "Which pocket are my keys in?" is not a difficult question. "Where in my house are my lost keys?" could be. When your filesystem becomes a haystack, find can quickly locate needles.

One simple but useful use of find is to just dump out all the files in a tree’s branch. This is different from ls because it operates recursively looking through all sub directories too. We can use this to see just how massive the haystack is.

$ find / | wc -l
511624

On my system running this (as root) shows that my entire filesystem tree has over a half million files. If you have misplaced one of them, you can see how find can be useful.

The normal way to use it is to specify a starting directory and a search criteria, usually a name. Here is an example I have used in the past when I’m trying to test speakers.

$ find /usr -iname "*wav"

I need a sound file, any sound file! I don’t care where it comes from. I know from experience that there will be some giant package stored somewhere below the /usr node in the tree which will include some sound files. Running this command informs me that there is a file called /usr/lib/libreoffice/share/gallery/sounds/strom.wav (and 47 others), just like the (iname) pattern I specified asked for. I never would have found that by casual file manager browsing. The "i" in iname stands for case Insensitive so that file.WAV will also be found. If you don’t want that, just use -name and then the pattern.

Remember, that the starting directory (the parents and ancestors of which you will not search) is specified first. Then comes the filter to narrow down what you’re looking for.

ℱancy find

Of course that simple usage is fine but the tip of the iceberg. The find command does a ton of other things. In fact, if it’s possible to filter files by their location based on their metadata (timestamps, ownership, etc) then it’s likely that the find command can do it. There is almost a full programming language of find filter syntax to make every thing plausible, possible.

My find notes have many more examples of find including some more exotic situations.

mount / umount

To "mount" a disk means to have the operating system detect it and confirm that it’s organized in a compatible way and that it’s ready for business. That’s all. The important thing for beginners is to understand that it is possible for disks to be connected and not mounted. In bad operating systems it is often because they don’t understand how good operating systems organize their filesystems. But in good operating systems, it may be strategic to not mount a drive to ensure that it is left completely untouched. Mounting can also come with options so that a drive could be mounted, but only for reading for example. This allows you to do forensics without any chance of modifying or corrupting anything.

Another important thing to know is that in environments that normal people use, the concept of "ejecting" a disk is really an unmounting process. Unix can do this explicitly with a lot of very fine control using the umount command.

Other than that, modern systems mount things automagically and regular users can ignore the topic.

ℱancy mount

Professionals however should know how to do this explicitly. The general format is something like this.

sudo mount /dev/sdc2 /mnt/mybackupdrive

The first argument is the device and the second is the "mountpoint", that is, where in the tree will this new system graft on. I think of the mount command in the same way I think of the ln command: real thing first, fake thing second.

Another useful tip is to use UUIDs when possible to mount things. The reason for this is that device names can change capriciously. Maybe when you plug your device in you always get /dev/sdg (I do). But what if you buy another similar device and plug it in too. Now what? Explicitly using the UUID removes any ambiguity and targets the correct device with certainty. This is especially important and useful with back up drives. Note that the UUID= must be capitalized.

mount UUID=e52a8c89-c84e-4dab-b71f-68ecda5cc4ec /mnt/backup/

A common thing to mount is a USB drive or SD card that is used in some device like a camera. The camera will inevitably want to use a crappy VFAT file system. One of the ways it is crappy is that it doesn’t handle unix metadata such as ownerships and permissions very well. I’ve had success using an option to the mount command.

mount -v -t vfat -o uid=11111,gid=11111 /dev/sdg /mnt/sdg

One other tip is that when you’ve unmounted a volume with umount there still may be outstanding writes that need to finish up. To make sure they are finished, issue the sync command and wait for it to finish. If you’ve done a umount and then a sync it is safe to remove the drive (assuming it’s removable!).

stat

Files are ones and zeros on your disk, but the illusion of files is stored by the filesystem that knows things about the file. This can be very useful to query. The stat command prints out all of a file’s metadata that it can find. Here is an example.

$ stat /bin/true
  File: /bin/true
  Size: 31464       Blocks: 64         IO Block: 4096   regular file
Device: 811h/2065d  Inode: 5767266     Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-01-10 09:23:01.105907652 -0500
Modify: 2017-02-22 07:23:45.000000000 -0500
Change: 2019-01-08 13:25:53.607888592 -0500
 Birth: -

Note this is especially good for finding out more about timestamps.

df

One day on the path to being a serious computer user, you will get a "No space left on device" error. Thereafter you will have a heightened sense of vigilance for that problem being possible. How do you check to see how much of your Disk is Free to use? The df command. By itself, it shows you all the filesystems it knows about and how full they are. If you specify a mount point or a device (or more than one) it shows you only that. Here I’m checking the space used (sort of the opposite of "free" isn’t it?) on my main top level filesystem (/).

$ df /
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/sdb1      117716872 17804248  93889868  16% /

I don’t know about you but I can’t understand that because it’s written in 1k "blocks" which is hard for me to think about. Adding the -h option, for human-readable, cleans things up nicely. For humans anyway.

$ df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1       113G   17G   90G  16% /

So I have a 113 gigabyte drive that I’m using 17 gigabytes of. It’s really a 128GB drive but remember that the filesystem sets aside some of that to organize the metadata for the half million or so files on a typical system.

If you’re on a Mac, the df command reports in nearly useless 512 byte blocks. You can use the -h human readable option just fine but to get comprehensible detailed output use the -k flag which will use the 1k blocks like Linux.

ℱancy df

Sometimes you need to see filesystems that are hidden. Use the -a option to really see All of them. Sometimes you use symlinks to add some more space to your setup and it’s easy to get confused about where the data really is being written. You can give df a path to a file or directory and it will show just the location of where it truly resides.

du

The df command is great for big picture knowledge about the entire disk but what if you want to know how much a particular branch of your filesystem’s tree is taking? The du command can add that up to your specification. By itself, I find du kind of useless. It will show you the size (in hard to understand blocks) of each directory in the current directory and below. That can get cumbersome. A better way is to limit it with the -s summary option and to use the human readable option, -h. Here’s what I generally do.

$ du -hs /home/xed/.mozilla
122M    /home/xed/.mozilla

Here we can see that my stupid clumsy browser’s working directory is packed with 122MB of garbage (mostly cached web stuff I suppose).

I can also count up all the subdirectories this contains.

$ du /home/xed/.mozilla | wc -l
623

That’s 623 (including the top level one) which means that this directory is filled with a baroque maze that’s less of a tree and more like a bramble patch. When planning backups, this kind of targeted analysis can be handy.

ℱancy du

What if you want to see which directory trees are the big ones? I use the --max-depth option to only descend one level. Here are the top 4 (tail -n4) biggest directory trees in my home directory.

$ du --max-depth=1 /home/xed | sort -n | head -n-1 | tail -n4
47780   /home/xed/.config
125648  /home/xed/.mozilla
691912  /home/xed/Downloads
1905432 /home/xed/.cache

Note that they’re all related to browsers!

This technique can also be useful if you have a bunch of users and want to see who is hogging all the space.

free

Shows memory stats. As in "free memory". Use the -h human readable option so you don’t have to think as much about what it means. Memory management on a Linux system is complex and baroque (though freakishly effective) so don’t feel bad if it’s not all crystal clear. I don’t exactly know everything it’s trying to tell me. But it’s a good quick check of memory.

On Linux, you can also do cat /proc/meminfo for similar information.

Probably Linux only. Not on FreeBSD or OSX.

lsblk

Shows block devices. Wonder what drives are on your system? Use this. Wonder what device name that USB flash drive was given? Run lsblk before inserting it and then run it again after and see which device just showed up.

Not on FreeBSD. Probably Linux only.

ℱancy du

I use this so often that I have this very helpful alias defined.

alias lsb='lsblk -o NAME,SIZE,TYPE,FSTYPE,MOUNTPOINT,UUID'

This eliminates some of the LSBLK cruft like major and minor device numbers (that I have yet to ever care about) and replaces them with some interesting things like file system type and the UUID code which can uniquely and positively identify a device. Not 100% sure if you’re formatting that USB flash drive you just inserted or your main hard drive? Double check the UUIDs and you can’t easily choose the wrong one.

lsusb

Need to see a list of your USB devices? Good for finding out things like your USB hub is swallowing some or your peripherals.

Not on FreeBSD. Probably Linux only.

ℱancy lsusb

The main thing I do with this command is to see if the USB system is identifying a new USB device that I’ve just plugged in. Here is a smooth workflow for isolating just that.

$ lsusb | tee /tmp/usblist
$ diff /tmp/usblist <(lsusb)

Now you’ve conclusively isolated it and can be sure it’s present without hunting through the entire list. Also, don’t forget to check dmesg.

lspci

Need to see a list of your internal PCI peripherals? Even if it’s physically not on a card per se, this will show you what the PCI subsystem knows about. It’s most handy for figuring out which graphics adaptor chipset you have so you can target obtaining the correct driver.

It can also be useful for figuring out what other chipsets you have on various hardware (NIC, sound, etc).

Not on FreeBSD. Probably Linux only.

dmesg

Something bad happen at boot? Getting some strange hardware related error? You can check the kernel’s "ring buffer" where it dumps internal messages that it thinks you may want to know about. To do this use the dmesg command. On some systems, this is considered privileged information and you’ll have to use sudo.

sudo dmesg

Or if you want to know if some Linux thing is active you can grep for it. Here are some examples.

sudo dmesg | grep -i nvidia
sudo dmesg | grep -i ipv6

This checks to see if the Nvidia driver started ok. And the other checks to see if IPv6 is active.

What the kernel tells you is a bit arcane but it’s still good to know how to check it and do some web searching for any problems that you find reported that concern you.

The kernel has other ways of providing you information. For example check out this command.

cat /proc/cpuinfo /proc/meminfo

The kernel will quickly pretend that there are two files (cpuinfo and meminfo) that contain a bunch of stats that the kernel knows. The cat command will dump them out for you. Very handy. Try it.

ℱancy dmesg

The timestamps dmesg produces are in time intervals since the computer was powered up. I find that to be pretty useless. The -e flag will give you sensible timestamps that let you know if some logged event is related to the problem you just had a minute ago.

Sometimes if you’re doing some robotics kind of thing with weird hardware and you want to see it get detected or whatever, you may need to monitor the kernel’s ring buffer in real time. Fortunately it’s simple to do. Check out these options.

sudo dmesg -wH

uname

This is supposed to tell you about the platform you’re running on. This is often used in scripts so the script can know what kind of system you’re on. For less boring use try the -a option.

$ uname
Linux
$ uname -a
Linux ctrl-sb 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux

I tend never to use this. Instead, I usually just do this.

$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

In the old days, I did this and it still mostly works.

$ cat /etc/issue
Debian GNU/Linux 9 \n \l

sensors

Start by installing this somehow if it’s not there. On Debian/Ubuntu type systems do this.

$ sudo apt-get install lm-sensors

Then have it figure out what sensors it can read (basically it’s autoconfiguring).

$ sudo sensors-detect

I just hit enter to go with the defaults of every question. Because I’m lazy.

Then you can put it to use and see what your computer knows about its sensors, usually temperature data.

$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +28.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:         +27.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:         +28.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:         +26.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:         +26.0°C  (high = +84.0°C, crit = +100.0°C)

That’s pretty handy to know, especially if you are having intermittent shutdowns on hot summer days.

Not on FreeBSD or OSX. Probably Linux only.

smartctl

My harddrive diagnostics notes. The smartmontools official website.

dmidecode

Produces a massive dumping of everything the system knows about your hardware. You need to use sudo and you probably need to pipe it to something like less or grep to be useful at all. Still, good to know when you need to find things out about your system. I think it is a good choice for archiving a hardware profile of machines you manage or care about.

By the way, DMI stands for Desktop Management Interface.

I found dmidecode on FreeBSD but not OSX.

tr

The tr command can be simple. It stands for "translate" but I think of it as "transpose" too — something like that. As an example, if you need all n’s converted to b’s and all a’s converted to e’s, this will do it.

$ cal | head -n1 | tr na be
Jebuery 2019

ℱancy tr

Things can get more complicated with ranges of characters. Here is a technique to get ROT13 output.

$ echo "This is mildly obfuscated." | tr [A-Za-z] [N-ZA-Mn-za-m]
Guvf vf zvyqyl boshfpngrq.
$ echo "Guvf vf zvyqyl boshfpngrq." | tr [A-Za-z] [N-ZA-Mn-za-m]
This is mildly obfuscated.

Here I explain a cool application of tr that I came up with.

sed

My sed notes.

cut

The cut command can be used to extract specific columns of data. You pretty much always need two options for this command. First you need to specify a delimiter, that is a character that will be the one which separates your fields. For example, in comma separated values, you’d say -d,. The other option is the -f which is the field number you want. Here I’m taking the 3rd field when separated by commas.

$ echo "one fish,two fish,red fish,blue fish" | cut -d, -f3
red fish

If I separate by spaces, I get a different result.

$ echo "one fish,two fish,red fish,blue fish" | cut -d' ' -f3
fish,red

awk

It turns out that a lot of people comfortably get away with not knowing about the cut command because they use awk. If you only know one thing about awk it should be that it is very good at easily extracting fields (i.e. columns) from lines of text. One nice thing about awk is that its default delimiter is space, so you often can just do something like this.

$ echo "one fish,two fish,red fish,blue fish" | awk '{print $3}'
fish,red

And if you need a different delimiter, you can do this.

$ echo "one fish,two fish,red fish,blue fish" | awk -F, '{print $3}'
red fish

Or this.

$ echo "one fish,two fish,red fish,blue fish" | awk 'BEGIN{FS=","}{print $3}'
red fish

That’s really all normal people need to know about awk.

ℱancy awk

But that’s just the tip of the iceberg! The reason that awk is generally better than cut is that it is way more powerful than the simple cut (which is fine if you’re going for minimal).

Awk is one of Brian Kernighan’s favorite languages. This is surprising since he is the "K" in the original K&R C. But less surprising because he is also the "K" in awK which he helped write. It is in fact a complete and powerful programming language. I have written some amazingly powerful and effective programs in Awk and I encourage professionals to get familiar with its power and potential.

Here is an example of a Bash program I wrote that creates specialty Awk programs and then runs them to solve problems in a custom yet scalable way.

Here is an example of how I used Awk to create pie charts on the fly from the Unix command line.

Definitely check out my my awk notes for other interesting applications and ideas.

cat / tac / rev

Short for "conCATenate". Although this is really to join multiple files together it is commonly used to just dump the contents of a file to standard output where it can be picked up by pipes and sent as input to other programs.

tac is a little known but fun and somewhat useful command that returns the lines of input starting with the last and ending with the first. I found tac on FreeBSD but not OSX.

rev is another little known but fun and less useful command that returns the lines of input reversed (right to left becomes left to right). Let me know if you come up with a brilliant application of this.

ℱancy cat

Don’t use cat! Use file arguments and redirections instead where possible. If you’re not concatenating, you probably don’t really need cat. That said, normal people shouldn’t feel bad for using it as a do-all convenience function. It works. It’s just that if you want to get to the next level of efficiency, it’s good to economize your scripts by getting rid of cats when possible.

One helpful simple use of cat is when a program tries too hard to format data. For example, the ls command tries to figure out how wide your terminal is and pack as many columns of output as possible. This normally makes sense, but if for some reason you want the display output in one continuous column, just pipe it to cat.

split

Most people don’t know about this command because it is rarely needed. But when it is, it’s nice to solve your problem in a way that you can be sure is direct and efficient. It is one of the many programs written by Richard Stallman personally.

Basically, like it says in the name, it splits data. Into multiple files.

Here’s an example. If you have a giant list of a million ligands and you have 100 computers that can check to see which ligands bind to a certain protein, you could use the split command to produce 100 files each containing 10000 ligands.

The split command is the natural complement of the cat command. How would you reassemble files produced with split? Simply concatenate them with cat.

A very good example of when I have used this is when trying to manage video files that need to be stored on SD cards formatted with vfat which has a 4GB file size limit. For example you might record 8 files with a GoPro; you copy them over to a proper file system and pick your favorite and concatenate them into an 8GB video. Now on the SD card you want to replace all the original videos with this one but 8GB is invalid on vfat. The split command to the rescue! This will break the video into 3 files (highlights.mp4-00, highlights.mp4-01, highlights.mp4-02).

split -n 3 -d highlights.mp4 highlights.mp4-

Note that these files aren’t video files and can’t be played as videos, but if you do the following, you can reconstruct (somewhere that supports the 8GB file size) the playable video.

cat highlights.mp4-* > highlights.mp4

fmt

While generally obscure, the fmt command is of special interest to people who — like I am doing right now — write large amounts of prose in a text editor. An important question arises: do you type away hitting enter only after you’ve reached the end of a paragraph? Or do you interject hard returns in your text as you type? The fmt command allows the best of both worlds. You can type away without worrying about where exactly those hard returns should go (something people had to think about when typing on an old manual typewriter), yet at the end of your input, you can have your lines be a sensible 70 or 80 characters wide (maximum) or whatever you want with the -w WIDTH option (default is 75).

Interestingly I went for 25 years without knowing about this program explicitly, and yet all of my prose is broken down into manageable line lengths for easier reading; my default value is 70 characters and longer as necessary for code comments. The way I make this happen is Vim’s set textwidth=70 and, if needed, the gq function. What I never knew is that behind the scenes Vim does have a set formatprg= setting which is described like this:

The name of an external program that will be used to format the lines selected with the |gq| operator. The program must take the input on stdin and produce the output on stdout. The Unix program "fmt" is such a program.

If you appreciate this great feature of Vim but would like to scale it up or automate it in some way, the fmt command is the answer. Or see fold below.

fold

Another classic tool very similar to fmt is fold You can specify the -s option and get breaks only at spaces. The -w 70 setting allows you to specify the width to break at (or before). fold is a little more serious about cutting off long lines while fmt will let them run over if there’s no sensible place to break them.

One interesting and useful application is to break down a line of text into a column of characters. Here is an example showing this and adding line numbers to each character position.

$ echo "xed.ch" | fold -w1 | sed '=' | sed 'N;s/\n/ /'
1 x
2 e
3 d
4 .
5 c
6 h

diff

The diff command simply finds the differences in two files. Well, no, not simply. It actually is a very hard problem to even formulate an efficient way for you to be apprised of what is different. diff solves that hard problem and other subtleties related to the task. If you think you have a better way to express file differences than diff, you’re probably wrong. Let’s look at an example.

Here I list block devices with the lsblk command and save the output to a file. I do this once before inserting a USB flash drive and once after.

$ lsblk > /tmp/before
$    # ... Now I insert a USB drive.
$ lsblk > /tmp/after
$ diff /tmp/before /tmp/after
7a8,10
> sdc      8:32   1  29.9G  0 disk
> ├─sdc1   8:33   1     1G  0 part /media/xed/ef6b577f-d3c3-4075-8da8-333d031b4515
> └─sdc2   8:34   1  28.9G  0 part

What diff shows me is only the part of the two output files that is different. Its format says that around line 7 you need to add the following lines (which start with ">"). On "diffs" (as they are called), deleted lines are indicated with the symbol is "<".

Note that the order of how the files are specified is important. The "diff" reflects what must be done to the first specified file to achieve the state found in the second.

ℱancy diff

That’s nice, you’re thinking, but maybe you don’t see yourself with a huge diff agenda. It turns out that this program is a cornerstone of human civilization. This is because pretty much all version control systems like RCS, CVS, Mercurial, and Git (and therefore all software) use diffs to keep track of what changed.

There is another related Unix command called patch (written by Perl creator Larry Wall) which takes something like /tmp/before plus a diff and produces a /tmp/after. If you’re wanting to tell Linus Torvalds about some brilliant change you have in mind for the Linux kernel, the normal way to do this is to post a diff (in email is fine) which will effect your changes if it is "patched" into the code. This is where the word "patch" in software contexts comes from.

Here is a hardcore use of diffs where I created a script to apply numerous custom patches to fix a terrible but important dataset. The point of this example is that I am taking for granted that there is no better system to explicitly record and apply what needs to be changed than diff and patch.

sdiff

This shows two files side by side highlighting their differences. This is the quick command line technique that might be able to replace something like vimdiff which does a fantastic job of such tasks.

cmp

There is also a unix command called cmp which mostly just checks to see if two files are the same. Unlike diff which is line based, cmp works on bytes making it a good choice for hunting down bytes that are different in a binary file. Getting an MD5 hash can tell you that there are different bytes, but cmp can tell you where those bytes are. The bytewise approach may be slightly more efficient. This might, for example, be useful if you suspect cosmic radiation has disturbed a large binary file — this is a thing that happens!

comm

This is another file comparing utility that operates on lines like diff. Unlike diff, its output is not optimized for creating a patch to reconcile the two files. Rather its output is designed to allow further processing in Unix pipelines. The command outputs 3 columns by default — the lines that are only in the first file, the lines that are only in the second file, and the lines that are in both. You can use options to suppress any of these columns to get the Boolean operation you need.

This command can be sensitive to ordering — here is a useful syntax to ensure the input files are ordered.

comm -12 <(sort fileA) <(sort fileB)

This shows only column 3, or the lines that both files have in common. Or you could use -3 to show just the differing lines; this might be useful, for example, when trying to compare file system trees and figure out what the differences are.

diff3

Pretty much like comm but for three files.

md5sum

I have a lot of respect for diff of course, but in my experience I have more occasions where I need to know if something changed rather than what about it changed. Make no mistake, diff can do that job, but there is a simpler way.

The MD5 message digest algorithm creates a "hash" of input data. This means that it makes a short (128 bits) numeric summary of input.

Think of it like some kind of inscrutable rhyming slang. Why would a "person from the United States of America" be called a "septic"? Well, "septic" from "septic tank" rhymes with "Yank" which is short for Yankee. You input "American" to a Cockney and you get "septic". WTF — go figure. Without a complex breakdown of the algorithm, you just have to accept that it is what it is. Hashes are similar.

If I feed md5sum that phrase, its rhyming slang produces this.

$ echo "person from the United States of America" | md5sum
f3b811b934ee28ba9e55b29c6658c5b7  -

That 32 character nickname is no more or less understandable than "septic". What it is, however, is unique. If I send it anything else, that "nickname" will be reliably different. And not just a little different but completely different in a completely random looking way. Just like rhyming slang.

Where is this useful? Well, everywhere for starters! This kind of thing is used heavily to make cryptographic keys. Something like this (and formerly this exact thing) was used to record your password in Unix /etc/passwd files. That allowed the system to check if it was really you without actually recording the secret word. It would just check the secret word you enter at log in by running it through md5sum and seeing if the hash on record matches.

Here is a simpler example that I run into a lot. Let’s say you have a big music collection or a big photo collection. With multiple backups and offload operations, it’s easy to get some files that are duplicates. (If this hasn’t happened to you, you have not managed enough files.) The question md5sum is ideal to answer is, "Are these files exactly the same?" Just run something like this.

$ md5sum /etc/resolv.conf /var/run/NetworkManager/resolv.conf
069bf43916faa9ee49158e58ac36704b  /etc/resolv.conf
069bf43916faa9ee49158e58ac36704b  /var/run/NetworkManager/resolv.conf

Here the MD5 hash for these two files is identical — these files contain the same exact 1s and 0s in the same exact order. (I cheated here a bit since the first is a symlink linking to the second.) If these are two distinct files, one of them is redundant.

On FreeBSD and OSX there is an equivalent command called simply md5.

ℱancy md5sum

Ok, so how would one find all the duplicated files on a filesystem? The md5sum command will surely be at the heart of the solution.

I wrote this little one liner script which takes a starting top level path as input and searches the entire sub tree for files which are actually the same. It saves a bit of time by only looking at the beginning of the files instead of the whole thing. Note that apparently 1k of an mp3 is not enough to distinguish it from others reliably enough.

undup

#!/bin/bash
find $1 -type f \
    | while read X; do head -c 100000 "$X" | echo -n `md5sum | cut -b-32`; echo " $X"; done \
    | sort -k1 \
    | awk 's==$1{print p;p=$0;print p}{s=$1;p=$0}' | uniq

Run like this:

./undup /media/WDUSB500TB/musicvault > musicdups

Just to sprinkle a little confusion and philosophical doubt into the topic, it turns out that it is possible that md5 for two different inputs will cause the same hash to be output. This is called a "collision" and it is very rare. Very. Rare. It is rare in the same way that choosing two random drops of water on earth would find them touching each other. Still for some life or death applications (mainly cryptography), this is not rare enough. There are fancier (n.b. harder to compute!) hash algorithms where the collision potential is something you could comfortably bet your life on. Here are nerds in late 2018 debating whether MD5 is sufficient for anything. To figure out which songs in your collection are dups, I say it’s more than fine.

sum

Note that there is an old program called sum that computes a very simplistic checksum of the blocks in a file. I feel like unless you have a very rudimentary check to perform and you have serious performance objectives, it’s always better to favor md5sum.

sort

One huge disconnect between the real world and computer science education seems to be the emphasis on sorting algorithms. You know who writes production code containing sorting algorithms? Nobody. Because it has all been exhaustively implemented for all practical purposes. In the Unix world that done deal is the sensibly named sort command. Need something sorted? It’s an off-the-shelf solution that’s probably better than what the typical CS education was going to provide for.

Don’t worry too much about how this works, but the following code can produce helpful examples.

for N in {1..5}; do echo $(($RANDOM%10));done

That basically outputs 5 random numbers between (and including) 0 and 9. If that output is sent (with a pipe) to the sort command you get something like this.

$ for N in {1..5}; do echo $(($RANDOM%10));done | sort
1
3
4
6
9

Those random numbers are now sorted. However for humans there lurks some unintuitive behavior. Let’s try it with numbers from 0 to 99.

for N in {1..5}; do echo $(($RANDOM%100));done | sort
14
17
5
55
62

That 5 does not look sorted. But in fact, from a text standpoint, it is. If you want the sort to be done from a numerical perspective, add the -n flag.

$ for N in {1..5}; do echo $(($RANDOM%100));done | sort -n
3
13
27
45
80

Now the single digit number does come first even though "3" is bigger than the "1" of the "13" and the "2" of the "27".

Need to reverse the sort? You could pipe it to tac or simply use the -r option. This produces the alphabet in reverse.

for N in {a..z}; do echo $N;done | sort -r

Need to "sort" things into a random order? The -R command can do that. This produces the alphabet in a random order. Run it multiple times.

for N in {a..z}; do echo $N;done | sort -R

ℱancy sort

Although that last example works, if you want to sort things randomly, you need to be careful. The sort command actually randomizes sets of duplicates. This means if there are two entries that are the same, they will still be stuck together after -R and that is not really quite random. You can see this by counting unique values with something like this.

$ for N in {1..1000}; do echo $(( $RANDOM%20 )) ; done | sort -R | uniq -c | wc -l
20

How could there only be 20 unique (see uniq) sets of numbers if they were randomly distributed? They are not. If you want truly random distribution, try the shuf command which does the right thing.

$ for N in {1..1000}; do echo $(( $RANDOM%20 )) ; done | shuf | uniq -c | wc -l
944

Sometimes I need to sort on two fields in a special way. Consider the following competition results data.

Alice Smith   11.02 F
Bob Tio        9.32 M
Charlie Angel 10.08 M
Eve Ng         9.96 F

Imagine that you would like this sorted by gender and then numerically by score, highest first. You could use sort -k4 -k3,3nr to produce this.

Alice Smith   11.02 F
Eve Ng         9.96 F
Charlie Angel 10.08 M
Bob Tio        9.32 M

The trickiest bit for me is the -k3,3 which defines the "key". The comma notation says that the sorting goes from field 3 to (in this case) field 3. But you could also sort on ranges of fields too. If the comma and second specifier is left off, it assumes to the end of the line.

Another tip about sort shows that it’s good to read your man pages. I just learned that sort now has a mode that can sort by human readable sizes (-h). This will come in handy when sorting output of the du command and many other applications involving file sizes.

shuf

As mentioned in sort, the shuf command is a way to "shuffle" lines of input into randomly ordered lines of output. This is great for the obvious uses like shuffling cards or your music or slideshow playlists. It’s also good for getting a subset of the items you need; for example I have used it for machine learning training where I have video footage but need to extract some representative stills.

Here’s a simple demo.

$ seq 10 | shuf | tr '\n' ' '
9 3 6 4 5 2 1 10 8 7

But shuf is actually more clever than just scrambling the input. Instead of providing input on stdin yourself, if you need integer numbers (as shown above using seq) you can use the -i option and get an integer range (you can also use the long option --input-range=LO-HI).

$ for _ in . . . ; do shuf -i 1-10 | tr '\n' ' '; printf '\n'; done
4 8 3 10 7 6 9 1 5 2
2 5 7 9 6 8 3 1 10 4
9 8 10 4 3 2 5 7 1 6

With this technique you can see that it could be useful to get random integers by piping this to head -n1. But shuf has you covered with a -n option of its own. For example, the following command will simulate a 6-sided dice roll.

shuf -i 1-6 -n1

uniq

The uniq command seems very weird at first. Why would anyone need a special command that eliminates duplicates? Especially since the sort command has a -u option that does this (kind of).

The typical usage is to find out "how many different things are there"? For example, you may have a log file and wonder how many different addresses made a connection to your web server (i.e. ignoring the many hits where the same customers are merely busy interacting with it).

Here’s a typical example. Which processes are logging things in syslog (which is just some Linux log thing)?

$ sudo cut -d' ' -f5 /var/log/syslog | sed 's/\[.*$//' | sort | uniq -c
     14 anacron
      6 CRON
      2 dbus-daemon
     39 kernel:
      1 liblogging-stdlog:
      4 mtp-probe:
     20 systemd
      2 udisksd

Here I cut out the service name and clipped off the process number. The real action starts at sort. That sort organizes the naturally occurring interleaved list. By then sending it to uniq I eliminate the duplicates. The -c option causes uniq to count up how many duplicates there are.

Of course when you look at output like that you might think to sort again and this is very common! Here I’m finding only the top process that generates log messages.

$ sudo cut -d' ' -f5 /var/log/syslog | sed 's/\[.*$//' | sort | uniq -c | sort -n | tail -n1
     39 kernel:

As your Unix proficiency increases, piping results to something like | sort | uniq -c | sort -n | tail -n1 becomes quite ordinary.

Since pipes work by running processes in parallel, this kind of workflow is also extremely high performance as a bonus.

ℱancy uniq

Still not convinced that the sort command is insufficient for making things unique? The problem with the unique option of sort is that it must actually sort the data first. Sometimes this is not what you want. Compare the following.

$ for N in {1..1000}; do echo $(( $RANDOM%20 )) ; done | shuf | uniq | wc -l
958
$ for N in {1..1000}; do echo $(( $RANDOM%20 )) ; done | shuf | sort -u | wc -l
20

You can now see that there naturally occurred 42 cases where there were duplicates that uniq had to get rid of while the sort -u got rid of 980 cases of duplicates giving the answer to quite a different question.

The reason to break uniq out into its own stand alone program is that it becomes more modular. The uniq command itself is pretty powerful on its own too. It can even eliminate lines that have duplications limited to a specific whitespace-separated fields (-f).

wc

The wc command stands for "word count" and is one of the most useful command line Unix tools. It is especially useful as a building block in a complex pipeline allowing you to distill a lot of data into something you can deal with. For example instead of seeing pages of files go zooming by, I can do this.

$ ls *txt | wc -l
152

And find out that I have 152 help files in my directory. The -l is to count, not words, but lines. Specifically I’m counting the lines returned by the ls program. That will list all of the files I’m interested in.

To see how many words are in all of my help files I simply do this.

$ cat *txt | wc -w
316522

Or if you do it like this, you get a breakdown of every file’s word count and a total. (I’ll just limit it to ones starting with "o" to illustrate.)

$ wc -w o*txt
10002 opencv.txt
3846 opengl.txt
13848 total

Without the -w option it shows you lines, words, and bytes.

$ wc o*txt
  1671  10002  71753 opencv.txt
   731   3846  28402 opengl.txt
  2402  13848 100155 total

If you’re a professional writer commissioned for a certain number of words the -w will be very useful to you. However, for normal Unix command line usefulness, wc -l is extremely useful. For example, if I wondered how many times I used the word Linux in my notes, the answer is immediately available.

$ grep -i Linux *txt | wc -l
503

ℱancy wc

In that last example, if I had a line with "linux" in it twice, it would only be counted once. To really do a proper job, I would want to break up every word into its own line an then find the lines containing the search target and then count them. Here is how I can do that.

$ cat *txt | tr ' ' '\n' | grep -i Linux | wc -l
526

Apparently I have (at most) 23 lines with multiple "Linux" mentions. But as you can see, no matter what you must do to get the data to be correct, when it comes time to count it all up, wc is your friend.

which / whereis / type / file / locate

Before you can use man to read about a command, you need to know if that command is even present on your system. There are many ways to do this. I prefer the which command for finding where executables really live (if they’re even installed). Let’s say I wanted to find out where the mysql executable is on my system. Here’s a comparison of techniques.

$ which mysql
/usr/local/bin/mysql
$ whereis mysql
mysql: /usr/local/bin/mysql /usr/local/man/man1/mysql.1
$ type mysql
mysql is /usr/local/bin/mysql
$ locate mysql | wc -l
   12550

A lot of people go right for locate, but as you can see, it produces a list of 12550 lines of stuff I’m not interested in reading. I also find whereis to be too verbose unless you’re looking for man page paths for some reason. I tend to go with which when I want to find a command.

The type command is not a command but a shell built in. Its great feature is that it can recognize shell built-ins for what they are.

$ type type
type is a shell builtin

The file command is good to figure out what exactly the executable is once you’ve found its location. (A shell script? 32 bit or 64bit? Designed for a bad OS?)

$ file /usr/local/bin/mysql
/usr/local/bin/mysql: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 11.2, FreeBSD-style, with debug_info, not stripped

In fact, checking the /sbin/init program is a good way to figure out if your underlying system is (for some god-awful reason) 32 bit.

$ file /sbin/init
/sbin/init: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD),
statically linked, for FreeBSD 11.2, FreeBSD-style, stripped

This one is thankfully not.

ℱancy file

If file seems great but you’d like to know absurd levels of detail about an executable start with ldd. This shows all the shared library dependencies and can be a life saver when trying to resolve dependencies. One very common thing I do is ldd ickyexecutable | grep Found — this will ironically catch all the items on the list that are "Not Found". (You can search for "Not Found" if you feel like including the quoting.) With this list of missing libraries, you now have a great place to start tracking down what you need to get the thing to run.

If you’re hungry to learn yet more about executables, check out the man page for the readelf command. Also the nm command specializes in extracting an executable’s symbols but readelf can too.

export / alias / set

These are shell commands but they can tell you a lot of useful things. If you run it with nothing else export will give you a list of exported variables in your environment. That is useful to review sometimes. The alias command by itself shows all the defined aliases. And set by itself shows everything the shell knows about that is, well, set. This includes shell functions, aliases, variables, and maybe more.

These are super useful commands to keep in mind when troubleshooting or perfecting your shell environment.

An example is if I’m interested in knowing how my shell history settings are configured. I could do this.

$ set | grep HIST
HISTCONTROL=ignoredups:ignorespace
HISTFILE=/home/xed/.bash_history
HISTFILESIZE=1000000
HISTSIZE=100000

Those are good settings, BTW.

xargs / parallel

My GNU Parallel notes.

unzip

Normal people are used to "compression" taking the form of .zip files. Unix can deal with this. To see what monsters may be lurking in a zip file check with something like this.

unzip -l stupidly_packed_thing.zip
unzip -l normal_java_stupidity.jar

(Yes, Java jar files are just zip files in reality.)

To actually do the extraction do the same thing but without the -l option.

That all seems fine, but I actually do not like zip files. To find out why I think you should not ever create them see this post I wrote called, Why Your Zip Files Irk Me.

gzip

If you have a file called elephants.gz — how can you read that? It is compressed with the gzip program. You can use your choice of these equivalent commands.

$ gunzip elephants.gz
$ gzip -d elephants.gz

And you will be left with a (bigger, natch) file called simply elephants. You can now make ordinary changes to that file. Normal people prefer the gunzip way but I like to stick to the gzip way since that’s really the program that is run by gunzip anyway.

To compress it back down into its smaller form, simply use the following obvious usage.

$ gzip elephants

Which will leave you with an elephants.gz file. Easy.

ℱancy gzip

Let’s say elephants is really big and I don’t have the disk space to actually unpack it. But I want to see if that data contains something of interest. I can do something like this.

$ gzip -cd elephants.gz | grep Babar

This will begin unpacking the file into its readable form and send it to standard output. This is fed into the grep program which looks for (and reports if it finds it, only) a line containing the word "Babar". The entire unpacked data set never is stored on your computer (grep just discards what you weren’t interested in). This is actually the simplest form of this trick. You can use streams of compressed data like this to send big things (e.g. streaming audio) over network connections and other such magic.

zcat

If you just want to look at a text file that has been compressed with gzip, the zcat command will do that. It basically decompresses it and sends the original out its standard output.

bzip2

It’s just like gzip. There’s even a comically named bunzip2 alternate equivalent to bzip2 -d. Files compressed this way are usually saved as file.bz2.

Basically bzip2 is more hard core than gzip. It will work harder to compress your stuff more intensely saving you disk space. The catch is that it will work harder meaning it will use up more CPU resources. You decide what you’d like to economize.

tar

The tar command stands for "tape archive" and that sounds boring and irrelevant for most modern homo sapiens. However, back in the ancient times the programmers who came up with sensible methods to put entire filesystems on tapes did a pretty solid job of it. So much so that it is still very useful and very much used today.

The most common interaction with tar files is needing to deal with something you download from an open source project like somegreatsoftware.tgz.

The gz means it is compressed and step one is to decompress it. Using gunzip on a .tgz file as described above will leave you with somegreatsoftware.tar. From here you can see the contents of the archive with this.

tar -tvf somegreatsoftware.tar

The -t is for "test" I believe. The -v is for verbose, i.e. show anything it can show. And the -f is getting the command ready for the file to look at.

To actually extract the files from the archive, just change the -t to -x for eXtract. I usually like to run the -t version first to see if the archive creator included a polite containing top directory for the contents. Some (bad) archive creators just cause tar extractions to dump hundreds of files into your current working directory which can be quite tedious to clean up.

My tar notes.

cpio

Note that there is an ancient utility called cpio which does mostly the same thing. The main important difference seems to be that cpio treats directories as file entries in the archive, while tar preserves the directory structure. When extracting files with cpio, directories are recreated as regular files with the directory contents. Probably not what is wanted.

head / tail

This pair of commands can be quite intuitive and also extremely useful in unintuitive ways. The head command just shows the beginning of the file (or input), by default the first 10 lines. The tail command shows the last 10 lines. This is extremely useful for seeing what’s in a file. The tail command is especially valuable for seeing what the latest entries in a log file are (you are less likely to care about stuff at the beginning that may have happened months ago).

To see some amount other than 10 lines you can use an option like this.

$ history | tail -n3

ℱancy head / tail

In ancient days these commands could take an option styled like this.

$ head -2 /proc/meminfo
MemTotal:       16319844 kB
MemFree:        15155016 kB

This shows the first two lines of the /proc/meminfo (memory information pseudo-) file that Linux creates. But this style is not recommended. It is best to use the full -n2 syntax.

There are (at least) two reasons for this. For one, it is explicit about exactly what units you want. For example, you could do this.

$ head -c3 /proc/meminfo
Mem

This stops after the first 3 characters of the file, a very useful feature.

The other reason is that head and tail have a clever mode where you can explicitly use positive and negative numbers for different effects.

$ ABC=$(echo {a..z}|tr -d ' ') # Don't worry about how I did this.
$ echo $ABC
abcdefghijklmnopqrstuvwxyz
$ echo $ABC | head -c5    # Normal mode - first 5 items
abcde
$ echo $ABC | head -c-5   # All except items _after_ position 5 from end *
abcdefghijklmnopqrstuv
$ echo $ABC | head -c+5   # Plus is like normal head mode
abcde
$ echo $ABC | tail -c5    # Normal mode - 4 items _after_ position 5 from end *
wxyz
$ echo $ABC | tail -c+5   # All _except_ 4 items _before_ number's position *
efghijklmnopqrstuvwxyz
$ echo $ABC | tail -c-5   # Minus is like normal tail mode *
wxyz

I used the character mode of head to make these examples compact; this mostly works the same for lines too with -n but it’s good to check. Note the ones with asterisks are not producing or eliminating the number of items listed, but seem to have an off-by-one issue. I’m pretty sure this is caused by new line characters getting counted.

$ seq 10 | tail -n3
8
9
10
$ echo "123456789" | tail -c3
89

So pay close attention to that.

Yes, the sense of what plus and minus do can be confusing, but it’s enough to know these tricks are possible. Simply look up the detail with man when you have a need and/or perform a quick little experiment to verify it works how you want. (Or reference this very explanation which is what I may start doing!)

Here is what the man page says about this important parameter for head:

print the first NUM bytes ...; with ... '-', print all but the last NUM bytes...
print the first NUM lines ...; with ... '-', print all but the last NUM lines...

And for tail:

output the last NUM bytes; or use -c +NUM to output starting with byte NUM of each file
output the last NUM lines, instead of the last 10; or use -n +NUM to output starting with line NUM

One more extremely powerful feature of tail specifically is that it can actively watch a file and show you new additions as they arrive and are appended. This uses the -f option (for "follow"). This is extremely useful for watching log files collect messages in real time.

xxd

xxd / od

I love xxd! This tool is not just really good at what it does and useful for serious computer nerds, but it has a kind of refreshing purity to me. Basically its function is to give a "hex dump" of the input. Beginners need not worry about that. In fact beginners do not need to use this command ever.

But what I love most about this command might appeal to beginners and experts alike. I love how this command allows you to see the actual ones and zeros that make up your data. Everyone has heard that computers work with ones and zeros, but who has really seen direct evidence of this? If you have a Unix system and xxd you can! Check it out.

echo -n "xed.ch" | xxd -b
00000000: 01111000 01100101 01100100 00101110 01100011 01101000 xed.ch

This shows the binary string for my website’s domain. These are the exact ones and zeros that encode "xed.ch". That’s pretty cool! Here’s more such low level obsessing.

If you don’t have it, the Debian package is simply xxd.

ℱancy xxd

Beyond that brush with obsessive detail, xxd is tremendously useful for picking apart files that are not in ASCII text. Its usual output is byte (not bit as above) oriented and it tends to be in hex. It can however do a lot of very clever things. Here I’m pulling data right off the first 512 positions of the disk and checking if there’s a GRUB installation there. (BTW - be inspired by, but don’t try this example unless you know what you’re doing.)

dd if=/dev/sda count=1 bs=512 | xxd | grep -B1 RUB

I could have grepped the weird hex values that make GRUB but this was simply easier to let xxd do that math.

The od command is like xxd but does an octal dump.

dd

If you’re a beginner, you probably should just know that dd is damn dangerous. Some people call it "disk destroyer".

ℱancy dd

As with using xxd, there may come a time when you want to cut through all nonsense and handle each one and each zero yourself explicitly. No messing around! That is what dd can do. The man page doesn’t really say what "dd" stands for but I think of it as "direct data" (transfer).

The way it works is you give it a source and a target and it takes the ones and zeros from the source and puts them on the target. Simple, right?

Here’s an example you should not try.

dd if=/usr/lib/extlinux/mbr.bin of=/dev/sdd

This will take the master boot record bits located in the binary file specified by if= (input file) and copy them to the of= (output file) specified as the disk device "/dev/sdd". The reason not to do stuff like this willy-nilly is that this will hose some possibly important stuff on /dev/sdd. So make sure you’re very sure what you’re doing with this.

Some common options can look like this.

dd if=wholedisk.img of=justtheboot.img bs=512 skip=2048 count=1024000

This pulls just a portion of the original "wholedisk.img" image when creating the new file, "justtheboot.img".

The bs specifier is for block size. Think of this like a bucket that the bits are transferred in. Besides the minimum raw transfer time it takes a little while to load each bucket. If you specify a small bucket, there will be many fillings of it. However, if you specify a huge bucket and you change your mind while the command is running, you may have to wait for the current huge bucket to finish processing before the program will check in and see if you want to interrupt and exit politely. The default is 512 (bytes) which is probably a bit small for modern use. I usually find that 4M is a good value for most things and very little benefit comes from using a different value.

My favorite option is status=progress which will give you a running assessment of how much has been transferred so far.

sudo dd if=ubuntu-16.04.3.iso of=/dev/sdg bs=4M status=progress oflag=sync

Here I’m burning the ones and zeros of an OS iso image onto a flash drive and the status allows me to keep an eye on how it’s doing. The fsync "conversion" (on output only) just ensures that a physical sync is called to make sure that the device is truly updated and not just presenting illusions from cached memory still queued for the device. Don’t forget the block size bs=4M (or some largish number) so that the transfer doesn’t take 20 times longer while it processes (default is 512B so 4 orders of magnitude!) more blocks.

Also if you’re trying to save data on a disk that’s going bad and it’s giving you IO errors with dd, you can investigate ddrescue which can valiantly overlook IO errors and press on with the job.

sudo / su

My sudo notes.

ssh

My SSH notes.

ping

Troubleshooting a network? You could do worse than starting with ping. I like to send 3 (ECHO_REQUEST) packets in case the first one dies in a fluke traffic accident (but the connection is really up). Any more than 3 and it just takes too long.

ping -c3 xed.ch

One thing to check right away with network troubleshooting is if it’s really the network connection to your target or simply the name lookup.

ping -c3 1.1.1.1

This will see if you can reach the domain one.one.one.one (yes, that is their domain).

Or try Google’s name server which is almost as easy to remember.

ping -c3 8.8.8.8

If those don’t come back successfully, you’ve got a real network problem!

ℱancy ping

Also in the ping family is traceroute and that classic program’s modern fancy version mtr. I like those tools, however, I’m finding those to be broken now that I have a blazing fast fiber optic connection. It’s like the "Time To Live" setting can’t get low enough to find out anything meaningful.

host / nslookup

Often you can reach the internet but you can’t use names, only IP numbers. If you need to troubleshoot that process or just find out about how names and IP numbers relate, used host and nslookup.

$ host -t A xed.ch
xed.ch has address 66.39.97.213
$ host images.google.com
images.google.com is an alias for images.l.google.com.
images.l.google.com has address 172.217.7.14
images.l.google.com has IPv6 address 2607:f8b0:4006:819::200e

Note how you can find the IP address of a name and other useful information (like alias targets and IPv6 numbers).

The nslookup command (normally) goes the other way, finding names from numbers.

$ nslookup 66.39.97.213
Server:     192.168.1.1
Address:    192.168.1.1#53

Non-authoritative answer:
213.97.39.66.in-addr.arpa   name = xed.ch.

Authoritative answers can be found from:

Who just tried to log into your SSH server 800,000 times? Find out by looking up their connection IP address with nslookup. You can also use web based services like iplocation.net for this and find out a rough guess where that host is physically located in the world.

iptraf / iftop

Do you have some kind of server which provides data to many clients at once and one of them is doing something uncool. The iptraf program can let you see the flows to each connected host.

Another possible tool that might serve this purpose is iftop. Or nethogs.

ℱancy iptraf

If you’re logged into your server with SSH, set the update interval to 1 sec so you do not overwhelm the network traffic with your own iptraf feedback loop.

It seems iptraf is a Linux only program. The FreeBSD community describes it as having "Too many linuxisms" to port. How about nethogs? That’s a nice program too. Also iftop. Maybe dstat or slurm.

ip addr / ifconfig

What IP number (or numbers!) are you using right now? Find out with ip addr (that’s the ip command with the addr option) or the classic ifconfig. The latter provides pretty much the same information but is regarded as the old way to control (configure) network devices (interfaces).

ip is not on FreeBSD, but ifconfig is.

Also on modern Linux distributions using Network Manager you can get some excellent information with nmcli dev show. I also used to think that nmtui (NetworkManager Text User Interface) was limited to the Red Hat ecosystem, but I recently found it on Ubuntu on an Arduino so that can definitely help tame the mess that NetworkManager can make of connections, especially wifi.

ss / netstat

What network connections are currently active on your computer? Find out with ss. Its older brother, netstat, is not used as much these days, but it is a well-known classic that does the same thing.

ss is probably Linux only while netstat is also on FreeBSD and OSX.

screen / tmux / nohup

My screen notes.

feh

Skilled command line practitioners do not avoid computer graphics per se — they avoid superfluous, wasteful, and misleading graphical interfaces. Sometimes, however, the job at hand is all about graphics. If you have a file that encodes an image, you may reasonably want to see what that image looks like. To illustrate the rule by its exception, the only time I feel like the normal GUI desktop claptrap is even slightly valuable is when I need to organize very disorganized photos. But if you’ve been organized and you know that things are where they should be, the command line approach performs efficiently again.

To see an image file as a human viewable image, I like the viewer program feh. Obviously you need some support for graphics (so no SSH terminals without X tunnelling or bare consoles) but most people have that these days.

The advantage of feh is that it is lighting fast and skips a lot of superfluous nonsense. If you just need to see an image (including maybe scaling it to fit your screen) feh will not be beat.

Just for completeness I’ll mention display which is a command line tool that comes with installing the ImageMagick command line graphics tools. A lot of serious Unix people use display but I find it much slower than feh. However, it is usually present on systems where feh might not be installed and is a good backup to know about. The ImageMagick tools in general are extremely powerful and any intelligent person who does anything with images whatsoever should know about them.

xpdf

I use xpdf so often that I have it aliased as simply o. It opens PDFs and is very efficient and fast doing so. People used to other PDF readers may carp about functionality deficiencies or even the old school Xlib interface. But xpdf is your true friend.

ℱancy xpdf

If you’ve been reading this whole thing here’s a nice example of how this Unix stuff is commonly used.

$ URL=https://helpx.adobe.com/security/products/acrobat/apsb18-21.html
$ wget -qO- $URL \
| tr ' ,' \n\n | sed -e 's/CVE-/\nCVE-/g' | sed -e 's/[^0-9]*$//g' \
| sort | uniq | grep CVE | wc -l
104

I’ve broken this single command line into three physical lines (with backslash). The first uses wget to get the URL variable. I set that variable to be Adobe’s security page for Acrobat. The next line converts that HTML mess into a long list of words that must end in numbers. The third line sorts all of these, eliminates duplicates, throws away everything but the key phrase I’m interested in — "CVE", and then counts the results. Reviewing it now, I can spot some places I could optimize this process, but I’ll leave it alone as an illustration of the kinds of rough jobs that can be done with Unix extemporaneously, on-the-fly. I quickly built that command line up piece by piece until running it gave me the final answer I was looking for. Very powerful.

So what is it telling us? It is telling us that on Adobe’s own Acrobat security page, they are mentioning no less than 104 unique registered CVE (Common Vulnerabilities and Exposures) problems related to Adobe Acrobat. Not impressed? If you take off the wc -l and look at these, you’ll see they all start with "2018". I’m no longer a professional security researcher, but that makes me think that we’re talking about 104 registered vulnerabilities in 2018 alone. I have seen Defcon presentations talking about what a delicious attack surface Acrobat/Reader is. It contains an absurd level of functionality — Wikipedia says "PDF files may contain a variety of content besides flat text and graphics including logical structuring elements, interactive elements such as annotations and form-fields, layers, rich media (including video content) and three dimensional objects using U3D or PRC, and various other data formats." PDFs can contain not just one programming language, the obvious PostScript, but also JavaScript code! And who knows what else! What could possibly go wrong?

Let all that sink in and then ask yourself if Acrobat is part of a secure computing environment. My answer is "no". xpdf, with its "limited" functionality, saves the day!

identify / import / convert

My Imagemagick notes.

vim

My Vim notes.

crontab

My cron notes.

at

A lot of people know about Cron but not as many know about at. Cron is for recurring jobs. While at is for jobs that you want to run once at some time in the future. A lot of times at isn’t installed by default which is a shame but it’s always easy to get.

If you’re constantly setting alarms or timers, at is perhaps a better choice.

You can simulate at easily enough with something like this.

sleep $((7 * 24 * 60 * 60)) && aplay loudnoise.wav

That will play a loud noise in one week. Still, if you do a lot of this, consider at.

sleep

The sleep command is a lot more useful than it seems like a command that does nothing would be. For example, if you want to log something every minute (not 800 times per second) simply add a sleep 60 to the loop that does the logging. Easy.

bash

My Bash notes.

ar

The ar command manages files that are (or need to be) in an archive file. It is an alternative to something like tar (or zip — without compression clumsily added whether you want it or not). These are used for shared libraries that are linked with the ld linker. Apparently Debian’s .deb files are ar archives. Also the initramfs file system used to boot your real one in a Linux system is an ar archive. Firmware blobs can sometimes be in this format. Sometimes compilers put debugging symbols into this format too. So definitely not dead!

as

The unix assembler. Converts assembly language instructions to executable machine object code. Note that there is a utility called dis (that can be installed) which disassembles object code.

basename

Strips off the path and, optionally, any suffixes (e.g. -s.jpg) from file names. Useful in scripts.

dirname

Like basename but retains only the directory path part of the input.

col

This command is of dubious value today without printers being controlled by control characters. But it still may have a use in stripping out non-printable "characters" from text streams. By piping to (and from) | col -b | you can ensure that non-printing characters (the -b stands for "backspace") are stripped. If you really want columns, check out the column command.

column

This formats input into columns. For example, if I need to look over all of the four letter words that begin with "f" I could

grep '^f...$' /usr/shar/dict/words | column -c 100

Without the column command, they all go flying by. Normally you’d page them with a pager like less but if it is helpful to see everything on the same page, this can be helpful.

ℱancy column Another use for the column command is to make tree diagrams. This isn’t fun or easy but it can be reasonable if trying to come up with an alias or function that solves a particular problem. For example, the process list command ps produces really unintelligible output without some serious care. Here is a way to have your processes shown with the parent/child relationships made a bit more obvious.

ps  -u $USER -o pid,ppid,command| \
sed 's/^ *\([^ ][^ ]*\)  *\([^ ][^ ]*\)  *\(.*\)$/\1|\2|\3/'| \
column --tree-id 1 --tree-parent 2 --tree 3 -s'|' -W 3

Replace the -u $USER with -e to get "every" process. Note this is like the pstree command but shows the PIDs a bit better.

Column can also convert some normal unix stdout into everybody’s favorite overly verbose format, JSON. In theory.

colrm

Need to remove columns from an output table? This command kind of does that. Note that it operates on character columns, not delimited columns (use cut for that).

$ yes "123456789"| head -n3 | colrm 3 6
12789
12789
12789

csplit

Though obscure, this interesting command can be quite useful. If you have a big file that you want to break up into smaller files, this can do it breaking it down by some regular expression patter. For example, if you have a log file that you want broken into multiple files by day, this command can do exactly that.

csplit server.log '/^20[0-9][0-9]-[01][0-9]-[0-3][0-9]/' '{*}'

Or, perhaps you have an SDF (structure data file) containing molecule definitions and you want each one in their own file. These can be glommed into one giant file with the weird section delimiter but this command can separate them into their own files.

csplit input.sdf '/^\$\$\$\$/' '{*}'

ed / ex

Note that the ex editor is probably the one you want if you’re thinking about ed; it’s just the more modern version and should be included.

This is an ancient editor that is the foundation of vi (the Visual Interface to ed). It can still have some application as a command line tool. (You can obtain it with apt install ed.) It can be useful for more complex scripting of large batch file edits. Of course vim can also handily do such jobs. If you really have a big agenda though, ex can outperform vim. Also consider ex if you have an absolutely gigantic file that you must reach into with precision and make some changes; it will handle memory issues efficiently.

expr

A lot of this command’s functionality is now in modern Bash. But in the past Bash was quick and efficient because it left this functionality to other processes. This process specifically calculates stuff. Here are some examples.

$ expr 10 + 5      # 15
$ expr 10 - 5      # 5
$ expr 10 \* 5     # 50
$ expr 10 / 5      # 2
$ expr 10 % 3      # 1 (remainder)
$ expr length "xed"  # 3
$ expr length "$USER"  # Length of the current user's username
$ expr substr "xed.ch" 1 3  # "xed" (from pos 1 with len 5)
$ expr index "xed.ch" "."  # 4 (counts from 1 not 0)
$ expr match "xed.ch" "xed"  # 3 (characters matched from start)
$ expr 10 = 10    # 1 (true)
$ expr 10 != 5    # 1 (true)
$ expr 10 \< 5    # 0 (false)
$ expr 10 \> 5    # 1 (true)
$ expr 0 \&\& 1   # 0 (logical AND)
$ expr 0 \|\| 1   # 1 (logical OR)
$ expr \( $counter = 5 \) : 1 # Conditional output

Very useful in scripting.

stat

Display "status" details about a file. This can often be more accurate and direct than pulling file metadata off of directory listings.

Here’s an example.

$ echo "ok" > ok && stat ok

This will produce output that looks like this.

  File: ok
  Size: 3             Blocks: 8          IO Block: 4096   regular file
Device: 802h/2050d    Inode: 30543966    Links: 1
Access: (0640/-rw-r-----)  Uid: (11111/     xed)   Gid: (11111/     xed)
Access: 2023-06-28 11:23:32.668705500 -0400
Modify: 2023-06-28 11:23:50.280678400 -0400
Change: 2023-06-28 11:23:50.280678400 -0400
 Birth: 2023-06-28 11:23:32.668705500 -0400

id / whoami / logname / who

The whoami command is basically a synonym for echo $USER showing you whom you are logged in as. The id (note, easier to type) command also supplies you with numeric values for the user and also detailed group information. The logname command is exactly like whoami but longer to type and slightly more cryptic in my opinion. who shows all the users logged into the system.

seq / nl

The seq command produces a sequence of numbers.

$ seq 4
1
2
3
4

This can be very helpful for many purposes. Maybe you have a lot of images and you want to keep only every third one or something.

for N in $(seq 1 3 25); do rm -v IMG${N}.jpg ; done

This will get rid of IMG3.jpg, IMG6.jpg, IMG9.jpg, etc.

The nl command is similar but it numbers lines that are supplied as input.

$ nl /proc/meminfo | head -n3
1    MemTotal:       65652732 kB
2    MemFree:        59790048 kB
3    MemAvailable:   60952312 kB
$ nl -v 2023 yearly_data
2023       55.3
2024       62.2
2025       64.4
2026       123.4

Another common way to get line numbers is with cat -n. A synonym for seq is yes '' | cat -n

join

There are many ways to maintain normalized databases but one of the most spartan and efficient is using the unix join command and saving yourself the need to have a particular SQL engine installed. The join command wants the inputs to be sorted, but it can then match up fields from one file to another an output what is effectively an ordinary SQL-like "join". See my SQL notes for an example.

paste

Paste is similar to join except that it just takes two files and merges a line from the first file with the same line (position) from the second. Here’s a quick example of what paste looks like.

$ paste <(seq 5) <(seq 5 -1 1)
1   5
2   4
3   3
4   2
5   1

pr

Not to be outdone by paste the pr command can also do what paste does, more or less. The pr command is for automatically formatting a big block of text or lists of data, ostensibly, for printing. Here it is doing a similar thing to the paste example.

$ pr -tm <(seq 5) <(seq 5 -1 1)
1                    5
2                    4
3                    3
4                    2
5                    1

There are tons of options and obviously this can be used for more than printouts. It can be used to structure data in more readable sensible ways. Note that it can break up long lists into columns. For example, see what seq 200 | pr -3 does.

lp / lpr / lpq / lprm / lpstat

The lp command submits jobs to the "line printer" — in other words what normal modern people would think of as simply "print" to an actual printer. The lpr is a simplified version of lp for normal cases; lp can take a zillion options that lpr simplifies for you. The lpq command shows you the printer queue and is helpful for figuring out why your print job disappeared into the void. And the lprm command will help you get rid of those zombie jobs in the print queue. And lpstat checks the status of the printer.

I hardly ever use printers now, so I don’t really know how the details work these days. The whole printing functionality in Unix is provided by something called CUPS — the Common Unix Printing System. And despite this being developed by Apple, it is properly open source and publicly licensed and it works pretty well and is universal on all normal Linux systems that can print.

mail

This obviously does something with mail. It turns out to be a fantastically useful command line scripting trick to send emails and that’s what the mail command does. A great example of when this is useful is in a cronjob. The following entry in a crontab will send an email containing "The message!" to myself@xed.ch with the subject "Daily Reminder" every day at midnight — 0 minute and 0 hour of every (* * *) day.

0 0 * * * echo "The message!" | mail -a "From: REMINDER <x@xed.ch" -s "Daily Reminder" tomyself@xed.ch

Or if you want to just send some output to someone or yourself, you can do something like this.

cal 2023 | mail -s "2023" myfriend@geemail.co

Note that you need to have some kind of mail handler configured. I have been using apt install nullmailer on Debian, but you can set up much more elaborate mail handling.

mesg / write / talk

These commands relate to an old party trick of writing messages to the terminal of some other user. This sounds kind of crazy and now that Unix is actually important and security is a thing, it is kind of crazy. The mesg command configures permissions

It’s helpful to know of the existence of the ‘write` command since it seems like it would be an important concept and one might naturally wonder what that word is up to. But since people don’t write on each others’ terminals much these days, that original command is somewhat vestigial.

In the early 1990s I used talk as a kind of primitive Discord or Slack. However, that tool is largely obsolete for me because I have a technique for implementing a complete chat utility using Bash’s named pipes. (Discord works mostly fine on Linux today too)

printf

The printf command is borrowed from C and (on my system) is both a standalone executable (see /usr/bin/printf --version) and simultaneously a Bash shell builtin (see help printf). It is used to create formatted output templates.

It works by taking two things: the template which has place holders for the variables, and variables. The variables don’t have to be variable but then there’s not much point. Here’s a simple example.

$ printf "Hello %s\n" $USER
Hello xed

Numbers follow the conventions for C printf templates.

reset

Initializes the terminal. I find this very helpful after I make a mistake and send a bunch of non text data to the terminal. For example, lets say you accidentally tab completed badly and did this: cat myproject.jpg instead of .txt. Well, that JPEG is filled with binary data that is not well suited to being displayed on a text terminal. Many of the 1s and 0s of the image will be interpreted by the terminal as some kind of exotic control codes. Basically if you send hundreds of thousands of random bytes to your terminal, things will get very messy very fast. You will invariably end up modifying terminal settings that cause much mischief. The reset command fixes all this and gets you back to a normal terminal again. I often type this command blind because I’ve just accidentally turned off user input echoing!

stty

So you know about reset and how to repair your terminal settings if you mess them up. Well, stty is one terrific way to mess them up. It is the command you use to set terminal settings. Note that "tty" shows up a lot in Unix and it stands for "teletypewriter", often abbreviated as "TTYs". They were electromechanical devices that combined a typewriter-like keyboard with a printer. These were used before actual interactive displays (that did not leave a permanent hard copy) were common.

This command has a bazillion options and it’s all quite confusing. So if you find it confusing, you’re not alone. Mostly this is used when following some instructions that eliminate some particular unusual problem for your unusual software or setup. As things have standardized over the decades, the need to do this has decreased a lot. Thankfully!

tabs

If you want a more obvious way to make mischief than stty then the tabs command is for you! It can mess with the default tab stop value. Or something like that. I don’t know really because tabs are evil and should always be avoided.

script

This is an interesting command that can create a transcript of an entire terminal session. Let’s say you’re trying to write documentation of how to setup a system or run a complex job or something complex like that. You would like to show other people the command line steps you took. Normally I would just cut and paste the bits that I care about. But when the transcript is going to be quite long, it can be very tedious to do this. The script command will create a subshell where every character used in the terminal is also recorded in a specified file. You can also use the -c COMMAND option to have the subshell run some program of interest instead of a general shell. For example, if you want to show a sequence of clever things in Gnuplot, you could do script -c gnuplot clever_plots.log

size

I’m sure this seemed very obvious and sensible back in the first days of Unix where understanding the size of object files was perhaps more important. You cat do something like size /usr/lib/xorg/modules/libwfb.so to see obscure size information about certain types of binary files. What you do not want to do is use this keyword for your own purposes. Or if you do, just understand that there may be a Unix command called that already present.

strings

Sometimes you have a binary file and you’re not quite sure what it is or what should open it. The strings command looks through the 1s and 0s of the binary file and tries to find plausible ASCII characters that make some sense. It then prints these as a list.

strip

Often object files are compiled containing debug symbols and other helpful human oriented stuff that may not be strictly necessary to the computer. The strip command can get rid of these. The benefit of course is you’ll have smaller files that will, in theory, load faster.

tee

Duplicate the standard input sending one copy to standard output (like normal) and one copy to a file. This command is used like a special kind of pipe (e.g. | tee myfile | taking the place of a simple pipe) is a way to intercept a pipeline at some point and leave a record of what was going through it. This is an extremely useful trick. I often use it where I both want to see what is happening and skim a record for further analysis or doing something else with.

test

This command is another one like printf which has both a compiled executable program and which is also a Bash builtin function. The reason for this, by the way, is that in the old days before Bash, there was just sh and it didn’t have such fancy features built in.

The test command is basically a way to check many things about files and strings. For example, if I do test -r /home/xed the exit code will come back true (0, which in exit codes is the successful state — think of it like "was there an error?"). If I do test -r /var/log/apt/term.log it will be false (1 in this case). This means that my own home directory is readable (the -r) by me but not that system log. There are a lot of things test can do. See help test for a list of them.

Also the test word can be replaced by double brackets. This does the same thing as the previous examples.

[[ -r /home/xed ]]
[[ -r /var/log/apt/term.log ]]

This construction is frequently used in Bash if statements and other conditionals. It’s good to keep in mind that test is mostly going to be a taken word in a Unix system and you’d best not name your own things that.

tsort

This does a topological sort. This basically has the ability to unravel a directed acyclic graph. Check out the following example.

$ cat deps
taskB taskA
taskC taskB
taskD taskC
taskE taskA
$ tsort deps
taskD
taskE
taskC
taskB
taskA

Here the syntax of the dependencies file is taskB depends on taskA (as written in the first line) and so on.

umask

This command displays the user file creation mask. What does that mean? This has to do with the default permissions file have when you create them. This is a shell built-in and this allows the shell to know how to protect files you create with something like myprogram > myoutput.txt. The file creation mask can be set in a Bash configuration file so it’s always active or you can change it. Note that this is a mask, not the permission itself. This means these are common reasonable settings.

umask 002 - file permissions of 664 (rw-rw-r--) and directory permissions of 775 (rwxrwxr-x). Group members can work on group files.
umask 022 - file permissions of 644 (rw-r—r--) and directory permissions of 755 (rwxr-xr-x). Good for sharing your documents with others but keeping them from modifying them.
umask 027 - file permissions of 640 (rw-r-----) and directory permissions of 750 (rwxr-x---). more privacy, restricts group members from reading owner’s files, but allows them to access directories.
umask 077 - file permissions of 600 (rw-------) and directory permissions of 700 (rwx------). The most restrictive sensible setting. Used when you want to keep files completely private.

uname

Prints the current unix OS, often just "Linux". I like to include the -a option for "all" information.

$ uname -a
Linux ra22 5.10.0-14-amd64 #1 SMP Debian 5.10.113-1 (2022-04-29) x86_64 GNU/Linux

If you’re interested in this, you might also wonder what distribution you’re using.

$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
[... etc]

uptime / w [uptime]]

Did you have a power outage? How can you tell for sure? The uptime command will tell you how long your computer has been running.

$ uptime
11:25:30 up 190 days, 22:39, 18 users,  load average: 0.02, 0.02, 0.00

I have had over 1000 days, but note that you should be doing OS updates (with a reboot which resets this) way more often than that!

The single letter w command is like a combination of uptime and who.

wall

This will write a message on all of the terminals currently being used. You can just type wall and enter. Then type some stuff and end with [ctl+d]. It’s actually confusing if you can supply a message string directly or a path to a file containing the message. I did the former and got this put on all my terminals, including all those in tmux sessions.

Broadcast message from xed@ra22 (pts/17) (Fri Jul 21 11:35:33 2023):
Just a test of the wall command.

This is used by the shutdown command.

yes

At first this command seems baffling and perhaps a bit daft. If you just run it with no arguments it will print the word "yes" over and over and over forever like some kind of nerd practical joke. What is the point of that? To understand the motivation imagine some software experience in the old days where a program is run in a text console and it asks you a lot of pesky questions. Modern computer users are certainly familiar with the experience of clicking "I agree" umpteen times when trying to install something. The yes command was conceived to help text console users do similar things. Although it’s not actually an optimal solution, just for example purposes you can consider the rm command asking too many pesky questions and using yes to move things along. Here is a classic usage setup for this weird command.

yes | rm --interactive=always oops-*

But yes has more tricks up its sleeve! If you don’t like "yes" being repeatedly generated forever, you can supply an argument and that will be the word that is spewed out.

yes no

Will produce "no" over and over forever. Sometimes you have a bunch of repetition in your output that yes can fill in for you.

$ yes place award | nl | head -n3
     1 place award
     2 place award
     3 place award

ℱancy yes

A subtle point about a command as simplistic as this is that it really is a surprisingly good way to fully load your CPU. It basically prints its output and if there is no impediment, it immediately prints more. It’s kind of the difference between lifting weights and waving your empty hands. You can get out of breath lifting weights but you’ll get out of breath faster waving your hands as fast as you can. How could this property ever be useful? I like to use yes when I’m testing CPU loading and/or cooling capabilities. Here is a way to define a function that can replace sleep but instead of doing nothing and leaving your CPU alone, this one manically does everything it can allowing you to test loading multiple cores or prolonged CPU stress.

function hotsleep { timeout ${1}s yes > /dev/null ;}
hotsleep 5 # Full CPU usage for 5 seconds.