The Unix sed command is an awesome piece of work. It can pretty much single handedly replace many other very standard Unix commands if you are really good at it.

A very, very good list of these tricks can be found here and here. The text version is nice too.

For example, the Unix nl (number line) command can be done with sed. This prints line numbers and then the lines:

sed = filename

The Unix head command can be done nicely with sed:

sed 10q filename

Multiple Sed Operations

If you need to use more than one sed command at a time you can use this format. This skips the first line (perhaps a table heading row) and replaces all commas with pipes.

sed -e1d -e's/,/|/g'

GNU sed may let you get away with just separating the elements with ; but obviously you need to protect it from shell interpretation.

One I use a lot is to combine some processing with the equivalent of a Unix head. Here I just want to have the date printed so everything from the first quote to the end is ignored as is everything after the first line with the -eq, i.e. "execute quit".

sed -e 's:".*$::' -eq logfile

Inserting Stuff Into Templates

Often I like to make a template file that has changeable content. The template mostly stays the same, but the content is new every time. I do this with my web pages which have a constant header and footer but changing content. Also I use it for quickly dumping geometry into an SVG file with the correct XML wrapping.

Here’s how it is done. In this example, I’m making a template file called T that just contains 5 numbers. If I want the number 3 to be replaced with some custom content (here I use "Three") this will do it.

seq 5 > T ; cat <(sed '/3/,$d' T) <(echo Three) <(sed '0,/3/d' T)

Or more generally…

cat <(sed '/CONTENT_GOES_HERE/,$d' template_file) \
    <(make_content_program) \
    <(sed '0,/CONTENT_GOES_HERE/d' template_file)

The line in template_file containing CONTENT_GOES_HERE will be replaced with whatever make_content_program produces when run.

Break Up Files Respecting Content

Breaking up files into parts can be done with the Unix split command but what if you can’t break the file arbitrarily. Here I wrote out a section of a file starting after a known break point and until the start of the next break point, then I cut off that last line (the next start point) with the second sed command.

sed -n /RECORD 100/,/RECORD 201/p records.xml | sed -n $\!p > records100-200.xml

Escaping What XML Is Sensitive To

This can be put in a Bash script just like this because the new lines are ok as whitespace here.

sed 's/\&/\&amp;/g;

An Example of Sed

Here is a quick script I wrote to create a concordance from the entirety of Wikipedia.

#usage: $0 enwiki-latest-pages-articles.xml.bz2 concordance
bzip2 -cd $1
| head -n 2000000 \
| sed -e '/^ *</d'                      \
      -e '/^ *|/d'                      \
      -e 's/[| ][a-z][a-z]* \?=//g'     \
      -e 's@https\?://[^ ][^ ]*[ ,]@@g' \
      -e 's@[{()}]@ @g'                 \
      -e 's@&quot;@@g'                  \
      -e 's@&[gl]t;@@g'                 \
      -e "s@[/_–']@ @g"                 \
      -e '/^=/d'                        \
      -e 's@[^a-zA-Z ]@@g'              \
| tr ' [:upper:]' '\n[:lower:]'         \
| grep .                                \
| sort                                  \
| uniq -c                               \
| sort -n                               \
| tee $2