The Unix sed command is an awesome piece of work. It can pretty much single handedly replace many other very standard Unix commands if you are really good at it.

A very, very good list of these tricks can be found here and here. The text version is nice too.

For example, the Unix nl (number line) command can be done with sed. This prints line numbers and then the lines:

sed = filename

The Unix head command can be done nicely with sed:

sed 10q filename

If you need to use more than one sed command at a time you can use this format. This skips the first line (perhaps a table heading row) and replaces all commas with pipes.

sed -e1d -e's/,/|/g'

Breaking up files into parts can be done with the Unix split command but what if you can’t break the file arbitrarily. Here I wrote out a section of a file starting after a known break point and until the start of the next break point, then I cut off that last line (the next start point) with the second sed command.

sed -n /RECORD 100/,/RECORD 201/p records.xml | sed -n $\!p > records100-200.xml

Escaping What XML Is Sensitive To

This can be put in a Bash script just like this because the new lines are ok as whitespace here.

sed 's/\&/\&/g;

An Example of Sed

Here is a quick script I wrote to create a concordance from the entirety of Wikipedia.

#usage: $0 enwiki-latest-pages-articles.xml.bz2 concordance
bzip2 -cd $1
| head -n 2000000 \
| sed -e '/^ *</d'                      \
      -e '/^ *|/d'                      \
      -e 's/[| ][a-z][a-z]* \?=//g'     \
      -e 's@https\?://[^ ][^ ]*[ ,]@@g' \
      -e 's@[{()}]@ @g'                 \
      -e 's@&quot;@@g'                  \
      -e 's@&[gl]t;@@g'                 \
      -e "s@[/_–']@ @g"                 \
      -e '/^=/d'                        \
      -e 's@[^a-zA-Z ]@@g'              \
| tr ' [:upper:]' '\n[:lower:]'         \
| grep .                                \
| sort                                  \
| uniq -c                               \
| sort -n                               \
| tee $2