The Unix sed
command is an awesome piece of work. It can pretty much
single handedly replace many other very standard Unix commands if you
are really good at it.
A very, very good list of these tricks can be found here and here. The text version is nice too.
For example, the Unix nl
(number line) command can be done with
sed
. This prints line numbers and then the lines:
sed = filename
The Unix head
command can be done nicely with sed
:
sed 10q filename
Multiple Sed Operations
If you need to use more than one sed command at a time you can use this format. This skips the first line (perhaps a table heading row) and replaces all commas with pipes.
sed -e1d -e's/,/|/g'
GNU sed may let you get away with just separating the elements with
;
but obviously you need to protect it from shell interpretation.
One I use a lot is to combine some processing with the equivalent of a
Unix head
. Here I just want to have the date printed so everything
from the first quote to the end is ignored as is everything after the
first line with the -eq
, i.e. "execute quit".
sed -e 's:".*$::' -eq logfile
Show Lines N Lines Past Search Term
A common situation is to have some messy status output where there are
sections and you want a piece from each of these sections. For
example, pactl
produces a big jumbled mess of what Pulse Audio
knows. But what if you want just the "Name" lines in each "Source"
section. You can look for Source and since the Name is the line after
the next line, you can do this.
pactl list | sed -n '/^Source/{n;n;p;}'
This also shows another way to put multiple commands together. Specifically, it’s skipping the found line (n) then the next one (n) and finally printing the one that is wanted.
Specific Range Of Line Numbers
Often I need every other line or every third or every 14th, etc. The
basic format for this uses ~N
where N is the number to be skipping.
sed -n 0~5p
Maybe you have a huge file full of line delimited stuff and you want to look at some small section of the data.
sed -n '1000,$p;2000q' data
Here’s a way to wrap it up nicely with proper shell quoting so that a script or something can dole out sensible chunks. This specifically prints the 1000 lines (L) after the millionth (S). Change just those variables as needed.
$ S=1000000; L=1000; E=$(($L+$S))
$ time sed -n "${S},\$p;${E}q" data > /dev/null
real 0m0.059s
Note that this blazes through 1 million records quite quickly. If you’re seeing much slower performance, it probably isn’t sed’s contribution. Also note that you can simply specify a range with a start and an end, but for large files where you want a slice near the top, you do not want sed continuing to check past the last possible line you care about.
$ S=1000; L=1000; E=$(($L+$S))
$ time sed -n "${S},\$p;${E}q" data > /dev/null
real 0m0.016s
$ time sed -n "${S},${E}p" data > /dev/null
real 0m0.054s
Here you can see the difference in speed when looking at 1k-2k in a 1M record file.
Inserting Stuff Into Templates
Often I like to make a template file that has changeable content. The template mostly stays the same, but the content is new every time. I do this with my web pages which have a constant header and footer but changing content. Also I use it for quickly dumping geometry into an SVG file with the correct XML wrapping.
Here’s how it is done. In this example, I’m making a template file
called T
that just contains 5 numbers. If I want the number 3 to be
replaced with some custom content (here I use "Three") this will do
it.
seq 5 > T ; cat <(sed '/3/,$d' T) <(echo Three) <(sed '0,/3/d' T)
Or more generally…
cat <(sed '/CONTENT_GOES_HERE/,$d' template_file) \
<(make_content_program) \
<(sed '0,/CONTENT_GOES_HERE/d' template_file)
The line in template_file containing CONTENT_GOES_HERE
will be
replaced with whatever make_content_program
produces when run.
Break Up Files Respecting Content
Breaking up files into parts can be done with the Unix split
command
but what if you can’t break the file arbitrarily. Here I wrote out a
section of a file starting after a known break point and until the
start of the next break point, then I cut off that last line (the next
start point) with the second sed
command.
sed -n /RECORD 100/,/RECORD 201/p records.xml | sed -n $\!p > records100-200.xml
Escaping What XML Is Sensitive To
This can be put in a Bash script just like this because the new lines are ok as whitespace here.
sed 's/\&/\&/g;
s/"/\"/g;
s/</\</g;
s/>/\>/g;
s/\x27/\'/g'
An Example of Sed
Here is a quick script I wrote to create a concordance from the entirety of Wikipedia.
#/bin/bash
#usage: $0 enwiki-latest-pages-articles.xml.bz2 concordance
bzip2 -cd $1
| head -n 2000000 \
| sed -e '/^ *</d' \
-e '/^ *|/d' \
-e 's/[| ][a-z][a-z]* \?=//g' \
-e 's@https\?://[^ ][^ ]*[ ,]@@g' \
-e 's@[{()}]@ @g' \
-e 's@"@@g' \
-e 's@&[gl]t;@@g' \
-e "s@[/_–']@ @g" \
-e '/^=/d' \
-e 's@[^a-zA-Z ]@@g' \
| tr ' [:upper:]' '\n[:lower:]' \
| grep . \
| sort \
| uniq -c \
| sort -n \
| tee $2