I have heard that Scalable Vector Graphics support was finally made decent because of the efficiency requirements of mobile devices. Those devices are so laden with profligate inefficiency that such an explanation is hard for me to reconcile, but we need not question the gods. SVG is 100% fantastic and I’m delighted with it!

Since SVG is just plain text I find that it works extremely well with the razor sharp wit of Unix. Today I have a very cool trick to demonstrate which nicely shows off the synergy between Unix and SVG. For a long time I’ve envisioned being able to pipe streams of data to very simple (or short) Unix commands which would produce SVG output that would render as plots. This post demonstrates how to convert such text streams of numeric data into SVG pie charts using standard Unix tools that are present on every Mac and Linux computer.

The heart of the magic trick is this line of Awk.

awk '{L[NR]=$1;S=S+$1}END{for(i in L){T+=L[i]/S;print L[i],T}}'

You can pipe a list of numbers to this command and it will return a list of cumulative ratios. Here’s an example.

printf "35\n15\n5\n25\n45\n" \
  | awk '{L[NR]=$1;S=S+$1}END{for(i in L){T+=L[i]/S;print L[i],T}}'
35 0.28
15 0.4
5 0.44
25 0.64
45 1

In this example, 35 is 28% of the total of all the numbers. The sum of 35 and 15 is 40%, and so on until all the numbers equal 100%, the final 1. This is precisely what is needed to make a pie chart. Think of each value as a portion of the circle this input, added to previous inputs, should represent. It’s a way to build the pie chart sector by increasing sector until exactly 1 complete revolution is circumscribed.

The interesting thing about this and the reason pie charts seem a bit harder than, say, bar charts is that to know how big of a slice each item needs, the size of every item must be known. This precludes a direct immediate conversion from the data to graphics. That line of Awk nicely reads in all the inputs into memory and then reiterates over them all for the proper percentages. Computer scientists might fret that if gazillions of items are piped to this kind of two pass strategy, it might suffer memory problems. Of course the saving trick is with the use case; if this is a pie chart, billions of slices would be pointless anyway. In practice, this strategy works perfectly well on all realistic computers for all sensible pie charts.

Of course that’s just the core. There is yet another devil hiding in the details of full implementation. Here is my shell script which does the complete job of generating pie charts from Unix streams.

pie
#!/bin/bash

cat <<"EOHD"
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="500" height="1000" >
<g transform="translate(120,120)">
EOHD

awk '
function rc(){r=rand()*255;g=rand()*255;b=rand()*255;}
BEGIN{srand(6);X=-90;Y=120;}
{   L[NR]=$1;S=S+$1;
    rc();
    printf("<rect x=\"%f\" y=\"%f\" width=\"20\" height=\"20\" fill=\"#%02x%02x%02x\" />\n",
           X,Y,r,g,b);
    #$1="";
    printf("<text x=\"%f\" y=\"%f\">%s</text>\n",X+30,Y+15,$0);
    Y+=25; }
END{srand(6);
    R=100;PX=R;PY=0;
    #PROCINFO["sorted_in"]="@ind_num_asc";
    #for(i in L){       # <--- See: http://xed.ch/blog/2016/1219.html
    for(i=1;i<=length(L);i++){
        T+=L[i]/S;
        A=T*6.283185307;
        L[i]>S/2?B=1:B=0;
        X=R*cos(A);Y=R*sin(A); rc()
        printf("<path d=\"M 0 0 %.6f %.6f A 100 100 0 %d 1 %.6f %.6f z\" fill=\"#%02x%02x%02x\" />\n",
               PX,PY,B,X,Y,r,g,b);
        PX=X;PY=Y;
        }
    }' -

echo "</g></svg>"

There are three parts: a simple dump of an SVG beginning, an Awk script to provide the geometry we care about, and a simple SVG ending. Here is an example of how it can be used. Suppose we want a visual indication of the relative amount of vowels contained in the Awk man page. This Unix command will do the trick.

man awk | sed 's/\(.\)/\1 /g' | tr 'A-Z ' 'a-z\n' | grep [aeiou] \
  | sort | uniq -c | ./pie > awkman.svg

That gives the following plot, the textual contents of which I have directly cut and pasted here. Check the source (or "Inspect Element") of this page if you want to see what that looks like.

4888 a 7060 e 4605 i 3941 o 1751 u

One great property of SVG is that modern browsers effortlessly render it, as <svg> tags in HTML or as entire stand alone SVG XML documents. If you ran this command yourself, you could just tell your web browser to go to

file:///home/${USER}/the/path/awkman.svg

and you would see the same image.

If you don’t want the values printed in the legend, just uncomment this line in the script which will clear them from output.

#$1="";

One of the more mysterious lines of the Awk script is this.

L[i]>S/2?B=1:B=0;

This cures a "feature" of SVG arc paths which allow them to be computed to either take the short arc or long arc between two points with a given radius. In the case of pie charts, only items that represent more than half of the pie are going to have this issue and then there will only be at most one of them. This is demonstrated with the following chart of the results of California’s recent Proposition 64 which decriminalized marijuana.

printf "7979041 Legalization\n5987020 Prohibition\n" | ./pie > prop64.svg
7979041 Legalization 5987020 Prohibition

Both sectors connect the same two points, but one takes the long way around and one the short way.

Another tricky obstacle that had to be overcome was the choice of colors. I definitely started to overthink this but eventually I realized that choosing distinct colors would have to involve what would amount to a bad pseudo random number generator. I decided to just use Awk’s rand() function with a well chosen seed. As you see in the script I am using "6" because it seemed to produce a lot of distinct colors, but you can try your luck with other seeds if you want different (random) colors. Or you can edit the SVG yourself and replace the colors with ones of your specific choosing. Lots of possibilities. Although it could be refactored to make only one pass, the random seed is set twice, once for the pie slices and once again for the legend boxes. This preserves the same colors in the same order. Here’s an example showing more colors. This is the length of top level domains these days, including the weird impractical new ones.

wget -qO- https://www.iana.org/domains/root/db \
| sed -n '/domain tld/s@^.*>\.\(.*\)</a.*$@\1@p' \
| sed s/./#/g \
| sort \
| uniq -c \
| ./pie > tld.svg
317 ## 248 ### 233 #### 174 ##### 175 ###### 143 ####### 92 ######## 54 ######### 37 ########## 24 ########### 5 ############ 6 ############# 7 ############## 3 ############### 1 ################ 1 ################# 2 ##################

This shows at a glance that roughly half of all top level domains are themselves longer than my entire domain and about a third are impractically long depending on your requirements for typing economy. Notice also how this data is not from a static source. This data is loaded directly from http://www.iana.org and extracted and turned into a pie chart in one smooth stroke. The advantage to such a system is that by just rerunning the line, the resulting pie chart will be up to date even if the source has been changing. This would also obviously be helpful if you needed to make millions of custom pie charts.

Since the result is very open structured ASCII XML SVG, there is tremendous flexibility with what can be done with this. You can load the resulting SVG file right into Inkscape and make all the fancy edits you like. You can play with the SVG header and resize things or any of the crazy pranks SVG allows you to do. It should also be noted that the output is extremely compact, the prop64 pie chart is 653 bytes, 335 gzipped.

I used to think of excuses not to use pie charts but now I’m thinking up excuses to use them!

UPDATE: I fixed this program with a PROCINFO setting. To find out why, see my next blog post which covers it in detail.