I was proudly showing off my Unix to pie chart trick when I noticed something was not right. I was showing my boss how it worked and he showed me why he’s my boss by instantly getting it and coming up with a very nice example using our data. Unfortunately, I noticed that the pie slices that should have been big were small and vice versa. Although horrifying it was, of course, a typical software demo.

I slunk off back to my lair and had a good look at this. It was definitely plotting things exactly backwards. Huh. Then I tried my original examples and they were messed up too. Then I tried his example on my computer and it worked fine. Ah ha. It works 100% fine on my computer and 100% wrong on his. That’s very weird but now the puzzle had its edge pieces in place.

After conclusively proving to myself that the exact same action with the exact same code produced two wildly different results on two different machines, I started hunting for something like that. And I found it.

This section of the gawk manual (well, info page I guess) is called Using Predefined Array Scanning Order with gawk. Ah ha. Does this mean the scanning order is not predefined? Turns out, yup. That is what it means.

By default, when a for loop traverses an array, the order is undefined, meaning that the awk implementation determines the order in which the array is traversed. This order is usually based on the internal implementation of arrays and will vary from one version of awk to the next.

Dudes… That is extremely uncool!

Suppose I have an array with the following values.

L[1]= 10
L[2]= 15
L[3]= 25
L[4]= 50

What do you think should happen if I do this?

for (i in L) print L[i]

I would (and did) guess that it would print "10, 15, 25, 50". But no. On some installations, it does this as I expected. But on some other particular machines, like my boss', it printed "50, 25, 15, 10". Wow. That is, in my opinion, very ugly.

Turns out that the fix is not too onerous. Either do the counting yourself with old C style for loops.

for (i=1;i<=length(L);i++) print L[i];

Or, and here’s where it gets freaky, add one of these before (not in the loop) the for/in statement.

PROCINFO["sorted_in"]="@ind_num_asc";
for (i in L) print L[i];

It turns out there are all kinds of ways to iterate through your arrays. The bug, in my opinion, was leaving the default behavior undefined. Just pick one! Oh well. Live and learn. Now that I understand what the problem was, I don’t feel like my demo today was an unlucky failure. It was an extremely lucky failure!

UPDATE: Double checking this on other computers, I notice that the PROCINFO["sorted_in"] setting does not always work! Compare the same command run on two different systems.

CentOS 7.2.1511 Machine
$ awk --version | head -n1
GNU Awk 4.0.2
$ seq 5 3 20 | awk '{L[NR]=$0}
> END{for(i in L)print i,L[i];print PROCINFO["sorted_in"]}'
4 14
5 17
6 20
1 5
2 8
3 11

$ seq 5 3 20 | awk '{L[NR]=$0}
> END{PROCINFO["sorted_in"]="@ind_str_asc";
> for(i in L)print i,L[i];print PROCINFO["sorted_in"]}'
1 5
2 8
3 11
4 14
5 17
6 20
@ind_str_asc
CentOS 6.7 Machine
$ awk --version | head -n1
GNU Awk 3.1.7
$ seq 5 3 20 | awk '{L[NR]=$0}
> END{for(i in L)print i,L[i];print PROCINFO["sorted_in"]}'
4 14
5 17
6 20
1 5
2 8
3 11

$ seq 5 3 20 | awk '{L[NR]=$0}
> END{PROCINFO["sorted_in"]="@ind_str_asc";
> for(i in L)print i,L[i];print PROCINFO["sorted_in"]}'
4 14
5 17
6 20
1 5
2 8
3 11
@ind_str_asc

The clear lesson here is do not use Awk’s for/in syntax! This is not Python!