The SWIG website has a lot of useful stuff including what SWIG is and a nice tutorial on SWIG.
SWIG is a way for software developers to take fundamental essential code written in C or C++ and make it natively available to other more convenient languages.
SWIG can automatically generate modules for the following programming languages:
-
AllegroCL
-
C# Mono
-
C# .NET
-
CFFI
-
CHICKEN
-
CLISP
-
D
-
Go language
-
Guile
-
Java
-
Lua
-
MzScheme/Racket
-
Ocaml
-
Octave
-
Perl
-
PHP
-
Python
-
R
-
Ruby
-
Tcl/Tk
An Example
Python Version
First let’s imagine a computationally expensive problem. Say we have a large amount of text that we need to find instances of alliteration in. For our simplified purposes, this means we want to give a file name and get back the number of times a word is followed by another word which starts with the same letter (and if you look closely I actually give bonus points for multiple consecutive instances). The first thing I might try is to write a Python program:
#!/usr/bin/python import sys def filesalliteration(filename): state= 0 old= 0 n= 0 t= 0 f= open(filename,'r') while 1: c= f.read(1) if not c: break if c == '\n': state= 0 elif c == ' ': state= 0 elif c == "\t": state= 0 else: if state == 0: state= 1 if c == old: n+=1 t+=n else: old= c n= 0 return t if __name__ == '__main__': for fn in sys.argv[1:]: a= filesalliteration(fn) print "[%s]:%d"%(fn,a)
This is what this program looks like when run:
$ ./alliteration.py samples/159*[samples/1591.txt.utf8]:1693
[samples/1598.txt.utf8]:1278
Running this on multiple files looks like this:
$ time ./alliteration.py samples/* | awk 'BEGIN{FS=":"}{T+=$2;N++}END{printf "Total %d in %d files. Time:",T,N}'
Total 83440 in 27 files. Time:6.683
Note that I have summarized the output with Awk. The real thing to notice here is the time (in seconds).
C Version
Let’s say the Python version is just unacceptably slow. I can now write a C version of the program. Here’s what that looks like:
/* alliteration.c */ #include <stdio.h> int main(int argc, char * argv[]) { int count; int allis; if (argc > 1) { for (count = 1; count < argc; count++) { allis= filesalliteration(argv[count]); printf("[%s]:%d\n", argv[count],allis); } } } int filesalliteration(char *fn){ char c; int old=0, n=0, t=0, state=0; FILE *thefile = fopen( fn, "r" ); while ((c= getc(thefile)) != EOF) { switch(c) { case '\n' : state= 0; break; case ' ' : state= 0; break; case '\t': state= 0; break; default: if (state == 0) { state= 1; if (c == old) { n++; t+=n; } else { old= c; n= 0; } } break; } } return t; }
Notice it’s very similar to the Python program (most C programs aren’t so lucky!) We have to compile this program before running it which looks like this:
$ gcc -o alliteration alliteration.c
$ time ./alliteration samples/* | awk 'BEGIN{FS=":"}{T+=$2;N++}END{printf "Total %d in %d files. Time:",T,N}'
Total 83440 in 27 files. Time:0.184
Notice that time dropped from 6.6 seconds in the Python program to .2 seconds in the C version. But C is a fussy language and not always fun or quick when it comes to developing bigger bodies of code.
SWIG
Now let’s explore what SWIG can do.
Interface File
The first thing we must do is to compose an interface file for SWIG to know what it should be working on. In this case that would look like this:
%module allit
%{ extern int filesalliteration(char *fn); %}
extern int filesalliteration(char *fn);
The details are only complex if you have complex requirements, but the simple explanation is that the list of functions to expect must be declared here. Apparently it is possible to even use C header files for this. For a proper explanation of interfaces files, check out this documentation.
Running SWIG
Now run SWIG itself to prepare the module code and the wrapper code.
$ swig -python alliteration.i
Compiling The Wrapper
The wrapper code is created as alliteration_wrap.c
and must be
compiled. You need to specify the development headers (make sure
they’re present: yum install python-devel
, apt-get install
python-dev
, etc). Now compile the wrapper:
$ gcc -fPIC -c alliteration.c alliteration_wrap.c -I/usr/include/python2.4
Linking The Module Into A Library
Now there is an object file for your C program (alliteration.o
) and
for the wrapper (alliteration_wrap.o
). These must be put into a
shared object library (_allit.so
) that the Python module will link
to. Here’s how to use the linker to assemble this shared library:
$ ld -shared alliteration.o alliteration_wrap.o -o _allit.so
Testing in Python
If all went well, you should now be able to use a functional Python module. Let’s try it out directly:
$ python
Python 2.4.3 (#1, Sep 21 2011, 19:55:41)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import allit
>>> allit.filesalliteration('samples/1598.txt.utf8')
1278
Seems to work. Now our Python program can look like this:
#!/usr/bin/python import allit import sys if __name__ == '__main__': for fn in sys.argv[1:]: a= allit.filesalliteration(fn) print "[%s]:%d"%(fn,a)
Results
Now we can compare the performance of these various strategies:
$ time ./alliteration.py samples/* | awk 'BEGIN{FS=":"}{T+=$2;N++}END{printf "Total %d in %d files. Time:",T,N}'
Total 83440 in 27 files. Time:6.871
$ time ./alliteration samples/* | awk 'BEGIN{FS=":"}{T+=$2;N++}END{printf "Total %d in %d files. Time:",T,N}'
Total 83440 in 27 files. Time:0.177
$ time ./alliteration+c.py samples/* | awk 'BEGIN{FS=":"}{T+=$2;N++}END{printf "Total %d in %d files. Time:",T,N}'
Total 83440 in 27 files. Time:0.193
You can see that the native C is only slightly faster than the Python code. This is a huge improvement from the native Python. The nice thing about SWIG is that if you have some fundamentally effective C code, you can make useful module for many languages pretty much automatically and get a lot more people to use and take an interest in your software.
System Call Alternative
One question that arises for simple cases like the example shown is why not just use system calls from Python to run the C code? That would be a Python program that looks like this:
#!/usr/bin/python
import sys
import os
if __name__ == '__main__':
for fn in sys.argv[1:]:
for out in os.popen('./alliteration '+fn).readlines():
print out.strip()
Besides turning into a potential security issue with possibly untrusted input controlling a system call, the performance is not very competitive (in this example at least):
$ time ./alliteration+sh.py samples/* | awk 'BEGIN{FS=":"}{T+=$2;N++}END{printf "Total %d in %d files. Time:",T,N}'
Total 83440 in 27 files. Time:0.323
Unix One Liner!
I wondered how just doing this operation in a single line using classic Unix tools compared. Here is the entire process succinctly expressed as a single Unix pipeline:
$ time cat samples/* | tr " " "\n" | grep -v "^$" | sed "s/^\(.\).*/\1/" | awk '{if(L==$1){N++;C+=N}else{L=$1;N=0}}END{printf"Total %d. Time:",C}'
Total 83440. Time:5.440