This is a collection of notes that I use for reference. It is pretty complete and generally has most of the stuff I need to use, but it is deliberately not absolutely complete. There is too much obscure weird stuff in Python to include it all. This is my attempt at a good compromise for a solid collection of reference material. The emphasis is on practical usages and I try to include examples where I can to get projects up and running.

For people who aren’t sure if Python is really good or the best thing ever, this fine article makes it clear. The same author has a thorough article on porting from 2 to 3. Python 3 looks grand but my notes will, unless specifically noted, be for classic Python 2 syntax.

For people who aren’t sure if Python 3 is right for them, this absurdly good article explains all the differences.

Contents

Things I Commonly Forget

is None

I used to do things like if cool: but that seems to have become uncool in Python3. The correct way apparently is to use is.

if cool is not None:

main

Python has an odd but sensible idiom where a program is checked to see if it was run as a real program. The idea is that if it was not, then perhaps nothing should really happen. This is useful for creating modules and other subcomponents of larger projects. This way you can define a library of functions that perform various functions and import them into any code you need them in and they will not run unless explicitly called. However if you run the module as a stand alone program, then you can have a function that tests the functions of interest. This helps in development.

The specific technique is to always put your default code which should be run as a standalone program at the end after a construction like this:

if __name__ == '__main__':
    do_default_stuff()

split and join

I can never remember the order of the object/function argument.

some_nice_separator_string .join( some_sequence )

>>> ' and '.join(['romeo','juliet'])
'romeo and juliet'

some_joined_string .split( divide_at_string )

>>> 'brad and jennifer'.split(' and ')
['brad', 'jennifer']

I am getting better about this and what finally helped it sink in is that both split and join are string functions. Even though join really smells like a function concerned with sequences, it is a string function.

any

The any() function seems totally superfluous to me, but there it is. Feed it an iterable and it will return true if any of the items are true.

Help

Python has a lot of clever help facilities that make in line documentation relatively easy. Here’s an example:

def cube(x):
    """This is the help for the cube function."""
    return x*x*x
print(cube(4))     # Outputs: 64
print(cube.__doc__) # Outputs: This is the help for the cube function.

These documentation strings can be multi-line when using the triple quoting.

Function Arguments

Sometimes you want to send a bunch of arguments to a function encapsulated in a list. This uses the myfun(*theargs) syntax. To use a dictionary, myfun(**theargdict). See details in this section about unpacking argument lists in the official documentation.

Lambda

The lambda calculus of computer science is pretty wacky. It can be handy in the real world, however, and Python delivers. If you’re a beginner, skip this. Here’s how it’s done:

variable_now_a_function= lambda x,y: x + y
print(variable_now_a_function(3,2))  # Would return 5.

There’s another syntax which I like better because it looks more like def and works just the same:

f= lambda((x,y)):x+y
print(f((5,3))) # Would return 8

This doesn’t look particularly simplified from the first syntax, but it shows what’s going on better and the simplification is more apparent with forms like:

plus1= lambda(x):x+1
print(plus1(9)) # Would return 10

Lambda is often used to define variables that can be used as functions, the precise functionality of which is, well, variable. This can be useful to pass some contingent behavior along to a function or to set up operational templates that process any number of functionalities.

Slices

For sequence objects like strings and lists (and many others), Python has an absurdly elegant and powerful way to specify an exact subsequence. The general format for slices is this.

[start:end:stride]

The best tip about slices I’ve seen is to consider these values as numbering the "fence posts".

 0   1   2   3   4   5   6
 | x | e | d | . | c | h |
-6  -5  -4  -3  -2  -1   0

So to get just "xed" you do this.

>>> x='xed.ch'
>>> x[0:3]
'xed'

Note that x[3:4] is the same as x[3]. If you want to specify the end as the last position, you can just leave it empty. Same with the beginning; you don’t need to use 0.

>>> x[3:]
'.ch'
>>> x[:3]
'xed'

A negative first value starts positioning from the end. A negative second value excludes that range from the end (vs. including that range from start position with positive).

>>> x[-2:]
'ch'
>>> x[:-2]
'xed.'

A negative stride reverses the list.

>>> x[::-1]
'hc.dex'

Of course things can get weird.

>>> x[4:1:-1]
'c.d'

Slice Objects

It is possible to name slice objects, perhaps to improve clarity.

>>> zerototen= list(range(11))
>>> evens= slice(None,None,2)
>>> odds= slice(1,None,2)
>>> (zerototen[evens],zerototen[odds])
([0, 2, 4, 6, 8, 10], [1, 3, 5, 7, 9])

Or obfuscation!

>>> O= slice(1,2)
>>> zerototen[O]
[1]

Strings

  • "Double" ' or single quotes are ok.'

  • "Adjacent " "strings " "are " "concatentated" "."

  • Raw string: r’All \\\\ are retained except at the end.'

  • R’Same as with "r"?'

  • u’A unicode string is like this'

  • \x40 is an "@" and \x41 is "A".

  • \u1234 is a unicode 16-bit value (4 hex digits).

  • \U12345678 is a unicode 32-bit value (8 hex digits).

Template Formatting

General String Format:

%[(name)][flags][width][.precision]code
Table 1. String Formatting Codes
Code Use

s

String (with str)

r

String (with repr)

c

Character

d

Decimal Integer

i

Integer

u

Unsigned Integer

o

Octal Number

x

Hex Number

X

Hex with uppercase X

e

Floating Point Exponent

E

e with uppercase

f

Floating Point Decimal

F

f with uppercase

g

Floating Point E or F

G

Floating Point E or F

%

Literal %

Example:

"%(n).5G is %(t)s." % {"n":6.0221415e+23, "t":"a very big number"}
'6.0221E+23 is a very big number.'

Sequence Converters

s.join(sequence)

Join sequence with the s as separator.

s.split(separator[,maxcount])

Separate string s at separator. Don’t forget you can limit the number of splits with maxcount.

s.rsplit(separator[,maxcount])

Like split but starting from the right. Probably not too useful without maxcount. Example: "First M. Last".rsplit(None,1) is ["First M.", "Last"].

s.splitlines([keepNL])

Breaks a string by line. Keep the new lines if keepNL is present and True.

Padding and Stripping

s.expandtabs([tabsize])

Converts tabs to space. If tabsize is missing, default is 8.

s.strip([char2strip])

Strips leading and trailing whitespace (space, tab, newlines). If char2strip is present, then strip that character instead. "$99.99".strip("$") is 99.99.

s.lstrip([char2strip])

See strip. However, note that the char2strip string is not searched for itself, but rather each character that comprises it is stripped if it is found (in any order). If you want to strip off a prefix or a suffix, probably best to use replace or a slice.

s.partition(separator)

Example: "rock and roll".partition(" and ") produces ("rock", " and ", "roll")

s.rpartition(separator)

Like partition but from the right side.

s.rstrip([char2strip])

See strip. Also see the note at lstrip about the char2strip.

s.zfill

Example: print("James Bond is %s." % "7".zfill(3))

s.ljust(w[,char])

Makes a string of width w padding the right with char (whose length must be 1) or spaces. If s is longer than w then s is returned unmodified.

s.rjust(w[,char])

See ljust.

s.center(w[,char])

See ljust.

Search and Replace

s.find(stringtofind[,start[,end]])

Returns -1 when stringtofind is not in s (see index). If found, returns first position where found. The start and end parameters are like doing find on s[start:end].

s.rfind(stringtofind[,start[,end]])

Similar to find but searching for stringtofind from the right to left.

s.index(stringtofind[,start[,end]])

Basically like find but raising a ValueError if the substring is not found.

s.rindex(stringtofind[,start[,end]])

See index.

s.count(stringtocount[,start[,end]])

Counts non-overlapping occurrences of stringtocount. The other parameters behave like the do in find.

s.replace(old,new[maxsubs])

New string with old replaced by new. Use maxsubs to limit the number of substitutions.

s.startswith(stringtofind[,start][,end])

True if the stringtofind is the beginning of s (or some other point if start is given).

s.endswith(stringtofind[,start][,end])

Like startswith.

Unicode and Translating

s.decode
s.encode([encoding,[errors]])

Encoding can be ascii, utf-32, utf-8, iso-8859-1, latin-1. Errors can be strict, ignore, replace, xmlcharrefreplace.

s.format

Something to do with unicode.

s.translate(table[,delchars])

Replace characters in s with corresponding characters in table which must be string of 256 characters. The delchars string contains values which are just dropped. Note from string import maketrans is handy for making table.

s.title()

Capitalize the first word of everything. Note that apostrophes will do things like It'S So Hard For Me To Believe By Otis Rush.

s.swapcase()

Switch upper case to lower and vice versa.

s.capitalize()

Only the first word of the string is capitalized, not the whole thing (see upper).

s.lower()

Make string all lower case. Useful for normalizing silly user input.

s.upper()

Like lower.

Boolean Checks

s.isalnum()

Alphanumeric.

s.isalpha()

Is a letter.

s.isdigit()

Is a digit.

s.islower()

Is lowercase.

s.isspace()

Is a string with a length of at least 1 with all whitespace.

s.istitle()

TitleCaseWantsUpperToOnlyFollowLowerAndViceVersa.

s.isupper()

Is uppercase.

Here are some more obscure attributes:

s.__add__, s.__class__, s.__contains__, s.__delattr__, s.__doc__,
s.__eq__, s.__format__, s.__ge__, s.__getattribute__,
s.__getitem__, s.__getnewargs__, s.__getslice__, s.__gt__,
s.__hash__, s.__init__, s.__le__, s.__len__, s.__lt__, s.__mod__,
s.__mul__, s.__ne__, s.__new__, s.__reduce__, s.__reduce_ex__,
s.__repr__, s.__rmod__, s.__rmul__, s.__setattr__, s.__sizeof__,
s.__str__, s.__subclasshook__, s._formatter_field_name_split,
s._formatter_parser

For example:

$ python -c "print('a string'.__doc__)"
str(object) -> string

Return a nice string representation of the object.
If the argument is a string, the return value is the same object.

Regular Expressions

I use regular expressions a lot and I really quite like them. For shell scripting they are essential. When I used to be a strong Perl programmer, I used Perl’s excellent regular expression libraries all the time. But as I switched to Python, I found that I really just hardly ever need to use them. For example this normal shell code…

cal | grep September

…can be done in Python like this…

[X for X in os.popen('cal') if 'September' in X]

…which may not look great, but it is the Python way and if you’re cool with that, it can actually be an improvement. Note that the modern Python way now uses the subprocess module. Details.

For simple matching, I find I use the Python in and is (and not) operators a lot. Instead of regular expressions one can use Python functions like split, join, replace, find, startswith, endswith, swapcase, uppper, lower, isalnum, isspace, etc. Also "slices" and string template substitution really make regular expressions seem kind of backward and inelegant in the Python idiom.

But as the saying goes, sometimes you have a problem that really needs regular expressions; now you have two problems.

Python handles regular expressions in a rather object oriented way. No simple Perl or sed implied syntax. Here’s a small example that shows how you could go through a bunch of eclectic data looking for social security numbers.

#!/usr/bin/python
# Don't name this test program re.py! Because of...
import re

D= ["William Poned","SS:456-90-9876","3425 Ponzi Dr."]

pattern_object= re.compile('(\d\d\d)-(\d\d)-(\d\d\d\d)')

for d in D:
    #Note that "search" is satisfied to find the pattern within the string.
    match_object= pattern_object.search(d)
    if match_object:
        print(match_object.re)
        print(match_object.groups())
        print(match_object.span())
        print("Using `search` function of the match object:")
        print(match_object.group())

#The "match" function demands that it match the entire string.
pattern_object= re.compile('.*(\d\d\d)-(\d\d)-(\d\d\d\d).*')
match_object= pattern_object.match(D[1])
print("Using `match` function of the match object:")
print(match_object.group())

Here’s what this program outputs.

<_sre.SRE_Pattern object at 0xf4c1e0>
('456', '90', '9876')
(3, 14)
Using `search` function of the match object:
456-90-9876
Using `match` function of the match object:
SS:456-90-9876

Note the difference between the search and the match methods of the pattern object. The latter needs to match the entire string with the pattern while the former simply needs to find the pattern in the string somewhere.

Substitution

Here is a comparative example of a simple substitution using core Python functions and regular expressions.

Here’s a string containing the characters "JUNK" followed by 4 unknown characters all of which must be removed.

In [1]: x="This is a long string JUNK1234with some unwanted stuff in it."

There are two major 2 ways.

#1: Use find() or index() to figure out where in the string the thing is and then use slices:

In [2]: n= x.index('JUNK')
In [3]: print(x[0:n]+x[n+8:])
This is a long string with some unwanted stuff in it.

#2: Use regular expressions. Just match with "JUNK….".

In [4]: import re
In [5]: print(re.sub('JUNK....','',x))
This is a long string with some unwanted stuff in it.

Despite being a regular expression pro, in Python, I tend to minimize it and, unlike other environments, that’s easy to do (as shown here).

For more information, check the official gory details.

Lists and Sequence Types

Functions available to lists:

l.append(object)

Simply append object in place to the end of the list.

l.count(value)

Count the occurrence of value in the list.

l.extend(iterable)

Append a list with (all?) the items supplied by iterable.

l.index(value[,start[,stop]])

Return index of first occurrence of value. The other parameters act as a slice.

l.insert(index,object)

Insert object in place immediately before index.

l.pop([index])

Remove (in place) and return item at index (or last item). Raise IndexError if index is out of range or the list is empty.

l.remove(value)

Removes in place first occurrence of value or raise ValueError if not found. Note that [v for v in l if v != value] can get rid of all value occurrences from a list (makes a new list this way).

l.reverse()

Reverses the list in place.

l.sort([cmp=None][,key=None][,reverse=False])

Sorts a list in place. The cmp(x,y) function returns -1, 0, or 1 for less than, equal, or greater than respectively. See Complex Object Sorting for how to use key. Also note that there is a sorted(mylist) function that will return a new sorted list if you want to preserve the original list.

Other attributes of lists:

l.__add__, l.__class__, l.__contains__, l.__delattr__,
l.__delitem__, l.__delslice__, l.__doc__, l.__eq__, l.__format__,
l.__ge__, l.__getattribute__ l.__getitem__, l.__getslice__,
l.__gt__, l.__hash__, l.__iadd__, l.__imul__, l.__init__,
l.__iter__, l.__le__, l.__len__, l.__lt__, l.__mul__, l.__ne__,
l.__new__, l.__reduce__, l.__reduce_ex__, l.__repr__,
l.__reversed__, l.__rmul__, l.__setattr__, l.__setitem__,
l.__setslice__, l.__sizeof__, l.__str__, l.__subclasshook__

List Comprehension

List comprehensions are a nice way to apply some action to a list in such a way that a new list is generated. The syntax is a bit odd at first, but it’s actually pretty reasonable and compact. Note that the functionality is comparable to the map function.

>>> [pow(2,y) for y in range(8)]
[1, 2, 4, 8, 16, 32, 64, 128]
>>> map(lambda y:pow(2,y),range(8))
[1, 2, 4, 8, 16, 32, 64, 128]

The list comprehension can be conditional which basically operates like a filter function.

>>> [x for x in range(1e3) if not x%333]
[0, 333, 666, 999]
>>> filter(lambda x:not x%333,range(1e3))
[0, 333, 666, 999]

reduce

While we’re covering wacky functions that like the lambda construction, here’s a use of the reduce function. I really don’t think this function is very useful, but this was my best attempt to do something more exotic with it than the normal adding a bunch of things.

>>> reduce(lambda hold,next:hold+chr(((ord(next.upper())-65)+13)%26+65),'jjjkrqpu','')
'WWWXEDCH'

I think this is as good as it gets judging by this.

If you never understood reduce’s utility you’re in luck! Python 3 removed it. You can still use the reduce function found in functools. But really, probably best to consider it dead to Python.

Generators

Generators are very much like list comprehensions except they don’t synthesize the entire list into memory at their location. Instead they produce a generator object which can be iterated, generally with a next() function. Each time it is iterated, the next item in the sequence is generated at that time until the specified objects are exhausted.

>>> a=9
>>> g=(x+a for x in range(10))
>>> g
<generator object <genexpr> at 0x7f41c0b63af0>
>>> next(g)
9
>>> for x in g: print(x,end=' ')
...
10 11 12 13 14 15 16 17 18

The generator syntax is a shorthand for a more verbose style involving the yield keyword. The yield keyword returns an argument just like return (unlike return the argument is mandatory). Then the function’s state is preserved and the next call to it resumes where it left off. This can be reset by a return statement or just a natural end to the function.

The following example illustrates the usage with a function that provides unique incrementing ID numbers.

#!/usr/bin/python
def numberer(id=0):
    while True:
        id += 1
        yield id

if __name__ == '__main__':
    ID_set_1= numberer()
    ID_set_2= numberer(10)
    for n in range(3):
        print(ID_set_1.next(), ID_set_2.next())

This produces:

1 11
2 12
3 13

Dictionaries

In python dictionaries are lists of items which store a value which is indexed by a key (as opposed to a list which indexes by an index position, a number). The order of items in a dictionary is usually unreliable since order is not needed for its management.

Generally dictionaries can be created like this:

>>> d=dict({'akey':'avalue','bkey':'bvalue'})
>>> d
{'akey': 'avalue', 'bkey': 'bvalue'}
>>> d['bkey']
'bvalue'

There are actually many ways to create dictionaries but why be complicated?

Here are methods that can be applied to dictionaries:

d.clear()

Remove all items in the dictionary.

d.copy()

Returns a shallow copy of d.

dict.fromkeys(sequence[,value])

Creates a new dictionary with items that have keys found in sequence. The value, if present, is applied to all new items. I don’t think this function sensibly acts on an existing dictionary but it is a dictionary method. For this reason it seems cool to just apply it to dict. This dict.fromkeys("xyz",0) produces {"y": 0, "x": 0, "z": 0}.

d.get(key[,elsevalue])

Same as d[key] except that if elsevalue is present and the key is not, then elsevalue is returned. Since elsevalue defaults to None then no KeyError is raised with this function.

d.has_key(key)

If the dictionary has an item with a key of key then returns True. Otherwise False. Same as k in d.

d.items()

Returns list of key,value tuples. Order is unreliable.

d.iteritems()

Produces an iteration object that can take .next() methods producing key,value tuples of all the items until a StopIteration exception.

d.iterkeys()

See iteritems() but with just the keys.

d.itervalues()

See iteritems() but with just the values.

d.keys()

Returns a list of keys. Order is unreliable.

d.pop(key[,elsevalue])

Like get but removes item in addition to returning its value. Unlike get if no elsevalue is provided and key isn’t in d then a KeyError is raised. This is a way to try to remove an item whether it exists or not; just make sure to specify an elsevalue.

d.popitem()

Not like pop! It is more like iteritems. Returns some item’s key,value tuple or, if no items are present, raises a KeyError.

d.setdefault(key[,elsevalue])

Almost exactly like get but in addition to returning elsevalue, it sets the specified key to it leaving that item subsequently defined. If no elsevalue is specified and key isn’t in d then an item key,None item is created.

d.update(d2)

Ok, this one’s a serious messy pile of function. Merges the items in d2 into d. It can also take key,value pairs like d.update({"m":13,"n":14}).

d.values()

Returns values in a list. Order is unreliable.

Other dictionary attributes:

d.__class__, d.__cmp__, d.__contains__, d.__delattr__,
d.__delitem__, d.__doc__, d.__eq__, d.__format__, d.__ge__,
d.__getattribute__, d.__getitem__, d.__gt__, d.__hash__,
d.__init__, d.__iter__, d.__le__, d.__len__, d.__lt__, d.__ne__,
d.__new__, d.__reduce__, d.__reduce_ex__, d.__repr__,
d.__setattr__, d.__setitem__, d.__sizeof__, d.__str__,
d.__subclasshook__

Tuples

A tuple is a type that gets its name (I think) from the idea of "multiple" or "quintuple". Its two most important aspects are that it is immutable and that it is a collection of references to other objects. This makes tuples ideal for passing around between functions because you know that the order of the arguments will not change and also because you don’t have to copy ("by value") all the argument data into another memory location to make it available to the function.

Tuples can be "unpacked" in the following way:

>>> origin= (0,0)
>>> x,y= origin
>>> print("X:%d Y:%d" % (x,y))
X:0 Y:0

Tuples do not have many idiosyncratic methods that can be called on them. Here they are:

t.count(value)

Returns the number of time value is found in t.

t.index(value[,start[,stop]])

Returns the position of the first occurrence of value. If the other parameters are supplied, it searches on a slice.

The Python built-in function zip is notable for returning a list of tuples composed of other lists.

>>> zip([1,2,3],['a','b','c'])
[(1, 'a'), (2, 'b'), (3, 'c')]

Output is only as long as the shortest list.

One very useful thing that can be done with this is listwise operations. Here, for example, I’m calculating a perceptron value by taking the sum of each input value times each corresponding weight, and then adding the bias. Here inputs is a list of input values and weights are the corresponding weights for each input position. Bias is just a constant.

value= sum([i*w for i,w in zip(inputs,weights)],bias)

The map command can serve for zip if the lists are the same length.

>>> map(None,[1,2,3],['a','b','c'])
[(1, 'a'), (2, 'b'), (3, 'c')]

Other attributes of tuple types:

t.__add__, t.__class__, t.__contains__, t.__delattr__, t.__doc__,
t.__eq__, t.__format__, t.__ge__, t.__getattribute__, t.__getitem__,
t.__getnewargs__, t.__getslice__, t.__gt__, t.__hash__, t.__init__,
t.__iter__, t.__le__, t.__len__, t.__lt__, t.__mul__, t.__ne__,
t.__new__, t.__reduce__, t.__reduce_ex__, t.__repr__, t.__rmul__,
t.__setattr__, t.__sizeof__, t.__str__, t.__subclasshook__

Sets

Are sets real Python objects? I think they must be:

>>> s= set([1,2,3,4])
>>> type(s)
<type 'set'>

They are certainly one of the more obscure and unused primary types in Python. I suspect that there may be some fantastic performance improvement in certain contexts, but I don’t know what those are.

The main points about sets are that they are unordered and they contain no duplicate elements.

Here’s a good overview of how sets are used:

>>> set1
set([0, 1, 2, 3, 4, 5, 6])
>>> set2
set([3, 4, 5, 6, 7, 8, 9])
>>> set1-set2
set([0, 1, 2])
>>> set2-set1
set([8, 9, 7])
>>> set1 & set2
set([3, 4, 5, 6])
>>> set1 | set2
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> 4 in set2
True

Note that sets do not have a + operator. If you need something like that look more into the | operator which is a set union. The & is a set intersection and there are explicit named functions for this too.

Here are some other examples:

>>> seta= set(['ann','bob','carl','doug','ed','frank'])
>>> setb= set(['ann','carl','doug','frank','gary','harry'])
>>> seta - setb
set(['ed', 'bob'])
>>> seta.difference(setb)
set(['ed', 'bob'])
>>> setb.difference(seta)
set(['gary', 'harry'])
>>> seta | setb
set(['ed', 'frank', 'ann', 'harry', 'gary', 'carl', 'doug', 'bob'])
>>> seta.union(setb)
set(['ed', 'frank', 'ann', 'harry', 'gary', 'carl', 'doug', 'bob'])
>>> seta & setb
set(['frank', 'ann', 'carl', 'doug'])
>>> seta.intersection(setb)
set(['frank', 'ann', 'carl', 'doug'])
>>> seta.symmetric_difference(setb)
set(['gary', 'harry', 'ed', 'bob'])
>>> setb.symmetric_difference(seta)
set(['gary', 'ed', 'harry', 'bob'])

Sets can be used to remove duplicates from a list. Here’s what that would look like:

thelist= list(set(thelist))

Note that this does not preserve order which may have been important to you.

Set Element Removal Functions

s.difference(badset)

Same as s - badset. This example, set("abcd").difference(set("cd")), produces new set(["a","b"]). Note that the other way around is different. In the example, it produces an empty set since a and b are ignored as not present and c and d are removed.

s.difference_update(badset)

Same as difference but just take out the badset from s.

s.symmetric_difference()

Similar to difference but the order is not important. Any objects that are in both sets (that intersect) are removed. This returns a new set.

s.symmetric_difference_update()

Same as symmetric_difference but operates in place.

s.intersection(s2)

Same as the s & s2 as applied to sets. Returns the elements in common between s and s2.

s.intersection_update(s2)

Instead of returning the element in common, it simply changes s.

s.pop()

Takes no arguments and returns an unreliable element which is then removed from the set. If the set is empty, a KeyError is raised.

s.discard(object)

Remove object from s if it is a member. Very similar to difference except only for one object and modifies in place instead of returning a new one.

s.remove(object)

Remove an object from a set in place or KeyError if it’s not there. Except for the exception, pretty much like discard.

s.clear()

Make the set completely empty.

Set Augmentation Functions

s.add(object)

Adds a single object to s (silently ignores it if already present).

s.union(s2)

Same as s | s2. Returns a new set containing all the elements that were in s and all the new elements in s2.

s.update(s2)

Incorporates elements of s2 into s. If all of s2 is already present, then nothing happens. It’s pretty much a union in place.

s.copy()

Shallow copy of the set.

Set Test Functions

s.isdisjoint(s2)

Returns True if s and s2 have no elements in common. Basically the same as not s & s2.

s.issubset(s2)

Note the order is as implied by the function name, True if s is a subset of s2 and not the other way around.

s.issuperset(s2)

Returns True if s contains s2. Pretty much the same as issubset but with the object and argument switched. A particular object as both object and argument is true, it is a subset of itself.

Other attributes of set objects:

s.__and__, s.__class__, s.__cmp__, s.__contains__, s.__delattr__,
s.__doc__, s.__eq__, s.__format__, s.__ge__, s.__getattribute__,
s.__gt__, s.__hash__, s.__iand__, s.__init__, s.__ior__, s.__isub__,
s.__iter__, s.__ixor__, s.__le__, s.__len__, s.__lt__, s.__ne__,
s.__new__, s.__or__, s.__rand__, s.__reduce__, s.__reduce_ex__,
s.__repr__, s.__ror__, s.__rsub__, s.__rxor__, s.__setattr__,
s.__sizeof__, s.__str__, s.__sub__, s.__subclasshook__, s.__xor__

Classes and Object Oriented Stuff (OOP)

Python isn’t just capable of using object-oriented features. It has been designed with that as a primary aspect of the language. One nice thing about the design, however, is that unlike Java, you can safely ignore all object oriented features and get quite a bit of useful programming done. But when object oriented features make a lot of sense and would actually reduce complexity, Python is there to make it quite simple.

The best way to remind myself how it all works is to look at some good code I have written that aptly uses the tradition OO programming style. This excerpt of my Geogad language shows a base class representing geometric entities and sub classes derived from it. It shows the definition of member data items and member functions. It also shows the idea of a class variable which in this case is useful to keep a unique ID number for each entity whatever its subtype.

class Entity:
    """Base class for functionality common to all entities."""
    lastIDused= 0 # Class static variable, new id pool.
    master_VList= vector.VectorList()
    def __init__(self):
        Entity.lastIDused += 1 # Increment last ID used...
        self.p_id= Entity.lastIDused #...which becomes the ID.
        self.vectors= dict() # Vectors that define entity's geometry.
        self.attribs= dict() # Properties of this entity.
    def __eq__(self, comparee): # Overload == operator.
            for p in self.vectors.keys():
                if not self.vectors[p] == comparee.vectors[p]:
                    return False
            else:
                return True
        except KeyError: # If they entities are totally mismatched.
            return False # E.g. A has end1 and end2, B has cen & rad.
    def __repr__(self):
        return 'A generic entity'
    def rotate(self, ang, basepoint=None): # On xy plane (%)
        pass
    def scale(self, fact, basepoint= None):
        if basepoint: # Move it to origen.
            self.translate(-basepoint)
        for p in self.vectors.keys():
            temp= self.vectors[p]
            newv= temp * fact
            Entity.master_VList.append( newv )
            Entity.master_VList.remove( temp )
            self.vectors[p]= newv
        if basepoint:
            self.translate(basepoint) # Put it back.
    def centroid(self):
        pass
    def boundingbox(self):
        pass
    def translate(self, offset):
        for p in self.vectors.keys():
            temp= self.vectors[p]
            newv= temp + offset
            Entity.master_VList.append( newv )
            Entity.master_VList.remove( temp )
            self.vectors[p]= newv

class Entity_Point(Entity):
    def __init__(self, P):
        Entity.__init__(self)
        self.vectors['A']= P
        Entity.master_VList.append(P) # Check in with the master list.
    def copy(self, offset=None):
        cp= Entity_Point(self.vectors['A'])
        if offset:
            cp.translate(offset)
        return cp
    def __repr__(self):
        return 'POINT:'+ str(self.vectors['A'])

class Entity_Line(Entity):
    def __init__(self, Pa, Pb):
        Entity.__init__(self)
        if Pa < Pb: # Here the vectors are sorted for predictability.
            self.vectors['A'], self.vectors['B']= Pa, Pb
        else: # The __lt__ is a bit arbitrary.
            self.vectors['A'], self.vectors['B']= Pb, Pa
        Entity.master_VList.append(Pa) # Check in with the master list.
        Entity.master_VList.append(Pa) # Check in with the master list.
    def copy(self, offset=None):
        cp= Entity_Line(self.vectors['A'], self.vectors['B'])
        if offset:
            cp.translate(offset)
        return cp
    def __repr__(self):
        return 'LINE:'+ str(self.vectors['A'])+str(self.vectors['B'])

There are a lot of built-in functions that can be overloaded to give your objects a more natural functionality. For example, whatever your object is, there is probably some sense of how big it is. Overloading the Python __len__() method for the class can make len(MyObject) do the right thing, whatever that is. This is a pretty good resource for figuring out what your options are.

Decorators

Decorators seem kind of lame to me. They basically add no fundamental functionality as far as I can tell. They seem to only turn this…

def f(x):
    return x
f = d(f)

…into this:

@d
def f(x):
    return x

I can’t say I’m super impressed by that. It seems like it’s for people who don’t know how to handle functions as objects, but what do I know? I find it weird that this syntax refers to something that is not yet defined and that’s not how Python should work. If it did, we could have main at the top of our programs. It is worth noting the first syntax as an alternative to decorators since it provides a clearer way to selectively activate them.

Nonetheless, some uses for decorators:

  • Timing something out so that it does not hang indefinitely.

  • Profiling something to see how long various parts of your code take.

  • Type checking questionable input parameters.

  • Checking the security context of a function.

  • Tests.

  • Logging that a function actually got run.

  • Counting the number of times it got run.

Function Timer

Here’s a decorator example that times a function:

#!/usr/bin/python
import time

def timethis(f):
    """A decorator function to time things."""
    def timed_function(*args):
        start= time.time()
        f(*args)
        print('Time was %f' % (time.time()-start))
    return timed_function

@timethis
def example_fun(s):
    print("Sleeping for %.6f seconds." % s)
    time.sleep(s)

if __name__ == '__main__':
    example_fun(1.23)

This program outputs something like this:

Sleeping for 1.230000 seconds.
Time was 1.231388

Function Timeout

Sometimes you’re expecting something to happen and you’re not sure how long it will take. You do know that if it goes beyond a certain threshold, you would rather just abort. An example of this is if you are scanning for fast internet mirrors from which to download something. In this case, by definition, there would exist slow mirrors and they may be so slow that they bog down the operation quite a lot. With the following decorator, you can give each mirror a certain amount of time to attempt its operation before cutting your loses and pulling the plug.

#!/usr/bin/python
"""See: `man 2 alarm`
         http://docs.python.org/library/signal.html"""
import signal
LIMIT= 2 #seconds

def nohang(f):
    """A decorator function to cancel a function that takes too long."""
    def raiseerror(signum, frame): # Handlers take two args.
        raise IOError
    orig= signal.signal(signal.SIGALRM, raiseerror)
    signal.alarm(LIMIT)
    def time_limited_function(*args):
        try:
            f(*args)
        except:
            print("Timed out!")
        signal.signal(signal.SIGALRM, orig)
        signal.alarm(0)
    return time_limited_function

@nohang
def wait_this_long(t):
    import time
    time.sleep(t)
    print('Finished OK in %d seconds' % t)

if __name__ == '__main__':
    wait_this_long(3)
    wait_this_long(1)

Produces:

Timed out!
Finished OK in 1 seconds

Decorator With Arguments

I think that a situation like this:

@d(a)
def f(x):
    return x

…is the same as this:

def f(x):
    return x
i= d(a)  # i is an intermediate function which produces a function.
f= i(f)

I could be wrong though. Here’s a working example of a decorator that can be adjusted with an argument.

#!/usr/bin/python

def Decorator(DecoArg):
    def DecoArgWrapFunc(FuncPassed2Deco):
        print('Decorator argument: %s' % DecoArg)
        def DecoratedFunction(*args):
            print('Start decoration(%s)...' % DecoArg)
            RetValOfFuncPassed2Deco= FuncPassed2Deco(*args)
            print('End decoration(%s)...' % DecoArg)
            return RetValOfFuncPassed2Deco # To simulate UserFunc.
        return DecoratedFunction
    return DecoArgWrapFunc

@Decorator('DA')
def UserFunc(UserArg):
    print('In user function with user argument: %s' % UserArg)
    return pow(2,UserArg)

print('Value of call to the user function: %s' % UserFunc(8))

This program produces the following output.

Decorator argument: DA
Start decoration(DA)...
In user function with user argument: 8
End decoration(DA)...
Value of call to the user function: 256

Exceptions

Philosophy

Error handling in Python is quite powerful, but it can be a bit complex too. In Python an "exception" is an event that accompanies an error situation and it is valuable to realize that all errors in Python use the exception mechanism. Because of this, it’s pretty useful to know how to deal with them. For example, python doesn’t have a simple "exit" keyword and the ultimate way programs stop is when this effectively happens:

raise SystemExit

Although understanding exceptions is important, I have a bit of philosophical unease with the idea of planning for things you didn’t plan for. My thinking is that if you expect an exceptional situation, you should take precautions to make it not exceptional. In the Python world this seems to be divided into the "Easier to Ask for Forgiveness than Permission" (EAFP) and "Look Before You Leap" (LBYL) factions.

The classic case is a division by zero error. What is the functional difference between letting a division operator raise a very specific ZeroDivisionError and just checking to see if the denominator is zero before proceeding? I think that sometimes you want such explicit control and sometimes you can tolerate a certain amount of ambiguity. For example, if you’re going to do 100 divisions and the whole set are invalid if any one has a zero denominator, then the code might be easier to write and later understand if you use exception facilities. However, if you know that a certain attribute might be missing from an object, it is very reasonable to do a if hasattr(object,propstr): rather than try: ... except: AttributeError:. In theory these are very similar approaches as hasattr basically calls getattr and catches exceptions. However, if using exceptions directly, looking at the code later will tell you nothing about why you thought to catch an error there, i.e. something about object and propstr. For all your future self knows, you were just covering the "unknown unknowns".

I like to explicitly check for all the things I can think of which will go wrong (LBYL) and then use exceptions to stagger away still breathing if something truly unforeseeable happens. To me it’s like working on a roof while wearing a safety harness - falling off the roof is still to be avoided and jumping off the roof seems to be seriously full of the wrong attitude.

There are other cases, however, where exceptions are strategically preferred. One example is explained on the OS module documentation. It points out that checking to see if a file is readable and then reading it is not as robust as just trying to open it and then catching an exception when it doesn’t work. The thinking is that in the former case, an attacker could devise a way to change the state of the thing being checked between the check and the action.

Implementation

Code to be monitored for an exception is "tried" with the try keyword. An "exception" is "raised" (not "thrown" as in C++ and Java). An exception is, uh, excepted with except and not "caught" with catch.

The basic syntax looks like:

try:
    AttemptSomething()
except LoneException:
    HandleThisBadThing()
except (ExceptionOne, ExceptionTwo, ExceptionN):
    HandleAnyOfTheseBadThings()
else:
    SomethingRanClean() # Executes if no exceptions raised.
finally:
    SomethingWasAttempted() # Runs if exceptions were raised or not.

Standard Exceptions

What exceptions are there to be raised? Here’s an abridged diagram of the exception hierarchy. Note that if you except an EnvironmentError exception then it will catch IOError and OSError since those are subclasses, or types, of that exception.

BaseException
 +-- SystemExit
 +-- KeyboardInterrupt
 +-- GeneratorExit
 +-- Exception
      +-- StopIteration
      +-- StandardError
      |    +-- BufferError
      |    +-- ArithmeticError
      |    |    +-- FloatingPointError
      |    |    +-- OverflowError
      |    |    +-- ZeroDivisionError
      |    +-- AssertionError
      |    +-- AttributeError
      |    +-- EnvironmentError
      |    |    +-- IOError
      |    |    +-- OSError
      |    +-- EOFError
      |    +-- ImportError
      |    +-- LookupError
      |    |    +-- IndexError
      |    |    +-- KeyError
      |    +-- MemoryError
      |    +-- NameError
      |    +-- ReferenceError
      |    +-- RuntimeError
      |    |    +-- NotImplementedError
      |    +-- SyntaxError
      |    |    +-- IndentationError
      |    +-- SystemError
      |    +-- TypeError
      |    +-- ValueError
      |         +-- UnicodeError
      +-- Warning

You can find out about exceptions with help:

$ python -c "help(EOFError)" | sed -n '/^class/,/ |  /p'
class EOFError(StandardError)
 |  Read beyond end of file.

Or this:

python -c "import exceptions; help(exceptions)"

Exception Data

Looks like there has been a change in the syntax used to manage exception information. Formerly it was something like:

except ValueError, exception_instance:

And now it is something like:

except ValueError as exception_instance:

What this means is that in your except clause you can use the exception instance that is generated with the raise (perhaps as part of a system error). This exception instance contains some handy stuff relating to the error (usually). It seems that the various built in exceptions have a variety of attributes. For example, an IOError will have a filename attribute that you can access. Here is an example of the basic attribute system showing the generic data stored by the exception object and how to find out what exactly the exception object can tell you.

#!/usr/bin/python
try:
    raise Exception('arg1','arg2','arg3')
except Exception as exception_instance:
    print("dir(exception_instance):")
    print(dir(exception_instance))
    print("type(exception_instance):")
    print(type(exception_instance))
    print("exception_instance:")
    print(exception_instance)
    print("exception_instance.args:")
    print(exception_instance.args)

Produces:

dir(exception_instance):
['__class__', '__delattr__', '__dict__', '__doc__', '__format__',
'__getattribute__', '__getitem__', '__getslice__', '__hash__',
'__init__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__setstate__', '__sizeof__',
'__str__', '__subclasshook__', '__unicode__', 'args', 'message']
type(exception_instance):
<type 'exceptions.Exception'>
exception_instance:
('arg1', 'arg2', 'arg3')
exception_instance.args:
('arg1', 'arg2', 'arg3')
Note
The exception_instance.message attribute seems like it has been deprecated. Best not to rely on it!

Custom Exceptions

If none of the built in exception classes seem appropriate or if they lack the necessary attributes, you can create your own exception classes which more specifically do what you need. You can inherit from any exception, but it’s normal to use Exception as an uncluttered base class. Here’s an example of what methods and structure such a user defined exception class entails.

#!/usr/bin/python

class UserDefinedException(Exception):
    """Best to inherit from Exception class."""
    def __init__(self, ex_att_param):
        self.user_attribute= ex_att_param
    def __str__(self):
        return repr(self.user_attribute)

try:
    raise UserDefinedException('Idiot programmer alert!')
except UserDefinedException as InstanceOfUserDefinedException:
    print(InstanceOfUserDefinedException.user_attribute)

This example program produces simply Idiot programmer alert!.

File operations

Checking

Use the os.access() function to check if the file you are interested in is in the condition you expect.

if not os.access(f,os.F_OK):
    print("Nonexistent file problem: %s"%f)
if not os.access(f,os.R_OK):
    print("Unreadable file problem: %s"%f)

You can also use os.W_OK and os.X_OK to test for writable and executable.

Also consider os.path for checking on directories.

os.path.isdir(path)

Simple Reading

To read an entire file into a string:

entire_file_contents= open(filename).read()

To read each line of a file use something like:

f= open(filename,'r')
for l in f:
    print(l)

Note that this comes with newlines from the file and from the print. Use sys.stdout.write(l) to avoid this problem.

Other streams besides sys.stdout are sys.stdin and sys.stderr.

with + as

It looks like since version 2.5, the fanciest (and best?) way to open files and read through them is to use the new with and as reserved words. Here’s an example that counts the lines in a file:

count= 0
with open('filename','r') as f:
    for l in f:
        count += 1
print(count)

The advantage here, apparently, is that the file gets neatly closed even if it’s rudely interrupted and never makes it to your file closing statement. Or something like that.

A better way to think of the with as syntax is that it seems to set up a "context". An object which has an __enter__() and __exit__() method can be used with with as such that the enter method gets called upon entering the block and the exit method gets called, unsurprisingly, upon exit. This is why it’s such a reasonable way to handle file opening because things can be done with the file and the exit function makes sure that, whatever weirdness transpires, the file will get closed.

Another example of when this would be appropriate would be some kind of routine that output some SVG. You may want to adjust the view parameters with a <g transform=...> block. You might start with the opening tag and then do a bunch of stuff and then print a </g> tag from the exit method of an object. This will allow for nested objects and the opening and closing tags will always exist and be correct.

#!/usr/bin/python
class SVGsettings(object):
    def __init__(self,p):
        self.property= p
    def __enter__(self):
        print('<g %s>'%self.property)
    def __exit__(self, type, value, traceback):
        print("</g><!--%s-->"%self.property)

with SVGsettings('stroke-width="1.5"'):
    with SVGsettings('stroke="red"'):
        print('<line x1="3" y1="0" x2="25" y2="44"/>')
Output
<g stroke-width="1.5">
<g stroke="red">
<line x1="3" y1="0" x2="25" y2="44"/>
</g><!--stroke="red"-->
</g><!--stroke-width="1.5"-->

It might be smart to rewrite my HTML tagger with this style.

Another example is one of a transaction "lock". If you need to do something like update a database and it must be locked to prevent access from other actors, you can have the lock be in the __enter__() method and the release be in the __exit__() method. This way, even if bad things happen, the lock will get released properly.

I think the core of the syntax is that this code (with as VAR optional):

with EXPR as VAR:
    BLOCK

Is equivalent to:

mgr = (EXPR)
exit = type(mgr).__exit__  # Not calling it yet
value = type(mgr).__enter__(mgr)
exc = True
try:
    try:
        VAR = value  # Only if "as VAR" is present
        BLOCK
    except:
        # The exceptional case is handled here
        exc = False
        if not exit(mgr, *sys.exc_info()):
            raise
        # The exception is swallowed if exit() returns true
finally:
    # The normal and non-local-goto cases are handled here
    if exc:
        exit(mgr, None, None, None)

The __exit__() function needs to take 4 values, self, type, value, and traceback. I think these are the arguments of a raise statement.

Also newer Pythons (2.7 and 3+) support stuff like this.

with open("customers") as f1, open("transactions") as f2:
    # do stuff with multiple files

Complete mind boggling details can be found here.

Temporary File Names

Here is a sample where a temporary file is created and then the program turns control over to Vim for the user to compose something and when the user quits, the Python program has all of the input. This is what you would do if you wanted to, for example, recreate the functionality of a mail client like Mutt.

#!/usr/bin/python

import tempfile
import subprocess
tmpfile= tempfile.NamedTemporaryFile(dir='/tmp').name
vimcmd= '/usr/bin/vim'
subprocess.call([vimcmd,tmpfile])

with open(tmpfile,'r') as fob:
    f= fob.readlines()
    if f[1].strip() == '='*len(f[0].strip()):
        print("Title: %s" % f[0])
    else:
        print(f[1])
        print('='*len(f[0]))
    print("%d lines entered." % len(f))

Buffering Issues

Although normally not something to worry about, sometimes it’s important to remember that Python tends to politely buffer output as a general rule. You can have unbuffered output by invoking Python with the -u run time option. I’ve found this to be important when using tee, named pipes, and other fancy stream situations.

Compression

Python can deal with compressed files just fine. There are the modules gzip and bz2 which are very similar but not identical. Here’s how to read a gzip compressed file.

import gzip
f= gzip.open(filename)

Then work with f as a normal file handle. This example is illustrative for writing compressed files:

How to compress an existing file
import gzip
f_in = open('file.txt', 'rb')
f_out = gzip.open('file.txt.gz', 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()

Here’s an example of how to use bz2. This little program is a (rough) wc program for text files with bzip2 compression.

#!/usr/bin/python
import bz2
import sys
fn= sys.argv[1]
b=0;w=0;l=0
f= bz2.BZ2File(fn, 'r')
for line in f.readlines():
     b+= len(line); w+=len(line.split(' ')); l+=1
print("%d bytes, %d words, %d lines" % (b,w,l))

File Input and Standard Input

The fileinput really simplifies getting things from files or standard input.

input.py
#!/usr/bin/python
import fileinput
for l in fileinput.input():
    print(l.strip().title())

This program produces this output.

$ ./input.py myfile
This File Can Be Sent As Input Both As
A File Argument And As Standard Input.
$ ./input.py <myfile
This File Can Be Sent As Input Both As
A File Argument And As Standard Input.
$ ./input.py myfile - <myfile
This File Can Be Sent As Input Both As
A File Argument And As Standard Input.
This File Can Be Sent As Input Both As
A File Argument And As Standard Input.

Interactive Input

When writing menu-driven features or other interactive programs that wait for a user to input things the following can be useful. This example shows how to suppress echoing for applications like passwords or where the key press' value is not relevant.

import os
os.system('stty -echo')
passwd= raw_input('Password:')
os.system('stty echo')

Note that Python3 took out raw_input. Now it’s just input. An answer here has a clever comprehensive solution.

Works in 2 and 3
try: input= raw_input
except NameError: pass
print("You entered: " + input("Prompt: "))

CSV

Now and again some lackwit sends you a file in a popular spreadsheet format. You use something like this to try and decrypt it.

libreoffice --headless --convert-to csv --outdir ./csvdir/ yuck.xls

But then you have crazy business like this.

a,b,"c1,c2,c3",d,"e1,e2",f

Which is extremely tedious to parse. But not with Python!

csv2bsv.py
#!/usr/bin/python
import sys
import csv
with open(sys.argv[1],'r') as f:
    for r in csv.reader(f):
        print('|'.join(r))

Binary Data

Sometimes clever people put data into very efficient binary containers. Using C is the preferred way to deal with this, but if you’re lazy, Python does a great job of decoding binary data too.

>>> import struct
>>> packformat='>cHcHIccccccccccccccccHHIcHccHcHIcHccccccccccccccccccccc'
>>> struct.unpack(packformat,open('/tmp/mybinary.sbd','rb').read())
('\x01', 69, '\x01', 28, 2200337468, '3', '0', '0', '2', '3', '4', '0', '6', '2',
'9', '5', '9', '9', '6', '0', '\x00', 11682, 0, 1456502452, '\x03', 11,
'\x01', ' ', 49763, 'u', 12901, 0, '\x02', 21, '\x00', ' ', 'M', '@',
'\x00', '\x01', 'P', '\xef', '\xf0', ' ', '\x08', 'J', '\x00', 'Y', '_',
'\xcc', '&', 'L', '\x91', '\xe7', '}')
Unpack Codes
  • c = char 1 byte 0-255 (256 values)

  • H = unsigned short 2 byte 0-65535 (65,536 values)

  • I = unsigned int 4 bytes 0-4294967295 (4,294,967,296 values)

Full unpack codes can be found in the official struct module documentation.

Formatting binary stuff can be done with something like this

>>> '{0:02d}'.format(0b00011110)
30

Fullish details on that can be found here.

Pickle

Although there are often better and more secure ways to save Python objects (see JSON below for example), an old classic is Python’s pickle. This object serialization basically just takes any Python object and makes it into a thing that can be written into a file. The end result of this trick is that you can dump some memory state (to a file, across a network, etc) and load it back into memory at another time and place.

import pickle
my_object= My_Object(1,2,3)
# ===== Save Object =====
with open('my_object.p','wb') as pickle_file:
    pickle.dump(my_object,pickle_file)
# ===== Clear Object =====
my_object= None
# ===== Restore Object =====
with open('my_object.p','rb') as pickle_file:
    my_object= pickle.load(pickle_file)

Pickle can serialize any objects you dream up. If your objects don’t involve homemade classes, i.e. they only use Python native types, consider the marshal module.

I think the shelve module provides a key/value style interface to pickle, if you like that kind of thing.

JSON

There are more Pythonic ways of serializing objects (marshal, pickle, cpickle) but in 2013, the way that makes the most people happiest across platforms and languages is JSON. Serendipitously, JSON looks almost identical to a Python dictionary’s __repr__() output. Here’s a sample of how to deal with JSON in a simple case.

json_sample.py
#!/usr/bin/python
import json
import sys
pfile= open("test.json",'r')
P= json.load(pfile)
for p in P.keys():
    P[p]+= 1
json.dump(P,sys.stdout) # Put some writable file object here.
sys.stdout.flush()

This might produce this result:

$ cat test.json
{"a": 1.5, "b": 1.5707963, "c": 0.95, "d": 0.55, "e": 10.0}
$ ./json_sample.py test.json
{"a": 2.5, "c": 1.95, "b": 2.5707963, "e": 11.0, "d": 1.55}

System Control

Python has several methods to allow arbitrary execution of system commands (exiting to a temporary shell). Obviously this is powerful and dangerous where security is an issue. It’s also often clumsy as the proper Python way of doing things is usually better than the shell way when you factor in the spawning of the shell.

This stuff has gone through a lot of changes over the years, but as of 2014, the consensus is to use the subprocess module.

Here is a nice overview of this kind of stuff.

Here are some methods:

os.listdir('./path') # Produces a list. `~` doesn't work. No hidden files.
os.system('ls ./path') # Just does the thing.
os.popen('ls .','r').read() # Captures the output into a string.
for f in os.popen('ls .','r').readlines(): print(f)# Deal with each.

If this doesn’t do what you need, you can investigate the fancier functions of os like popen2, popen3, popen4, fork, spawn, and execv. See the official os help for more details.

Note
It seems that popen and friends are now deprecated since version 2.6. This is a real moving target. Looks like the new way is the subprocess module.

Subprocess

Here’s the recommended way for executing shell commands as of 2013.

Start with getting the lines of output from the simplest kind of command to fill a Python list.

''.join(map(chr,subprocess.check_output(['cal']))).split('\n')

The reason for all that guff is that this check_output command produces a bytes object. Another way to untangle a byte stream object is to decode it.

>>> b'Line one\nLine two'.decode('utf-8').split('\n')
['Line one', 'Line two']

Here are some more examples.

>>> import subprocess
>>> n= subprocess.Popen(['df','-h','/media/WDUSB500TB'],stdout=subprocess.PIPE)
>>> o= n.stdout.read()
>>> o
'Filesystem            Size  Used Avail Use% Mounted on\n/dev/sdb              459G  350G   86G  81% /media/WDUSB500TB\n'

Note the stdout=subprocess.PIPE value to the Popen constructor. This is required to keep the function from immediately dumping the results on the spot. The function does run immediately when the constructor runs. So if you do a date function, for example, and there’s a lag between the constructor and the n.stdout.read() the time will reflect the initial operation.

Proper Python documentation suggests that it’s good to use the supplied convenience functions when possible. These are call, check_call, and check_output. Here’s how the latter work:

import subprocess
findcmd= ['find', '/home/xed/', '-name', '*pdf']
for PDF in subprocess.check_output(findcmd).strip().split('\n'):
    print("PDF: %s" % PDF)
# Might output list like:
#    PDF: /home/xed/SlaughterhouseFive.pdf
#    PDF: /home/xed/gpcard.pdf

Here’s an example of an outside command that gets run with some data the Python program knows about being piped to the command and the standard output being captured back into the program.

>>> import subprocess
>>> pro= subprocess.Popen(['/usr/bin/tr','a-z','A-Z'], shell=False, stdin=subprocess.PIPE,stdout=subprocess.PIPE)
>>> pro.stdin.write("It might get loud.\n")
>>> pro.communicate()
('IT MIGHT GET LOUD.\n', None)

Note that the second item (pro.communicate()[1]) is the standard error.

Environment Variables

To access environment variables from python use this technique:

>>> import os
>>> os.environ['USER']
'xed'

Time And Date

Working with times and dates can be tricky in Python. There are a lot of seemingly overlapping modules (date, time, datetime) and everything is done very fastidiously. This can make simple things seem complex. Here are some common usage cases dealing with times.

import datetime
print(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

This produces '2012-10-18 16:50:37' and is typical of timestamps found in logging situations.

If you have some kind of Unixy tool giving you seconds from the epoch, you can tidy that up with this.

>>> datetime.datetime.fromtimestamp(1456502452).strftime('%Y%m%d %H:%M:%S')
'20160226 08:00:52'

The timedelta objects can be useful for relative dates. Again note the datetime.datetime.SOMEFUNCTION syntax which is not entirely obvious in the Python documentation.

today= datetime.datetime.strptime('2016-05-20','%Y-%m-%d')
aday= datetime.timedelta(1)
yesterday= today-aday
lastweek= today-(7*aday)

Another common requirement involving time is to profile code or for some other reason find out how long something took.

import time
start= time.time()
do_some_lengthy_thing()
print('Elapsed time: %f' % (time.time()-start))

See the function timing decorator for this kind of application implemented in a general way.

Here’s an simple example that calculates days between two events.

dayselapsed.py
#!/usr/bin/python
""" Usage: dateelapsed.py 2012-01-09 2014-03-13 """
import datetime,sys
s,e = sys.argv[1],sys.argv[2]
sd= datetime.datetime.strptime(s,'%Y-%m-%d')
ed= datetime.datetime.strptime(e,'%Y-%m-%d')
dd= ed-sd
print(dd.days)

Random Numbers

Here’s how to use random numbers in normal usage.

import random
random.seed([any_hashable_object])  # Default is sys' time or urand
random.randint(a,b)                 # a <= N <= b
random.choice(sequence)             # Pick one
random.shuffle(sequence)            # Same list scrambled - in place!
random.random()                     # floating point [0.0,1.0)
'%06x' % random.randint(0,0xFFFFFF) # random hex color for HTML, etc.
Synchronized Shuffling
>>> A,B=[1,2,3,4,5],['a','b','c','d','e']
>>> import random
>>> Z=(list(zip(A,B))) ; random.shuffle(Z)
>>> A,B=[n[0] for n in Z],[n[1] for n in Z]
>>> A,B
([3, 2, 1, 5, 4], ['c', 'b', 'a', 'e', 'd'])

Also see from sklearn.utils import shuffle.

More Entropy Please

Also, os.urandom(n) returns n random bytes from /dev/urandom or some other OS specific collection of high quality entropy. This is slower and depends on actual random events having occurred on the system.

Hashing

There is a built in function called hash which can return the numerical hash of a hashable object.

>>> print(hash('xed'))
-2056704

But that is probably not what you need. This is the more common application and technique for hashes:

import hashlib
md5sum= hashlib.md5(open(fn,'rb').read()).hexdigest()

Here’s a tip that might be helpful.

fn= 'my_critical_data.tgz'
goodmd5= '5d3c7e653e63471c88df796156a9dfa9'
actualmd5= hashlib.md5(open(fn,'rb').read()).hexdigest()
assert actualmd5 == goodmd5, '{} is corrupted!'.format(fn)

Besides md5, hashlib also supports SHA1, SHA224, SHA256, SHA384, and SHA512 hashing.

Note
It used to be import md5 but that is apparently deprecated. If you’re using a very old Python and hashlib doesn’t work, give it a try.

Math

The math module does all the normal stuff, usually as expected.

Some Math Functions
  • pi - Constant ready to use.

  • e - Constant ready to use.

  • ceil - Next whole number float.

  • floor - Previous whole number float.

  • sqrt - Use import cmath for negative values and fun with complex numbers.

  • atan - Returns in radians.

  • atan2 - Takes two arguments, a numerator and a denominator, so that the correct quadrant can be returned.

  • sin - Trig functions like radians.

  • degrees - Convert from radians.

  • radians - Convert to radians.

  • log - Don’t get caught by int(math.log(1000,10)) being equal to 2.

  • log10 - Use int(math.log10(1000)) instead.

  • gamma - Fancy float capable way to do factorials. Maybe supply n+1 if you want math.factorial or just use that function.

NumPy

Can of worms! But super powerful. The key trick of NumPy is that it has an array object that makes arrays more like C arrays (with strides) but with all the accounting done. This allows the performance to be much better than native Python objects, especially for large numeric data sets. It’s also quite good at linear algebra. See my TensorFlow notes for an example of that.

This is a nice tip for getting help strings out of NumPy syntax.

np.info(np.npcommand)

NumPy Types And Object Attributes

Types
  • np.int, np.uint8, np.int64, np.uint16, np.int8, np.int16, np.intc (C sized int)

  • np.float, np.float32, np.float64, np.float16

  • np.complex (same as 128), np.complex128, np.complex64

  • np.bool

  • np.object

  • np.string_

  • np.unicode_

Information about your array
  • mynparr.shape

  • mynparr.flags

  • len(mynparr)

  • mynparr.ndim

  • mynparr.size

  • mynparr.nbytes

  • mynparr.dtype

  • mynparr.dtype.name

  • a[x] - element x

  • a[x,y] - element x,y, similar, perhaps the same as a[x][y].

  • a[0:3,0:3,0:3] - full slices options for each dimension.

  • a[1,…] - same as a[1,:,: etc ,:]

  • mynparr.view(<type>)

Casting
  • mynparr.astype(uint8)

  • mynparr.tolist() - convert back out of NumPy to regular Python.

Creating Arrays

  • np.loadtxt(file.txt[,skiprows=1][,delimiter=|) - load from text file. Also, see np.savetxt(file,a,delimiter=|) and np.save()

  • np.array( [ (1,2,3),(4,5,6) ], dtype=float)

  • np.ones( (x,y,z…), dtype=np.int16 )

  • np.zeros( (x,y) )

  • np.zeros_like(template) - I think this makes a zero array in the shape of another one. Like your other array all zeroed out.

  • np.empty( (x,y,z…) ) - similar to zeros but in reality the values are never set.

  • np.arange(2,101,2) = 2,4,6…98,100 - Third arg is tick interval.

  • np.linspace(start,end,[qty]) - qty defaults to 50. evenly spaces values. Third arg is number of ticks.

  • np.full( (x,y), val ) - fills an array of specified size with val.

  • np.eye(n) - Identity matrix (array really) of given size. Same as np.identity(n).

  • np.random.seed(whatever) - allows controlled repeats of random experiments.

  • np.random.random( (x,y…) ) - makes an array of specified size with random values.

Arithmetic

Element by element
  • np.add(a,b) - two arguments only. Adds corresponding elements of two arrays. Also has the + operator overloaded.

  • np.sub(a,b) - similar to np.add. -

  • np.multiply(a,b) - Simple multiplication of each corresponding element. *

  • np.divide(a,b) - Same as multiply. /

  • np.remainder(a,b) - Same as mod. Maybe %.

  • np.exp(n) - Raise e (2.71828182846) to the n power for each element.

  • np.sqrt(n)

  • np.sin(rad) - Sine of radians for each element.

  • np.arctan2(y,x) - Normal trig angle finding.

  • np.cos(rad) - Cosine of radians for each element.

  • np.log(n) - Natural log for each element.

  • np.dot(a,b) - Dot product of an array. Also a.dot(b) format.

Listwise
  • np.sum(a) or a.sum() - Adds up contents. Even adds nested values unless a ,axis=n is included to lock certain dimensions.

  • np.min(a)

  • np.max(a)

  • np.histogram(a,bins=b,range=(0,255)) - range defaults to a.min() and a.max(). Returns counts in bins and bin edges (fenceposts).

    np.histogram(np.array([1,2,2,3,3,3]),bins=4, range=np.array([0,4]))
    1, 2, 3]), array([ 0.,  1.,  2.,  3.,  4.]))
  • np.argmax(a) - returns the index of the maximum value’s position. Good for finding the peaks locations of a histogram, for example.

  • np.argmin(a) - Similar to argmax.

  • np.nonzero(a) - returns indices (or array of locations) where nonzero values occur. a=np.hstack( (np.arange(3),np.arange(3) ) ) ; (a==1).nonzero()A produces (array([1, 4]),).

  • np.mean(a) - average

  • np.median(a)

  • np.std(a) - standard deviation.

  • np.corrcoef(a,y=b) - Pearson correlation coefficient. a and b must

  • np.logical_or(a,b) - Or.

  • np.logical_and(a,b) - And.

  • np.logical_not(a,b) - Not.

  • np.equal(a,b) - == - listwise, returns array of bools.

  • np.array_equal(a,b) - True or False if the whole arrays are identical.

Also a[a<2] gives true values where a < 2.

Fancy
  • np.polyfit(x,y,degree) - So with x 0 to 5 and y being x^2, like this np.polyfit(np.arange(6),np.arange(6)*np.arange(6),2), produces array([1,0,0]) since this is y= 1*x^2 + 0*x + 0.

  • np.cumsum(a) - changes (1,2,3,4) to (1,3,6,10). Cumulative sum. have same length.

  • np.convolve(a,b) - a is longer than b (or they’re auto switched). This is complicated, but I think here it is a function C(t) where t is (time but whatever) an offset for the values of two functions. So at t=0, a and b are checked in the same place and the values are multiplied where they align, the products are summed, and that’s the value returned at C(0). At t=10, the a function is sampled and the b is taken from 10 units ahead (or behind?). A product is found, everything is summed, and that’s C(10). "Same" means size. "Valid" is only returning the full overlapping region.

    np.convolve(np.ones(5),np.ones(5))
    ([ 1.,  2.,  3.,  4.,  5., 4.,  3.,  2.,  1.])
    np.convolve(np.ones(5),np.ones(5),mode="same")
    ([ 3.,  4., 5.,  4.,  3.])
    np.convolve(np.ones(3),np.ones(2),mode="valid")
    ([ 2., 2.])

    Here’s an example demonstrating that it’s a type of sum of products function. Imagine three $100 purchases in different states.

    tax,cost=np.array([1.08,1.03,1.04]),np.array([100,100,100])

    The total spent can be computed like this.

    np.convolve(cost,tax,mode="valid")
    array([ 315.])

    See np.polmul() too. If you dare.

  • np.sort(a) - Sorts in place. Seems to return nothing. Use axis where needed.

  • np.flip(a,axis) - flipup is same with axis=0, fliplr is same with axis=1.

  • np.flipup(a) - Flips the array up for down (mirrors on a horizontal axis).

  • np.fliplr(a) - Flips the array left for right (mirrors on a vertical axis).

  • np.rot90(a) - Rotates matrix values. Seems CCW. np.rot90( np.arange(4).reshape(2,2) ) = array([ [1, 3], [0, 2] ])

  • np.copy(a) - Deep copy?

  • np.transpose(a) - or a.T, transpose - makes 3x2 into 2x3.

  • np.ravel(a) - flattens. Not to be confused with tensorflow.contrib.layers.flatten.

  • np.reshape(a,(newx,newy)) - rearranges dimensions but keeps data.

  • np.resize(a,(newx,newy)) - adds (recycled?) data if needed to pad things.

  • np.mgrid= Fills multi dimensional arrays with puzzling sequences related to the arrays' dimensions. "dense mesh grid" np.mgrid[:2,:2] = array([ [ [0, 0], [1, 1] ], [ [0, 1], [0, 1] ] ]) Use with transpose to get coords for a grid pattern.

  • np.ogrid= Similar to mgrid but even weirder. "open mesh grid" np.ogrid[:2,:2] - [array([ [0], [1] ]), array([ [0, 1] ])]

  • np.unique(a) - removes duplicate items.

  • np.append(a,b) - Almost identical to concatenate but with syntax differences. Note that you don’t append [ [*] [*] [*] ] with a [*]. You need a [ [*] ]. See Growing Arrays below.

  • np.insert(a,pos,item) - inserts item at position of array a.

  • np.delete(a,[n]) - delete item n from array a. Not in place!

  • np.concatenate( (a,b) ) - [1,2,3] and [4,5] become [1,2,3,4,5].

  • np.c_[a,b] - stack by columns

  • np.column_stack( (a,b) ) - seems the same as np.c_

  • np.r_[a,b] - very similar to concatenate for some simple arrays. The r is for stacking by rows.

  • np.vstack( (a,b) ) - vertical stack. If a and b have shape (3,2) then (6,2) results. np.vstack( (np.arange(3),np.arange(3) ) ) produces array( [ [0, 1, 2], [0, 1, 2] ] ).

  • np.hstack( (a,b) ) - horizontal stack. If a and b have shape (3,2) then (3,4) results. np.hstack( (np.arange(3),np.arange(3) ) ) produces array([0, 1, 2, 0, 1, 2]).

  • np.hsplit(a,n) - makes a list of arrays broken as specified.

  • np.vsplit(a,n) - similar to hsplit but with different axis perspective.

  • np.dstack( (a,b) ) - if a and b’s shape is (3,2), this makes a shape (3,2,2). Imagine multiple 2d images now in an array (stack) indexable with another dimension.

Growing Arrays

The traditional idea with arrays is that you reserve the memory you need and that’s that. But sometimes you need to build an array up from smaller parts and it’s more convenient to increase its size than replace parts of it (e.g. you may not know the final size). This happened to me where I needed to read in a sequence of images and store the whole collection as an array (holding each image) of an array (holding each image’s row) of an array (holding each row’s column) of an array (holding each pixel’s RGB). Assume a collection of three 2x2 grayscale images.

ims= np.reshape(np.random.random(12),(3,2,2)) * 255
ims= ims.astype(np.uint8)
array([[[ 99,   5], [137,  73]], [[145, 124], [ 14,  36]],
       [[183,  78], [ 88,  82]]     ], dtype=uint8)

Now suppose you have a new image that you want to add.

i= (np.reshape(np.random.random(4),(2,2)) * 255).astype(np.uint8)
array([[155, 237], [160,  27]], dtype=uint8)

You might think that having something like [*] would be what you need to add to something like [ [*] [*] [*] [*] ] but in fact, you need something like [ [*] ]. So here’s what works.

i.shape
(2, 2)
i= i.reshape(1,2,2)
i.shape
(1, 2, 2)
np.append(ims,i,axis=0) # Not in place!
array([[[ 99,   5], [137,  73]], [[145, 124], [ 14,  36]],
       [[183,  78], [ 88,  82]], [[155, 237], [160,  27]]], dtype=uint8)
np.vstack((ims,i)) # Does the same thing. Note extra paren.
np.concatenate((ims,i)) # Does the same thing.
np.r_[ims,i] # Unbelievably, same thing.

Sorting

Python sorting used to be kind of tricky since the sort function was something that was attached to a list object and sorted in place. That is still true. For example:

>>> a=[3,4,1,2,0]
>>> a.sort()
>>> a
[0, 1, 2, 3, 4]

This caused so much confusion that a new function was added to return a sorted version of the original list. This produces a new list and leaves the original one alone.

>>> a=[3,4,1,2,0]
>>> sorted(a)
[0, 1, 2, 3, 4]
>>> a
[3, 4, 1, 2, 0]

Complex Object Sorting

There are many fancy ways of sorting things. Often you have a list of lists and you want to sort by some item in the list. Here’s a list of tuples representing (model_number,score) which need to be sorted so that the top 5 scoring models are displayed.

for top5 in sorted(score_list,key=lambda x:x[1],reverse=True)[0:5]:
    print('#{}={:.3f}'.format(top5[0],top5[1]))

Strangely I haven’t found a cleaner way to do this. Here’s another more complicated example of a two level sort.

m=[ ['Cho Oyu',8188,1954], ['Everest',8848,1953], ['Kangchenjunga',8586,1955],
['K2',8611,1954], ['Lhotse',8516,1956], ['Makalu',8485,1955] ]
ms2= sorted(m, key=lambda x:x[1], reverse=True ) # Secondary key
ms1= sorted(ms2, key=lambda x:x[2]) # Primary key

Here the result ms1 is sorted by date of ascent (earliest first) and then, if that is the same, by mountain height (highest first). The results look like this:

[['Everest', 8848, 1953], ['K2', 8611, 1954], ['Cho Oyu', 8188, 1954],
['Kangchenjunga', 8586, 1955], ['Makalu', 8485, 1955], ['Lhotse', 8516, 1956]]

Graphics

There are many options for getting Python to draw arbitrary things graphically.

Table 2. Python Modules Useful For Creating Graphics
Tool Import Package 1

Tkinter

Tkinter

python-tk

De facto standard.

pyCairo

cairo

python-cairo 2

Not the easiest to use.

PyX

pyx

python-pyx

Specializes in PostScript.

pyglet

pyglet

python-pyglet

pygame

pygame

python-pygame

wxPython

wx

python-wxtools

Major window took kit.

PyQt

qt

python-qt3

Major window tool kit.

PyGTK

gtk

python-gtk2

Major window tool kit.

PIL

PIL

python-imaging

Format filters mostly.

1. On Ubuntu 12.04.

2. Already installed on Ubuntu and CentOS.

I tend to often just write directly into PostScript.

Tkinter

Although Tkinter is not installed by default on many Linux systems, the rumor is that it is included with Python on other platforms. It is the official graphics toolkit for Python and is blessed by the language maintainers. If you just need to open a window on your screen and draw some stuff, say to plot some data, it is probably the easiest options. Here is a working example that does the minimum useful thing:

from Tkinter import *
c= Canvas(bg='white', height=1000, width=1000)
c.pack()
c.create_line(100,100,200,200)  # X1,Y1,X2,Y2

Plotting Data Visualization Graphs

If you need to "graph" some data, Python can help. The main technique is to use matplotlib. Although a bit overly fancy and likely to spontaneously burst into a GUI, it is powerful and, in some modes, easy:

from pylab import *
x= [1,2,3]; y= [1,4,9]
plot(x,y)
# show()   # Use this for interactive goofing off.
savefig('./filename.png')

For more details, check out my complete notes on matplotlib.

Also, check out Pychart.

Plotting Graph Theory Graphs

See pydot which is the Python interface to the mighty Graphviz package.

Command Line Parsing

getopt

The original way to parse options draws stylistic inspiration from the C version. Many languages (Bash, Perl) have such a thing and if you’re used to one of them, the Python version won’t be too complicated.

getopt_example.py
#!/usr/bin/python
# An example of how to parse options with the 'getopt' module.
import sys
import getopt

# Initialize help messages
options=           'Options:\n'
options= options + '  -a <alpha>   Set option alpha to a string. Default is "two".\n'
options= options + '  -b <beta>    Set option beta to a number. Default is 1.\n'
options= options + '  -h           Show this help.\n'
options= options + '  -v           Show current version.'
usage = 'Usage: %s [options] arguments\n' % sys.argv[0]
usage = usage + options

# Initialize defaults
alpha= "one"
beta= 2
version="v0.0-pre-alpha"

# Parse options
try:
    (opts, args) = getopt.getopt(sys.argv[1:], 'ha:b:v', ['help','alpha=','beta=','version'])
except getopt.error, why:
    print('getopt error: %s\n%s' % (why, usage))
    sys.exit(-1)

try:
    for opt in opts:
        if opt[0] == '-h' or opt[0] == '--help':
            print(usage)
            sys.exit(0)
        if opt[0] == '-a' or opt[0] == '--alpha':
            alpha= opt[1]
        if opt[0] == '-b' or opt[0] == '--beta':
            beta= int(opt[1])
        if opt[0] == '-v' or opt[0] == '--version':
            print('%s %s' % (sys.argv[0], version))
            sys.exit(0)
except ValueError, why:
    print('Bad parameter \'%s\' for option %s: %s\n%s' % (opt[1], opt[0], why, usage))
    sys.exit(-1)

if len(args) < 1:
    print('Insufficient number of arguments supplied\n%s' % usage)
    sys.exit(-1)

print('alpha=%s beta=%s' % (alpha, beta))
for (n,a) in enumerate(args):
    print('Argument %d: %s' % (n,a))

argparse

There is a module called optparse which has been deprecated since Python version 2.7. In its place is the newer and pretty awesome argparse module. Before getting excited about this module, check to see if it’s on the system you intend to use.

Here’s an example of how to use it. This example should be pretty much functionally equivalent to the getopt example above.

argparse_example.py
#!/usr/bin/python
import argparse

parser = argparse.ArgumentParser(description='A demonstration of argparse.')

parser.add_argument('-a', '--alpha', default='one', help= 'Set option alpha to a string.')
parser.add_argument('-b', '--beta', default=2, type=int, choices=[0,1], help= 'Set option beta to a binary digit.')
parser.add_argument('-v', '--version', action='version', help= 'Print the version.', version="v0.0-pre-alpha")
parser.add_argument('the_rest', metavar='file', type=str, nargs='+', help='One or more filenames.')

args= parser.parse_args()

print('alpha=%s beta=%d' % (args.alpha, args.beta))
print('Specified files: %s' % ', '.join(args.the_rest))

Here’s another example that might be simpler. It shows how one would collect the necessary options for a simple multiplication drill. It wants to know what number you are interested in practicing and whether the drill should include all lower numbers (cumulative).

A simpler argparse example
import argparse
parser = argparse.ArgumentParser(description='Multiplication drills.')

parser.add_argument('N', type=int, help='Which number to drill.')
parser.add_argument('-c','--cumulative',
            help='Makes the drill cumulative.', action='store_true')

args= parser.parse_args()
if args.cumulative:
    print("Cumulative Test of %ds" % args.N)

Here’s a case where I needed two classes of arguments with unknown quantities. One or more files needs to be supplied for each type of file.

    parser= argparse.ArgumentParser(description='Vehicles and non-vehicles.')
    parser.add_argument('-V','--vehicle',dest='V',required=True,
                        nargs='+',metavar="Vlist", type=str,
                        help='CSV list of vehicle directories')
    parser.add_argument('-N','--nonvehicle',dest='N',required=True,
                        nargs='+',metavar="NVlist", type=str,
                        help='CSV list of non-vehicle directories')
    args= parser.parse_args()

Run with something like this.

./vehicle_classify.py -V ../data/vehicles/v? -N ../data/non-vehicles/nv?

This produces something like this for args.V and args.N respectively.

['../data/vehicles/v1', '../data/vehicles/v2', '../data/vehicles/v3',
'../data/vehicles/v4', '../data/vehicles/v5']
['../data/non-vehicles/nv1', '../data/non-vehicles/nv2']

Web Programming

Python is one of the premier languages for web-based programs. Here are some helpful techniques for web projects.

cgitb Module

One of the best reasons to use Python for web projects is the cgitb module. This stands for CGI TraceBack and is a diagnostic tool to help you understand what might be going wrong with your Python script run over the web. The nice thing is that this is super easy to use and super useful when activated. Here’s an example showing how to use it (simply import and enable it) and some faulty code which takes advantage of it:

cgitb Example
#!/usr/bin/python
import cgitb
cgitb.enable()
idontexist()

Putting this in a cgi-bin directory and typing its URL in a browser produces this very cool diagnostic (which in this case correctly notices that the function idontexist does not exist):

--> --> -->
 
 
<type 'exceptions.NameError'>
Python 2.7.2: /usr/bin/python2.7
Sat Jun 30 12:50:21 2012

A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred.

 /var/www/fs/users/xed/cgi-bin/cgitest.py in ()
      2 import cgitb
      3 cgitb.enable()
=>    4 idontexist()
      5 
      6 #content_type= 'Content-type: text/html\n\n'
idontexist undefined

<type 'exceptions.NameError'>: name 'idontexist' is not defined
      args = ("name 'idontexist' is not defined",)
      message = "name 'idontexist' is not defined"

If you’re looking at the output without HTML rendering, you’ll also notice that this is tacked on to the previous HTML message for maximum intelligent utility:

<!-- The above is a description of an error in a Python program, formatted
     for a Web browser because the 'cgitb' module was enabled.  In case you
     are not reading this in a Web browser, here is the original traceback:

Traceback (most recent call last):
  File "/var/www/fs/users/xed/cgi-bin/cgitest.py", line 4, in <module>
    idontexist()
NameError: name 'idontexist' is not defined
-->

Note that this works on any Python program run over the web, not just ones that use CGI per se. It is advisable to comment out the enable line when your program is served live to the public to avoid any leaking of sensitive information such as how your code works. But other than that, use this early and often.

Content Type

Before generating any HTML, every web program will most likely need to send back the HTTP content type. It’s often useful to make a global variable of it.

Content Type Global Variable
content_type= 'Content-type: text/html\n\n'

HTML Generation

I personally hate Python code that is filled with HTML. HTML should be in HTML documents and Python should be programming. But sometimes they mix annoyingly. This throws off syntax highlighting and the wholesome goodness of Python’s formatting and style. Here is a technique I use in my Python code to completely obviate the need for any HTML.

This function can be imported into programs requiring the generation of HTML. It allows you to not put HTML in python code. It’s easier to type, easier to think about, and it doesn’t break syntax highlighting. When run as a standalone program, it prints a complete HTML document as a demonstration.

html_tagger.py
#!/usr/bin/python
def tag(tag, contents=None, attlist=None):
    """No HTML in my programs! This function functionalizes HTML tags.
    Example: tag('a','click here', {'href':'http://www.xed.ch'})
    Produces: <a href="http://www.xed.ch">click here</a>
     Param1= name of tag (table, img, body, etc)
     Param2= contents of tag <tag>This text</tag>
     Param3= dictionary of attributes {'alt':'[bullet]','height':'100'}
    """
    tagstring= "<"+tag
    if attlist:
        for A in attlist:
            V= attlist[A].replace('"','&quot;')
            attstring= ' '+A+'="'+V+'"'
            tagstring += attstring
    if contents:
        tagstring += ">\n"+contents.rstrip()+"\n</"+tag+">\n"
    else:
        tagstring += "/>\n"
    return tagstring

if __name__ == '__main__':
     Title= tag('head', tag('title', "A Test"))
     Text= tag('body', tag('p', "No html here. Just sensible code."))
     print(tag('html', Title + Text))
Output of html_tagger.py test routine
<html>
<head>
<title>
A Test
</title>
</head>
<body>
<p>
No html here. Just sensible code.
</p>
</body>
</html>

Web Programming Environment

The technique above is useful for generating web-based output. To process web sourced input, the cgi module is helpful. This module is very helpful but it is not magical. I think the most helpful way to illustrate what it does is to not use it and see what that looks like.

Assuming the helpers such as the tag() function as defined as above are in place, the following code is very illustrative:

Print CGI Program’s Entire Environment
#!/usr/bin/python
import os
vars= ''.join([tag('dt',k)+tag('dd',os.environ[k]) for k in sorted(os.environ.keys())])
print(content_type + (tag('html',tag('body',tag('dl',vars)))))
Note
Now that you see how to do it yourself, don’t forget about import cgi; cgi.test() which when run as a single line program over a web interface produces similar and somewhat more comprehensive data about what’s going on.

When run you get a list of environment variables that your CGI program knows about. This sample list may or may not include some of the following you would see:

DOCUMENT_ROOT

/var/www

GATEWAY_INTERFACE

CGI/1.1

HTTP_ACCEPT

text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

HTTP_ACCEPT_CHARSET

ISO-8859-1,utf-8;q=0.7,*;q=0.3

HTTP_ACCEPT_ENCODING

gzip,deflate,sdch

HTTP_ACCEPT_LANGUAGE

en-US,en;q=0.8,de;q=0.6,es;q=0.4

HTTP_CONNECTION

Keep-Alive

HTTP_COOKIE

v1=keyvaluepairs;v2=ofany;v3=cookiesthat;v4=yourbrowser;v5=offersthisdomain

HTTP_HOST

xed.ch

HTTP_USER_AGENT

Wget/ (linux-gnu)

PATH

/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin

QUERY_STRING

a=a&b=simple&c=test

REMOTE_ADDR

192.168.0.10

REMOTE_PORT

48731

REQUEST_METHOD

GET

REQUEST_URI

/~xed/cgi-bin/cgitest.py?a=a&b=simple&c=test

SCRIPT_FILENAME

/var/www/fs/users/xed/cgi-bin/cgitest.py

SCRIPT_NAME

/~xed/cgi-bin/cgitest.py

SERVER_ADDR

192.168.0.99

SERVER_ADMIN

wwwadmin@xed.ch

SERVER_NAME

www.xed.ch

SERVER_PORT

80

SERVER_PROTOCOL

HTTP/1.1

SERVER_SIGNATURE

Apache Server at www.xed.ch Port 80

SERVER_SOFTWARE

Apache

UNIQUE_ID

T@9vm6nkPz0AFBbDuW0FAFAM

Plus any special variables your web server sets using Apache’s SetEnv directive will also be present.

Obviously if your program can print this stuff out, you have quite a bit of control over what is going on. This particular little program is quite useful to track down problems with path and environment issues as well as debugging more complicated or annoying details such as user agent settings for stupid web sites.

Two important variables to note for CGI programming are REQUEST_URI and QUERY_STRING. The first contains the entire URL used to effect this response while the second contains just the part intended to serve as input for this program. You can parse this directly yourself and for very simple applications, I think it is reasonable to do so.

When the number and complexity of the variables your program wishes to define from the QUERY_STRING becomes more involved, then it is sensible to use the cgi module. The point of showing how things would work without it is to illustrate that it’s not absolutely critical (and sometimes not even especially helpful) to use it.

This exercise also indicates how one might test CGI programs without using a web server at all. Since all that is really going on is that the web server is simply setting some variables, you can explicitly set them on the command line to test things. Here’s an example:

$ QUERY_STRING='a=a&b=simple&c=test' mycgiprogram.cgi

cgi Module

Here is an example of a complete form processing program showing many different kinds of form elements. This program shows a form and if submitted shows the data submitted and a new form to repeat the process.

Full CGI Form Example
#!/usr/bin/python
import cgi
from html_tagger import tag

content_type= 'Content-type: text/html\n\n'
br= '<br/>'

def generate_form():
    f= list()
    f.append( 'Username:' + tag('input', '', {'type':'text', 'name':'uid'}) +br )
    f.append( 'Password:' + tag('input', '', {'type':'password', 'name':'pwd'}) +br )
    f.append( tag('input',None, {'type':'radio','name':'hyp','value':'1'}) + 'True' )
    f.append( tag('input',None, {'type':'radio','name':'hyp','value':'0'}) + 'False' +br )
    f.append( tag('input',None, {'type':'checkbox','name':'metal','value':'cu'}) + 'copper' )
    f.append( tag('input',None, {'type':'checkbox','name':'metal','value':'fe'}) + 'iron' +br )
    f.append( tag('select',
                       tag('option','chromium',{'value':'cr'})+
                       tag('option','manganese',{'value':'mn'})+
                       tag('option','nickel',{'value':'ni'})+
                       tag('option','zinc',{'value':'zn'})
                   ,{'name':'alloy'}) +br )
    f.append( tag('textarea','Edit this text!',{'rows':'5','columns':'40','name':'essay'}) +br )
    f.append( tag('input',None,{'type':'submit','value':'Do This Form'}) )
    return tag('form', ''.join(f),
                 {'name':'input','action':'./cgitest.py','method':'get'})

def display_data(myf):
    c=   tag('tr',tag('td',"Name:")+tag('td', myf["uid"].value))
    c += tag('tr',tag('td',"Password:")+tag('td', myf["pwd"].value))
    c += tag('tr',tag('td',"Hypothesis:")+tag('td', myf["hyp"].value))
    c += tag('tr',tag('td',"Metal:")+tag('td', ','.join(myf.getlist('metal'))))
    c += tag('tr',tag('td',"Alloy:")+tag('td', ','.join(myf.getlist('alloy'))))
    c += tag('tr',tag('td',"Essay:")+tag('td', ','.join(myf.getlist('essay'))))
    return tag('table',c,{'border':'1'})

form= cgi.FieldStorage()
if 'uid' not in form or 'pwd' not in form or 'hyp' not in form:
    content= tag('h4','A Form') + "Please fill in the user and password fields." + generate_form()
else:
    content= display_data(form) + br + generate_form()

print(content_type)
print(tag('html',tag('body',content)))
Note
For composing HTML output this program uses the tag function defined above. Also, include cgitb as described above if there are problems you wish to debug.

The output of this CGI programming example is the following:

A Form

Please fill in the user and password fields.
Username:
Password:
True False
copper iron


Note
If you’re seeing this in a web browser, it will look functional, but obviously it’s not. It’s just the HTML that the previous program generated (minus html and body tags).

One Program Executable On The Command Line And Over The Web

Here’s a technique I’ve used for programs that I want to work with a text menu at the console and also to automatically support a web interface when run remotely from a web browser.

if __name__ == "__main__":
    if os.getuid() == 48: # apache:x:48:48:Apache:/var/www:/sbin/nologin
        html_version()
    else:
        while True:
            text_version_menu()
Note
There may be better indicators of whether we’re coming from a browser or not. See the cgi.test() above for possibilities. Perhaps REQUEST_URI.

Upload A File

Here’s a short routine that does nothing but allow one to upload a file to the server it’s run on. I found this handy to allow me to simply upload photos off my stupid Android phone to my own server. It nicely demonstrates how to handle POST methods and file uploads using the cgi module.

up.py
#!/usr/bin/python
import os
import cgi
from html_tagger import tag

content_type= 'Content-type: text/html\n\n'
form = cgi.FieldStorage()

if not form:
    acturl= "./up.py"
    ff= tag('input','',{'type':'file','name':'filename'}) + tag('input','',{'type':'submit'})
    f= tag('form',ff, {'action':acturl, 'method':'POST', 'enctype':'multipart/form-data'})
    H= tag('head', tag('title', "Uploader"))
    B= tag('body', tag('p', f))
    print(content_type + tag('html', H + B))
elif form.has_key("filename"):
    item= form["filename"]
    if item.file:
        data= item.file.read()
        t= os.path.basename(item.filename)
        FILE= open("/home/xed/www/up/"+t,'w')
        FILE.write(data)
        FILE.close()
        msg= "Success! "
    else:
        msg= "Fail."

    H= tag('head', tag('title', "Uploader"))
    B= tag('body', tag('p', msg + tag('a','Another?',{'href':'./up.py'})))
    print(content_type + tag('html', H + B))
Note
The html tagging function defined above is assumed here.
Warning
This program would be best limited to personal use and is not especially secure.

Beyond Python

Tools to help Python be even more awesome than it normally is:

  • Jython - Run Python in the theoretically ubiquitous and annoyingly powerful JVM.

  • PyPy - Or use a different implementation with its own JIT compiler. Might use less memory.

  • Nuitka is a straight up Python compiler.

  • SWIG - Wrap C code into Python modules.

Socket Programming

Creating client/server connections with internet sockets is pretty easy with Python. A good example of a full practical socket server is my ISBD server. Here is a generic TCP server that covers the main functionality one would require from the network in order to implement something like a web server.

Python Socket Server
#!/usr/bin/python
# A Sample TCP server demonstrating simple socket programming.
# This simply echoes what is sent back to the client.

# Usage: Run this program and then connect with
#     echo "The Message" | nc localhost 6660
# What PID is listening?
#     lsof -i :6660

# Official Socket Documentation -
#    * https://docs.python.org/2/library/socket.html
# Notes about backlog parameter of `listen()` function.
#    * http://irrlab.com/2015/03/02/how-tcp-backlog-works-in-linux/
#    * https://veithen.github.io/2014/01/01/how-tcp-backlog-works-in-linux.html

import socket
import sys
from thread import *

HOST= '' # Server interface to bind to. Blank is `INADDR_ANY`.
PORT= 6660
BACKLOG= 200 # Max connections on accept queue. See notes.

# AF_INET is Address Family IPv4
# SOCK_STREAM is TCP protocol (SOCK_DGRAM for UDP)
s= socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print('Socket Creation OK')

# == Connection Handling ==
def servicethread(connection):
    READ_BYTES= 24
    connection.send('This is the server. Send something now.\n')
    while True:
        data= connection.recv(READ_BYTES)
        reply= 'You said: %s' % data
        if not data:
            break
        connection.sendall(reply)
    connection.close()

# == Binding ==
try:
    s.bind((HOST,PORT))
except socket.error as msg:
    print('ERROR: Bind failed! %s (error #%s)' % (msg[1],str(msg[0])))
    sys.exit()
print('Socket Binding OK')

# == Listening ==
s.listen(BACKLOG)
print('Socket Listening OK')

# == Handle Client Transactions ==
while True:
    conn,addr= s.accept()
    print('Connected to %s:%d' % addr)
    start_new_thread(servicethread, (conn,) )
s.close()

Running the program starts the server listening.

$ ./sockettest.py
Socket Creation OK
Socket Binding OK
Socket Listening OK

From another terminal (or another computer if you like) you can check up on it.

$ nmap localhost -p 6660 | sed -n /PORT/,+1p
PORT     STATE SERVICE
6660/tcp open  unknown

Using nmap has consequences for the server. Here are the server’s resulting messages.

Connected to 127.0.0.1:51783
Unhandled exception in thread started by <function servicethread at 0x7f62f336e5f0>
Traceback (most recent call last):
  File "./sockettest.py", line 32, in servicethread
    connection.send('This is the server. Type something now.\n')
socket.error: [Errno 104] Connection reset by peer

You could handle this error (when the client strangely aborts) more smoothly if you like but it does continue to function just fine.

Additional activity from the client, a classic netcat test, looks like this.

$ echo testmessage | nc localhost 6660
This is the server. Type something now.
You said: testmessage

Or back to Python, this is the simplest socket client.

Python Socket Client
import socket
con= ('isbdserver.example.edu',10800)
try:
    s= socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect(con)
    s.send(entire_message)
    s.close()
except socket.error as msg:
    log( 'ERROR: ISBD socket client problem! %s (error #%s)' % (msg[1],str(msg[0])) )

Packaging And Distribution

Python packaging is a nightmare. This is mostly due to so many competing ways to do the job. I personally avoid the topic to the greatest extent possible. I usually rely on my Linux distributions to do the proper thing or I put things in the PYTHONPATH explicitly myself.

This enumeration of project summaries is depressing and helpful. This basic package install guide is wholesome and official if you have to go messing with packages outside your OS distribution. Here’s a popular Stack Overflow discussion about Python packaging. Here are some official best practices for packaging and installation.

Here are some details that might be interesting.

distutils

Original Python packaging and distribution system. In the standard library for me (CentOS7).

Uses this.

python setup.py

Can be tar.gz.

from distutils.core import setup

setuptools

Third party system (not part of Python per se) built using distutils.

from setuptools import setup

Includes easy_install which is widely used.

Includes support for eggs which are a package format for distributing binary packages. This seems a bit mental since Python is good at compiling itself on the fly, but apparently waiting a minute for that to happen for monstrous overblown projects is too much for people. Somebody figured spending hours fussing with a fussy binary package was better.

sudo yum upgrade python-setuptools

distribute

A fork of setuptools and is actually called setuptools which isn’t confusing at all! Replaces an existing genuine setuptools if one is already present. Apparently has better support for v2 to v3 issues. Probably more traction than its ancestor. Also includes an easy_install.

Some people believe that this project has been merged back into the original setuptools. Here is evidence that this was intended. But it may be true. Rumor has it that Distribute was merged back into Setuptools 0.7. It is probably safe to ignore this now. Let’s just say that setuptools and distribute are very close relatives and maybe a case of dissociative identity disorder.

pip

This does not create packages. It is a system for downloading and installing Python distributions. This seems to replace easy_install. It can roll back a failed install attempt if it determines dependencies can’t be met. It can uninstall things. It does not use or install eggs. It doesn’t automatically update things so they won’t randomly break; apparently some of the other systems try to do that. Requires a packager (distutils,setuptools,distribute) because this is just an installer. The packages themselves are not its thing. Here is a justification for pip. Apparently this can try to compile things with a C compiler as part of package installation. This would of course be likely to fail on a bad OS like Windows. This might explain some bias for using easy_install on Windows. However, with wheel, these issues may be historical.

Normally used like this.

pip install <package>

It requires setuptools (which requires distutils).

This quote from the pip documentation about sums it up for me.

Be cautious if you’re using a Python install that’s managed by your operating system or another package manager. get-pip.py does not coordinate with those tools, and may leave your system in an inconsistent state.

And to contradict that.

sudo yum install python-pip

Or install with get-pip.py which can try to not be so dependent with --no-wheel and --no-setuptools. Apparently pip should be included with a clean Python install from python.org (not Linux).

wheel

Installing pip will also install wheel which is a zip based archive (extension .whl) which is like an egg but with subtle differences. Apparently this is somewhat of a modern (2015+) thing.

Of the name, they say "because newegg was taken" and "a container of cheese".

sudo yum install python-wheel python3-wheel

distutils2

This topic is such a mess, why not scrap it all and start over with yet another attempt?

Does not use setup.py scripts. Instead it uses setup.cfg. Also uses the pysetup command which seems to try to replace pip.

If you see import packaging, that is synonymous with distutils2.

The latest release was March 2012 so this project is dead. Anything referring to it is hopelessly out of fashion.

Buildout

Buildout is yet another way to assemble and deploy complex Python applications. It may include non-Python components. Used by Zope, Plone, and Django. Nuff said.

Distlib

This is a new experimental thing (as late as October 2016) which, according to their docs is trying to do this.

Basically, Distlib contains the implementations of the packaging PEPs and other low-level features which relate to packaging, distribution and deployment of Python software. If Distlib can be made genuinely useful, then it is possible for third-party packaging tools to transition to using it. Their developers and users then benefit from standardised implementation of low-level functions, time saved by not having to reinvent wheels, and improved interoperability between tools.

Virtualenv

This is not a packaging or distribution system but it can be very important in solving related problems. Here is a good explanation of what this is from the documentation.

The basic problem being addressed is one of dependencies and versions, and indirectly permissions. Imagine you have an application that needs version 1 of LibFoo, but another application requires version 2. How can you use both these applications? If you install everything into /usr/lib/python2.7/site-packages (or whatever your platform’s standard location is), it’s easy to end up in a situation where you unintentionally upgrade an application that shouldn’t be upgraded.

Or more generally, what if you want to install an application and leave it be? If an application works, any change in its libraries or the versions of those libraries can break the application.

Also, what if you can’t install packages into the global site-packages directory? For instance, on a shared host.

In all these cases, virtualenv can help you. It creates an environment that has its own installation directories, that doesn’t share libraries with other virtualenv environments (and optionally doesn’t access the globally installed libraries either).

CentOS Packages
python-virtualenv.noarch

Tool to create isolated Python environments

python-virtualenv-clone.noarch

Script to clone virtualenvs

python-virtualenvwrapper.noarch

Enhancements to virtualenv

python-tox.noarch

Virtualenv-based automation of test activities

Anaconda And Conda And Miniconda

Apparently "Anaconda" (not to be confused with the Red Hat installer - good name guys) is a special distribution of Python that includes stuff like R and other fancy pants components (but not too fancy to program it themselves in C). Wikipedia calls it "freemium". It has its own package management system (facepalm) called Conda or maybe miniconda, hard to say. This Conda is not restricted to Python and apparently can be used with other programming languages.

Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. … But if you need a package that requires a different version of Python, there is no need to switch to a different environment manager, because conda is both a package manager and an environment manager.

Installation details. Installer is 34MB but it did seem to come with Python 3.6 and install as a non-root user. I created a separate Linux account to keep it from doing anything unpleasant but it seems well behaved so far. The executables live here.

export PATH=~/miniconda3/bin:$PATH

Use something like this.

~/miniconda3/bin/conda update conda

They do have instructions for a polite and sensible uninstall.

rm -rv ~/miniconda3 ~/.condarc ~/.conda ~/.continuum

Non-Root Custom Python Environments

Let’s say you need to run some very fancy cool hipster dude Python program which was, for example, written in UTF-8 emojis in Python 3. Unfortunately the account you were given by a mean sys admin has CentOS 6.8 which, while secure and up to date for the series, is so yesterday. It is possible to set up your own fancy Python environments which can include the Python version you require and your own copies of all modules and dependencies.

Here is an example of a procedure to achieve that.

D=/tmp/${USER}
mkdir D; cd ${D}
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ${D}/Miniconda3-latest-Linux-x86_64.sh -b -p $D/miniconda3
export PATH=${D}/miniconda3/bin/:${PATH}
conda update conda  # Answer prompt "y".
conda create -n mycoolproject python=3.6 anaconda
source ${D}/miniconda3/bin/activate mycoolproject
conda search h5py
conda install -n mycoolproject h5py
conda install -n mycoolproject matplotlib seaborn pandas HDF5

Or something like this.

miniconda3/bin/conda create -n my_proj python=2.7 pandas seaborn HDF5 matplotlib h5py

And to pretend like none of this ever happened.

source ${D}/miniconda3/bin/deactivate
${D}/miniconda3/bin/conda remove -n mycoolproject --all

A lot of help with this can be found here.

Python 3.x

Python 3 is a bit of a different animal from Python 2. The most useful treatment of the differences between 2 and 3 are by eev.ee here and here.

Here are some features I commonly deal with.

  • print() now requires full function style. Seems legit.

  • print x, to suppress automatic new lines doesn’t work. Use print(x, end=""). instead.

  • range() produces a iterable range object. xrange() is therefore superfluous and gone.

  • map(ord,"xed")+[0] This will now produce an error. Before it was fine. Because map produces a special object type you need this: list(map(ord,"xed")+[0]).

One interesting tidbit I encountered was that source code in Python3 can use extended characters as variable names. Normally (maybe universally!) this is asking for trouble, but you can imagine a function where an angle is called alpha but using a real alpha (α). To get this to work, I had to add a special second line like this.

Special Characters In Source Code
#!/usr/bin/python3
# vim: set fileencoding=utf-8 :
def greeks(α=0.8, β=1., λ=0.):
    return (α,β,λ)

The full description of this is in PEP0263.

Troubleshooting

Python is pretty solid and most sensible systems take great care with it since it’s often essential to a functional OS (e.g. emerge, yum). But sometimes things happen. Here’s a very nasty situation I had with CentOS 7.

:->[centos7-][~]$ python --version
Python 2.7.12
:->[centos7-][~]$ python -c 'print("ok")'
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
ImportError: No module named site
:-<[centos7-][~]$ export PYTHONPATH=/usr/lib64/python2.7/
:->[centos7-][~]$ export PYTHONHOME=/usr/lib64/python2.7/
:->[centos7-][~]$ python -c 'print("ok")'
ok

Module Search Path

How the module search path is constructed is almost described here. Unfortunately the final item there is "the installation-dependent default". You can mentally substitute the word "voodoo" for "default" for all the good that description does.

To really find out what’s going on you must go to the source code. Here is the authoritative description of how this works for Linux (Windows and Mac are different).

Linking

This was a very tricky problem to diagnose. Here are two systems which appear to have the exact same Python, but when run, they clearly are not the same.

Clean working
:->[goodhost][~]$ md5sum /usr/bin/python
49623a632cb4bf3c501f603af80103c4  /usr/bin/python
:->[goodhost.example.edu][~]$ /usr/bin/python --version
Python 2.7.5
Messed up
:->[messedup][/etc/ld.so.conf.d]# md5sum /usr/bin/python2.7
49623a632cb4bf3c501f603af80103c4  /usr/bin/python2.7
:->[messedup-new][/etc/ld.so.conf.d]# /usr/bin/python2.7 --version
Python 2.7.12

How can this be? After checking all possibilities, path and symbolic link issues were not relevant. This problem is why simply reinstalling Python may not fix its incorrect behavior. The answer, it turns out, is that the shared library linking was messed up on the non-working machine.

Clean ldd
:->[goodhost][~]$ ldd /bin/python2.7
        linux-vdso.so.1 =>  (0x00007fff494c5000)
        libpython2.7.so.1.0 => /lib64/libpython2.7.so.1.0 (0x00007f03e28f1000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f03e26d5000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f03e24d0000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007f03e22cd000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f03e1fcb000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f03e1c08000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f03e2cdf000)
Messed up ldd
:->[messedup][/etc/ld.so.conf.d]# ldd /usr/bin/python2.7
        linux-vdso.so.1 =>  (0x00007ffca97df000)
        libpython2.7.so.1.0 => /public/apps/coot-0.8.5/lib/libpython2.7.so.1.0 (0x00007f1554076000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1553e59000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f1553c55000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007f1553a52000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f1553750000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f155338d000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f155449e000)

I had originally assumed that if a core component like Python was reinstalled from clean packages, libraries and all, that it would have to behave like a clean installation. But this is not true. If the ldcache is set to link Python to some spurious installation then it might not work. Or worse, barely work giving a maddening situation to troubleshoot. This problem arose when a program (coot) tried to allow for its own separate version of Python to be linked to. The moral of the story is to use ldd to check the Python executable before trying to diagnose things like PYTHONPATH and PYTHONHOME which, if the linking is bad, may not be able to help no matter what they’re set to.