This is a collection of notes that I use for reference. It is pretty complete and generally has most of the stuff I need to use, but it is deliberately not absolutely complete. There is too much obscure weird stuff in Python to include it all. This is my attempt at a good compromise for a solid collection of reference material. The emphasis is on practical usages and I try to include examples where I can to get projects up and running.

For people who aren’t sure if Python is really good or the best thing ever, this fine article makes it clear. The same author has a thorough article on porting from 2 to 3. These notes started eons ago with Python 2, but they’re mostly sensible with respect to Python 3 these days.

For people who aren’t sure if Python 3 is right for them, this absurdly good article explains all the differences.

Contents

Things I Commonly Forget

is None

I used to do things like if cool: but that seems to have become uncool in Python3. The correct way apparently is to use is.

if cool is not None:

main

Python has an odd but sensible idiom where a program is checked to see if it was run as a real program. The idea is that if it was not, then perhaps nothing should really happen. This is useful for creating modules and other subcomponents of larger projects. This way you can define a library of functions that perform various functions and import them into any code you need them in and they will not run unless explicitly called. However if you run the module as a stand alone program, then you can have a function that tests the functions of interest. This helps in development.

The specific technique is to always put your default code which should be run as a standalone program at the end after a construction like this:

if __name__ == '__main__':
    do_default_stuff()

split and join

I can never remember the order of the object/function argument.

some_nice_separator_string .join( some_sequence )

>>> ' and '.join(['romeo','juliet'])
'romeo and juliet'

some_joined_string .split( divide_at_string )

>>> 'brad and jennifer'.split(' and ')
['brad', 'jennifer']

I am getting better about this and what finally helped it sink in is that both split and join are string functions. Even though join really smells like a function concerned with sequences, it is a string function.

Another handy note is that split() with no arguments will split on whitespace nicely without giving you silly things like multiple consecutive space list entries (e.g. for something like a many spaces b).

any

The any() function seems totally superfluous to me, but there it is. Feed it an iterable and it will return true if any of the items are true.

Help

Python has a lot of clever help facilities that make in line documentation relatively easy. Here’s an example:

def cube(x):
    """This is the help for the cube function."""
    return x*x*x
print(cube(4))     # Outputs: 64
print(cube.__doc__) # Outputs: This is the help for the cube function.

These documentation strings can be multi-line when using the triple quoting.

Functions And Arguments

There are lots of subtle details (officially described here) involved in fancy function definitions. Here is how to define functions to accept arbitrary numbers of arguments.

def f1(x,*l,**d): print(x,','.join(l),','.join(d.keys()) )
f1('fixed','l1','l2','l3',d1=1,d2=2,d3=3) # Returns: "fixed l1,l2,l3 d1,d2,d3"

It looks like Python 3.9 has a "/" parameter which is used to separate positional arguments from keyword arguments.

def slasher(positional_only, /, standard, *, keyword_only):

Here I take the "standard" to mean positional or keyword i.e. you can say slasher(3,standard="ok") or slasher(3,"ok",other="stuff",keyword_only=True). But this syntax appears to be non-functional in Python 3.7.3 — best to avoid it.

Sometimes you want to send a bunch of arguments to a function encapsulated in a list. This is sometimes called starred expression. This uses the myfun(*theargs) syntax. To use a dictionary, myfun(**theargdict). See details in this section about unpacking argument lists in the official documentation. Here’s an example.

def f2(x,y): print(x,y)     # Normal function.
f2(1,2)                     # Returns "1 2" as expected.
l=(3,4)                    # Define a list.
f2(*l)                      # "3 4"
f2(*(5,6))                  # "5 6"
d={'x':7,'y':8}            # Define a dict. Keys must match.
f2(**d)                     # "7 8"

Is this a good idea? Not sure. Doesn’t smell great though.

Also note the following syntax which uses function annotations. These are completely optional, do nothing, and shown just to help identify this weird stuff when found in the wild.

def fa(x:int,y:float,alpha:str="messy")->str: return(f'{x} {y} {alpha}')
fa(1.111,True)
'1.111 True messy'

Note that the call did not supply an int and a float as implied by the definition’s annotations. None of this stuff is binding, even the return type could have been specified falsely. To me this is a great way to obfuscate code by lying to your colleagues with Python going along with it.

Lambda

The lambda calculus of computer science is pretty wacky. It can be handy in the real world, however, and Python delivers. If you’re a beginner, skip this. Here’s how it’s done:

variable_now_a_function= lambda x,y: x + y
print(variable_now_a_function(3,2))  # Would return 5.

There’s another syntax which I like better because it looks more like def and works just the same:

f= lambda((x,y)):x+y
print(f((5,3))) # Would return 8

This doesn’t look particularly simplified from the first syntax, but it shows what’s going on better and the simplification is more apparent with forms like:

plus1= lambda(x):x+1
print(plus1(9)) # Would return 10

Lambda is often used to define variables that can be used as functions, the precise functionality of which is, well, variable. This can be useful to pass some contingent behavior along to a function or to set up operational templates that process any number of functionalities.

Slices

For sequence objects like strings and lists (and many others), Python has an absurdly elegant and powerful way to specify an exact subsequence. The general format for slices is this.

[start:end:stride]

The best tip about slices I’ve seen is to consider these values as numbering the "fence posts".

 0   1   2   3   4   5   6
 | x | e | d | . | c | h |
-6  -5  -4  -3  -2  -1   0

So to get just "xed" you do this.

>>> x='xed.ch'
>>> x[0:3]
'xed'

Note that x[3:4] is the same as x[3]. If you want to specify the end as the last position, you can just leave it empty. Same with the beginning; you don’t need to use 0.

>>> x[3:]
'.ch'
>>> x[:3]
'xed'

A negative first value starts positioning from the end. A negative second value excludes that range from the end (vs. including that range from start position with positive).

>>> x[-2:]
'ch'
>>> x[:-2]
'xed.'

A negative stride reverses the list.

>>> x[::-1]
'hc.dex'

Of course things can get weird.

>>> x[4:1:-1]
'c.d'

Slice Objects

It is possible to name slice objects, perhaps to improve clarity.

>>> zerototen= list(range(11))
>>> evens= slice(None,None,2)
>>> odds= slice(1,None,2)
>>> (zerototen[evens],zerototen[odds])
([0, 2, 4, 6, 8, 10], [1, 3, 5, 7, 9])

Or obfuscation!

>>> O= slice(1,2)
>>> zerototen[O]
[1]

Strings

  • "Double" ' or single quotes are ok.'

  • "Adjacent " "strings " "are " "concatentated" "."

  • Raw string: r’All \\\\ are retained except at the end.'

  • R’Same as with "r"?'

  • u’A unicode string is like this'

  • V=PEP498; print(f'Explanation:{V}') shows Template Formatting string. See below.

  • \x40 is an "@" and \x41 is "A".

  • \u1234 is a unicode 16-bit value (4 hex digits).

  • \U12345678 is a unicode 32-bit value (8 hex digits).

Template Formatting (Classic)

General String Format:

%[(name)][flags][width][.precision]code
Table 1. String Formatting Codes
Code Use

s

String (with str)

r

String (with repr)

c

Character

d

Decimal Integer

i

Integer

u

Unsigned Integer

o

Octal Number

x

Hex Number

X

Hex with uppercase X

e

Floating Point Exponent

E

e with uppercase

f

Floating Point Decimal

F

f with uppercase

g

Floating Point E or F

G

Floating Point E or F

%

Literal %

Example:

"%(n).5G is %(t)s." % {"n":6.0221415e+23, "t":"a very big number"}
'6.0221E+23 is a very big number.'

Note that to get plus signs always, use + at the beginning. To get leading zeros, just put leading zeros. Note however, that if you want a number like this "+001.300" you need something like %+08.3. The 8 is needed because that is the full length you want reserved. The 3 is how to deal with the fraction and everything else is just decoration.

Template Formatting (Modern)

Of course Python3 had to go and reinvent how this is done. It is more complex in some ways, but simpler in other ways. For example, it is no longer strictly necessary to specify in the template what kinds of types will be showing up; it can just deduce them.

>>> 'int: {} float: {} str: {}'.format(99,3.141,'fun')
'int: 99 float: 3.141 str: fun'

You can scramble the order.

>>> 'int: {2} float: {0} str: {1}'.format(3.141,'fun',99)
'int: 99 float: 3.141 str: fun'

Need full times with leading zeros and milliseconds?

>>> '{:02d}:{:02d}:{:06.3f} {}'.format(5,0,7,'ok')
'05:00:07.000 ok'

Note that the ints really need to be ints!

So that’s all fine and annoyingly different but not obviously way better. But hang on, it’s all about to sink in why this new system is much better. The important trick is that you can use the format() function but without writing it out explicitly. Basically you can specify a formatted "f-string" where the formatting operations are implied. The braces are not just holding places but rather containing the data itself. Take a look.

>>> f'{5:02d}:{0:02d}:{7:06.3f} {"ok"}'
'05:00:07.000 ok'

This is most useful when used to print variables.

>>> h,m,s,note= 3,9,39,"marathon PR"
>>> f'{h:02d}:{m:02d}:{s:06.3f} {note}'
'03:09:39.000 marathon PR'

This nice webpage has a very nice catalog of most of the cool things this format system can do helpfully contrasted with the old way of doing things.

Sequence Converters

s.join(sequence)

Join sequence with the s as separator.

s.split(separator[,maxcount])

Separate string s at separator. Don’t forget you can limit the number of splits with maxcount.

s.rsplit(separator[,maxcount])

Like split but starting from the right. Probably not too useful without maxcount. Example: "First M. Last".rsplit(None,1) is ["First M.", "Last"].

s.splitlines([keepNL])

Breaks a string by line. Keep the new lines if keepNL is present and True.

Padding and Stripping

s.expandtabs([tabsize])

Converts tabs to space. If tabsize is missing, default is 8.

s.strip([char2strip])

Strips leading and trailing whitespace (space, tab, newlines). If char2strip is present, then strip that character instead. "$99.99".strip("$") is 99.99.

s.lstrip([char2strip])

See strip. However, note that the char2strip string is not searched for itself, but rather each character that comprises it is stripped if it is found (in any order). If you want to strip off a prefix or a suffix, probably best to use replace or a slice.

s.partition(separator)

Example: "rock and roll".partition(" and ") produces ("rock", " and ", "roll")

s.rpartition(separator)

Like partition but from the right side.

s.rstrip([char2strip])

See strip. Also see the note at lstrip about the char2strip.

s.zfill

Example: print("James Bond is %s." % "7".zfill(3))

s.ljust(w[,char])

Makes a string of width w padding the right with char (whose length must be 1) or spaces. If s is longer than w then s is returned unmodified.

s.rjust(w[,char])

See ljust.

s.center(w[,char])

See ljust.

Search and Replace

s.find(stringtofind[,start[,end]])

Returns -1 when stringtofind is not in s (see index). If found, returns first position where found. The start and end parameters are like doing find on s[start:end].

s.rfind(stringtofind[,start[,end]])

Similar to find but searching for stringtofind from the right to left.

s.index(stringtofind[,start[,end]])

Basically like find but raising a ValueError if the substring is not found.

s.rindex(stringtofind[,start[,end]])

See index.

s.count(stringtocount[,start[,end]])

Counts non-overlapping occurrences of stringtocount. The other parameters behave like they do in find.

s.replace(old,new[,maxsubs])

Returns new string with old replaced by new. Use maxsubs to limit the number of substitutions (global, all, by default). Does not modify string in place!

s.startswith(stringtofind[,start][,end])

True if the stringtofind is the beginning of s (or some other point if start is given).

s.endswith(stringtofind[,start][,end])

Like startswith.

Unicode and Translating

s.decode
s.encode([encoding,[errors]])

Encoding can be ascii, utf-32, utf-8, iso-8859-1, latin-1. Errors can be strict, ignore, replace, xmlcharrefreplace.

s.format

Something to do with unicode.

s.translate(table[,delchars])

Replace characters in s with corresponding characters in table which must be string of 256 characters. The delchars string contains values which are just dropped. Note from string import maketrans is handy for making table.

s.title()

Capitalize the first word of everything. Note that apostrophes will do things like It'S So Hard For Me To Believe By Otis Rush.

s.swapcase()

Switch upper case to lower and vice versa.

s.capitalize()

Only the first word of the string is capitalized, not the whole thing (see upper).

s.lower()

Make string all lower case. Useful for normalizing silly user input.

s.upper()

Like lower.

ord(c)

Inverse of chr(n) and unichr(n) where c is a single character.

Boolean Checks

s.isalnum()

Alphanumeric.

s.isalpha()

Is a letter.

s.isdigit()

Is a digit.

s.islower()

Is lowercase.

s.isspace()

Is a string with a length of at least 1 with all whitespace.

s.istitle()

TitleCaseWantsUpperToOnlyFollowLowerAndViceVersa.

s.isupper()

Is uppercase.

Here are some more obscure attributes:

s.__add__, s.__class__, s.__contains__, s.__delattr__, s.__doc__,
s.__eq__, s.__format__, s.__ge__, s.__getattribute__,
s.__getitem__, s.__getnewargs__, s.__getslice__, s.__gt__,
s.__hash__, s.__init__, s.__le__, s.__len__, s.__lt__, s.__mod__,
s.__mul__, s.__ne__, s.__new__, s.__reduce__, s.__reduce_ex__,
s.__repr__, s.__rmod__, s.__rmul__, s.__setattr__, s.__sizeof__,
s.__str__, s.__subclasshook__, s._formatter_field_name_split,
s._formatter_parser

For example:

$ python -c "print('a string'.__doc__)"
str(object) -> string

Return a nice string representation of the object.
If the argument is a string, the return value is the same object.

Regular Expressions

I use regular expressions a lot and I really quite like them. For shell scripting they are essential. When I used to be a strong Perl programmer, I used Perl’s excellent regular expression libraries all the time. But as I switched to Python, I found that I really just hardly ever need to use them. For example this normal shell code…

cal | grep September

…can be done in Python like this…

[X for X in os.popen('cal') if 'September' in X]

…which may not look great, but it is the Python way and if you’re cool with that, it can actually be an improvement. Note that the modern Python way now uses the subprocess module. Details.

For simple matching, I find I use the Python in and is (and not) operators a lot. Instead of regular expressions one can use Python functions like split, join, replace, find, startswith, endswith, swapcase, uppper, lower, isalnum, isspace, etc. Also "slices" and string template substitution really make regular expressions seem kind of backward and inelegant in the Python idiom.

But as the saying goes, sometimes you have a problem that really needs regular expressions; now you have two problems.

Python handles regular expressions in a rather object oriented way. No simple Perl or sed implied syntax. Here’s a small example that shows how you could go through a bunch of eclectic data looking for social security numbers.

#!/usr/bin/python
# Don't name this test program re.py! Because of...
import re

D= ["William Poned","SS:456-90-9876","3425 Ponzi Dr."]

pattern_object= re.compile('(\d\d\d)-(\d\d)-(\d\d\d\d)')

for d in D:
    #Note that "search" is satisfied to find the pattern within the string.
    match_object= pattern_object.search(d)
    if match_object:
        print(match_object.re)
        print(match_object.groups())
        print(match_object.span())
        print("Using `search` function of the match object:")
        print(match_object.group())

#The "match" function demands that it match the entire string.
pattern_object= re.compile('.*(\d\d\d)-(\d\d)-(\d\d\d\d).*')
match_object= pattern_object.match(D[1])
print("Using `match` function of the match object:")
print(match_object.group())

Here’s what this program outputs.

<_sre.SRE_Pattern object at 0xf4c1e0>
('456', '90', '9876')
(3, 14)
Using `search` function of the match object:
456-90-9876
Using `match` function of the match object:
SS:456-90-9876

Note the difference between the search and the match methods of the pattern object. The latter needs to match the entire string with the pattern while the former simply needs to find the pattern in the string somewhere.

Here’s another example. This one used the "raw" string type.

import re
target,regexp= '==== Sub-Heading', r'^==* .*$' # Note raw string prefix.
match= re.search(regexp,target)                # Creates a <class 're.Match'> object.
if match:                       # Which can be queried directly for matching success.
  print ('Heading detected:', match.group())          # What exact 'str' was matched.

Substitution

Here is a comparative example of a simple substitution using core Python functions and regular expressions.

Here’s a string containing the characters "JUNK" followed by 4 unknown characters all of which must be removed.

In [1]: x="This is a long string JUNK1234with some unwanted stuff in it."

There are two major 2 ways.

#1: Use find() or index() to figure out where in the string the thing is and then use slices:

In [2]: n= x.index('JUNK')
In [3]: print(x[0:n]+x[n+8:])
This is a long string with some unwanted stuff in it.

#2: Use regular expressions. Just match with "JUNK….".

In [4]: import re
In [5]: print(re.sub('JUNK....','',x))
This is a long string with some unwanted stuff in it.

Despite being a regular expression pro, in Python, I tend to minimize it and, unlike other environments, that’s easy to do (as shown here).

For more information, check the official gory details.

Lists and Sequence Types

Functions available to lists:

l.append(object)

Simply append object in place to the end of the list.

l.count(value)

Count the occurrence of value in the list.

l.extend(iterable)

Append a list with (all?) the items supplied by iterable.

l.index(value[,start[,stop]])

Return index of first occurrence of value. The other parameters act as a slice.

l.insert(index,object)

Insert object in place immediately before index.

l.pop([index])

Remove (in place) and return item at index (or last item). Raise IndexError if index is out of range or the list is empty.

l.remove(value)

Removes in place first occurrence of value or raise ValueError if not found. Note that [v for v in l if v != value] can get rid of all value occurrences from a list (makes a new list this way).

l.reverse()

Reverses the list in place.

l.sort([cmp=None][,key=None][,reverse=False])

Sorts a list in place. The cmp(x,y) function returns -1, 0, or 1 for less than, equal, or greater than respectively. See Complex Object Sorting for how to use key. Also note that there is a sorted(mylist) function that will return a new sorted list if you want to preserve the original list.

Other attributes of lists:

l.__add__, l.__class__, l.__contains__, l.__delattr__,
l.__delitem__, l.__delslice__, l.__doc__, l.__eq__, l.__format__,
l.__ge__, l.__getattribute__ l.__getitem__, l.__getslice__,
l.__gt__, l.__hash__, l.__iadd__, l.__imul__, l.__init__,
l.__iter__, l.__le__, l.__len__, l.__lt__, l.__mul__, l.__ne__,
l.__new__, l.__reduce__, l.__reduce_ex__, l.__repr__,
l.__reversed__, l.__rmul__, l.__setattr__, l.__setitem__,
l.__setslice__, l.__sizeof__, l.__str__, l.__subclasshook__

List Comprehension

List comprehensions are a nice way to apply some action to a list in such a way that a new list is generated. The syntax is a bit odd at first, but it’s actually pretty reasonable and compact. Note that the functionality is comparable to the map function.

>>> [pow(2,y) for y in range(8)]
[1, 2, 4, 8, 16, 32, 64, 128]
>>> map(lambda y:pow(2,y),range(8))
[1, 2, 4, 8, 16, 32, 64, 128]

Conditional filtering works too, including a form with else.

[myfn(x) for x in mylist if mycondition]
[myfn(x) if mycondition else myelsefn(x) for x in mylist]

Note these are subtly different. The first is a filter. It throws out any that do not match the if condition. This example divides by two but never wants to see a value split in half and rounds up in the case of odd numbers.

>>> [(x+1)/2 if x%2 else x/2 for x in range(20)]
[0.0, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 4.0, 4.0, 5.0, 5.0, 6.0, 6.0, 7.0,
7.0, 8.0, 8.0, 9.0, 9.0, 10.0]

There the if x%2 else x/2 acts as if it wraps the myfn() clause. Note that in this construction you must have an else; if you don’t really want a fancy myelsefn(), just use else x.

This similar example is simply filtering out the source list based on how each list item interacts with the mycondition.

>>> [(x+1)/2 for x in range(20) if x%2]
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]

My general rule for the whole list comprehension syntax is that I use it only where it reads to humans quite naturally. This occurs surprisingly often but generally precludes overly complex logic.

That said, sometimes you want to be aware of performance benefits, which can be substantial. For example, this program took only 0.100s.

Y= [x for x in range(1000000)]

Compare with this equivalent code which took 0.171 seconds.

Y= list()
for x in range(1000000):
    Y.append(x)

Note that Y+=[x] takes even longer (0.200s) than the append(x).

Filter

The list comprehension can be conditional which basically operates like a filter function.

>>> [x for x in range(1e3) if not x%333]
[0, 333, 666, 999]
>>> filter(lambda x:not x%333,range(1e3))
[0, 333, 666, 999]

To be clear, the official Python description of filter says that these are equivalent.

  • filter(fn, iterable)

  • [x for x in iterable if fn(x)]

And so are these.

  • filter(None, iterable)

  • [x for x in iterable if x]

Note that in Python 3+ the filter command produces an iterator instead of a list. This may be helpful for large data sets that need to be processed. Otherwise, you should probably use list comprehensions.

For example, this kind of thing is not cool in Python 3.

if not filter(lambda x:x.whatami!="VAL",thevaluelist):

But this is and it’s pretty obviously an improvement.

if all([x.whatami=="VAL" for x in thevaluelist]):

reduce

While we’re covering wacky functions that like the lambda construction, here’s a use of the reduce function. I really don’t think this function is very useful, but this was my best attempt to do something more exotic with it than the normal adding a bunch of things.

>>> reduce(lambda hold,next:hold+chr(((ord(next.upper())-65)+13)%26+65),'jjjkrqpu','')
'WWWXEDCH'

I think this is as good as it gets judging by this.

If you never understood reduce’s utility you’re in luck! Python 3 removed it. You can still use the reduce function found in functools. But really, probably best to consider it dead to Python.

Generators

Generators are very much like list comprehensions except they don’t synthesize the entire list into memory at their location. Instead they produce a generator object which can be iterated, generally with a next() function. Each time it is iterated, the next item in the sequence is generated at that time until the specified objects are exhausted.

>>> a=9
>>> g=(x+a for x in range(10))
>>> g
<generator object <genexpr> at 0x7f41c0b63af0>
>>> next(g)
9
>>> for x in g: print(x,end=' ')
...
10 11 12 13 14 15 16 17 18

The generator syntax is a shorthand for a more verbose style involving the yield keyword. The yield keyword returns an argument just like return (unlike return the argument is mandatory). Then the function’s state is preserved and the next call to it resumes where it left off. This can be reset by a return statement or just a natural end to the function.

The following example illustrates the usage with a function that provides unique incrementing ID numbers.

#!/usr/bin/python
def numberer(id=0):
    while True:
        id += 1
        yield id

if __name__ == '__main__':
    ID_set_1= numberer()
    ID_set_2= numberer(10)
    for n in range(3):
        print(ID_set_1.next(), ID_set_2.next())

This produces:

1 11
2 12
3 13

Dictionaries

In python dictionaries are lists of items which store a value which is indexed by a key (as opposed to a list which indexes by an index position, a number). The order of items in a dictionary is usually unreliable since order is not needed for its management. Wait — in Python 3 it looks like the order of dictionaries is maintained.

Generally dictionaries can be created like this:

>>> d=dict({'akey':'avalue','bkey':'bvalue'})
>>> d
{'akey': 'avalue', 'bkey': 'bvalue'}
>>> d['bkey']
'bvalue'

There are actually many ways to create dictionaries but why be complicated?

Here are methods that can be applied to dictionaries:

d.clear()

Remove all items in the dictionary.

d.copy()

Returns a shallow copy of d.

dict.fromkeys(sequence[,value])

Creates a new dictionary with items that have keys found in sequence. The value, if present, is applied to all new items. I don’t think this function sensibly acts on an existing dictionary but it is a dictionary method. For this reason it seems cool to just apply it to dict. This dict.fromkeys("xyz",0) produces {"y": 0, "x": 0, "z": 0}.

d.get(key[,elsevalue])

Same as d[key] except that if elsevalue is present and the key is not, then elsevalue is returned. Since elsevalue defaults to None then no KeyError is raised with this function.

d.has_key(key)

Deprecated. If the dictionary has an item with a key of key then returns True. Otherwise False. Same as k in d which should always be used in Python3.

d.items()

Returns list of key,value tuples. Order is unreliable.

d.iteritems()

Produces an iteration object that can take .next() methods producing key,value tuples of all the items until a StopIteration exception. Deprecated in Python3.

d.iterkeys()

See iteritems() but with just the keys.

d.itervalues()

See iteritems() but with just the values.

d.keys()

Returns a list of keys (Python 2!). Order is unreliable. Now returns a dict_keys iterable object in Python 3.

d.pop(key[,elsevalue])

Like get but removes item in addition to returning its value. Unlike get if no elsevalue is provided and key isn’t in d then a KeyError is raised. This is a way to try to remove an item whether it exists or not; just make sure to specify an elsevalue.

d.popitem()

Not like pop! It is more like iteritems. Returns some item’s key,value tuple or, if no items are present, raises a KeyError.

d.setdefault(key[,elsevalue])

Almost exactly like get but in addition to returning elsevalue, it sets the specified key to it leaving that item subsequently defined. If no elsevalue is specified and key isn’t in d then an item key,None item is created.

d.update(d2)

Ok, this one’s a serious messy pile of function. Merges the items in d2 into d. It can also take key,value pairs like d.update({"m":13,"n":14}). If you wanted to do dictA + dictB this is probably what you want.

d.values()

Returns values in a list. Order is unreliable.

Other dictionary attributes:

d.__class__, d.__cmp__, d.__contains__, d.__delattr__,
d.__delitem__, d.__doc__, d.__eq__, d.__format__, d.__ge__,
d.__getattribute__, d.__getitem__, d.__gt__, d.__hash__,
d.__init__, d.__iter__, d.__le__, d.__len__, d.__lt__, d.__ne__,
d.__new__, d.__reduce__, d.__reduce_ex__, d.__repr__,
d.__setattr__, d.__setitem__, d.__sizeof__, d.__str__,
d.__subclasshook__

Tuples

A tuple is a type that gets its name (I think) from the idea of "multiple" or "quintuple". Its two most important aspects are that it is immutable and that it is a collection of references to other objects. This makes tuples ideal for passing around between functions because you know that the order of the arguments will not change and also because you don’t have to copy ("by value") all the argument data into another memory location to make it available to the function.

Tuples can be "unpacked" in the following way:

>>> origin= (0,0)
>>> x,y= origin
>>> print("X:%d Y:%d" % (x,y))
X:0 Y:0

Tuples do not have many idiosyncratic methods that can be called on them. Here they are:

t.count(value)

Returns the number of time value is found in t.

t.index(value[,start[,stop]])

Returns the position of the first occurrence of value. If the other parameters are supplied, it searches on a slice.

The Python built-in function zip is notable for returning a list of tuples composed of other lists.

>>> zip([1,2,3],['a','b','c'])
[(1, 'a'), (2, 'b'), (3, 'c')]

Output is only as long as the shortest list.

One very useful thing that can be done with this is listwise operations. Here, for example, I’m calculating a perceptron value by taking the sum of each input value times each corresponding weight, and then adding the bias. Here inputs is a list of input values and weights are the corresponding weights for each input position. Bias is just a constant.

value= sum([i*w for i,w in zip(inputs,weights)],bias)

The map command can serve for zip if the lists are the same length.

>>> map(None,[1,2,3],['a','b','c'])
[(1, 'a'), (2, 'b'), (3, 'c')]

Other attributes of tuple types:

t.__add__, t.__class__, t.__contains__, t.__delattr__, t.__doc__,
t.__eq__, t.__format__, t.__ge__, t.__getattribute__, t.__getitem__,
t.__getnewargs__, t.__getslice__, t.__gt__, t.__hash__, t.__init__,
t.__iter__, t.__le__, t.__len__, t.__lt__, t.__mul__, t.__ne__,
t.__new__, t.__reduce__, t.__reduce_ex__, t.__repr__, t.__rmul__,
t.__setattr__, t.__sizeof__, t.__str__, t.__subclasshook__

Sets

Are sets real Python objects? I think they must be:

>>> s= set([1,2,3,4])
>>> type(s)
<type 'set'>

They are certainly one of the more obscure and unused primary types in Python. I suspect that there may be some fantastic performance improvement in certain contexts, but I don’t know what those are.

The main points about sets are that they are unordered and they contain no duplicate elements.

Here’s a good overview of how sets are used:

>>> set1
set([0, 1, 2, 3, 4, 5, 6])
>>> set2
set([3, 4, 5, 6, 7, 8, 9])
>>> set1-set2
set([0, 1, 2])
>>> set2-set1
set([8, 9, 7])
>>> set1 & set2
set([3, 4, 5, 6])
>>> set1 | set2
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> 4 in set2
True

Note that sets do not have a + operator. If you need something like that look more into the | operator which is a set union. The & is a set intersection and there are explicit named functions for this too.

Here are some other examples:

>>> seta= set(['ann','bob','carl','doug','ed','frank'])
>>> setb= set(['ann','carl','doug','frank','gary','harry'])
>>> seta - setb
set(['ed', 'bob'])
>>> seta.difference(setb)
set(['ed', 'bob'])
>>> setb.difference(seta)
set(['gary', 'harry'])
>>> seta | setb
set(['ed', 'frank', 'ann', 'harry', 'gary', 'carl', 'doug', 'bob'])
>>> seta.union(setb)
set(['ed', 'frank', 'ann', 'harry', 'gary', 'carl', 'doug', 'bob'])
>>> seta & setb
set(['frank', 'ann', 'carl', 'doug'])
>>> seta.intersection(setb)
set(['frank', 'ann', 'carl', 'doug'])
>>> seta.symmetric_difference(setb)
set(['gary', 'harry', 'ed', 'bob'])
>>> setb.symmetric_difference(seta)
set(['gary', 'ed', 'harry', 'bob'])

Sets can be used to remove duplicates from a list. Here’s what that would look like:

thelist= list(set(thelist))

Note that this does not preserve order which may have been important to you.

Set Element Removal Functions

s.difference(badset)

Same as s - badset. This example, set("abcd").difference(set("cd")), produces new set(["a","b"]). Note that the other way around is different. In the example, it produces an empty set since a and b are ignored as not present and c and d are removed.

s.difference_update(badset)

Same as difference but just take out the badset from s.

s.symmetric_difference()

Similar to difference but the order is not important. Any objects that are in both sets (that intersect) are removed. This returns a new set.

s.symmetric_difference_update()

Same as symmetric_difference but operates in place.

s.intersection(s2)

Same as the s & s2 as applied to sets. Returns the elements in common between s and s2.

s.intersection_update(s2)

Instead of returning the element in common, it simply changes s.

s.pop()

Takes no arguments and returns an unreliable element which is then removed from the set. If the set is empty, a KeyError is raised.

s.discard(object)

Remove object from s if it is a member. Very similar to difference except only for one object and modifies in place instead of returning a new one.

s.remove(object)

Remove an object from a set in place or KeyError if it’s not there. Except for the exception, pretty much like discard.

s.clear()

Make the set completely empty.

Set Augmentation Functions

s.add(object)

Adds a single object to s (silently ignores it if already present).

s.union(s2)

Same as s | s2. Returns a new set containing all the elements that were in s and all the new elements in s2.

s.update(s2)

Incorporates elements of s2 into s. If all of s2 is already present, then nothing happens. It’s pretty much a union in place.

s.copy()

Shallow copy of the set.

Set Test Functions

s.isdisjoint(s2)

Returns True if s and s2 have no elements in common. Basically the same as not s & s2.

s.issubset(s2)

Note the order is as implied by the function name, True if s is a subset of s2 and not the other way around.

s.issuperset(s2)

Returns True if s contains s2. Pretty much the same as issubset but with the object and argument switched. A particular object as both object and argument is true, it is a subset of itself.

Other attributes of set objects:

s.__and__, s.__class__, s.__cmp__, s.__contains__, s.__delattr__,
s.__doc__, s.__eq__, s.__format__, s.__ge__, s.__getattribute__,
s.__gt__, s.__hash__, s.__iand__, s.__init__, s.__ior__, s.__isub__,
s.__iter__, s.__ixor__, s.__le__, s.__len__, s.__lt__, s.__ne__,
s.__new__, s.__or__, s.__rand__, s.__reduce__, s.__reduce_ex__,
s.__repr__, s.__ror__, s.__rsub__, s.__rxor__, s.__setattr__,
s.__sizeof__, s.__str__, s.__sub__, s.__subclasshook__, s.__xor__

Classes and Object Oriented Stuff (OOP)

Python isn’t just capable of using object-oriented features. It has been designed with that as a primary aspect of the language. One nice thing about the design, however, is that unlike Java, you can safely ignore all object oriented features and get quite a bit of useful programming done. But when object oriented features make a lot of sense and would actually reduce complexity, Python is there to make it quite simple.

Here is a quick example showing a simple class and one technique for making loading of the attributes optional at instantiation.

#!/usr/bin/env /usr/bin/python
class Plan:
    def __init__(self,premium=0,deductible=0,HSA=0):
        self.loadinfo(premium,deductible,HSA)
    def loadinfo(self,p,d,H):
        self.premium= p
        self.deductible= d
        self.HSA= H
    def __repr__(self):
        return '$%0.2f, $%0.2f, $%0.2f'% (self.premium,self.deductible,self.HSA)
if __name__ == '__main__':
    Gold= Plan()                       # Defer loading data values.
    Gold.loadinfo(1027.90,3900,350)    # Load values explicitly when ready.
    Silver= Plan(936.12,1900,100)      # Load values during creation.
    print(Silver)
    print(Gold)

The best way to remind myself how it all works is to look at some good code I have written that aptly uses the tradition OO programming style. This excerpt of my Geogad language shows a base class representing geometric entities and sub classes derived from it. It shows the definition of member data items and member functions. It also shows the idea of a class variable which in this case is useful to keep a unique ID number for each entity whatever its subtype.

class Entity:
    """Base class for functionality common to all entities."""
    lastIDused= 0 # Class static variable, new id pool.
    master_VList= vector.VectorList()
    def __init__(self):
        Entity.lastIDused += 1 # Increment last ID used...
        self.p_id= Entity.lastIDused #...which becomes the ID.
        self.vectors= dict() # Vectors that define entity's geometry.
        self.attribs= dict() # Properties of this entity.
    def __eq__(self, comparee): # Overload == operator.
            for p in self.vectors.keys():
                if not self.vectors[p] == comparee.vectors[p]:
                    return False
            else:
                return True
        except KeyError: # If they entities are totally mismatched.
            return False # E.g. A has end1 and end2, B has cen & rad.
    def __repr__(self):
        return 'A generic entity'
    def rotate(self, ang, basepoint=None): # On xy plane (%)
        pass
    def scale(self, fact, basepoint= None):
        if basepoint: # Move it to origen.
            self.translate(-basepoint)
        for p in self.vectors.keys():
            temp= self.vectors[p]
            newv= temp * fact
            Entity.master_VList.append( newv )
            Entity.master_VList.remove( temp )
            self.vectors[p]= newv
        if basepoint:
            self.translate(basepoint) # Put it back.
    def centroid(self):
        pass
    def boundingbox(self):
        pass
    def translate(self, offset):
        for p in self.vectors.keys():
            temp= self.vectors[p]
            newv= temp + offset
            Entity.master_VList.append( newv )
            Entity.master_VList.remove( temp )
            self.vectors[p]= newv

class Entity_Point(Entity):
    def __init__(self, P):
        Entity.__init__(self)
        self.vectors['A']= P
        Entity.master_VList.append(P) # Check in with the master list.
    def copy(self, offset=None):
        cp= Entity_Point(self.vectors['A'])
        if offset:
            cp.translate(offset)
        return cp
    def __repr__(self):
        return 'POINT:'+ str(self.vectors['A'])

class Entity_Line(Entity):
    def __init__(self, Pa, Pb):
        Entity.__init__(self)
        if Pa < Pb: # Here the vectors are sorted for predictability.
            self.vectors['A'], self.vectors['B']= Pa, Pb
        else: # The __lt__ is a bit arbitrary.
            self.vectors['A'], self.vectors['B']= Pb, Pa
        Entity.master_VList.append(Pa) # Check in with the master list.
        Entity.master_VList.append(Pa) # Check in with the master list.
    def copy(self, offset=None):
        cp= Entity_Line(self.vectors['A'], self.vectors['B'])
        if offset:
            cp.translate(offset)
        return cp
    def __repr__(self):
        return 'LINE:'+ str(self.vectors['A'])+str(self.vectors['B'])

There are a lot of built-in functions that can be overloaded to give your objects a more natural functionality. For example, whatever your object is, there is probably some sense of how big it is. Overloading the Python __len__() method for the class can make len(MyObject) do the right thing, whatever that is. This is a pretty good resource for figuring out what your options are.

Decorators

Decorators seem kind of lame to me. They basically add no fundamental functionality as far as I can tell. They seem to only turn this…

def f(x):
    return x
f = d(f)

…into this:

@d
def f(x):
    return x

I can’t say I’m super impressed by that. It seems like it’s for people who don’t know how to handle functions as objects, but what do I know? I find it weird that this syntax refers to something that is not yet defined and that’s not how Python should work. If it did, we could have main at the top of our programs. It is worth noting the first syntax as an alternative to decorators since it provides a clearer way to selectively activate them.

Nonetheless, some uses for decorators:

  • Timing something out so that it does not hang indefinitely. See Function Timeout section.

  • Profiling something to see how long various parts of your code take. See Function Timer section.

  • Type checking questionable input parameters.

  • Checking the security context of a function.

  • Tests.

  • Logging that a function actually got run.

  • Counting the number of times it got run.

One notable Python builtin decorator is @staticmethod. This can be used to include a method function in a class namespace when it really doesn’t or can’t use the class. Imagine a "member" function with no (self,... argument. This reasonable sounding person has misgivings about the whole idea. Google’s Python Style Guide seems to prohibit it. And Guido himself seems to regret it. But it is noted in case it pops up again.

Attribute Management

Python has a hasattr() function.

TheObject.hasattr('color') # WRONG - not used like this at all!
hasattr(TheObject,'color) # Right - will return true if TheObject.color is valid.

This guy has some good reasons to think that it’s better to except on AttributeError than do checks this way. However this can mask unknown unknown attribute errors.

He also points out that this works and is a bit more efficient too.

getattr(TheObject,'color',None)

If there is no color attribute for TheObject, the None object is returned.

Python also has setattr() and delattr() if that’s what you need.

Function Timer

Here’s a decorator example that times a function:

#!/usr/bin/python
import time

def timethis(f):
    """A decorator function to time things."""
    import time
    def timed_function(*args,**kw):
        start= time.time()
        result= f(*args,**kw)
        print('Time was %.3fs' % (time.time()-start))
        return result # Important to pass along any function results.
    return timed_function

@timethis
def example_fun(s):
    print("Sleeping for %.6f seconds." % s)
    time.sleep(s)

if __name__ == '__main__':
    example_fun(1.23)

This program outputs something like this:

Sleeping for 1.230000 seconds.
Time was 1.231388

Note that I’m trying for minimal dependency and maximal comprehensibility but there are plenty of official ways to do this like the timeit module.

Function Timeout

Sometimes you’re expecting something to happen and you’re not sure how long it will take. You do know that if it goes beyond a certain threshold, you would rather just abort. An example of this is if you are scanning for fast internet mirrors from which to download something. In this case, by definition, there would exist slow mirrors and they may be so slow that they bog down the operation quite a lot. With the following decorator, you can give each mirror a certain amount of time to attempt its operation before cutting your loses and pulling the plug.

#!/usr/bin/python
"""See: `man 2 alarm`
         http://docs.python.org/library/signal.html"""
import signal
LIMIT= 2 #seconds

def nohang(f):
    """A decorator function to cancel a function that takes too long."""
    def raiseerror(signum, frame): # Handlers take two args.
        raise IOError
    orig= signal.signal(signal.SIGALRM, raiseerror)
    signal.alarm(LIMIT)
    def time_limited_function(*args):
        try:
            f(*args)
        except:
            print("Timed out!")
        signal.signal(signal.SIGALRM, orig)
        signal.alarm(0)
    return time_limited_function

@nohang
def wait_this_long(t):
    import time
    time.sleep(t)
    print('Finished OK in %d seconds' % t)

if __name__ == '__main__':
    wait_this_long(3)
    wait_this_long(1)

Produces:

Timed out!
Finished OK in 1 seconds

Decorator With Arguments

I think that a situation like this:

@d(a)
def f(x):
    return x

…is the same as this:

def f(x):
    return x
i= d(a)  # i is an intermediate function which produces a function.
f= i(f)

I could be wrong though. Here’s a working example of a decorator that can be adjusted with an argument.

#!/usr/bin/python

def Decorator(DecoArg):
    def DecoArgWrapFunc(FuncPassed2Deco):
        print('Decorator argument: %s' % DecoArg)
        def DecoratedFunction(*args):
            print('Start decoration(%s)...' % DecoArg)
            RetValOfFuncPassed2Deco= FuncPassed2Deco(*args)
            print('End decoration(%s)...' % DecoArg)
            return RetValOfFuncPassed2Deco # To simulate UserFunc.
        return DecoratedFunction
    return DecoArgWrapFunc

@Decorator('DA')
def UserFunc(UserArg):
    print('In user function with user argument: %s' % UserArg)
    return pow(2,UserArg)

print('Value of call to the user function: %s' % UserFunc(8))

This program produces the following output.

Decorator argument: DA
Start decoration(DA)...
In user function with user argument: 8
End decoration(DA)...
Value of call to the user function: 256

Exceptions

Philosophy

Error handling in Python is quite powerful, but it can be a bit complex too. In Python an "exception" is an event that accompanies an error situation and it is valuable to realize that all errors in Python use the exception mechanism. Because of this, it’s pretty useful to know how to deal with them. For example, python doesn’t have a simple "exit" keyword and the ultimate way programs stop is when this effectively happens:

raise SystemExit

Although understanding exceptions is important, I have a bit of philosophical unease with the idea of planning for things you didn’t plan for. My thinking is that if you expect an exceptional situation, you should take precautions to make it not exceptional. In the Python world this seems to be divided into the "Easier to Ask for Forgiveness than Permission" (EAFP) and "Look Before You Leap" (LBYL) factions.

The classic case is a division by zero error. What is the functional difference between letting a division operator raise a very specific ZeroDivisionError and just checking to see if the denominator is zero before proceeding? I think that sometimes you want such explicit control and sometimes you can tolerate a certain amount of ambiguity. For example, if you’re going to do 100 divisions and the whole set are invalid if any one has a zero denominator, then the code might be easier to write and later understand if you use exception facilities. However, if you know that a certain attribute might be missing from an object, it is very reasonable to do a if hasattr(object,propstr): rather than try: ... except: AttributeError:. In theory these are very similar approaches as hasattr basically calls getattr and catches exceptions. However, if using exceptions directly, looking at the code later will tell you nothing about why you thought to catch an error there, i.e. something about object and propstr. For all your future self knows, you were just covering the "unknown unknowns".

I like to explicitly check for all the things I can think of which will go wrong (LBYL) and then use exceptions to stagger away still breathing if something truly unforeseeable happens. To me it’s like working on a roof while wearing a safety harness - falling off the roof is still to be avoided and jumping off the roof seems to be seriously full of the wrong attitude.

There are other cases, however, where exceptions are strategically preferred. One example is explained on the OS module documentation. It points out that checking to see if a file is readable and then reading it is not as robust as just trying to open it and then catching an exception when it doesn’t work. The thinking is that in the former case, an attacker could devise a way to change the state of the thing being checked between the check and the action.

Implementation

Code to be monitored for an exception is "tried" with the try keyword. An "exception" is "raised" (not "thrown" as in C++ and Java). An exception is, uh, excepted with except and not "caught" with catch.

The basic syntax looks like:

try:
    AttemptSomething()
except LoneException:
    HandleThisBadThing()
except (ExceptionOne, ExceptionTwo, ExceptionN):
    HandleAnyOfTheseBadThings()
except Exception as error:
    print(error) # Show error while handling. Also see below.
else:
    SomethingRanClean() # Executes if no exceptions raised.
finally:
    SomethingWasAttempted() # Runs if exceptions were raised or not.

In the above code the Exception exception catches most sensible things (but not stuff that you need to stop the program). I find I use this for overlooking a few anomalous corrupt input records while processing large quantities of mostly good ones.

for l in fileinput.input():  # Read each line from external data source.
    try:                     # Unexpected bad things can happen because...
        process_line(l)      # ...I don't know how random data will react.
    except Exception as e:   # Catch anything interesting.
        if debug:            # Else, better luck next line and quietly skip.
            print(l+" ERROR:"+repr(e)) # Make fuss.
            raise SystemExit # Stop now to address this while debugging.

Also don’t forget about assert which can raise an AssertionError. This kind of thing can be useful.

import sys
assert('linux' in sys.platform), "Ask Chris how to fix that..."

That throws an exception and prints that message if the script is run on the wrong kind of platform. Basically, consider assert when you want to raise an exception conditionally.

Standard Exceptions

What exceptions are there to be raised? Here’s an abridged diagram of the exception hierarchy. Note that if you except an EnvironmentError exception then it will catch IOError and OSError since those are subclasses, or types, of that exception.

BaseException
 +-- SystemExit
 +-- KeyboardInterrupt
 +-- GeneratorExit
 +-- Exception
      +-- StopIteration
      +-- StandardError
      |    +-- BufferError
      |    +-- ArithmeticError
      |    |    +-- FloatingPointError
      |    |    +-- OverflowError
      |    |    +-- ZeroDivisionError
      |    +-- AssertionError
      |    +-- AttributeError
      |    +-- EnvironmentError
      |    |    +-- IOError
      |    |    +-- OSError
      |    +-- EOFError
      |    +-- ImportError
      |    +-- LookupError
      |    |    +-- IndexError
      |    |    +-- KeyError
      |    +-- MemoryError
      |    +-- NameError
      |    +-- ReferenceError
      |    +-- RuntimeError
      |    |    +-- NotImplementedError
      |    +-- SyntaxError
      |    |    +-- IndentationError
      |    +-- SystemError
      |    +-- TypeError
      |    +-- ValueError
      |         +-- UnicodeError
      +-- Warning

You can find out about exceptions with help:

$ python -c "help(EOFError)" | sed -n '/^class/,/ |  /p'
class EOFError(StandardError)
 |  Read beyond end of file.

Or this:

python -c "import exceptions; help(exceptions)"

Exception Data

Looks like there has been a change in the syntax used to manage exception information. Formerly it was something like:

except ValueError, exception_instance:

And now it is something like:

except ValueError as exception_instance:

What this means is that in your except clause you can use the exception instance that is generated with the raise (perhaps as part of a system error). This exception instance contains some handy stuff relating to the error (usually). It seems that the various built in exceptions have a variety of attributes. For example, an IOError will have a filename attribute that you can access. Here is an example of the basic attribute system showing the generic data stored by the exception object and how to find out what exactly the exception object can tell you.

#!/usr/bin/python
try:
    raise Exception('arg1','arg2','arg3')
except Exception as exception_instance:
    print("dir(exception_instance):")
    print(dir(exception_instance))
    print("type(exception_instance):")
    print(type(exception_instance))
    print("exception_instance:")
    print(exception_instance)
    print("exception_instance.args:")
    print(exception_instance.args)

Produces:

dir(exception_instance):
['__class__', '__delattr__', '__dict__', '__doc__', '__format__',
'__getattribute__', '__getitem__', '__getslice__', '__hash__',
'__init__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__setstate__', '__sizeof__',
'__str__', '__subclasshook__', '__unicode__', 'args', 'message']
type(exception_instance):
<type 'exceptions.Exception'>
exception_instance:
('arg1', 'arg2', 'arg3')
exception_instance.args:
('arg1', 'arg2', 'arg3')
Note
The exception_instance.message attribute seems like it has been deprecated. Best not to rely on it!

This catches any general exception. Presumably to catch KeyboardInterrupt and even higher level exceptions that you’re not expecting, try BaseException.

Custom Exceptions

If none of the built in exception classes seem appropriate or if they lack the necessary attributes, you can create your own exception classes which more specifically do what you need. You can inherit from any exception, but it’s normal to use Exception as an uncluttered base class. Here’s an example of what methods and structure such a user defined exception class entails.

#!/usr/bin/python

class UserDefinedException(Exception):
    """Best to inherit from Exception class."""
    def __init__(self, ex_att_param):
        self.user_attribute= ex_att_param
    def __str__(self):
        return repr(self.user_attribute)

try:
    raise UserDefinedException('Idiot programmer alert!')
except UserDefinedException as InstanceOfUserDefinedException:
    print(InstanceOfUserDefinedException.user_attribute)

This example program produces simply Idiot programmer alert!.

File operations

Checking

Use the os.access() function to check if the file you are interested in is in the condition you expect.

if not os.access(f,os.F_OK):
    print("Nonexistent file problem: %s"%f)
if not os.access(f,os.R_OK):
    print("Unreadable file problem: %s"%f)

You can also use os.W_OK and os.X_OK to test for writable and executable.

Also consider os.path for checking on directories.

os.path.isdir(path)

Simple Reading

To read an entire file into a string:

entire_file_contents= open(filename).read()

To read each line of a file use something like:

f= open(filename,'r')
for l in f:
    print(l)

Note that this comes with newlines from the file and from the print. Use sys.stdout.write(l) to avoid this problem.

Other streams besides sys.stdout are sys.stdin and sys.stderr.

Binary File Operations

Sometimes you need very specific things done with your file writes. If, for example, you need to aggressively prevent buffering, use the third argument of 0 in the open function. But this only works with binary (e.g. wb) modes. When using binary modes, you have to make sure any string types are converted — or encoded — into something at a lower level. Here’s an example that finally did what I wanted it to do.

data= '05F10D66 00 F8 B4 01 00 00 FF FF'
with open('/dev/ttyACM1','wb',0) as f:
    f.write((data+'\r\n').encode('ASCII'))

Note the with/as syntax is described below.

with + as

It looks like since version 2.5, the fanciest (and best?) way to open files and read through them is to use the new with and as reserved words.

This creates a list of lines in a file.

with open(filename, 'r', encoding='utf-8') as f:
    listofallfilelines = f.readlines()

Here’s an example that counts the lines in a file.

count= 0
with open('filename','r') as f:
    for l in f:
        count += 1
print(count)

The advantage here, apparently, is that the file gets neatly closed even if it’s rudely interrupted and never makes it to your file closing statement. Or something like that.

A better way to think of the with as syntax is that it seems to set up a "context". An object which has an __enter__() and __exit__() method can be used with with as such that the enter method gets called upon entering the block and the exit method gets called, unsurprisingly, upon exit. This is why it’s such a reasonable way to handle file opening because things can be done with the file and the exit function makes sure that, whatever weirdness transpires, the file will get closed.

Another example of when this would be appropriate would be some kind of routine that output some SVG. You may want to adjust the view parameters with a <g transform=...> block. You might start with the opening tag and then do a bunch of stuff and then print a </g> tag from the exit method of an object. This will allow for nested objects and the opening and closing tags will always exist and be correct.

#!/usr/bin/python
class SVGsettings(object):
    def __init__(self,p):
        self.property= p
    def __enter__(self):
        print('<g %s>'%self.property)
    def __exit__(self, type, value, traceback):
        print("</g><!--%s-->"%self.property)

with SVGsettings('stroke-width="1.5"'):
    with SVGsettings('stroke="red"'):
        print('<line x1="3" y1="0" x2="25" y2="44"/>')
Output
<g stroke-width="1.5">
<g stroke="red">
<line x1="3" y1="0" x2="25" y2="44"/>
</g><!--stroke="red"-->
</g><!--stroke-width="1.5"-->

It might be smart to rewrite my HTML tagger with this style.

Another example is one of a transaction "lock". If you need to do something like update a database and it must be locked to prevent access from other actors, you can have the lock be in the __enter__() method and the release be in the __exit__() method. This way, even if bad things happen, the lock will get released properly.

I think the core of the syntax is that this code (with as VAR optional):

with EXPR as VAR:
    BLOCK

Is equivalent to:

mgr = (EXPR)
exit = type(mgr).__exit__  # Not calling it yet
value = type(mgr).__enter__(mgr)
exc = True
try:
    try:
        VAR = value  # Only if "as VAR" is present
        BLOCK
    except:
        # The exceptional case is handled here
        exc = False
        if not exit(mgr, *sys.exc_info()):
            raise
        # The exception is swallowed if exit() returns true
finally:
    # The normal and non-local-goto cases are handled here
    if exc:
        exit(mgr, None, None, None)

The __exit__() function needs to take 4 values, self, type, value, and traceback. I think these are the arguments of a raise statement.

Also newer Pythons (2.7 and 3+) support stuff like this.

with open("customers") as f1, open("transactions") as f2:
    # do stuff with multiple files

Complete mind boggling details can be found here.

Temporary File Names

Here is a sample where a temporary file is created and then the program turns control over to Vim for the user to compose something and when the user quits, the Python program has all of the input. This is what you would do if you wanted to, for example, recreate the functionality of a mail client like Mutt.

#!/usr/bin/python

import tempfile
import subprocess
tmpfile= tempfile.NamedTemporaryFile(dir='/tmp').name
vimcmd= '/usr/bin/vim'
subprocess.call([vimcmd,tmpfile])

with open(tmpfile,'r') as fob:
    f= fob.readlines()
    if f[1].strip() == '='*len(f[0].strip()):
        print("Title: %s" % f[0])
    else:
        print(f[1])
        print('='*len(f[0]))
    print("%d lines entered." % len(f))

Buffering Issues

Although normally not something to worry about, sometimes it’s important to remember that Python tends to politely buffer output as a general rule. You can have unbuffered output by invoking Python with the -u run time option. I’ve found this to be important when using tee, named pipes, and other fancy stream situations.

Also note that you can use something like print(x,flush=True). Sometimes I do print('.',end='',flush=True) to print a line of status dots and I don’t want the status to be incorrect for what is actually going on.

Compression

Python can deal with compressed files just fine. There are the modules gzip and bz2 which are very similar but not identical. Here’s how to read a gzip compressed file.

import gzip
f= gzip.open(filename)

Then work with f as a normal file handle. This example is illustrative for writing compressed files:

How to compress an existing file
import gzip
f_in = open('file.txt', 'rb')
f_out = gzip.open('file.txt.gz', 'wb')
f_out.writelines(f_in)
f_out.close()
f_in.close()

Here’s an example of how to use bz2. This little program is a (rough) wc program for text files with bzip2 compression.

#!/usr/bin/python
import bz2
import sys
fn= sys.argv[1]
b=0;w=0;l=0
f= bz2.BZ2File(fn, 'r')
for line in f.readlines():
     b+= len(line); w+=len(line.split(' ')); l+=1
print("%d bytes, %d words, %d lines" % (b,w,l))

File Input and Standard Input

The fileinput really simplifies getting things from files or standard input.

input.py
#!/usr/bin/python
import fileinput
for l in fileinput.input():
    print(l.strip().title())

This program produces this output.

$ ./input.py myfile
This File Can Be Sent As Input Both As
A File Argument And As Standard Input.
$ ./input.py <myfile
This File Can Be Sent As Input Both As
A File Argument And As Standard Input.
$ ./input.py myfile - <myfile
This File Can Be Sent As Input Both As
A File Argument And As Standard Input.
This File Can Be Sent As Input Both As
A File Argument And As Standard Input.

Interactive Input

When writing menu-driven features or other interactive programs that wait for a user to input things the following can be useful. This example shows how to suppress echoing for applications like passwords or where the key press' value is not relevant.

import os
os.system('stty -echo')
passwd= raw_input('Password:')
os.system('stty echo')

Note that Python3 took out raw_input. Now it’s just input. An answer here has a clever comprehensive solution.

Works in 2 and 3
try: input= raw_input
except NameError: pass
print("You entered: " + input("Prompt: "))

CSV

Now and again some lackwit sends you a file in a popular spreadsheet format. You use something like this to try and decrypt it.

libreoffice --headless --convert-to csv --outdir ./csvdir/ yuck.xls

But then you have crazy business like this.

a,b,"c1,c2,c3",d,"e1,e2",f

Which is extremely tedious to parse. But not with Python!

csv2bsv.py
#!/usr/bin/python
import sys
import csv
with open(sys.argv[1],'r') as f:
    for r in csv.reader(f):
        print('|'.join(r))

Bitwise Operators

The official guide is good.

  • x<<y = x’s bits shifted left y places

  • x>>y = x’s bits shifted right y places

  • x&y = bitwise AND - can also be overloaded by set classes (and similar) for union

  • x|y = bitwise OR - can also be overloaded for intersection

  • x^y = bitwise XOR

  • ~x = complement - this is supposed to change 1s to 0s and 0s to 1s but it can sometimes get tricky. See below.

Note that the complement turns values into their negative version.

>>> x=64
>>> bin(x)
 '0b1000000'
>>> bin(~x)
'-0b1000001'

I think this uses 2s complement. Here is an example of decoding a list of 4 bytes (LSB to MSB order) with 2s complement to handle negative values properly.

# This simple method works for positive values.
#return (b[3]*256**3 + b[2]*256**2 + b[1]*256 + b[0])
# But 2's complement must be handled for negative. Should also be much quicker.
x= 0
for n,B in enumerate(b): # n= is byte index and B is that data byte.
    x |=  B<<(n*8) # Slide first LSB byte over 0, then 8 for next, then 16, finally 24 for MSB.
if x>>31: # If leftmost (MSb) is 1, then negative conversion is necessary.
    x -= 4294967296 # Subtract off 1<<32 from val for 2's complement.
return x

Note that besides using the bin() function, you can visualize binary with a format specifier like this: "{0:b}".format(x)

Binary Data

Sometimes clever people put data into very efficient binary containers. Using C is the preferred way to deal with this, but if you’re lazy, Python does a great job of decoding binary data too.

>>> import struct
>>> packformat='>cHcHIccccccccccccccccHHIcHccHcHIcHccccccccccccccccccccc'
>>> struct.unpack(packformat,open('/tmp/mybinary.sbd','rb').read())
('\x01', 69, '\x01', 28, 2200337468, '3', '0', '0', '2', '3', '4', '0', '6', '2',
'9', '5', '9', '9', '6', '0', '\x00', 11682, 0, 1456502452, '\x03', 11,
'\x01', ' ', 49763, 'u', 12901, 0, '\x02', 21, '\x00', ' ', 'M', '@',
'\x00', '\x01', 'P', '\xef', '\xf0', ' ', '\x08', 'J', '\x00', 'Y', '_',
'\xcc', '&', 'L', '\x91', '\xe7', '}')
Unpack Codes
  • c = char 1 byte 0-255 (256 values)

  • H = unsigned short 2 byte 0-65535 (65,536 values)

  • I = unsigned int 4 bytes 0-4294967295 (4,294,967,296 values)

Full unpack codes can be found in the official struct module documentation.

If you’re using binary data, you might need to convert bases which is described here.

Pickle

Although there are often better and more secure ways to save Python objects (see JSON below for example), an old classic is Python’s pickle. This object serialization basically just takes any Python object and makes it into a thing that can be written into a file. The end result of this trick is that you can dump some memory state (to a file, across a network, etc) and load it back into memory at another time and place.

import pickle
my_object= My_Object(1,2,3)
# ===== Save Object =====
with open('my_object.p','wb') as pickle_file:
    pickle.dump(my_object,pickle_file)
# ===== Clear Object =====
my_object= None
# ===== Restore Object =====
with open('my_object.p','rb') as pickle_file:
    my_object= pickle.load(pickle_file)

Pickle can serialize any objects you dream up. If your objects don’t involve homemade classes, i.e. they only use Python native types, consider the marshal module.

I think the shelve module provides a key/value style interface to pickle, if you like that kind of thing.

This example compares pickle with json.

#!/usr/bin/env /usr/bin/python3
import json
import pickle

class MyOwnClass:
    def __init__(self):
        self.oblist= [1,2,3,4,'ok']
        self.obA= self.calcA()
        self.obB= 1
    def __repr__(self):
        return f'{self.obA:d} and {self.obB:d}'
    def calcA(self):
        return 5

def pickling(A):
    print("== Pickling ==")
    with open('/tmp/my_object.p','wb') as pickle_file: # Note 'wb' mode.
        pickle.dump(A,pickle_file)       # Understands and records full object.
    print("Unpickling... creates: <class '__main__.MyOwnClass'>")
    with open('/tmp/my_object.p','rb') as pickle_file: # Note 'rb' mode.
        P= pickle.load(pickle_file)
    print(P)

def JSONing(A):
    print("== JSONing ==")
    with open('/tmp/my_object.json','w') as json_file: # Note 'w' mode.
        json.dump(A.__dict__,json_file)  # Note `.__dict__` attribute.
    print("UnJSONing... creates: <class 'dict'>")
    with open('/tmp/my_object.json','r') as json_file: # Note 'r' mode.
        J= json.load(json_file)
    print(J)

A= MyOwnClass()
print(A)
pickling(A)
JSONing(A)
5 and 1
== Pickling ==
Unpickling... creates: <class '__main__.MyOwnClass'>
5 and 1
== JSONing ==
UnJSONing... creates: <class 'dict'>
{'oblist': [1, 2, 3, 4, 'ok'], 'obA': 5, 'obB': 1}

As you can see, the pickle preserved the whole object including the methods. JSON needed to use the dict attribute to just get the data. What is read back in is not the object, but just the data. For more details on the json module, see the next section.

JSON

There are more Pythonic ways of serializing objects (marshal, pickle, cpickle) but in 2013, the way that makes the most people happiest across platforms and languages is JSON. Serendipitously, JSON looks almost identical to a Python dictionary’s __repr__() output. Here’s a sample of how to deal with JSON in a simple case.

json_sample.py
#!/usr/bin/python
import json
import sys
pfile= open("test.json",'r')
P= json.load(pfile)
for p in P.keys():
    P[p]+= 1
json.dump(P,sys.stdout) # Put some writable file object here.
sys.stdout.flush()

This might produce this result:

$ cat test.json
{"a": 1.5, "b": 1.5707963, "c": 0.95, "d": 0.55, "e": 10.0}
$ ./json_sample.py test.json
{"a": 2.5, "c": 1.95, "b": 2.5707963, "e": 11.0, "d": 1.55}

System Control

Python has several methods to allow arbitrary execution of system commands (exiting to a temporary shell). Obviously this is powerful and dangerous where security is an issue. It’s also often clumsy as the proper Python way of doing things is usually better than the shell way when you factor in the spawning of the shell.

This stuff has gone through a lot of changes over the years, but as of 2014, the consensus is to use the subprocess module.

Here is a nice overview of this kind of stuff.

Here are some methods:

os.listdir('./path') # Produces a list. `~` doesn't work. No hidden files.
os.system('ls ./path') # Just does the thing.
os.popen('ls .','r').read() # Captures the output into a string.
for f in os.popen('ls .','r').readlines(): print(f)# Deal with each.

If this doesn’t do what you need, you can investigate the fancier functions of os like popen2, popen3, popen4, fork, spawn, and execv. See the official os help for more details.

Note
It seems that popen and friends are now deprecated since version 2.6. This is a real moving target. Looks like the new way is the subprocess module.

Subprocess

Here’s the recommended way for executing shell commands as of 2013.

Start with getting the lines of output from the simplest kind of command to fill a Python list.

''.join(map(chr,subprocess.check_output(['cal']))).split('\n')

The reason for all that guff is that this check_output command produces a bytes object. Another way to untangle a byte stream object is to decode it.

>>> b'Line one\nLine two'.decode('utf-8').split('\n')
['Line one', 'Line two']

Here are some more examples.

>>> import subprocess
>>> n= subprocess.Popen(['df','-h','/media/WDUSB500TB'],stdout=subprocess.PIPE)
>>> o= n.stdout.read()
>>> o
'Filesystem            Size  Used Avail Use% Mounted on\n/dev/sdb              459G  350G   86G  81% /media/WDUSB500TB\n'

Note the stdout=subprocess.PIPE value to the Popen constructor. This is required to keep the function from immediately dumping the results on the spot. The function does run immediately when the constructor runs. So if you do a date function, for example, and there’s a lag between the constructor and the n.stdout.read() the time will reflect the initial operation.

Proper Python documentation suggests that it’s good to use the supplied convenience functions when possible. These are call, check_call, and check_output. Here’s how the latter work:

import subprocess
findcmd= ['find', '/home/xed/', '-name', '*pdf']
for PDF in subprocess.check_output(findcmd).strip().split('\n'):
    print("PDF: %s" % PDF)
# Might output list like:
#    PDF: /home/xed/SlaughterhouseFive.pdf
#    PDF: /home/xed/gpcard.pdf

I started having trouble reading lines of standard output from a process with Python 3. Here’s a way that worked.

import io
proc= subprocess.Popen(CMD,stdout=subprocess.PIPE)
for oo in io.TextIOWrapper(proc.stdout, encoding="utf-8"):
    o= oo.strip() # Hmm. Wish I could think of a smarter way to do this.

Here’s an example of an outside command that gets run with some data the Python program knows about being piped to the command and the standard output being captured back into the program.

>>> import subprocess
>>> pro= subprocess.Popen(['/usr/bin/tr','a-z','A-Z'], shell=False, stdin=subprocess.PIPE,stdout=subprocess.PIPE)
>>> pro.stdin.write("It might get loud.\n")
>>> pro.communicate()
('IT MIGHT GET LOUD.\n', None)

Note that the second item (pro.communicate()[1]) is the standard error.

Environment Variables

To access environment variables from python use this technique:

>>> import os
>>> os.environ['USER']
'xed'

Console Colors

I came up with this approach to tagging text with ANSI escape codes.

#!/usr/bin/python3
color= { # B___=Background, L___=Light, LB___=Light/Background
'BLD':'\33[1m', 'ITL':'\33[3m', 'UNL':'\33[4m', 'BNK':'\33[5m', 'INV':'\33[7m',
'BLK':'\33[30m', 'RED':'\33[31m', 'GRN':'\33[32m', 'YEL':'\33[33m', 'BLU':'\33[34m',
'VIO':'\33[35m', 'CYN':'\33[36m', 'WHT':'\33[37m', 'GRY':'\33[90m',
'LRED':'\33[91m', 'LGRN':'\33[92m', 'LYEL':'\33[93m', 'LBLU':'\33[94m',
'LVIO':'\33[95m', 'LCYN':'\33[96m', 'LWHT':'\33[97m',
'BBLK':'\33[40m', 'BRED':'\33[41m', 'BGRN':'\33[42m', 'BYEL':'\33[43m', 'BBLU':'\33[44m',
'BVIO':'\33[45m', 'BCYN':'\33[46m', 'BWHT':'\33[47m', 'BGRY':'\33[100m',
'LBRED':'\33[101m', 'LBGRN':'\33[102m', 'LBYEL':'\33[103m', 'LBBLU':'\33[104m',
'LBVIO':'\33[105m', 'LBCYN':'\33[106m', 'LBWHT':'\33[107m' }
def colorfn(v): return lambda t:v+t+'\033[0m'
for k,v in color.items(): color[k]= colorfn(v)

if __name__ == '__main__':
    print(12*'='+ color['RED'](' Color Examples ') +'='*12)       # General usage.
    for k in color: print(color[k]('Console color testing: '+k))  # Full test.

It seems to work in normal graphical terminals and in the console. YMMV on a bad OS.

Time And Date

Working with times and dates can be tricky in Python. There are a lot of seemingly overlapping modules (date, time, datetime) and everything is done very fastidiously. This can make simple things seem complex. Here are some common usage cases dealing with times.

import datetime
print(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))

This produces '2012-10-18 16:50:37' and is typical of timestamps found in logging situations.

If you have some kind of Unixy tool giving you seconds from the epoch, you can tidy that up with this.

>>> datetime.datetime.fromtimestamp(1456502452).strftime('%Y%m%d %H:%M:%S')
'20160226 08:00:52'

The timedelta objects can be useful for relative dates. Again note the datetime.datetime.SOMEFUNCTION syntax which is not entirely obvious in the Python documentation.

today= datetime.datetime.strptime('2016-05-20','%Y-%m-%d')
aday= datetime.timedelta(1)
yesterday= today-aday
lastweek= today-(7*aday)

Another common requirement involving time is to profile code or for some other reason find out how long something took.

import time
start= time.time()
do_some_lengthy_thing()
print('Elapsed time: %f' % (time.time()-start))

It might be better to use time.perf_counter() or time.perf_counter_ns() instead of time.time(). These specialize in performance timing at the highest resolution possible (though the "ns" version is limited to nanoseconds as you’d expect). Note that any specific values, i.e. not used to calculate time deltas, are meaningless.

See the function timing decorator for this kind of application implemented in a general way.

Here’s an simple example that calculates days between two events.

dayselapsed.py
#!/usr/bin/python
""" Usage: dateelapsed.py 2012-01-09 2014-03-13 """
import datetime,sys
s,e = sys.argv[1],sys.argv[2]
sd= datetime.datetime.strptime(s,'%Y-%m-%d')
ed= datetime.datetime.strptime(e,'%Y-%m-%d')
dd= ed-sd
print(dd.days)

Need to just slow things down or wait in a polling loop?

while still_working():
    time.sleep(.1)

Random Numbers

Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin.

— John von Neumann

Oh well. Here’s how to use random numbers in normal usage.

import random
random.seed([any_hashable_object])  # Default is sys' time or urand
random.randint(a,b)                 # a <= N <= b
random.choice(sequence)             # Pick one
random.shuffle(sequence)            # Same list scrambled - in place!
random.random()                     # floating point [0.0,1.0)
'%06x' % random.randint(0,0xFFFFFF) # random hex color for HTML, etc.
Synchronized Shuffling
>>> A,B=[1,2,3,4,5],['a','b','c','d','e']
>>> import random
>>> Z=(list(zip(A,B))) ; random.shuffle(Z)
>>> A,B=[n[0] for n in Z],[n[1] for n in Z]
>>> A,B
([3, 2, 1, 5, 4], ['c', 'b', 'a', 'e', 'd'])

Also see from sklearn.utils import shuffle.

More Entropy Please

Also, os.urandom(n) returns n random bytes from /dev/urandom or some other OS specific collection of high quality entropy. This is slower and depends on actual random events having occurred on the system.

Hashing

There is a built in function called hash which can return the numerical hash of a hashable object.

>>> print(hash('xed'))
-2056704

But that is probably not what you need. This is the more common application and technique for hashes:

import hashlib
md5sum= hashlib.md5(open(fn,'rb').read()).hexdigest()

Here’s a tip that might be helpful.

fn= 'my_critical_data.tgz'
goodmd5= '5d3c7e653e63471c88df796156a9dfa9'
actualmd5= hashlib.md5(open(fn,'rb').read()).hexdigest()
assert actualmd5 == goodmd5, '{} is corrupted!'.format(fn)

Besides md5, hashlib also supports SHA1, SHA224, SHA256, SHA384, and SHA512 hashing.

Note
It used to be import md5 but that is apparently deprecated. If you’re using a very old Python and hashlib doesn’t work, give it a try.

Base Conversion

To convert common base whole numbers to decimal integers is pretty easy.

>>> int("CE",16)
206
>>> int("1000000",2)
64

Or just use native type syntax for common bases.

>>> 0744; 0xCE; 0b100
484
206
4

Going the other way can be done with special built-in functions.

>>> oct(484); bin(512); hex(206)
'0744'
'0b1000000000'
'0xce'

If you don’t like the prefix there are formatting tricks.

>>> "{:o}".format(484); "{:b}".format(512); "{:x}".format(206); "{0:02d}".format(0b00011110)
'744'
'1000000000'
'ce'
30

It looks like you can skip using "str".format(args...) by doing something like this. This is probably the cleanest looking way to do this.

>>> f"{math.pi:0.3f} are squared."
'3.142 are squared.'

Old style can do some too.

>>> '%x %o' % (206,484)
'ce 744'

Fullish details on formatting tricks can be found here.

Unfortunately that only works for the commonly used bases. Here’s the simple function I came up with to solve this problem.

def dec2N(n,base=16):
    anint,o,s= type(int()),str(),str()
    if (type(n) is not anint) or (type(base) is not anint) or base < 1:
        return "Undefined"
    if base == 1: return "|"*n
    if n == 0: return "0"
    if n<0: s,n= '-',n*-1
    def digit_str(x):
        if x > 15: return '(%s)'%x
        if x > 9: return chr(55+x)
        return str(x)
    while n:
        o= digit_str(n%base)+o
        n= n//base
    return s+o

This handles hex letters (and could handle higher letters in higher bases by changing the > 15 value). It doesn’t have stack depth recursion issues and should work fine for most normal things I can think of.

If you need very exotic fractional hex conversion, investigate something like this.

>>> float.hex(2.25)
'0x1.2000000000000p+1'

Math

The math module does all the normal stuff, usually as expected.

Some Math Functions
  • pi - Constant ready to use.

  • e - Constant ready to use.

  • ceil - Next whole number float.

  • floor - Previous whole number float.

  • sqrt - Use import cmath for negative values and fun with complex numbers.

  • atan - Returns in radians.

  • atan2 - Takes two arguments, a numerator and a denominator, so that the correct quadrant can be returned.

  • sin - Trig functions like radians.

  • degrees - Convert from radians.

  • radians - Convert to radians.

  • log - Don’t get caught by int(math.log(1000,10)) being equal to 2.

  • log10 - Use int(math.log10(1000)) instead.

  • gamma - Fancy float capable way to do factorials. Maybe supply n+1 if you want math.factorial or just use that function.

NumPy

Can of worms! But super powerful. The key trick of NumPy is that it has an array object that makes arrays more like C arrays (with strides) but with all the accounting done. This allows the performance to be much better than native Python objects, especially for large numeric data sets. It’s also quite good at linear algebra. See my TensorFlow notes for an example of that.

This is a nice tip for getting help strings out of NumPy syntax.

np.info(np.npcommand)

Numpy uses an idea called "broadcasting" to help work with arrays of different dimensions. For example, A * B where A is 10x1 and B is 1 results in a 10x1 array. And a 4x1 array and a 1x3 array results in a 4x3. More gory details and a decent explanation can be found here.

A quick note about some common optional arguments.

  • keepdims=True - will cause something like sum(array([[1,2],[1,1],[1,0]])) to be an array([[6]]) while False will just produce 6.

  • axis=1 - takes sum(array([[1,2],[1,1],[1,0]])) and returns array([3,2,1]). axis=0 will give [3,3] (or something like that).

  • dtype=? - allows a specific numpy type to be specified.

And when printing these objects out, there are some formatting options that can come in handy. See numpy.set_printoptions.

  • linewidth - Defaults to 75 which is very often annoying.

  • sign - Prints "+" for positive values.

  • threshold - Number of items to show before summarizing with ....

  • edgeitems - How many summary items in representations like [1,2,...3,4].

  • precision - How many decimals to display.

  • floatmode - Display all decimals including trailing zeros (fixed) or perhaps as needed (maxprec_equal the default).

Just set the option somewhere before using the output methods.

np.set_printoptions(linewidth=200)

NumPy Types And Object Attributes

Types
  • np.int, np.uint8, np.int64, np.uint16, np.int8, np.int16, np.intc (C sized int)

  • np.float, np.float32, np.float64, np.float16

  • np.complex (same as 128), np.complex128, np.complex64

  • np.bool

  • np.object

  • np.string_

  • np.unicode_

Information about your array
  • mynparr.shape

  • mynparr.flags

  • len(mynparr)

  • mynparr.ndim

  • mynparr.size

  • mynparr.nbytes

  • mynparr.dtype

  • mynparr.dtype.name

  • a[x] - element x

  • a[x,y] - element x,y, similar, perhaps the same as a[x][y].

  • a[0:3,0:3,0:3] - full slices options for each dimension.

  • a[1,…] - same as a[1,:,: etc ,:]

  • mynparr.view(<type>)

Casting
  • np.array(my_normal_list) - convert to proper numpy array which is technically an ndarray (n-dimensional array) object.

  • mynparr.astype(uint8) - cast as integer (not default floats).

  • (mynparr.astype(np.float32) - 127.5) / 127.5 - cast from uint8s 0 to 255 into -1 to +1.

  • mynparr.tolist() - convert back out of NumPy to regular Python.

Creating Arrays

  • np.loadtxt(file.txt[,skiprows=1][,delimiter=|) - load from text file. Also, see np.savetxt(file,a,delimiter=|) and np.save()

  • np.array( [ (1,2,3),(4,5,6) ], dtype=float)

  • np.ones( (x,y,z…), dtype=np.int16 )

  • np.zeros( (x,y) )

  • np.zeros_like(template) - I think this makes a zero array in the shape of another one. Like your other array all zeroed out. Also there’s a np.ones_like() that’s about the same but for ones.

  • np.empty( (x,y,z…) ) - similar to zeros but in reality the values are never set.

  • np.arange(2,101,2) = 2,4,6…98,100 - Third arg is tick interval.

  • np.linspace(start,end,[qty]) - qty defaults to 50. evenly spaces values. Third arg is number of ticks.

  • np.full( (x,y), val ) - fills an array of specified size with val. Also if full is not there, try fill like this: a=np.empty(n);a.fill(val)

  • np.eye(n) - Identity matrix (array really) of given size. Same as np.identity(n).

  • np.random.seed(whatever) - allows controlled repeats of random experiments.

  • np.random.random( (x,y…) ) - makes an array of specified size with random values.

  • np.random.binomial(k,p,size=n) - Create a set from a binomial distribution. A handy one is np.random.binomial(1,.5,size=20) which makes 20 random 1s or 0s. Good for things like dropout layers.

Arithmetic

Element by element
  • np.add(a,b) - two arguments only. Adds corresponding elements of two arrays. Also has the + operator overloaded.

  • np.sub(a,b) - similar to np.add. -

  • np.multiply(a,b) - Simple multiplication of each corresponding element. *

  • np.divide(a,b) - Same as multiply. /

  • np.remainder(a,b) - Same as mod. Maybe %.

  • np.exp(n) - Raise e (2.71828182846) to the power of each element in n.

  • np.sqrt(n)

  • np.sin(rad) - Sine of radians for each element.

  • np.arctan2(y,x) - Normal trig angle finding.

  • np.cos(rad) - Cosine of radians for each element.

  • np.log(n) - Natural log for each element.

  • np.dot(a,b) - Dot product of an array. Also a.dot(b) format.

Listwise
  • np.sum(a) or a.sum() - Adds up contents. Even adds nested values unless a ,axis=n is included to lock certain dimensions.

  • np.min(a)

  • np.max(a) - Same as np.amax() both which return the maximum array element from a single array. Can supply an optional axis=0 which gets confusing fast. See this.

  • np.maximum(a,b) - Unlike np.max() mentioned above, this one takes two np.array arguments and does a listwise max check. np.maximum(np.array([1,2,3,4,5]),np.array([5,4,3,2,1])) produces an array of 5, 4, 3, 4, 5. My experiments showed that it would take a 3rd argument but that it seemed to be ignored. Some people believe that np.max() is actually np.maximum.reduce() in disguise.

  • np.histogram(a,bins=b,range=(0,255)) - range defaults to a.min() and a.max(). Returns counts in bins and bin edges (fenceposts).

    np.histogram(np.array([1,2,2,3,3,3]),bins=4, range=np.array([0,4]))
    1, 2, 3]), array([ 0.,  1.,  2.,  3.,  4.]))
  • np.argmax(a) - returns the index of the maximum value’s position. Good for finding the peaks locations of a histogram, for example.

  • np.argmin(a) - Similar to argmax.

  • np.nonzero(a) - returns indices (or array of locations) where nonzero values occur. a=np.hstack( (np.arange(3),np.arange(3) ) ) ; (a==1).nonzero()A produces (array([1, 4]),).

  • np.mean(a) - average

  • np.median(a)

  • np.std(a) - standard deviation. Or without np: math.sqrt(sum((x-mean)**2 for x in l)/len(l))

  • np.corrcoef(a,y=b) - Pearson correlation coefficient. a and b must have the same shape.

  • np.logical_or(a,b) - Or.

  • np.logical_and(a,b) - And.

  • np.logical_not(a,b) - Not.

  • np.equal(a,b) - == - listwise, returns array of bools.

  • np.array_equal(a,b) - True or False if the whole arrays are identical.

Also a[a<2] gives true values where a < 2.

Fancy
  • np.polyfit(x,y,degree) - So with x 0 to 5 and y being x^2, like this np.polyfit(np.arange(6),np.arange(6)*np.arange(6),2), produces array([1,0,0]) since this is y= 1*x^2 + 0*x + 0. This is also very useful to calculate slopes of arbitrary data (using a degree of 1).

  • np.cumsum(a) - changes (1,2,3,4) to (1,3,6,10). Cumulative sum. have same length.

  • np.convolve(a,b) - a is longer than b (or they’re auto switched). This is complicated, but I think here it is a function C(t) where t is (time but whatever) an offset for the values of two functions. So at t=0, a and b are checked in the same place and the values are multiplied where they align, the products are summed, and that’s the value returned at C(0). At t=10, the a function is sampled and the b is taken from 10 units ahead (or behind?). A product is found, everything is summed, and that’s C(10). "Same" means size. "Valid" is only returning the full overlapping region.

    np.convolve(np.ones(5),np.ones(5))
    ([ 1.,  2.,  3.,  4.,  5., 4.,  3.,  2.,  1.])
    np.convolve(np.ones(5),np.ones(5),mode="same")
    ([ 3.,  4., 5.,  4.,  3.])
    np.convolve(np.ones(3),np.ones(2),mode="valid")
    ([ 2., 2.])

    Here’s an example demonstrating that it’s a type of sum of products function. Imagine three $100 purchases in different states.

    tax,cost=np.array([1.08,1.03,1.04]),np.array([100,100,100])

    The total spent can be computed like this.

    np.convolve(cost,tax,mode="valid")
    array([ 315.])

    See np.polmul() too. If you dare.

  • np.clip(a,min,max) - Ensures no element in a is less than min nor greater than max, resetting values to min and max as needed.

  • np.sort(a) - Sorts in place. Seems to return nothing. Use axis where needed.

  • np.flip(a,axis) - flipup is same with axis=0, fliplr is same with axis=1.

  • np.flipup(a) - Flips the array up for down (mirrors on a horizontal axis).

  • np.fliplr(a) - Flips the array left for right (mirrors on a vertical axis).

  • np.rot90(a) - Rotates matrix values. Seems CCW. np.rot90( np.arange(4).reshape(2,2) ) = array([ [1, 3], [0, 2] ])

  • np.copy(a) - Deep copy?

  • np.transpose(a) - or a.T, transpose - makes 3x2 into 2x3.

  • np.ravel(a) - flattens. Not to be confused with tensorflow.contrib.layers.flatten.

  • np.reshape(a,(newx,newy)) - rearranges dimensions but keeps data. This example makes 3 sets of 2. np.arange(6).reshape(3,2)array([[0,1],[2,3],[4,5]]) One dimension argument can be -1 which says to arrange the data into the minimum number of dimensions compatible with any other specified dimension. Useful for unwrapping x,y images into vectors, e.g. np.array([[1,2],[3,4]]).reshape(-1)array([1,2,3,4]), same as reshape(4) but for when you don’t know the 4.

  • np.resize(a,(newx,newy)) - adds (recycled?) data if needed to pad things.

  • np.mgrid= Fills multi dimensional arrays with puzzling sequences related to the arrays' dimensions. "dense mesh grid" np.mgrid[:2,:2] = array([ [ [0, 0], [1, 1] ], [ [0, 1], [0, 1] ] ]) Use with transpose to get coords for a grid pattern.

  • np.ogrid= Similar to mgrid but even weirder. "open mesh grid" np.ogrid[:2,:2] - [array([ [0], [1] ]), array([ [0, 1] ])]

  • np.unique(a) - removes duplicate items.

  • np.append(a,b) - Almost identical to concatenate but with syntax differences. Note that you don’t append [ [*] [*] [*] ] with a [*]. You need a [ [*] ]. See Growing Arrays below.

  • np.insert(a,pos,item) - inserts item at position of array a.

  • np.delete(a,[n]) - delete item n from array a. Not in place!

  • np.concatenate( (a,b) ) - [1,2,3] and [4,5] become [1,2,3,4,5].

  • np.c_[a,b] - stack by columns

  • np.column_stack( (a,b) ) - seems the same as np.c_

  • np.r_[a,b] - very similar to concatenate for some simple arrays. The r is for stacking by rows.

  • np.vstack( (a,b) ) - vertical stack. If a and b have shape (3,2) then (6,2) results. np.vstack( (np.arange(3),np.arange(3) ) ) produces array( [ [0, 1, 2], [0, 1, 2] ] ).

  • np.hstack( (a,b) ) - horizontal stack. If a and b have shape (3,2) then (3,4) results. np.hstack( (np.arange(3),np.arange(3) ) ) produces array([0, 1, 2, 0, 1, 2]).

  • np.hsplit(a,n) - makes a list of arrays broken as specified.

  • np.vsplit(a,n) - similar to hsplit but with different axis perspective.

  • np.dstack( (a,b) ) - if a and b’s shape is (3,2), this makes a shape (3,2,2). Imagine multiple 2d images now in an array (stack) indexable with another dimension.

  • np.where(x<128,x+1,-1) - replaces all instances of x where x is less than 128 with x+1 and the rest are set to -1.

Growing Arrays

The traditional idea with arrays is that you reserve the memory you need and that’s that. But sometimes you need to build an array up from smaller parts and it’s more convenient to increase its size than replace parts of it (e.g. you may not know the final size). This happened to me where I needed to read in a sequence of images and store the whole collection as an array (holding each image) of an array (holding each image’s row) of an array (holding each row’s column) of an array (holding each pixel’s RGB). Assume a collection of three 2x2 grayscale images.

ims= np.reshape(np.random.random(12),(3,2,2)) * 255
ims= ims.astype(np.uint8)
array([[[ 99,   5], [137,  73]], [[145, 124], [ 14,  36]],
       [[183,  78], [ 88,  82]]     ], dtype=uint8)

Now suppose you have a new image that you want to add.

i= (np.reshape(np.random.random(4),(2,2)) * 255).astype(np.uint8)
array([[155, 237], [160,  27]], dtype=uint8)

You might think that having something like [*] would be what you need to add to something like [ [*] [*] [*] [*] ] but in fact, you need something like [ [*] ]. So here’s what works.

i.shape
(2, 2)
i= i.reshape(1,2,2)
i.shape
(1, 2, 2)
np.append(ims,i,axis=0) # Not in place!
array([[[ 99,   5], [137,  73]], [[145, 124], [ 14,  36]],
       [[183,  78], [ 88,  82]], [[155, 237], [160,  27]]], dtype=uint8)
np.vstack((ims,i)) # Does the same thing. Note extra paren.
np.concatenate((ims,i)) # Does the same thing.
np.r_[ims,i] # Unbelievably, same thing.

Sorting

Python sorting used to be kind of tricky since the sort function was something that was attached to a list object and sorted in place. That is still true. For example:

>>> a=[3,4,1,2,0]
>>> a.sort()
>>> a
[0, 1, 2, 3, 4]

This caused so much confusion that a new function was added to return a sorted version of the original list. This produces a new list and leaves the original one alone.

>>> a=[3,4,1,2,0]
>>> sorted(a)
[0, 1, 2, 3, 4]
>>> a
[3, 4, 1, 2, 0]

Complex Object Sorting

There are many fancy ways of sorting things. Often you have a list of lists and you want to sort by some item in the list. Here’s a list of tuples representing (model_number,score) which need to be sorted so that the top 5 scoring models are displayed.

for top5 in sorted(score_list,key=lambda x:x[1],reverse=True)[0:5]:
    print('#{}={:.3f}'.format(top5[0],top5[1]))

Strangely I haven’t found a cleaner way to do this. Here’s another more complicated example of a two level sort.

m=[ ['Cho Oyu',8188,1954], ['Everest',8848,1953], ['Kangchenjunga',8586,1955],
['K2',8611,1954], ['Lhotse',8516,1956], ['Makalu',8485,1955] ]
ms2= sorted(m, key=lambda x:x[1], reverse=True ) # Secondary key
ms1= sorted(ms2, key=lambda x:x[2]) # Primary key

Here the result ms1 is sorted by date of ascent (earliest first) and then, if that is the same, by mountain height (highest first). The results look like this:

[['Everest', 8848, 1953], ['K2', 8611, 1954], ['Cho Oyu', 8188, 1954],
['Kangchenjunga', 8586, 1955], ['Makalu', 8485, 1955], ['Lhotse', 8516, 1956]]

Graphics

There are many options for getting Python to draw arbitrary things graphically.

Table 2. Python Modules Useful For Creating Graphics
Tool Import Package 1

Tkinter

Tkinter

python-tk

De facto standard.

pyCairo

cairo

python-cairo 2

Not the easiest to use.

PyX

pyx

python-pyx

Specializes in PostScript.

pyglet

pyglet

python-pyglet

pygame

pygame

python-pygame

wxPython

wx

python-wxtools

Major window took kit.

PyQt

qt

python-qt3

Major window tool kit.

Pyside

qt

libpyside-dev

Another QT binding lib.

PyGTK

gtk

python-gtk2

Major window tool kit.

PIL

PIL

python-imaging

Format filters mostly.

1. On Ubuntu 12.04.

2. Already installed on Ubuntu and CentOS.

I tend to often just write directly into PostScript.

Tkinter

Although Tkinter is not installed by default on many Linux systems, the rumor is that it is included with Python on other platforms. It is the official graphics toolkit for Python and is blessed by the language maintainers. If you just need to open a window on your screen and draw some stuff, say to plot some data, it is probably the easiest option (well, besides simple SVG). Here is a working example that does the minimum useful thing:

from Tkinter import *
c= Canvas(bg='white', height=1000, width=1000)
c.pack()
c.create_line(100,100,200,200)  # X1,Y1,X2,Y2

Plotting Data Visualization Graphs

If you need to "graph" some data, Python can help. The main technique is to use matplotlib. Although a bit overly fancy and likely to spontaneously burst into a GUI, it is powerful and, in some modes, easy:

from pylab import *
x= [1,2,3]; y= [1,4,9]
plot(x,y)
# show()   # Use this for interactive goofing off.
savefig('./filename.png')

For more details, check out my complete notes on matplotlib.

Also, check out Pychart.

Plotting Graph Theory Graphs

See pydot which is the Python interface to the mighty Graphviz package.

Command Line Parsing

Best to use argparse.

getopt

The original way to parse options draws stylistic inspiration from the C version. Many languages (Bash, Perl) have such a thing and if you’re used to one of them, the Python version won’t be too complicated.

getopt_example.py
#!/usr/bin/python
# An example of how to parse options with the 'getopt' module.
import sys
import getopt

# Initialize help messages
options=           'Options:\n'
options= options + '  -a <alpha>   Set option alpha to a string. Default is "two".\n'
options= options + '  -b <beta>    Set option beta to a number. Default is 1.\n'
options= options + '  -h           Show this help.\n'
options= options + '  -v           Show current version.'
usage = 'Usage: %s [options] arguments\n' % sys.argv[0]
usage = usage + options

# Initialize defaults
alpha= "one"
beta= 2
version="v0.0-pre-alpha"

# Parse options
try:
    (opts, args) = getopt.getopt(sys.argv[1:], 'ha:b:v', ['help','alpha=','beta=','version'])
except getopt.error, why:
    print('getopt error: %s\n%s' % (why, usage))
    sys.exit(-1)

try:
    for opt in opts:
        if opt[0] == '-h' or opt[0] == '--help':
            print(usage)
            sys.exit(0)
        if opt[0] == '-a' or opt[0] == '--alpha':
            alpha= opt[1]
        if opt[0] == '-b' or opt[0] == '--beta':
            beta= int(opt[1])
        if opt[0] == '-v' or opt[0] == '--version':
            print('%s %s' % (sys.argv[0], version))
            sys.exit(0)
except ValueError, why:
    print('Bad parameter \'%s\' for option %s: %s\n%s' % (opt[1], opt[0], why, usage))
    sys.exit(-1)

if len(args) < 1:
    print('Insufficient number of arguments supplied\n%s' % usage)
    sys.exit(-1)

print('alpha=%s beta=%s' % (alpha, beta))
for (n,a) in enumerate(args):
    print('Argument %d: %s' % (n,a))

argparse

There is a module called optparse which has been deprecated since Python version 2.7. In its place is the newer and pretty awesome argparse module. Official documentation is here. If you’re using an ancient system, check to see if it’s available but these days (e.g. Python 3) it always is.

These are the main steps to using this module.

  • Import module.

  • Define a parser object.

  • Add arguments to the parser object.

  • Parse the parser object.

  • Use the parsed result.

When defining a parser object, you can use the following optional parameters.

  • description= - Shows up in automatically composed usage message.

  • prog= - Usage’s executable name instead of inferring it from argv[0].

  • epilog= - Text at end of usage message.

When adding arguments you want the parser to look out for, start with either a name of the positional argument you want or a list of option strings. Then you can add some of these optional parameters to get the exact behavior you want.

  • name or flags - Either a name or a list of option strings, e.g. foo or -f, --foo.

  • action - The basic type of action to be taken when this argument is encountered at the command line.

    • store - The default action is to store the argument’s value.

    • store_const

    • store_true

    • store_false

    • append

    • append_const

    • count - Useful for things like -vvv verbose levels.

    • help - Usually automatic with -h

    • version - Needs a version= keyword too.

    • extend - For accumulating multiple option instances. (e.g. -f file1 -x -f file2)

  • nargs - The number of command-line arguments that should be consumed.

    • N - Exact number of option arguments. Note that nargs=1 makes a list of one item.

    • ? - One or zero items (in which case default is used).

    • * - All arguments are put into a list. List can be empty.

    • + - Same as * but with an error for none. Will even greedily pull from a previous optional argument if that avoids an error.

    • argparse.REMAINDER - All remaining arguments put in a list.

  • const - A constant value required by some action and nargs selections.

  • default - The value produced if the argument is absent from the command line.

  • type - The type to which the command-line argument should be converted.

  • choices - A container of the allowable values for the argument.

  • required - Whether or not the command-line option may be omitted (optionals only).

  • help - A brief description of what the argument does.

  • metavar - A name for the option argument in usage messages. So metavar="run" produces --x run instead of the default --x X.

  • dest - The name of the attribute to be added to the object returned by parse_args(). I feel like this one is very useful to properly organize variable names.

argtest.py
#!/usr/bin/python3
import argparse
parser= argparse.ArgumentParser(prog="xyz",description="A demo of argparse.",epilog="Final notes.")
parser.add_argument('-e', '--easy',action="store_true") # Optional argument.
parser.add_argument('-x','--normal-value',type=int,metavar="X1",default=10,help="A number.",dest="norm_num")
parser.add_argument('A') # Positional argument. Required (because storing the arg is default, and must exist).
parser.add_argument('B',nargs="?") # Positional argument. Not required.
p= parser.parse_args()
print([p.easy, p.norm_num, p.A, p.B])

Here’s how that can be used. Note that the usage program name shows up as`xyz` and not argtest.py which is how it was really run; this is thanks to the prog= parameter when defining the parser.

$ ./argtest.py --help
usage: xyz [-h] [-e] [-x X1] [A] [B]

A demo of argparse.

positional arguments:
  A
  B

optional arguments:
  -h, --help            show this help message and exit
  -e, --easy
  -x X1, --normal-value X1
                        A number.

Final notes.
$ ./argtest.py
[False, 10, None, None]
$ ./argtest.py red
[False, 10, 'red', None]
$ ./argtest.py red blue
[False, 10, 'red', 'blue']
$ ./argtest.py -e red blue
[True, 10, 'red', 'blue']
$ ./argtest.py -e -x 99 red blue
[True, 99, 'red', 'blue']

Here’s another example showing how to use the parse object as a global variable neatly containing all the user’s preferences. All this stuff is appropriate for a global variable since the sys.argv input itself is global to any executed process.

My Typical Argument Parsing Design
Args= None # Global argument object containing user preferences.
def parse_options(): # Function to isolate this option parsing stuff.
    import argparse # Might as well import this here in case this never gets called.
    parser= argparse.ArgumentParser(description='sampleprog - Shows off argparse.')
    parser.add_argument('-q','--quiet',default=True,dest="VERBOSE",action="store_false",
        help='Suppress printing of published readings on stdout.') # Invert for normally quiet.
    parser.add_argument('-d','--debug',type=float,default=0,dest="DEBUG",
        help='Debug features. Look into `action="count"` too.')
    parser.add_argument('-R', '--red-dev',type=int,default=10,dest="RED_DEV",metavar="ID",
        help= 'the red device [10].')
    parser.add_argument('-H', '--hold-value',type=int,default=1,dest="HOLD_VALUE",metavar="VAL",
        help= 'Hold value. Careful not to conflict with "h" for "help".')
    parser.add_argument('--messy',type=str,default='/dev/shm/messy',dest="MESSY",
        help=argparse.SUPPRESS) # A hidden global option that the user doesn't usually care about.
    parser.add_argument('filelist',type=str,nargs="*",default=[],metavar="INPUT",
        help='Optional list of files or - for standard input. Empty also reads standard input.')
    return parser.parse_args()
def Do_The_Main_Thing():
    # Which can now use things like Args.DEBUG, Args.MESSY, etc.
    for l in fileinput.input(Args.filelist):
        pass # Work on each line of all files and standard input.
if __name__ == '__main__':
    Args= parse_options()
    Do_The_Main_Thing()

This technique also allows the user to read in N files, or standard input in the case of none, while keeping the ability to parse complex options.

Here’s another example of how to use it. This example should be pretty much functionally equivalent to the getopt example above.

argparse_example.py
#!/usr/bin/python
import argparse

parser = argparse.ArgumentParser(description='A demonstration of argparse.')

parser.add_argument('-a', '--alpha', default='one', help= 'Set option alpha to a string.')
parser.add_argument('-b', '--beta', default=2, type=int, choices=[0,1], help= 'Set option beta to a binary digit.')
parser.add_argument('-v', '--version', action='version', help= 'Print the version.', version="v0.0-pre-alpha")
parser.add_argument('the_rest', metavar='file', type=str, nargs='+', help='One or more filenames.')

args= parser.parse_args()

print('alpha=%s beta=%d' % (args.alpha, args.beta))
print('Specified files: %s' % ', '.join(args.the_rest))

Argparse is powerful and can do weird things too. Here’s a stranger case where I needed two classes of arguments with unknown quantities. One or more files needs to be supplied for each type of file.

parser= argparse.ArgumentParser(description='Vehicles and non-vehicles.')
parser.add_argument('-V','--vehicle',dest='V',required=True,
                    nargs='+',metavar="Vlist", type=str,
                    help='CSV list of vehicle directories')
parser.add_argument('-N','--nonvehicle',dest='N',required=True,
                    nargs='+',metavar="NVlist", type=str,
                    help='CSV list of non-vehicle directories')
args= parser.parse_args()

Run with something like this.

./vehicle_classify.py -V ../data/vehicles/v? -N ../data/non-vehicles/nv?

This produces something like this for args.V and args.N respectively.

['../data/vehicles/v1', '../data/vehicles/v2', '../data/vehicles/v3',
'../data/vehicles/v4', '../data/vehicles/v5']
['../data/non-vehicles/nv1', '../data/non-vehicles/nv2']

Sometimes your program is not misbehaving but running just fine as far as arguments go but some processing in your code suggests that the user is an idiot who needs to read the instructions. How can you immediately generate the automatically generated usage message?

parser.print_help()

And here’s another tip when you’re writing instructive descriptions and argparse overly helpfully removes formatting. Here’s how to cure that.

usage= """This is a multi-line description.

   ./example.py [options]

This will not all be jumbled together if you use the following trick.
"""
parser= argparse.ArgumentParser(description=usage,formatter_class=argparse.RawTextHelpFormatter)

Web Programming

Python is one of the premier languages for web-based programs. Here are some helpful techniques for web projects.

Simple Web Client

I often use wget — Apple people like curl. Python unsurprisingly has a perfectly good way to simple web client downloads.

import os
import urllib.request
URL,FILE= 'http://xed.ch/h/python.txt','/tmp/pyhelp'
if not os.path.isfile(FILE):
    print(f'Downloading {URL} and saving as {FILE}...')
        urllib.request.urlretrieve(URL, FILE)

cgitb Module

One of the best reasons to use Python for web projects is the cgitb module. This stands for CGI TraceBack and is a diagnostic tool to help you understand what might be going wrong with your Python script run over the web. The nice thing is that this is super easy to use and super useful when activated. Here’s an example showing how to use it (simply import and enable it) and some faulty code which takes advantage of it:

cgitb Example
#!/usr/bin/python
import cgitb
cgitb.enable()
idontexist()

Putting this in a cgi-bin directory and typing its URL in a browser produces this very cool diagnostic (which in this case correctly notices that the function idontexist does not exist):

--> --> -->
 
 
<type 'exceptions.NameError'>
Python 2.7.2: /usr/bin/python2.7
Sat Jun 30 12:50:21 2012

A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred.

 /var/www/fs/users/xed/cgi-bin/cgitest.py in ()
      2 import cgitb
      3 cgitb.enable()
=>    4 idontexist()
      5 
      6 #content_type= 'Content-type: text/html\n\n'
idontexist undefined

<type 'exceptions.NameError'>: name 'idontexist' is not defined
      args = ("name 'idontexist' is not defined",)
      message = "name 'idontexist' is not defined"

If you’re looking at the output without HTML rendering, you’ll also notice that this is tacked on to the previous HTML message for maximum intelligent utility:

<!-- The above is a description of an error in a Python program, formatted
     for a Web browser because the 'cgitb' module was enabled.  In case you
     are not reading this in a Web browser, here is the original traceback:

Traceback (most recent call last):
  File "/var/www/fs/users/xed/cgi-bin/cgitest.py", line 4, in <module>
    idontexist()
NameError: name 'idontexist' is not defined
-->

Note that this works on any Python program run over the web, not just ones that use CGI per se. It is advisable to comment out the enable line when your program is served live to the public to avoid any leaking of sensitive information such as how your code works. But other than that, use this early and often.

Content Type

Before generating any HTML, every web program will most likely need to send back the HTTP content type. It’s often useful to make a global variable of it.

Content Type Global Variable
content_type= 'Content-type: text/html\n\n'

HTML Generation

I personally hate Python code that is filled with HTML. HTML should be in HTML documents and Python should be programming. But sometimes they mix annoyingly. This throws off syntax highlighting and the wholesome goodness of Python’s formatting and style. Here is a technique I use in my Python code to completely obviate the need for any HTML.

This function can be imported into programs requiring the generation of HTML. It allows you to not put HTML in python code. It’s easier to type, easier to think about, and it doesn’t break syntax highlighting. When run as a standalone program, it prints a complete HTML document as a demonstration.

html_tagger.py
#!/usr/bin/python
def tag(tag, contents=None, attlist=None):
    """No HTML in my programs! This function functionalizes HTML tags.
    Example: tag('a','click here', {'href':'http://www.xed.ch'})
    Produces: <a href="http://www.xed.ch">click here</a>
     Param1= name of tag (table, img, body, etc)
     Param2= contents of tag <tag>This text</tag>
     Param3= dictionary of attributes {'alt':'[bullet]','height':'100'}
    """
    tagstring= "<"+tag
    if attlist:
        for A in attlist:
            V= attlist[A].replace('"','&quot;')
            attstring= ' '+A+'="'+V+'"'
            tagstring += attstring
    if contents:
        tagstring += ">\n"+contents.rstrip()+"\n</"+tag+">\n"
    else:
        tagstring += "/>\n"
    return tagstring

if __name__ == '__main__':
     Title= tag('head', tag('title', "A Test"))
     Text= tag('body', tag('p', "No html here. Just sensible code."))
     print(tag('html', Title + Text))
Output of html_tagger.py test routine
<html>
<head>
<title>
A Test
</title>
</head>
<body>
<p>
No html here. Just sensible code.
</p>
</body>
</html>

Web Programming Environment

The technique above is useful for generating web-based output. To process web sourced input, the cgi module is helpful. This module is very helpful but it is not magical. I think the most helpful way to illustrate what it does is to not use it and see what that looks like.

Assuming the helpers such as the tag() function as defined as above are in place, the following code is very illustrative:

Print CGI Program’s Entire Environment
#!/usr/bin/python
import os
vars= ''.join([tag('dt',k)+tag('dd',os.environ[k]) for k in sorted(os.environ.keys())])
print(content_type + (tag('html',tag('body',tag('dl',vars)))))
Note
Now that you see how to do it yourself, don’t forget about import cgi; cgi.test() which when run as a single line program over a web interface produces similar and somewhat more comprehensive data about what’s going on.

When run you get a list of environment variables that your CGI program knows about. This sample list may or may not include some of the following you would see:

DOCUMENT_ROOT

/var/www

GATEWAY_INTERFACE

CGI/1.1

HTTP_ACCEPT

text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

HTTP_ACCEPT_CHARSET

ISO-8859-1,utf-8;q=0.7,*;q=0.3

HTTP_ACCEPT_ENCODING

gzip,deflate,sdch

HTTP_ACCEPT_LANGUAGE

en-US,en;q=0.8,de;q=0.6,es;q=0.4

HTTP_CONNECTION

Keep-Alive

HTTP_COOKIE

v1=keyvaluepairs;v2=ofany;v3=cookiesthat;v4=yourbrowser;v5=offersthisdomain

HTTP_HOST

xed.ch

HTTP_USER_AGENT

Wget/ (linux-gnu)

PATH

/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin

QUERY_STRING

a=a&b=simple&c=test

REMOTE_ADDR

192.168.0.10

REMOTE_PORT

48731

REQUEST_METHOD

GET

REQUEST_URI

/~xed/cgi-bin/cgitest.py?a=a&b=simple&c=test

SCRIPT_FILENAME

/var/www/fs/users/xed/cgi-bin/cgitest.py

SCRIPT_NAME

/~xed/cgi-bin/cgitest.py

SERVER_ADDR

192.168.0.99

SERVER_ADMIN

wwwadmin@xed.ch

SERVER_NAME

www.xed.ch

SERVER_PORT

80

SERVER_PROTOCOL

HTTP/1.1

SERVER_SIGNATURE

Apache Server at www.xed.ch Port 80

SERVER_SOFTWARE

Apache

UNIQUE_ID

T@9vm6nkPz0AFBbDuW0FAFAM

Plus any special variables your web server sets using Apache’s SetEnv directive will also be present.

Obviously if your program can print this stuff out, you have quite a bit of control over what is going on. This particular little program is quite useful to track down problems with path and environment issues as well as debugging more complicated or annoying details such as user agent settings for stupid web sites.

Two important variables to note for CGI programming are REQUEST_URI and QUERY_STRING. The first contains the entire URL used to effect this response while the second contains just the part intended to serve as input for this program. You can parse this directly yourself and for very simple applications, I think it is reasonable to do so.

When the number and complexity of the variables your program wishes to define from the QUERY_STRING becomes more involved, then it is sensible to use the cgi module. The point of showing how things would work without it is to illustrate that it’s not absolutely critical (and sometimes not even especially helpful) to use it.

This exercise also indicates how one might test CGI programs without using a web server at all. Since all that is really going on is that the web server is simply setting some variables, you can explicitly set them on the command line to test things. Here’s an example:

$ QUERY_STRING='a=a&b=simple&c=test' mycgiprogram.cgi

cgi Module

Here is an example of a complete form processing program showing many different kinds of form elements. This program shows a form and if submitted shows the data submitted and a new form to repeat the process.

Full CGI Form Example
#!/usr/bin/python
import cgi
from html_tagger import tag

content_type= 'Content-type: text/html\n\n'
br= '<br/>'

def generate_form():
    f= list()
    f.append( 'Username:' + tag('input', '', {'type':'text', 'name':'uid'}) +br )
    f.append( 'Password:' + tag('input', '', {'type':'password', 'name':'pwd'}) +br )
    f.append( tag('input',None, {'type':'radio','name':'hyp','value':'1'}) + 'True' )
    f.append( tag('input',None, {'type':'radio','name':'hyp','value':'0'}) + 'False' +br )
    f.append( tag('input',None, {'type':'checkbox','name':'metal','value':'cu'}) + 'copper' )
    f.append( tag('input',None, {'type':'checkbox','name':'metal','value':'fe'}) + 'iron' +br )
    f.append( tag('select',
                       tag('option','chromium',{'value':'cr'})+
                       tag('option','manganese',{'value':'mn'})+
                       tag('option','nickel',{'value':'ni'})+
                       tag('option','zinc',{'value':'zn'})
                   ,{'name':'alloy'}) +br )
    f.append( tag('textarea','Edit this text!',{'rows':'5','columns':'40','name':'essay'}) +br )
    f.append( tag('input',None,{'type':'submit','value':'Do This Form'}) )
    return tag('form', ''.join(f),
                 {'name':'input','action':'./cgitest.py','method':'get'})

def display_data(myf):
    c=   tag('tr',tag('td',"Name:")+tag('td', myf["uid"].value))
    c += tag('tr',tag('td',"Password:")+tag('td', myf["pwd"].value))
    c += tag('tr',tag('td',"Hypothesis:")+tag('td', myf["hyp"].value))
    c += tag('tr',tag('td',"Metal:")+tag('td', ','.join(myf.getlist('metal'))))
    c += tag('tr',tag('td',"Alloy:")+tag('td', ','.join(myf.getlist('alloy'))))
    c += tag('tr',tag('td',"Essay:")+tag('td', ','.join(myf.getlist('essay'))))
    return tag('table',c,{'border':'1'})

form= cgi.FieldStorage()
if 'uid' not in form or 'pwd' not in form or 'hyp' not in form:
    content= tag('h4','A Form') + "Please fill in the user and password fields." + generate_form()
else:
    content= display_data(form) + br + generate_form()

print(content_type)
print(tag('html',tag('body',content)))
Note
For composing HTML output this program uses the tag function defined above. Also, include cgitb as described above if there are problems you wish to debug.

The output of this CGI programming example is the following:

A Form

Please fill in the user and password fields.
Username:
Password:
True False
copper iron


Note
If you’re seeing this in a web browser, it will look functional, but obviously it’s not. It’s just the HTML that the previous program generated (minus html and body tags).

One Program Executable On The Command Line And Over The Web

Here’s a technique I’ve used for programs that I want to work with a text menu at the console and also to automatically support a web interface when run remotely from a web browser.

if __name__ == "__main__":
    if os.getuid() == 48: # apache:x:48:48:Apache:/var/www:/sbin/nologin
        html_version()
    else:
        while True:
            text_version_menu()
Note
There may be better indicators of whether we’re coming from a browser or not. See the cgi.test() above for possibilities. Perhaps REQUEST_URI.

Upload A File

Here’s a short routine that does nothing but allow one to upload a file to the server it’s run on. I found this handy to allow me to simply upload photos off my stupid Android phone to my own server. It nicely demonstrates how to handle POST methods and file uploads using the cgi module.

up.py
#!/usr/bin/python
import os
import cgi
from html_tagger import tag

content_type= 'Content-type: text/html\n\n'
form = cgi.FieldStorage()

if not form:
    acturl= "./up.py"
    ff= tag('input','',{'type':'file','name':'filename'}) + tag('input','',{'type':'submit'})
    f= tag('form',ff, {'action':acturl, 'method':'POST', 'enctype':'multipart/form-data'})
    H= tag('head', tag('title', "Uploader"))
    B= tag('body', tag('p', f))
    print(content_type + tag('html', H + B))
#elif form.has_key("filename"):
elif 'filename' in form:
    item= form["filename"]
    if item.file:
        data= item.file.read()
        t= os.path.basename(item.filename)
        FILE= open("/home/xed/www/up/"+t,'w')
        FILE.write(data)
        FILE.close()
        msg= "Success! "
    else:
        msg= "Fail."

    H= tag('head', tag('title', "Uploader"))
    B= tag('body', tag('p', msg + tag('a','Another?',{'href':'./up.py'})))
    print(content_type + tag('html', H + B))
Note
The html tagging function defined above is assumed here.
Warning
This program would be best limited to personal use and is not especially secure.

Output Non-Text

Often you want your CGI program to not just compose some HTML for your web clients, you also want some custom graphics. For example, if you want to show a plot of something that is very up to date. The naive way to do this is to have the program generate a plot file and store it somewhere and then send out HTML that can find it. But this leads to bad guys filling up your drive with such plots. Better to never have the plot stick around.

Here is an example of how to have a plot dynamically sent out to a web client.

#!/usr/bin/python
import sys
import matplotlib.pyplot as plt
print("Content-type: image/png\n")
plt.plot([1,2,4])
plt.savefig(sys.stdout,format='png')

Then on the client I can do this.

$ wget -qO test.png http://xed.ch/cgi-bin/mkpng.py
$ identify test.png
test.png PNG 800x600 800x600+0+0 8-bit sRGB 19.1KB 0.000u 0:00.000

This means that you can use the following in any subsequent HTML code that features the image.

<img src="http://xed.ch/cgi-bin/mkpng.py">

Of course the program that generates the image could be the same that generates the HTML where the program chooses which part to generate based on GCI variables.

HTTP Server

Obviously Python serves web pages. Of course it does. And guides spacecraft and composes hit songs. Python does everything! But I try to strongly avoid import mysteriousmagicmodule. Amazingly, Python’s ability to run an HTTP server is not a fancy external module — it is now a core part of Python! The module is http.server. Check out the official documentation.

In the past, Python was a perfectly good way to do things over the web — you simply let your web server know which files were CGI scripts. But now, you can do all of that as before, but you don’t need the web server. So is Apache configured properly? Who cares? Skip it entirely!

Here’s a demonstration program that listens to port 8888 for web connections and adjusts a variable. Again I’m assuming that my HTML tag() function — described previously — is present. This program will allow you to control something over a web interface.

from http.server import HTTPServer, BaseHTTPRequestHandler
V= 0 # Global value of interest that this web interface exists to adjust.
URL,PORT= 'http://192.168.1.251',8888
def form_response(): # == Create An HTML Form Response
    hp= tag('input',None,{'type':'hidden','name':'r','value':'-.5'}) # Note: Submit actions can't have CGI parameters!
    hs= tag('input',None,{'type':'hidden','name':'r','value':'.5'})  #       Hidden form elements send such intent.
    bp= tag('input',None,{'style':'width:200px;height:50px;font-size:40px;background:red','type':'submit','value':'Left'})
    bs= tag('input',None,{'style':'width:200px;height:50px;font-size:40px;background:green','type':'submit','value':'Right'})
    fp= tag('form', hp+bp, {'style':'display:inline;','name':'input','action':f'{URL}:{PORT}','method':'get'})
    fs= tag('form', hs+bs, {'style':'display:inline;','name':'input','action':f'{URL}:{PORT}','method':'get'})
    stats= tag('div',f'v:{V}',{'style':'font-size:40px'})
    phonefix= '<meta name="viewport" content="width=device-width, initial-scale=1.0">'
    return tag('html',tag('body',phonefix+fp+fs+stats)).encode('ascii')
class SimpleHTTPRequestHandler(BaseHTTPRequestHandler): # == Handle HTTP Requests
    def do_GET(self):
        global V
        self.send_response(200) # Tell client a good status code.
        # `self.requestline` produces something like: 'GET / HTTP/1.1' or 'GET /?r=3 HTTP/1.1'
        cgistuff= self.requestline.split(' ')[1] # Custom input handling; using `import cgi` is sane too.
        if "r=" in cgistuff[2:]: # r is the CGI variable in the URL and hidden form elements.
            V+= float(cgistuff[4:]) # Adjust the vaule of interest.
        self.send_header('Content-type','text/html') # With the HTTPServer, better to not DIY.
        self.end_headers() # Probably whatever weird double returns are needed after headers.
        self.wfile.write(form_response()) # Send this program's actual content.
httpd= HTTPServer(('',PORT),SimpleHTTPRequestHandler) # First arg can be address to listen on.
httpd.serve_forever() # == Start The Server

When you run this program it will sit there, waiting for connection attempts — printing to the console as they occur.

Then a browser will see something like this.

$ wget -qO- 127.0.0.1:8888
<html>
<body>
<meta name="viewport" content="width=device-width, initial-scale=1.0"><form style="display:inline;" name="input" action="http://192.168.1.251:8888" method="get">
<input type="hidden" name="r" value="-.5"/>
<input style="width:200px;height:50px;font-size:40px;background:red" type="submit" value="Left"/>
</form>
<form style="display:inline;" name="input" action="http://192.168.1.251:8888" method="get">
<input type="hidden" name="r" value=".5"/>
<input style="width:200px;height:50px;font-size:40px;background:green" type="submit" value="Right"/>
</form>
<div style="font-size:40px">
v:12.5
</div>
</body>
</html>

This will be two big (submit form) buttons for raising and lowering the value of the variable the program is interested in. They’re labeled Left/Right because I used something like this to make a robotic control that I could steer with my phone. You can also just do a request for something like http://127.0.0.1:8888?r=-5 which will subtract 5 from the important value.

Beyond Python

Tools to help Python be even more awesome than it normally is:

  • Jython - Run Python in the theoretically ubiquitous and annoyingly powerful JVM.

  • PyPy - Or use a different implementation with its own JIT compiler. Might use less memory.

  • Nuitka is a straight up Python compiler.

  • SWIG - Wrap C code into Python modules.

Socket Programming

Creating client/server connections with internet sockets is pretty easy with Python. A good example of a full practical socket server is my ISBD server. Here is a generic TCP server that covers the main functionality one would require from the network in order to implement something like a web server.

Python Socket Server
#!/usr/bin/python
# A Sample TCP server demonstrating simple socket programming.
# This simply echoes what is sent back to the client.

# Usage: Run this program and then connect with
#     echo "The Message" | nc localhost 6660
# What PID is listening?
#     lsof -i :6660

# Official Socket Documentation -
#    * https://docs.python.org/2/library/socket.html
# Notes about backlog parameter of `listen()` function.
#    * http://irrlab.com/2015/03/02/how-tcp-backlog-works-in-linux/
#    * https://veithen.github.io/2014/01/01/how-tcp-backlog-works-in-linux.html

import socket
import sys
from thread import *

HOST= '' # Server interface to bind to. Blank is `INADDR_ANY`.
PORT= 6660
BACKLOG= 200 # Max connections on accept queue. See notes.

# AF_INET is Address Family IPv4
# SOCK_STREAM is TCP protocol (SOCK_DGRAM for UDP)
s= socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print('Socket Creation OK')

# == Connection Handling ==
def servicethread(connection):
    READ_BYTES= 24
    connection.send('This is the server. Send something now.\n')
    while True:
        data= connection.recv(READ_BYTES)
        reply= 'You said: %s' % data
        if not data:
            break
        connection.sendall(reply)
    connection.close()

# == Binding ==
try:
    s.bind((HOST,PORT))
except socket.error as msg:
    print('ERROR: Bind failed! %s (error #%s)' % (msg[1],str(msg[0])))
    sys.exit()
print('Socket Binding OK')

# == Listening ==
s.listen(BACKLOG)
print('Socket Listening OK')

# == Handle Client Transactions ==
while True:
    conn,addr= s.accept()
    print('Connected to %s:%d' % addr)
    start_new_thread(servicethread, (conn,) )
s.close()

Running the program starts the server listening.

$ ./sockettest.py
Socket Creation OK
Socket Binding OK
Socket Listening OK

From another terminal (or another computer if you like) you can check up on it.

$ nmap localhost -p 6660 | sed -n /PORT/,+1p
PORT     STATE SERVICE
6660/tcp open  unknown

Using nmap has consequences for the server. Here are the server’s resulting messages.

Connected to 127.0.0.1:51783
Unhandled exception in thread started by <function servicethread at 0x7f62f336e5f0>
Traceback (most recent call last):
  File "./sockettest.py", line 32, in servicethread
    connection.send('This is the server. Type something now.\n')
socket.error: [Errno 104] Connection reset by peer

You could handle this error (when the client strangely aborts) more smoothly if you like but it does continue to function just fine.

Additional activity from the client, a classic netcat test, looks like this.

$ echo testmessage | nc localhost 6660
This is the server. Type something now.
You said: testmessage

Or back to Python, this is the simplest socket client.

Python Socket Client
import socket
con= ('isbdserver.example.edu',10800)
try:
    s= socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect(con)
    s.send(entire_message)
    s.close()
except socket.error as msg:
    log( 'ERROR: ISBD socket client problem! %s (error #%s)' % (msg[1],str(msg[0])) )

Here’s another example of the socket library used to trigger a Wake On LAN (WOL) feature. Note that it probably doesn’t work, but it’s a good starting point for further research into the topic.

Wake On LAN Example
#!/usr/bin/python
# Wake-On-LAN
# From Wikipedia: Magic packet
#  The magic packet is a broadcast frame containing anywhere within
#  its payload 6 bytes of 255 (all bits set to the on position) which
#  has the decimal representation of: 255 255 255 255 255 255 (also FF
#  FF FF FF FF FF in hexadecimal or 11111111 11111111 11111111
#  11111111 11111111 11111111 in binary), followed by sixteen
#  repetitions of the target computer's 48-bit MAC address. Since the
#  magic packet is only scanned for the string above, and not actually
#  parsed by a full protocol stack, it may be sent as a broadcast
#  packet of any network- and transport-layer protocol. It is
#  typically sent as a UDP datagram to port 0, 7 or 9, or, in former
#  times, as an IPX packet.

import struct, socket

def WakeOnLan(ethernet_address):
    # Construct a six-byte hardware address
    addr_byte = ethernet_address.split(':')
    hw_addr = struct.pack('BBBBBB',
        int(addr_byte[0], 16),
        int(addr_byte[1], 16),
        int(addr_byte[2], 16),
        int(addr_byte[3], 16),
        int(addr_byte[4], 16),
        int(addr_byte[5], 16))
    #print hw_addr
    # Build the Wake-On-LAN "Magic Packet"...
    msg = '\xff' * 6 + hw_addr * 16
    # ...and send it to the broadcast address using UDP
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
    s.sendto(msg, ('<broadcast>', 9))
    s.close()

WakeOnLan('e0:cb:4e:56:74:73')  # The MAC of the host to wake.

Importing Modules

Importing modules can sometimes be slightly tricky if what you need isn’t already taken care of for you. (Usually it is, which is why this isn’t knowledge you use everyday.)

Here is a way to use modules in another directory by setting the PYTHONPATH environment variable.

$ mkdir A B
$ echo "print('This is Python - dir B')" > B/mymodule.py
$ echo "import mymodule; print('This is Python - dir A')" > A/myprog.py
$ python A/myprog.py
Traceback (most recent call last):
  File "A/myprog.py", line 1, in <module>
    import mymodule; print('This is Python - dir A')
ImportError: No module named mymodule
$ PYTHONPATH=./B:$PYTHONPATH python A/myprog.py
This is Python - dir B
This is Python - dir A

Here is another method which works without requiring an intervention from the execution environment.

import sys
sys.path.append('/tmp/B')
import mymodule
print('This is Python - dir A')

The problem with this method is that relative paths don’t seem to work. You could have Python sort out the relative paths before importing. For example, although this is straying into ugly territory, this works.

import os
pwd= os.path.dirname(os.path.realpath(__file__))
parent= os.path.split(os.path.abspath(pwd))[0]
import sys
sys.path.append(parent+'/B')
import mymodule
print('This is Python - dir A')

If you just want to know if a module is available this is a way to do a quick check. This also shows you exactly where the module really lives which can by quite informative too.

$ python3 -c 'import caffe; print(caffe.__path__)'
['/usr/lib/python3/dist-packages/caffe']

Packaging And Distribution

Python packaging is a nightmare. This is mostly due to so many competing ways to do the job. I personally avoid the topic to the greatest extent possible. I usually rely on my Linux distributions to do the proper thing or I put things in the PYTHONPATH explicitly myself.

This enumeration of project summaries is depressing and helpful. This basic package install guide is wholesome and official if you have to go messing with packages outside your OS distribution. Here’s a popular Stack Overflow discussion about Python packaging. Here are some official best practices for packaging and installation.

Here are some details that might be interesting.

Ok, it’s not just me! Here’s XKCD calling it like it is! Wow. Too true.

XKCD-1987

Hopefully this gets Pythoneers working on this mess. To be fair though, Python is a victim of its deserved success.

distutils

Original Python packaging and distribution system. In the standard library for me (CentOS7).

Uses this.

python setup.py

Can be tar.gz.

from distutils.core import setup

setuptools

Third party system (not part of Python per se) built using distutils.

from setuptools import setup

Includes easy_install which is widely used.

Includes support for eggs which are a package format for distributing binary packages. This seems a bit mental since Python is good at compiling itself on the fly, but apparently waiting a minute for that to happen for monstrous overblown projects is too much for people. Somebody figured spending hours fussing with a fussy binary package was better.

sudo yum upgrade python-setuptools

distribute

A fork of setuptools and is actually called setuptools which isn’t confusing at all! Replaces an existing genuine setuptools if one is already present. Apparently has better support for v2 to v3 issues. Probably more traction than its ancestor. Also includes an easy_install.

Some people believe that this project has been merged back into the original setuptools. Here is evidence that this was intended. But it may be true. Rumor has it that Distribute was merged back into Setuptools 0.7. It is probably safe to ignore this now. Let’s just say that setuptools and distribute are very close relatives and maybe a case of dissociative identity disorder.

pip

This does not create packages. It is a system for downloading and installing Python distributions. This seems to replace easy_install. It can roll back a failed install attempt if it determines dependencies can’t be met. It can uninstall things. It does not use or install eggs. It doesn’t automatically update things so they won’t randomly break; apparently some of the other systems try to do that. Requires a packager (distutils,setuptools,distribute) because this is just an installer. The packages themselves are not its thing. Here is a justification for pip. Apparently this can try to compile things with a C compiler as part of package installation. This would of course be likely to fail on a bad OS like Windows. This might explain some bias for using easy_install on Windows. However, with wheel, these issues may be historical.

Normally used like this.

pip install <package>

It requires setuptools (which requires distutils).

This quote from the pip documentation about sums it up for me.

Be cautious if you’re using a Python install that’s managed by your operating system or another package manager. get-pip.py does not coordinate with those tools, and may leave your system in an inconsistent state.

And to contradict that.

sudo yum install python-pip

In early 2020 I got this to work easily with Debian.

sudo apt install python-pip
pip install --upgrade pip

This now seems to install pip3 as merely pip and brings along pip2 as the strange way to do things these days. If that still is having trouble (ImportError: No module named pip) try this:

sudo apt install python3-pip

Or install with get-pip.py which can try to not be so dependent with --no-wheel and --no-setuptools. Apparently pip should be included with a clean Python install from python.org (not Linux).

Once pip is happy, you can generally install things without too much fuss. For example.

pip install mingus

Sometimes you don’t want to "install" whatever it is but you do want the code. You can actually see all zillion packages available to pip by looking at this URL.

wget -qO- https://pypi.org/simple/ | grep href | wc -l
270894

Each package is a link and you can search for specific ones with grep. Here’s a demonstration with pymidi.

$ wget -qO- https://pypi.org/simple/ | grep pymidi
    <a href="/simple/ipymidicontrols/">ipymidicontrols</a>
    <a href="/simple/pymidi/">pymidi</a>

Then put https://pypi.org/simple/pymidi into the browser.

wheel

Installing pip will also install wheel which is a zip based archive (extension .whl) which is like an egg but with subtle differences. Apparently this is somewhat of a modern (2015+) thing.

Of the name, they say "because newegg was taken" and "a container of cheese".

sudo yum install python-wheel python3-wheel

distutils2

This topic is such a mess, why not scrap it all and start over with yet another attempt?

Does not use setup.py scripts. Instead it uses setup.cfg. Also uses the pysetup command which seems to try to replace pip.

If you see import packaging, that is synonymous with distutils2.

The latest release was March 2012 so this project is dead. Anything referring to it is hopelessly out of fashion.

Buildout

Buildout is yet another way to assemble and deploy complex Python applications. It may include non-Python components. Used by Zope, Plone, and Django. Nuff said.

Distlib

This is a new experimental thing (as late as October 2016) which, according to their docs is trying to do this.

Basically, Distlib contains the implementations of the packaging PEPs and other low-level features which relate to packaging, distribution and deployment of Python software. If Distlib can be made genuinely useful, then it is possible for third-party packaging tools to transition to using it. Their developers and users then benefit from standardised implementation of low-level functions, time saved by not having to reinvent wheels, and improved interoperability between tools.

Virtualenv

This is not a packaging or distribution system but it can be very important in solving related problems. Here is a good explanation of what this is from the documentation.

The basic problem being addressed is one of dependencies and versions, and indirectly permissions. Imagine you have an application that needs version 1 of LibFoo, but another application requires version 2. How can you use both these applications? If you install everything into /usr/lib/python2.7/site-packages (or whatever your platform’s standard location is), it’s easy to end up in a situation where you unintentionally upgrade an application that shouldn’t be upgraded.

Or more generally, what if you want to install an application and leave it be? If an application works, any change in its libraries or the versions of those libraries can break the application.

Also, what if you can’t install packages into the global site-packages directory? For instance, on a shared host.

In all these cases, virtualenv can help you. It creates an environment that has its own installation directories, that doesn’t share libraries with other virtualenv environments (and optionally doesn’t access the globally installed libraries either).

Install Virtualenv

CentOS

python-virtualenv.noarch

Tool to create isolated Python environments

python-virtualenv-clone.noarch

Script to clone virtualenvs

python-virtualenvwrapper.noarch

Enhancements to virtualenv

python-tox.noarch

Virtualenv-based automation of test activities

Debian

Install as expected.

sudo apt install python3-virtualenv virtualenv

Usage

Start by creating a virtual environment. Simply pick a point on your filesystem tree to put all the cruft your mission requires and it will dutifully be confined to it.

VEPATH=/home/xed/.virtualenvs
ENV=funproject
cd $VEPATH
virtualenv --python=$(which python3) $ENV

This will create a virtual environment called funproject and it will be in a directory called /home/xed/virtualenvs/funproject. Pretty much everything related to this mess will be in there. Obviously the --python option is optional but this just shows how to force Python3 if your system otherwise likes to stick with Python2.

Once you have the dumpster ready, you need to activate it so that it is the center of attention with respect to Python package wrangling.

source $VEPATH/$ENV/bin/activate

This just basically sets your $PATH so that the virtual environment directory’s bin/ is found first. This means you don’t have to be so explicit about the path when you undo all this — simply type deactivate. This also implies that removing a virtual environment is as simple as just rm -r $VEPATH/$ENV.

Normally people recommend that the $VEPATH be ~/.virtualenvs.

virtualenvwrapper

The path, ~/.virtualenvs, is designed to work with virtualenvwrapper. Here is the official documentation for that. It is a virtual environment manager composed of shell tricks. It basically boils down to this.

mkvirtualenv funproject
workon funproject

I personally don’t think I need this kind of shell obfuscation (that I didn’t create myself) but its helpful to know that mkvirtualenv and workon are commands from that system and can be skipped.

Conda And Miniconda And Anaconda

First — what are they?

  • Conda - a dual purpose package management system and an environment management system for installing multiple versions of software packages and their dependencies and switching easily between them.

  • Miniconda - A distribution of packages managed by the Conda package manager which provide minimal Conda functionality. This includes a particular Python of your choosing which may not be the same as your OS’s Python. The critical ability Miniconda provides is a mostly blank starting point from which you can install the software you need to use and (automatically) its dependencies.

  • Anaconda (not to be confused with the Red Hat installer - good name guys) is a distribution of packages managed by Conda which provides multifarious functions that many people find useful. Think of it as a full featured distribution so that you don’t, for example, have to go installing modules all the time while doing Python work. It is heavy and requires plenty of disk and initial installation time. This could possibly be helpful on systems that will be sent into service where internet connectivity is poor. Presumably everything you’ll need would be present, just don’t trigger any updates!

Installing Conda/Miniconda

Installation details. Installer is 34MB but it did seem to come with Python 3.6 and install as a non-root user. I created a separate Linux account to keep it from doing anything unpleasant but it seems well behaved so far. The executables live here.

export PATH=~/miniconda3/bin:$PATH

Use something like this.

~/miniconda3/bin/conda update conda

They do have instructions for a polite and sensible uninstall.

rm -rv ~/miniconda3 ~/.condarc ~/.conda ~/.continuum

Here’s the procedure I used most recently. Compare to the very similar procedure below and choose what makes most sense.

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh ./Miniconda3-latest-Linux-x86_64.sh
cp .bashrc .bashrc-conda # Fix .bashrc if you don't like the meddling.
vi .bashrc .bashrc-conda
source .bashrc-conda
conda list
conda config --add channels conda-forge
conda create -n xedopencv
conda activate xedopencv
conda search tensorflow-gpu
conda install tensorflow-gpu=1.13.1
conda install opencv

Non-Root Custom Python Environments

Let’s say you need to run some very fancy cool hipster dude Python program which was, for example, written in UTF-8 emojis in Python 3. Unfortunately the account you were given by a mean sys admin has CentOS 6.8 which, while secure and up to date for the series, is so yesterday. It is possible to set up your own fancy Python environments which can include the Python version you require and your own copies of all modules and dependencies.

Here is an example of a procedure to achieve that.

D=/tmp/${USER}
mkdir -p ${D} && cd ${D}
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ${D}/Miniconda3-latest-Linux-x86_64.sh -b -p $D/miniconda3
export PATH=${D}/miniconda3/bin/:${PATH}
conda update conda  # Answer prompt "y".
conda create -n mycoolproject python=3.6 anaconda
source ${D}/miniconda3/bin/activate mycoolproject
conda search h5py
conda install -n mycoolproject h5py
conda install -n mycoolproject opencv
conda install -n mycoolproject matplotlib seaborn pandas HDF5 keras tensorflow theano
conda install -n mycoolproject tensorflow-gpu # If you have GPUs.

Or something like this.

miniconda3/bin/conda create -n my_proj python=2.7 pandas seaborn HDF5 matplotlib h5py

If you are using a machine that needs to use a proxy. You need to setup a configuration file to specify the proxy (nope, it doesn’t use wget standard environment variables which is lame).

Save this as ${HOME}/.condarc because that and only that is what gets read by conda operations.

# Proxy settings: http://[username]:[password]@[server]:[port]
proxy_servers:
    http: http://user:pass@corp.com:8080
    https: https://user:pass@corp.com:8080

If the sysadmin set everything up for you, just activate is enough to get going. Here’s a complete demonstration of using a prepared miniconda setup at pro/bin/python/miniconda3. The "environment" is called py17 which is a naming convention I’m using to indicate modern Python circa 2017.

[~]$ echo $0
-tcsh
[~]$ bash  # Bash may not be required, but it makes things easier.
:->[ws8-ablab.ucsd.edu][~]$ python --version
Python 2.6.6
:->[ws8-ablab.ucsd.edu][~]$ source /pro/python/miniconda3/bin/activate py17
(py17) :->[ws8-ablab.ucsd.edu][~]$ python --version
Python 3.6.1 :: Anaconda custom (64-bit)
(py17) :->[ws8-ablab.ucsd.edu][~]$ python
Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:09:58)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import keras
Using TensorFlow backend.
>>> raise SystemExit
(py17) :->[ws8-ablab.ucsd.edu][~]$ source /pro/python/miniconda3/bin/deactivate
:->[ws8-ablab.ucsd.edu][~]$ python --version
Python 2.6.6

And to pretend like none of this ever happened.

source ${D}/miniconda3/bin/deactivate
${D}/miniconda3/bin/conda remove -n mycoolproject --all

Moving A Miniconda Installation

God help you if you need to move the miniconda installation, say to put it on a NAS. This article has some rough ideas that get you started. I tried this basic sed brute force technique and I got it working. I had to search through 3GB of Python installation and change 762 files.

A lot of general help with conda can be found here.

Python 3.x

Python 3 is a bit of a different animal from Python 2. The most useful treatment of the differences between 2 and 3 are by eev.ee here and here.

Here is an interesting technique to see which kind of Python you’re dealing with if you don’t know which interpreter will be used.

import sys
if sys.version_info.major < 3:
    print("Requires Python 3! Dude, this isn't 2008.")
    raise SystemExit

Here are some features I commonly deal with.

  • print() now requires full function style. Seems legit.

  • print x, to suppress automatic new lines doesn’t work. Use print(x, end="") instead. Also if you don’t want that to be buffered use print(x,end='',flush=True).

  • range() produces a iterable range object. xrange() is therefore superfluous and gone. Consider list(range(3)) for old behavior; it is a clearer form of [i for i in range(3)]. And for padding out lists with the same value, [0]*3 is much better yet.

  • x // y is now floor division while / is proper division as one expects everywhere but Python 2.

  • map(ord,"xed")+[0] This will now produce an error. Before it was fine. Because map produces a special object type you need this: list(map(ord,"xed")+[0]).

  • raw_input() has been replaced (more or less) with input(). I just ditched the raw_ and it worked fine in my application, but I’m not sure about the gory details.

  • Exceptions.

    • Was: except (Exception1,Exception2),target:

    • Now: except (Exception1,Exception2) as target:

  • Tuple arguments are scrapped. This means that def f(a,(b,c)): no longer works. This is actually not really needed ever since you can just set b_c to (b,c) and pass it in with very little conceptual difference. But lambdas, especially my favorite way to write them don’t work. So lambda (x,y): x+y must now be something like xy=(x,y);lambda xy:xy[0]+xy[1]

  • The filter command produces an iterator instead of a list.

  • Dictionaries are now ordered based on insertion.

  • mydict.keys() now produces an iterable, not a list. Use list(mydict.keys()) if you need that.

  • iteritems now just items as an iterable.

  • mydict.has_key('key') is no longer present. Use 'key' in mydict.

  • There is a bytes object that is like a string but not like a string. It is defined with something like b'myencodedbytes'. The difference seems to be that a string is an abstract thing humans can think about while a byte object is that string encoded (with an "encoding") into some ones and zeros that a computer can deal with.

Here’s a decent way to use Vim to change old print commands into Python 3 functions. Find the first print that needs parentheses and do this.

:s/print \([^ ].*\) *$/print(\1)/

Then you can just find the second one with n and do @:. After that, you can find them with n and repeat the change with @@.

One interesting tidbit I encountered was that source code in Python3 can use extended characters as variable names. Normally (maybe universally!) this is asking for trouble, but you can imagine a function where an angle is called alpha but using a real alpha (α). To get this to work, I had to add a special comment in the second line of the program like this.

Special Characters In Source Code
#!/usr/bin/python3
# vim: set fileencoding=utf-8 :
def greeks(α=0.8, β=1., λ=0.):
    return (α,β,λ)

The full description of this is in PEP0263.

Troubleshooting

Debug

Python has a built in variable called __debug__ which is normally true.

$ python -c "print(__debug__)"
True

Note that the __debug__ constant is immutable within a program. If you want to strip the debugging, perhaps to improve performance in some way, you can use the -O option at run time.

$ python -O -c "print(__debug__)"
False

Assert statements

The main use of the __debug__ variable is to control the execution of assert statements.

if __debug__ and not expression1: raise AssertionError
assert expression1                 # Same as previous line.

if __debug__ and not expression1: raise AssertionError(expression2)
assert expression1,expression2     # Same as previous line.

Problems With PYTHONPATH

Python is pretty solid and most sensible systems take great care with it since it’s often essential to a functional OS (e.g. emerge, yum). But sometimes things happen. Here’s a very nasty situation I had with CentOS 7.

:->[centos7-][~]$ python --version
Python 2.7.12
:->[centos7-][~]$ python -c 'print("ok")'
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
ImportError: No module named site
:-<[centos7-][~]$ export PYTHONPATH=/usr/lib64/python2.7/
:->[centos7-][~]$ export PYTHONHOME=/usr/lib64/python2.7/
:->[centos7-][~]$ python -c 'print("ok")'
ok

Module Search Path

How the module search path is constructed is almost described here. Unfortunately the final item there is "the installation-dependent default". You can mentally substitute the word "voodoo" for "default" for all the good that description does.

To really find out what’s going on you must go to the source code. Here is the authoritative description of how this works for Linux (Windows and Mac are different).

Linking

This was a very tricky problem to diagnose. Here are two systems which appear to have the exact same Python, but when run, they clearly are not the same.

Clean working
:->[goodhost][~]$ md5sum /usr/bin/python
49623a632cb4bf3c501f603af80103c4  /usr/bin/python
:->[goodhost.example.edu][~]$ /usr/bin/python --version
Python 2.7.5
Messed up
:->[messedup][/etc/ld.so.conf.d]# md5sum /usr/bin/python2.7
49623a632cb4bf3c501f603af80103c4  /usr/bin/python2.7
:->[messedup-new][/etc/ld.so.conf.d]# /usr/bin/python2.7 --version
Python 2.7.12

How can this be? After checking all possibilities, path and symbolic link issues were not relevant. This problem is why simply reinstalling Python may not fix its incorrect behavior. The answer, it turns out, is that the shared library linking was messed up on the non-working machine.

Clean ldd
:->[goodhost][~]$ ldd /bin/python2.7
        linux-vdso.so.1 =>  (0x00007fff494c5000)
        libpython2.7.so.1.0 => /lib64/libpython2.7.so.1.0 (0x00007f03e28f1000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f03e26d5000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f03e24d0000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007f03e22cd000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f03e1fcb000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f03e1c08000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f03e2cdf000)
Messed up ldd
:->[messedup][/etc/ld.so.conf.d]# ldd /usr/bin/python2.7
        linux-vdso.so.1 =>  (0x00007ffca97df000)
        libpython2.7.so.1.0 => /public/apps/coot-0.8.5/lib/libpython2.7.so.1.0 (0x00007f1554076000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1553e59000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f1553c55000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00007f1553a52000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f1553750000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f155338d000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f155449e000)

I had originally assumed that if a core component like Python was reinstalled from clean packages, libraries and all, that it would have to behave like a clean installation. But this is not true. If the ldcache is set to link Python to some spurious installation then it might not work. Or worse, barely work giving a maddening situation to troubleshoot. This problem arose when a program (coot) tried to allow for its own separate version of Python to be linked to. The moral of the story is to use ldd to check the Python executable before trying to diagnose things like PYTHONPATH and PYTHONHOME which, if the linking is bad, may not be able to help no matter what they’re set to.