This is a collection of notes that I use for reference. It is pretty complete and generally has most of the stuff I need to use, but it is deliberately not absolutely complete. There is too much obscure weird stuff in Python to include it all. This is my attempt at a good compromise for a solid collection of reference material. The emphasis is on practical usages and I try to include examples where I can to get projects up and running.
For people who aren’t sure if Python is really good or the best thing ever, this fine article makes it clear. The same author has a thorough article on porting from 2 to 3. These notes started eons ago with Python 2, but they’re mostly sensible with respect to Python 3 these days.
For people who aren’t sure if Python 3 is right for them, this absurdly good article explains all the differences.
Contents
Things I Commonly Forget
is None
I used to do things like if cool:
but that seems to have become
uncool in Python3. The correct way apparently is to use is
.
if cool is not None:
main
Python has an odd but sensible idiom where a program is checked to see if it was run as a real program. The idea is that if it was not, then perhaps nothing should really happen. This is useful for creating modules and other subcomponents of larger projects. This way you can define a library of functions that perform various functions and import them into any code you need them in and they will not run unless explicitly called. However if you run the module as a stand alone program, then you can have a function that tests the functions of interest. This helps in development.
The specific technique is to always put your default code which should be run as a standalone program at the end after a construction like this:
if __name__ == '__main__': do_default_stuff()
split and join
I can never remember the order of the object/function argument.
some_nice_separator_string .join(
some_sequence )
>>> ' and '.join(['romeo','juliet'])
'romeo and juliet'
some_joined_string .split(
divide_at_string )
>>> 'brad and jennifer'.split(' and ')
['brad', 'jennifer']
I am getting better about this and what finally helped it sink in is
that both split
and join
are string functions. Even though
join
really smells like a function concerned with sequences, it is a
string function.
Another handy note is that split()
with no arguments will split on
whitespace nicely without giving you silly things like multiple
consecutive space list entries (e.g. for something like a many
spaces b).
any
The any()
function seems totally superfluous to me, but there it is.
Feed it an iterable and it will return true if any of the items are
true.
Help
Python has a lot of clever help facilities that make in line documentation relatively easy. Here’s an example:
def cube(x): """This is the help for the cube function.""" return x*x*x print(cube(4)) # Outputs: 64 print(cube.__doc__) # Outputs: This is the help for the cube function.
These documentation strings can be multi-line when using the triple quoting. Here’s an official resource for good docstring practices.
SB proposes doing it like this.
Concisely describes what this function does.
If necessary, expands on that functionality to describe how to use it,
like defining its calling syntax/semantics, but not necessarily its
implementation details unless those are relevant to how the function
is used.
Args:
name (str): The name of a thing.
things (dict): A dictionary of things and their locations.
Returns:
location (tuple): The coordinates of the thing.
Raises:
KeyError: The name was not found amongst the things.
Functions And Arguments
There are lots of subtle details (officially described here) involved in fancy function definitions. Here is how to define functions to accept arbitrary numbers of arguments.
def f1(x,*l,**d): print(x,','.join(l),','.join(d.keys()) ) f1('fixed','l1','l2','l3',d1=1,d2=2,d3=3) # Returns: "fixed l1,l2,l3 d1,d2,d3"
It looks like Python 3.9 has a "/" parameter which is used to separate positional arguments from keyword arguments.
def slasher(positional_only, /, standard, *, keyword_only):
Here I take the "standard" to mean positional or keyword i.e. you can
say slasher(3,standard="ok")
or
slasher(3,"ok",other="stuff",keyword_only=True)
. But this syntax
appears to be non-functional in Python 3.7.3 — best to avoid it.
Sometimes you want to send a bunch of arguments to a function
encapsulated in a list. This is sometimes called starred expression.
This uses the myfun(*theargs)
syntax. To use a dictionary,
myfun(**theargdict)
. See details
in
this section about unpacking argument lists in the official
documentation. Here’s an example.
def f2(x,y): print(x,y) # Normal function. f2(1,2) # Returns "1 2" as expected. l=(3,4) # Define a list. f2(*l) # "3 4" f2(*(5,6)) # "5 6" d={'x':7,'y':8} # Define a dict. Keys must match. f2(**d) # "7 8"
Is this a good idea? Not sure. Doesn’t smell great though.
Also note the following syntax which uses function annotations. These are completely optional, do nothing, and shown just to help identify this weird stuff when found in the wild.
def fa(x:int,y:float,alpha:str="messy")->str: return(f'{x} {y} {alpha}') fa(1.111,True) '1.111 True messy'
Note that the call did not supply an int and a float as implied by the definition’s annotations. None of this stuff is binding, even the return type could have been specified falsely. To me this is a great way to obfuscate code by lying to your colleagues with Python going along with it.
Lambda
The lambda calculus of computer science is pretty wacky. It can be handy in the real world, however, and Python delivers. If you’re a beginner, skip this. Here’s how it’s done:
variable_now_a_function= lambda x,y: x + y print(variable_now_a_function(3,2)) # Would return 5.
There’s another syntax which I like better because it looks more like
def
and works just the same:
f= lambda((x,y)):x+y print(f((5,3))) # Would return 8
This doesn’t look particularly simplified from the first syntax, but it shows what’s going on better and the simplification is more apparent with forms like:
plus1= lambda(x):x+1 print(plus1(9)) # Would return 10
Lambda is often used to define variables that can be used as functions, the precise functionality of which is, well, variable. This can be useful to pass some contingent behavior along to a function or to set up operational templates that process any number of functionalities.
Slices
For sequence objects like strings and lists (and many others), Python has an absurdly elegant and powerful way to specify an exact subsequence. The general format for slices is this.
[start:end:stride]
The best tip about slices I’ve seen is to consider these values as numbering the "fence posts".
0 1 2 3 4 5 6
| x | e | d | . | c | h |
-6 -5 -4 -3 -2 -1 0
So to get just "xed" you do this.
>>> x='xed.ch'
>>> x[0:3]
'xed'
Note that x[3:4] is the same as x[3]. If you want to specify the end as the last position, you can just leave it empty. Same with the beginning; you don’t need to use 0.
>>> x[3:]
'.ch'
>>> x[:3]
'xed'
A negative first value starts positioning from the end. A negative second value excludes that range from the end (vs. including that range from start position with positive).
>>> x[-2:]
'ch'
>>> x[:-2]
'xed.'
A negative stride reverses the list.
>>> x[::-1]
'hc.dex'
Of course things can get weird.
>>> x[4:1:-1]
'c.d'
Slice Objects
It is possible to name slice objects, perhaps to improve clarity.
>>> zerototen= list(range(11))
>>> evens= slice(None,None,2)
>>> odds= slice(1,None,2)
>>> (zerototen[evens],zerototen[odds])
([0, 2, 4, 6, 8, 10], [1, 3, 5, 7, 9])
Or obfuscation!
>>> O= slice(1,2)
>>> zerototen[O]
[1]
Strings
-
"Double" ' or single quotes are ok.'
-
"Adjacent " "strings " "are " "concatentated" "."
-
Raw string: r’All \\\\ are retained except at the end.'
-
R’Same as with "r"?'
-
u’A unicode string is like this'
-
V=PEP498; print(f'Explanation:{V}')
shows Template Formatting string. See below. -
\x40 is an "@" and \x41 is "A".
-
\u1234 is a unicode 16-bit value (4 hex digits).
-
\U12345678 is a unicode 32-bit value (8 hex digits).
Template Formatting (Classic)
General String Format:
%[(name)][flags][width][.precision]code
Code | Use |
---|---|
|
String (with |
|
String (with |
|
Character |
|
Decimal Integer |
|
Integer |
|
Unsigned Integer |
|
Octal Number |
|
Hex Number |
|
Hex with uppercase |
|
Floating Point Exponent |
|
e with uppercase |
|
Floating Point Decimal |
|
f with uppercase |
|
Floating Point E or F |
|
Floating Point E or F |
|
Literal |
Example:
"%(n).5G is %(t)s." % {"n":6.0221415e+23, "t":"a very big number"}
'6.0221E+23 is a very big number.'
Note that to get plus signs always, use + at the beginning. To get
leading zeros, just put leading zeros. Note however, that if you want
a number like this "+001.300" you need something like %+08.3
. The 8
is needed because that is the full length you want reserved. The 3 is
how to deal with the fraction and everything else is just decoration.
Template Formatting (Modern)
Of course Python3 had to go and reinvent how this is done. It is more complex in some ways, but simpler in other ways. For example, it is no longer strictly necessary to specify in the template what kinds of types will be showing up; it can just deduce them.
>>> 'int: {} float: {} str: {}'.format(99,3.141,'fun')
'int: 99 float: 3.141 str: fun'
You can scramble the order.
>>> 'int: {2} float: {0} str: {1}'.format(3.141,'fun',99)
'int: 99 float: 3.141 str: fun'
Need full times with leading zeros and milliseconds?
>>> '{:02d}:{:02d}:{:06.3f} {}'.format(5,0,7,'ok')
'05:00:07.000 ok'
Note that the ints really need to be ints!
So that’s all fine and annoyingly different but not obviously way
better. But hang on, it’s all about to sink in why this new system is
much better. The important trick is that you can use the format()
function but without writing it out explicitly. Basically you can
specify a formatted "f-string" where the formatting operations are
implied. The braces are not just holding places but rather containing
the data itself. Take a look.
>>> f'{5:02d}:{0:02d}:{7:06.3f} {"ok"}'
'05:00:07.000 ok'
This is most useful when used to print variables.
>>> h,m,s,note= 3,9,39,"marathon PR"
>>> f'{h:02d}:{m:02d}:{s:06.3f} {note}'
'03:09:39.000 marathon PR'
This nice webpage has a very nice catalog of most of the cool things this format system can do helpfully contrasted with the old way of doing things.
Sequence Converters
-
s.join(sequence)
-
Join
sequence
with thes
as separator. -
s.split(separator[,maxcount])
-
Separate string
s
atseparator
. Don’t forget you can limit the number of splits withmaxcount
. -
s.rsplit(separator[,maxcount])
-
Like
split
but starting from the right. Probably not too useful withoutmaxcount
. Example:"First M. Last".rsplit(None,1)
is["First M.", "Last"]
. -
s.splitlines([keepNL])
-
Breaks a string by line. Keep the new lines if
keepNL
is present andTrue
.
Padding and Stripping
-
s.expandtabs([tabsize])
-
Converts tabs to space. If
tabsize
is missing, default is 8. -
s.strip([char2strip])
-
Strips leading and trailing whitespace (space, tab, newlines). If
char2strip
is present, then strip that character instead."$99.99".strip("$")
is99.99
. -
s.lstrip([char2strip])
-
See
strip
. However, note that thechar2strip
string is not searched for itself, but rather each character that comprises it is stripped if it is found (in any order). If you want to strip off a prefix or a suffix, probably best to use replace or a slice. -
s.partition(separator)
-
Example:
"rock and roll".partition(" and ")
produces("rock", " and ", "roll")
-
s.rpartition(separator)
-
Like
partition
but from the right side. -
s.rstrip([char2strip])
-
See
strip
. Also see the note atlstrip
about thechar2strip
. -
s.zfill
-
Example:
print("James Bond is %s." % "7".zfill(3))
-
s.ljust(w[,char])
-
Makes a string of width
w
padding the right withchar
(whose length must be 1) or spaces. Ifs
is longer thanw
thens
is returned unmodified. -
s.rjust(w[,char])
-
See
ljust
. -
s.center(w[,char])
-
See
ljust
.
Search and Replace
-
s.find(stringtofind[,start[,end]])
-
Returns -1 when
stringtofind
is not ins
(seeindex
). If found, returns first position where found. Thestart
andend
parameters are like doing find ons[start:end]
. -
s.rfind(stringtofind[,start[,end]])
-
Similar to
find
but searching forstringtofind
from the right to left. -
s.index(stringtofind[,start[,end]])
-
Basically like
find
but raising aValueError
if the substring is not found. -
s.rindex(stringtofind[,start[,end]])
-
See
index
. -
s.count(stringtocount[,start[,end]])
-
Counts non-overlapping occurrences of
stringtocount
. The other parameters behave like they do infind
. -
s.replace(old,new[,maxsubs])
-
Returns new string with
old
replaced bynew
. Usemaxsubs
to limit the number of substitutions (global, all, by default). Does not modify string in place! -
s.startswith(stringtofind[,start][,end])
-
True if the
stringtofind
is the beginning ofs
(or some other point ifstart
is given). -
s.endswith(stringtofind[,start][,end])
-
Like
startswith
.
Unicode and Translating
-
s.decode
-
s.encode([encoding,[errors]])
-
Encoding can be
ascii
,utf-32
,utf-8
,iso-8859-1
,latin-1
. Errors can bestrict
,ignore
,replace
,xmlcharrefreplace
. -
s.format
-
Something to do with unicode.
-
s.translate(table[,delchars])
-
Replace characters in
s
with corresponding characters intable
which must be string of 256 characters. Thedelchars
string contains values which are just dropped. Notefrom string import maketrans
is handy for makingtable
. -
s.title()
-
Capitalize the first word of everything. Note that apostrophes will do things like
It'S So Hard For Me To Believe By Otis Rush
. -
s.swapcase()
-
Switch upper case to lower and vice versa.
-
s.capitalize()
-
Only the first word of the string is capitalized, not the whole thing (see
upper
). -
s.lower()
-
Make string all lower case. Useful for normalizing silly user input.
-
s.upper()
-
Like
lower
. -
ord(c)
-
Inverse of
chr(n)
andunichr(n)
wherec
is a single character.
Boolean Checks
-
s.isalnum()
-
Alphanumeric.
-
s.isalpha()
-
Is a letter.
-
s.isdigit()
-
Is a digit.
-
s.islower()
-
Is lowercase.
-
s.isspace()
-
Is a string with a length of at least 1 with all whitespace.
-
s.istitle()
-
TitleCaseWantsUpperToOnlyFollowLowerAndViceVersa.
-
s.isupper()
-
Is uppercase.
Here are some more obscure attributes:
s.__add__, s.__class__, s.__contains__, s.__delattr__, s.__doc__,
s.__eq__, s.__format__, s.__ge__, s.__getattribute__,
s.__getitem__, s.__getnewargs__, s.__getslice__, s.__gt__,
s.__hash__, s.__init__, s.__le__, s.__len__, s.__lt__, s.__mod__,
s.__mul__, s.__ne__, s.__new__, s.__reduce__, s.__reduce_ex__,
s.__repr__, s.__rmod__, s.__rmul__, s.__setattr__, s.__sizeof__,
s.__str__, s.__subclasshook__, s._formatter_field_name_split,
s._formatter_parser
For example:
$ python -c "print('a string'.__doc__)"
str(object) -> string
Return a nice string representation of the object.
If the argument is a string, the return value is the same object.
Regular Expressions
I use regular expressions a lot and I really quite like them. For shell scripting they are essential. When I used to be a strong Perl programmer, I used Perl’s excellent regular expression libraries all the time. But as I switched to Python, I found that I really just hardly ever need to use them. For example this normal shell code…
cal | grep September
…can be done in Python like this…
[X for X in os.popen('cal') if 'September' in X]
…which may not look great, but it is the Python way and if you’re
cool with that, it can actually be an improvement. Note that the
modern Python way now uses the subprocess
module.
Details.
For simple matching, I find I use the Python in
and is
(and not
)
operators a lot. Instead of regular expressions one can use Python
functions like split
, join
, replace
, find
, startswith
,
endswith
, swapcase
, uppper
, lower
, isalnum
, isspace
, etc.
Also "slices" and string template substitution really make regular
expressions seem kind of backward and inelegant in the Python idiom.
But as the saying goes, sometimes you have a problem that really needs regular expressions; now you have two problems.
Python handles regular expressions in a rather object oriented way. No simple Perl or sed implied syntax. Here’s a small example that shows how you could go through a bunch of eclectic data looking for social security numbers.
#!/usr/bin/python # Don't name this test program re.py! Because of... import re D= ["William Poned","SS:456-90-9876","3425 Ponzi Dr."] pattern_object= re.compile('(\d\d\d)-(\d\d)-(\d\d\d\d)') for d in D: #Note that "search" is satisfied to find the pattern within the string. match_object= pattern_object.search(d) if match_object: print(match_object.re) print(match_object.groups()) print(match_object.span()) print("Using `search` function of the match object:") print(match_object.group()) #The "match" function demands that it match the entire string. pattern_object= re.compile('.*(\d\d\d)-(\d\d)-(\d\d\d\d).*') match_object= pattern_object.match(D[1]) print("Using `match` function of the match object:") print(match_object.group())
Here’s what this program outputs.
<_sre.SRE_Pattern object at 0xf4c1e0>
('456', '90', '9876')
(3, 14)
Using `search` function of the match object:
456-90-9876
Using `match` function of the match object:
SS:456-90-9876
Note the difference between the search
and the match
methods of
the pattern object. The latter needs to match the entire string with
the pattern while the former simply needs to find the pattern in the
string somewhere.
Here’s another example. This one used the "raw" string type.
import re target,regexp= '==== Sub-Heading', r'^==* .*$' # Note raw string prefix. match= re.search(regexp,target) # Creates a <class 're.Match'> object. if match: # Which can be queried directly for matching success. print ('Heading detected:', match.group()) # What exact 'str' was matched.
Substitution
Here is a comparative example of a simple substitution using core Python functions and regular expressions.
Here’s a string containing the characters "JUNK" followed by 4 unknown characters all of which must be removed.
In [1]: x="This is a long string JUNK1234with some unwanted stuff in it."
There are two major 2 ways.
#1: Use find()
or index()
to figure out where in the string the
thing is and then use slices:
In [2]: n= x.index('JUNK')
In [3]: print(x[0:n]+x[n+8:])
This is a long string with some unwanted stuff in it.
#2: Use regular expressions. Just match with "JUNK….".
In [4]: import re
In [5]: print(re.sub('JUNK....','',x))
This is a long string with some unwanted stuff in it.
Despite being a regular expression pro, in Python, I tend to minimize it and, unlike other environments, that’s easy to do (as shown here).
For more information, check the official gory details.
Match
Since we just looked at the match
function of the re
module, let
me interrupt the flow here to interject a quick note about Python’s
branching syntax. In the old days, if you had a lot of possibilities
you’d do something like this.
if x == 1: print("one") elif x == 2: print("two") else: print("unknown")
That’s not the loveliest syntax possible but it is simplistic and clear. But it
looks like Python couldn’t leave well enough alone and they had some FOMO for
C’s switch ()...case
. As of Python 3.1 you can use the match
keyword (i.e. not the re
module method).
match x: case 1: print("one") case 2: print("two") case _: print("unknown")
That _
is official syntax and not a nonce or something. Enjoy your
extra line. And indent.
Lists and Sequence Types
Functions available to lists:
-
l.append(object)
-
Simply append
object
in place to the end of the list. -
l.count(value)
-
Count the occurrence of
value
in the list. -
l.extend(iterable)
-
Append a list with (all?) the items supplied by
iterable
. -
l.index(value[,start[,stop]])
-
Return index of first occurrence of
value
. The other parameters act as a slice. -
l.insert(index,object)
-
Insert
object
in place immediately beforeindex
. -
l.pop([index])
-
Remove (in place) and return item at
index
(or last item). RaiseIndexError
ifindex
is out of range or the list is empty. -
l.remove(value)
-
Removes in place first occurrence of
value
or raiseValueError
if not found. Note that[v for v in l if v != value]
can get rid of allvalue
occurrences from a list (makes a new list this way). -
l.reverse()
-
Reverses the list in place.
-
l.sort([cmp=None][,key=None][,reverse=False])
-
Sorts a list in place. The
cmp(x,y)
function returns-1
,0
, or1
for less than, equal, or greater than respectively. See Complex Object Sorting for how to usekey
. Also note that there is asorted(mylist)
function that will return a new sorted list if you want to preserve the original list.
Other attributes of lists:
l.__add__, l.__class__, l.__contains__, l.__delattr__,
l.__delitem__, l.__delslice__, l.__doc__, l.__eq__, l.__format__,
l.__ge__, l.__getattribute__ l.__getitem__, l.__getslice__,
l.__gt__, l.__hash__, l.__iadd__, l.__imul__, l.__init__,
l.__iter__, l.__le__, l.__len__, l.__lt__, l.__mul__, l.__ne__,
l.__new__, l.__reduce__, l.__reduce_ex__, l.__repr__,
l.__reversed__, l.__rmul__, l.__setattr__, l.__setitem__,
l.__setslice__, l.__sizeof__, l.__str__, l.__subclasshook__
List Comprehension
List comprehensions are a nice way to apply some action to a list in
such a way that a new list is generated. The syntax is a bit odd at
first, but it’s actually pretty reasonable and compact. Note that the
functionality is comparable to the map
function.
>>> [pow(2,y) for y in range(8)]
[1, 2, 4, 8, 16, 32, 64, 128]
>>> map(lambda y:pow(2,y),range(8))
[1, 2, 4, 8, 16, 32, 64, 128]
Conditional filtering works too, including a form with else.
[myfn(x) for x in mylist if mycondition]
[myfn(x) if mycondition else myelsefn(x) for x in mylist]
Note these are subtly different. The first is a filter. It throws out any that do not match the if condition. This example divides by two but never wants to see a value split in half and rounds up in the case of odd numbers.
>>> [(x+1)/2 if x%2 else x/2 for x in range(20)]
[0.0, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 4.0, 4.0, 5.0, 5.0, 6.0, 6.0, 7.0,
7.0, 8.0, 8.0, 9.0, 9.0, 10.0]
There the if x%2 else x/2
acts as if it wraps the myfn() clause.
Note that in this construction you must have an else
; if you don’t
really want a fancy myelsefn(), just use else x
.
This similar example is simply filtering out the source list based on how each list item interacts with the mycondition.
>>> [(x+1)/2 for x in range(20) if x%2]
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
My general rule for the whole list comprehension syntax is that I use it only where it reads to humans quite naturally. This occurs surprisingly often but generally precludes overly complex logic.
That said, sometimes you want to be aware of performance benefits, which can be substantial. For example, this program took only 0.100s.
Y= [x for x in range(1000000)]
Compare with this equivalent code which took 0.171 seconds.
Y= list()
for x in range(1000000):
Y.append(x)
Note that Y+=[x]
takes even longer (0.200s) than the append(x)
.
Filter
The list comprehension can be conditional which basically operates
like a filter
function.
>>> [x for x in range(1e3) if not x%333]
[0, 333, 666, 999]
>>> filter(lambda x:not x%333,range(1e3))
[0, 333, 666, 999]
To be clear, the
official
Python description of filter
says that these are equivalent.
-
filter(fn, iterable)
-
[x for x in iterable if fn(x)]
And so are these.
-
filter(None, iterable)
-
[x for x in iterable if x]
Note that in Python 3+ the filter
command produces an iterator
instead of a list. This may be helpful for large data sets that
need to be processed. Otherwise, you should probably use list
comprehensions.
For example, this kind of thing is not cool in Python 3.
if not filter(lambda x:x.whatami!="VAL",thevaluelist):
But this is and it’s pretty obviously an improvement.
if all([x.whatami=="VAL" for x in thevaluelist]):
reduce
While we’re covering wacky functions that like the lambda
construction, here’s a use of the reduce
function. I really don’t
think this function is very useful, but this was my best attempt to do
something more exotic with it than the normal adding a bunch of
things.
>>> reduce(lambda hold,next:hold+chr(((ord(next.upper())-65)+13)%26+65),'jjjkrqpu','')
'WWWXEDCH'
I think this is as good as it gets judging by this.
If you never understood reduce’s utility you’re in luck! Python 3 removed it. You can still use the reduce function found in functools. But really, probably best to consider it dead to Python.
Generators
Generators are very much like list comprehensions except they don’t
synthesize the entire list into memory at their location. Instead they
produce a generator object
which can be iterated, generally with a
next()
function. Each time it is iterated, the next item in the
sequence is generated at that time until the specified objects are
exhausted.
>>> a=9
>>> g=(x+a for x in range(10))
>>> g
<generator object <genexpr> at 0x7f41c0b63af0>
>>> next(g)
9
>>> for x in g: print(x,end=' ')
...
10 11 12 13 14 15 16 17 18
The generator syntax is a shorthand for a more verbose style involving
the yield
keyword. The yield
keyword returns an argument just like
return
(unlike return
the argument is mandatory). Then the
function’s state is preserved and the next call to it resumes where it
left off. This can be reset by a return
statement or just a natural
end to the function.
The following example illustrates the usage with a function that provides unique incrementing ID numbers.
#!/usr/bin/python def numberer(id=0): while True: id += 1 yield id if __name__ == '__main__': ID_set_1= numberer() ID_set_2= numberer(10) for n in range(3): print(ID_set_1.next(), ID_set_2.next())
This produces:
1 11
2 12
3 13
Dictionaries
In python dictionaries are lists of items which store a value which is indexed by a key (as opposed to a list which indexes by an index position, a number). The order of items in a dictionary is usually unreliable since order is not needed for its management. Wait — in Python 3 it looks like the order of dictionaries is maintained.
Generally dictionaries can be created like this:
>>> d=dict({'akey':'avalue','bkey':'bvalue'})
>>> d
{'akey': 'avalue', 'bkey': 'bvalue'}
>>> d['bkey']
'bvalue'
There are actually many ways to create dictionaries but why be complicated?
Here are methods that can be applied to dictionaries:
-
d.clear()
-
Remove all items in the dictionary.
-
d.copy()
-
Returns a shallow copy of
d
. -
dict.fromkeys(sequence[,value])
-
Creates a new dictionary with items that have keys found in
sequence
. Thevalue
, if present, is applied to all new items. I don’t think this function sensibly acts on an existing dictionary but it is a dictionary method. For this reason it seems cool to just apply it todict
. Thisdict.fromkeys("xyz",0)
produces{"y": 0, "x": 0, "z": 0}
. -
d.get(key[,elsevalue])
-
Same as
d[key]
except that ifelsevalue
is present and the key is not, thenelsevalue
is returned. Sinceelsevalue
defaults toNone
then noKeyError
is raised with this function. -
d.has_key(key)
-
Deprecated. If the dictionary has an item with a key of
key
then returnsTrue
. OtherwiseFalse
. Same ask in d
which should always be used in Python3. -
d.items()
-
Returns list of key,value tuples. Order is unreliable.
-
d.iteritems()
-
Produces an iteration object that can take
.next()
methods producing key,value tuples of all the items until aStopIteration
exception. Deprecated in Python3. -
d.iterkeys()
-
See
iteritems()
but with just the keys. -
d.itervalues()
-
See
iteritems()
but with just the values. -
d.keys()
-
Returns a list of keys (Python 2!). Order is unreliable. Now returns a dict_keys iterable object in Python 3.
-
d.pop(key[,elsevalue])
-
Like
get
but removes item in addition to returning its value. Unlikeget
if noelsevalue
is provided andkey
isn’t ind
then aKeyError
is raised. This is a way to try to remove an item whether it exists or not; just make sure to specify anelsevalue
. -
d.popitem()
-
Not like
pop
! It is more likeiteritems
. Returns some item’s key,value tuple or, if no items are present, raises aKeyError
. -
d.setdefault(key[,elsevalue])
-
Almost exactly like
get
but in addition to returningelsevalue
, it sets the specifiedkey
to it leaving that item subsequently defined. If noelsevalue
is specified andkey
isn’t ind
then an itemkey,None
item is created. -
d.update(d2)
-
Ok, this one’s a serious messy pile of function. Merges the items in
d2
intod
. It can also take key,value pairs liked.update({"m":13,"n":14})
. If you wanted to dodictA + dictB
this is probably what you want. -
d.values()
-
Returns values in a list. Order is unreliable.
Other dictionary attributes:
d.__class__, d.__cmp__, d.__contains__, d.__delattr__,
d.__delitem__, d.__doc__, d.__eq__, d.__format__, d.__ge__,
d.__getattribute__, d.__getitem__, d.__gt__, d.__hash__,
d.__init__, d.__iter__, d.__le__, d.__len__, d.__lt__, d.__ne__,
d.__new__, d.__reduce__, d.__reduce_ex__, d.__repr__,
d.__setattr__, d.__setitem__, d.__sizeof__, d.__str__,
d.__subclasshook__
Dict Comprehensions
As of Python 2.7 and all of Python 3, there are dict comprehensions which are very, very similar to Python’s list comprehension. The syntax looks like this.
{key_expression: value_expression for item in iterable}
Here’s an example.
even_squares_dict= {x: x*x for x in range(10) if x % 2 == 0}
I can’t say that this is clearer and better than this equivalent syntax.
even_squares_dict= dict([(x, x*x) for x in range(10) if x % 2 == 0])
Tuples
A tuple is a type that gets its name (I think) from the idea of "multiple" or "quintuple". Its two most important aspects are that it is immutable and that it is a collection of references to other objects. This makes tuples ideal for passing around between functions because you know that the order of the arguments will not change and also because you don’t have to copy ("by value") all the argument data into another memory location to make it available to the function.
Tuples can be "unpacked" in the following way:
>>> origin= (0,0)
>>> x,y= origin
>>> print("X:%d Y:%d" % (x,y))
X:0 Y:0
Tuples do not have many idiosyncratic methods that can be called on them. Here they are:
-
t.count(value)
-
Returns the number of time
value
is found int
. -
t.index(value[,start[,stop]])
-
Returns the position of the first occurrence of
value
. If the other parameters are supplied, it searches on a slice.
The Python built-in function zip
is notable for returning a list of
tuples composed of other lists.
>>> zip([1,2,3],['a','b','c'])
[(1, 'a'), (2, 'b'), (3, 'c')]
Output is only as long as the shortest list.
One very useful thing that can be done with this is listwise operations.
Here, for example, I’m calculating a perceptron value by taking the
sum of each input value times each corresponding weight, and then
adding the bias. Here inputs
is a list of input values and weights
are the corresponding weights for each input position. Bias is just a
constant.
value= sum([i*w for i,w in zip(inputs,weights)],bias)
The map
command can serve for zip
if the lists are the same
length.
>>> map(None,[1,2,3],['a','b','c'])
[(1, 'a'), (2, 'b'), (3, 'c')]
Other attributes of tuple types:
t.__add__, t.__class__, t.__contains__, t.__delattr__, t.__doc__,
t.__eq__, t.__format__, t.__ge__, t.__getattribute__, t.__getitem__,
t.__getnewargs__, t.__getslice__, t.__gt__, t.__hash__, t.__init__,
t.__iter__, t.__le__, t.__len__, t.__lt__, t.__mul__, t.__ne__,
t.__new__, t.__reduce__, t.__reduce_ex__, t.__repr__, t.__rmul__,
t.__setattr__, t.__sizeof__, t.__str__, t.__subclasshook__
Sets
Are sets real Python objects? I think they must be:
>>> s= set([1,2,3,4])
>>> type(s)
<type 'set'>
They are certainly one of the more obscure and unused primary types in Python. I suspect that there may be some fantastic performance improvement in certain contexts, but I don’t know what those are.
The main points about sets are that they are unordered and they contain no duplicate elements.
Here’s a good overview of how sets are used:
>>> set1
set([0, 1, 2, 3, 4, 5, 6])
>>> set2
set([3, 4, 5, 6, 7, 8, 9])
>>> set1-set2
set([0, 1, 2])
>>> set2-set1
set([8, 9, 7])
>>> set1 & set2
set([3, 4, 5, 6])
>>> set1 | set2
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> 4 in set2
True
Note that sets do not have a +
operator. If you need something like
that look more into the |
operator which is a set union. The &
is
a set intersection and there are explicit named functions for this
too.
Here are some other examples:
>>> seta= set(['ann','bob','carl','doug','ed','frank'])
>>> setb= set(['ann','carl','doug','frank','gary','harry'])
>>> seta - setb
set(['ed', 'bob'])
>>> seta.difference(setb)
set(['ed', 'bob'])
>>> setb.difference(seta)
set(['gary', 'harry'])
>>> seta | setb
set(['ed', 'frank', 'ann', 'harry', 'gary', 'carl', 'doug', 'bob'])
>>> seta.union(setb)
set(['ed', 'frank', 'ann', 'harry', 'gary', 'carl', 'doug', 'bob'])
>>> seta & setb
set(['frank', 'ann', 'carl', 'doug'])
>>> seta.intersection(setb)
set(['frank', 'ann', 'carl', 'doug'])
>>> seta.symmetric_difference(setb)
set(['gary', 'harry', 'ed', 'bob'])
>>> setb.symmetric_difference(seta)
set(['gary', 'ed', 'harry', 'bob'])
Sets can be used to remove duplicates from a list. Here’s what that would look like:
thelist= list(set(thelist))
Note that this does not preserve order which may have been important to you.
Set Element Removal Functions
-
s.difference(badset)
-
Same as
s - badset
. This example,set("abcd").difference(set("cd"))
, produces newset(["a","b"])
. Note that the other way around is different. In the example, it produces an empty set sincea
andb
are ignored as not present andc
andd
are removed. -
s.difference_update(badset)
-
Same as
difference
but just take out thebadset
froms
. -
s.symmetric_difference()
-
Similar to
difference
but the order is not important. Any objects that are in both sets (that intersect) are removed. This returns a new set. -
s.symmetric_difference_update()
-
Same as
symmetric_difference
but operates in place. -
s.intersection(s2)
-
Same as the
s & s2
as applied to sets. Returns the elements in common betweens
ands2
. -
s.intersection_update(s2)
-
Instead of returning the element in common, it simply changes
s
. -
s.pop()
-
Takes no arguments and returns an unreliable element which is then removed from the set. If the set is empty, a
KeyError
is raised. -
s.discard(object)
-
Remove
object
froms
if it is a member. Very similar to difference except only for one object and modifies in place instead of returning a new one. -
s.remove(object)
-
Remove an object from a set in place or
KeyError
if it’s not there. Except for the exception, pretty much like discard. -
s.clear()
-
Make the set completely empty.
Set Augmentation Functions
-
s.add(object)
-
Adds a single
object
tos
(silently ignores it if already present). -
s.union(s2)
-
Same as
s | s2
. Returns a new set containing all the elements that were ins
and all the new elements ins2
. -
s.update(s2)
-
Incorporates elements of
s2
intos
. If all ofs2
is already present, then nothing happens. It’s pretty much aunion
in place. -
s.copy()
-
Shallow copy of the set.
Set Test Functions
-
s.isdisjoint(s2)
-
Returns
True
ifs
ands2
have no elements in common. Basically the same asnot s & s2
. -
s.issubset(s2)
-
Note the order is as implied by the function name,
True
ifs
is a subset ofs2
and not the other way around. -
s.issuperset(s2)
-
Returns
True
ifs
containss2
. Pretty much the same asissubset
but with the object and argument switched. A particular object as both object and argument is true, it is a subset of itself.
Other attributes of set
objects:
s.__and__, s.__class__, s.__cmp__, s.__contains__, s.__delattr__,
s.__doc__, s.__eq__, s.__format__, s.__ge__, s.__getattribute__,
s.__gt__, s.__hash__, s.__iand__, s.__init__, s.__ior__, s.__isub__,
s.__iter__, s.__ixor__, s.__le__, s.__len__, s.__lt__, s.__ne__,
s.__new__, s.__or__, s.__rand__, s.__reduce__, s.__reduce_ex__,
s.__repr__, s.__ror__, s.__rsub__, s.__rxor__, s.__setattr__,
s.__sizeof__, s.__str__, s.__sub__, s.__subclasshook__, s.__xor__
Classes and Object Oriented Stuff (OOP)
Python isn’t just capable of using object-oriented features. It has been designed with that as a primary aspect of the language. One nice thing about the design, however, is that unlike Java, you can safely ignore all object oriented features and get quite a bit of useful programming done. But when object oriented features make a lot of sense and would actually reduce complexity, Python is there to make it quite simple.
Here is a quick example showing a simple class and one technique for making loading of the attributes optional at instantiation.
#!/usr/bin/env /usr/bin/python class Plan: def __init__(self,premium=0,deductible=0,HSA=0): self.loadinfo(premium,deductible,HSA) def loadinfo(self,p,d,H): self.premium= p self.deductible= d self.HSA= H def __repr__(self): return '$%0.2f, $%0.2f, $%0.2f'% (self.premium,self.deductible,self.HSA) if __name__ == '__main__': Gold= Plan() # Defer loading data values. Gold.loadinfo(1027.90,3900,350) # Load values explicitly when ready. Silver= Plan(936.12,1900,100) # Load values during creation. print(Silver) print(Gold)
The best way to remind myself how it all works is to look at some good code I have written that aptly uses the tradition OO programming style. This excerpt of my Geogad language shows a base class representing geometric entities and sub classes derived from it. It shows the definition of member data items and member functions. It also shows the idea of a class variable which in this case is useful to keep a unique ID number for each entity whatever its subtype.
class Entity: """Base class for functionality common to all entities.""" lastIDused= 0 # Class static variable, new id pool. master_VList= vector.VectorList() def __init__(self): Entity.lastIDused += 1 # Increment last ID used... self.p_id= Entity.lastIDused #...which becomes the ID. self.vectors= dict() # Vectors that define entity's geometry. self.attribs= dict() # Properties of this entity. def __eq__(self, comparee): # Overload == operator. for p in self.vectors.keys(): if not self.vectors[p] == comparee.vectors[p]: return False else: return True except KeyError: # If they entities are totally mismatched. return False # E.g. A has end1 and end2, B has cen & rad. def __repr__(self): return 'A generic entity' def rotate(self, ang, basepoint=None): # On xy plane (%) pass def scale(self, fact, basepoint= None): if basepoint: # Move it to origen. self.translate(-basepoint) for p in self.vectors.keys(): temp= self.vectors[p] newv= temp * fact Entity.master_VList.append( newv ) Entity.master_VList.remove( temp ) self.vectors[p]= newv if basepoint: self.translate(basepoint) # Put it back. def centroid(self): pass def boundingbox(self): pass def translate(self, offset): for p in self.vectors.keys(): temp= self.vectors[p] newv= temp + offset Entity.master_VList.append( newv ) Entity.master_VList.remove( temp ) self.vectors[p]= newv class Entity_Point(Entity): def __init__(self, P): Entity.__init__(self) self.vectors['A']= P Entity.master_VList.append(P) # Check in with the master list. def copy(self, offset=None): cp= Entity_Point(self.vectors['A']) if offset: cp.translate(offset) return cp def __repr__(self): return 'POINT:'+ str(self.vectors['A']) class Entity_Line(Entity): def __init__(self, Pa, Pb): Entity.__init__(self) if Pa < Pb: # Here the vectors are sorted for predictability. self.vectors['A'], self.vectors['B']= Pa, Pb else: # The __lt__ is a bit arbitrary. self.vectors['A'], self.vectors['B']= Pb, Pa Entity.master_VList.append(Pa) # Check in with the master list. Entity.master_VList.append(Pa) # Check in with the master list. def copy(self, offset=None): cp= Entity_Line(self.vectors['A'], self.vectors['B']) if offset: cp.translate(offset) return cp def __repr__(self): return 'LINE:'+ str(self.vectors['A'])+str(self.vectors['B'])
There are a lot of built-in functions that can be overloaded to give
your objects a more natural functionality. For example, whatever your
object is, there is probably some sense of how big it is. Overloading
the Python __len__()
method for the class can make len(MyObject)
do the right thing, whatever that is.
This is a
pretty good resource for figuring out what your options are.
The ‘super()` function returns a temporary object of the superclass
that allows access to all of its methods to its child class. This
allows for stuff like super().__init__()
in a derived class’
‘__init__()` that asks to do all the stuff the general class’
initialization does too. So in my entity example above, I use
‘Entity.__init__(self)` in the Entity_Point(Entity)
class’
__init__()
and I could have used the super().__init__()
there.
Decorators
Decorators seem kind of lame to me. They basically add no fundamental functionality as far as I can tell. They seem to only turn this…
def f(x): return x f = d(f)
…into this:
@d def f(x): return x
I can’t say I’m super impressed by that. It seems like it’s for people who don’t know how to handle functions as objects, but what do I know? I find it weird that this syntax refers to something that is not yet defined and that’s not how Python should work. If it did, we could have main at the top of our programs. It is worth noting the first syntax as an alternative to decorators since it provides a clearer way to selectively activate them.
Nonetheless, some uses for decorators:
-
Timing something out so that it does not hang indefinitely. See Function Timeout section.
-
Profiling something to see how long various parts of your code take. See Function Timer section.
-
Type checking questionable input parameters.
-
Checking the security context of a function.
-
Tests.
-
Logging that a function actually got run.
-
Counting the number of times it got run.
One notable Python builtin decorator is @staticmethod
. This can be
used to include a method function in a class namespace when it really
doesn’t or can’t use the class. Imagine a "member" function with no
(self,...
argument.
This
reasonable sounding person has misgivings about the whole idea.
Google’s Python Style Guide seems to prohibit it. And
Guido
himself seems to regret it. But it is noted in case it pops up
again.
Oh and
here
is some discussion about @classmethod
which is very similar to
@staticmethod
in syntax but, as I understand it, passes in an entire
class instead of the normal self
instance. I can’t imagine the exact
circumstances where this is truly essential or even useful but maybe I
have a limited imagination (limited by assiduously avoiding Java most
of my life).
Attribute Management
Python has a hasattr()
function.
TheObject.hasattr('color') # WRONG - not used like this at all!
hasattr(TheObject,'color) # Right - will return true if TheObject.color is valid.
This guy has some good reasons to think that it’s better to except on AttributeError than do checks this way. However this can mask unknown unknown attribute errors.
He also points out that this works and is a bit more efficient too.
getattr(TheObject,'color',None)
If there is no color attribute for TheObject, the None object is returned.
Python also has setattr()
and delattr()
if that’s what you need.
Function Timer
Here’s a decorator example that times a function:
#!/usr/bin/python def timethis(f): """A decorator function to time things.""" import time def timed_function(*args,**kw): start= time.time() result= f(*args,**kw) print('Time was %.3fs' % (time.time()-start)) return result # Important to pass along any function results. return timed_function @timethis def example_fun(s): print("Sleeping for %.6f seconds." % s) time.sleep(s) if __name__ == '__main__': example_fun(1.23)
This program outputs something like this:
Sleeping for 1.230000 seconds.
Time was 1.231388
Note that I’m trying for minimal dependency and maximal comprehensibility but there are plenty of official ways to do this like the timeit module.
Function Timeout
Sometimes you’re expecting something to happen and you’re not sure how long it will take. You do know that if it goes beyond a certain threshold, you would rather just abort. An example of this is if you are scanning for fast internet mirrors from which to download something. In this case, by definition, there would exist slow mirrors and they may be so slow that they bog down the operation quite a lot. With the following decorator, you can give each mirror a certain amount of time to attempt its operation before cutting your loses and pulling the plug.
#!/usr/bin/python """See: `man 2 alarm` http://docs.python.org/library/signal.html""" import signal LIMIT= 2 #seconds def nohang(f): """A decorator function to cancel a function that takes too long.""" def raiseerror(signum, frame): # Handlers take two args. raise IOError orig= signal.signal(signal.SIGALRM, raiseerror) signal.alarm(LIMIT) def time_limited_function(*args): try: f(*args) except: print("Timed out!") signal.signal(signal.SIGALRM, orig) signal.alarm(0) return time_limited_function @nohang def wait_this_long(t): import time time.sleep(t) print('Finished OK in %d seconds' % t) if __name__ == '__main__': wait_this_long(3) wait_this_long(1)
Produces:
Timed out!
Finished OK in 1 seconds
Decorator With Arguments
I think that a situation like this:
@d(a) def f(x): return x
…is the same as this:
def f(x): return x i= d(a) # i is an intermediate function which produces a function. f= i(f)
I could be wrong though. Here’s a working example of a decorator that can be adjusted with an argument.
#!/usr/bin/python def Decorator(DecoArg): def DecoArgWrapFunc(FuncPassed2Deco): print('Decorator argument: %s' % DecoArg) def DecoratedFunction(*args): print('Start decoration(%s)...' % DecoArg) RetValOfFuncPassed2Deco= FuncPassed2Deco(*args) print('End decoration(%s)...' % DecoArg) return RetValOfFuncPassed2Deco # To simulate UserFunc. return DecoratedFunction return DecoArgWrapFunc @Decorator('DA') def UserFunc(UserArg): print('In user function with user argument: %s' % UserArg) return pow(2,UserArg) print('Value of call to the user function: %s' % UserFunc(8))
This program produces the following output.
Decorator argument: DA
Start decoration(DA)...
In user function with user argument: 8
End decoration(DA)...
Value of call to the user function: 256
Other Uses Of @ In Python
I was amazed to discover that this is valid code.
[list(mat @ v.co) for v in bm.verts]
That example is some Blender stuff and the @
here is
an operator doing some
matrix multiplication. This was new in 3.5. Works with numpy too.
Exceptions
Philosophy
Error handling in Python is quite powerful, but it can be a bit complex too. In Python an "exception" is an event that accompanies an error situation and it is valuable to realize that all errors in Python use the exception mechanism. Because of this, it’s pretty useful to know how to deal with them. For example, python doesn’t have a simple "exit" keyword and the ultimate way programs stop is when this effectively happens:
raise SystemExit
Although understanding exceptions is important, I have a bit of philosophical unease with the idea of planning for things you didn’t plan for. My thinking is that if you expect an exceptional situation, you should take precautions to make it not exceptional. In the Python world this seems to be divided into the "Easier to Ask for Forgiveness than Permission" (EAFP) and "Look Before You Leap" (LBYL) factions.
The classic case is a division by zero error. What is the functional
difference between letting a division operator raise a very specific
ZeroDivisionError
and just checking to see if the denominator is zero
before proceeding? I think that sometimes you want such explicit
control and sometimes you can tolerate a certain amount of ambiguity.
For example, if you’re going to do 100 divisions and the whole set are
invalid if any one has a zero denominator, then the code might be
easier to write and later understand if you use exception facilities.
However, if you know that a certain attribute might be missing from an
object, it is very reasonable to do a if hasattr(object,propstr):
rather than try: ... except: AttributeError:
. In theory these are
very similar approaches as hasattr
basically calls getattr
and
catches exceptions. However, if using exceptions directly, looking at
the code later will tell you nothing about why you thought to catch an
error there, i.e. something about object
and propstr
. For all your
future self knows, you were just covering the "unknown unknowns".
I like to explicitly check for all the things I can think of which will go wrong (LBYL) and then use exceptions to stagger away still breathing if something truly unforeseeable happens. To me it’s like working on a roof while wearing a safety harness - falling off the roof is still to be avoided and jumping off the roof seems to be seriously full of the wrong attitude.
There are other cases, however, where exceptions are strategically preferred. One example is explained on the OS module documentation. It points out that checking to see if a file is readable and then reading it is not as robust as just trying to open it and then catching an exception when it doesn’t work. The thinking is that in the former case, an attacker could devise a way to change the state of the thing being checked between the check and the action.
Implementation
Code to be monitored for an exception is "tried" with the try
keyword. An "exception" is "raised" (not "thrown" as in C++ and Java).
An exception is, uh, excepted with except
and not "caught" with
catch.
The basic syntax looks like:
try:
AttemptSomething()
except LoneException:
HandleThisBadThing()
except (ExceptionOne, ExceptionTwo, ExceptionN):
HandleAnyOfTheseBadThings()
except Exception as error:
print(error) # Show error while handling. Also see below.
else:
SomethingRanClean() # Executes if no exceptions raised.
finally:
SomethingWasAttempted() # Runs if exceptions were raised or not.
In the above code the Exception
exception catches most sensible
things (but not stuff that you need to stop the program). I find I use
this for overlooking a few anomalous corrupt input records while
processing large quantities of mostly good ones.
for l in fileinput.input(): # Read each line from external data source.
try: # Unexpected bad things can happen because...
process_line(l) # ...I don't know how random data will react.
except Exception as e: # Catch anything interesting.
if debug: # Else, better luck next line and quietly skip.
print(l+" ERROR:"+repr(e)) # Make fuss.
raise SystemExit # Stop now to address this while debugging.
Also don’t forget about assert
which can raise an AssertionError
.
This kind of thing can be useful.
import sys
assert('linux' in sys.platform), "Ask Chris how to fix that..."
That throws an exception and prints that message if the script is run
on the wrong kind of platform. Basically, consider assert
when you
want to raise
an exception conditionally.
Standard Exceptions
What exceptions are there to be raised? Here’s an abridged diagram of
the exception hierarchy. Note that if you except
an
EnvironmentError
exception then it will catch IOError
and
OSError
since those are subclasses, or types, of that exception.
BaseException
+-- SystemExit
+-- KeyboardInterrupt
+-- GeneratorExit
+-- Exception
+-- StopIteration
+-- StandardError
| +-- BufferError
| +-- ArithmeticError
| | +-- FloatingPointError
| | +-- OverflowError
| | +-- ZeroDivisionError
| +-- AssertionError
| +-- AttributeError
| +-- EnvironmentError
| | +-- IOError
| | +-- OSError
| +-- EOFError
| +-- ImportError
| +-- LookupError
| | +-- IndexError
| | +-- KeyError
| +-- MemoryError
| +-- NameError
| +-- ReferenceError
| +-- RuntimeError
| | +-- NotImplementedError
| +-- SyntaxError
| | +-- IndentationError
| +-- SystemError
| +-- TypeError
| +-- ValueError
| +-- UnicodeError
+-- Warning
You can find out about exceptions with help
:
$ python -c "help(EOFError)" | sed -n '/^class/,/ | /p'
class EOFError(StandardError)
| Read beyond end of file.
Or this:
python -c "import exceptions; help(exceptions)"
Exception Data
Looks like there has been a change in the syntax used to manage exception information. Formerly it was something like:
except ValueError, exception_instance:
And now it is something like:
except ValueError as exception_instance:
What this means is that in your except
clause you can use the
exception instance that is generated with the raise
(perhaps as part
of a system error). This exception instance contains some handy stuff
relating to the error (usually). It seems that the various built in
exceptions have a variety of attributes. For example, an IOError will
have a filename
attribute that you can access. Here is an example of
the basic attribute system showing the generic data stored by the
exception object and how to find out what exactly the exception object
can tell you.
#!/usr/bin/python try: raise Exception('arg1','arg2','arg3') except Exception as exception_instance: print("dir(exception_instance):") print(dir(exception_instance)) print("type(exception_instance):") print(type(exception_instance)) print("exception_instance:") print(exception_instance) print("exception_instance.args:") print(exception_instance.args)
Produces:
dir(exception_instance):
['__class__', '__delattr__', '__dict__', '__doc__', '__format__',
'__getattribute__', '__getitem__', '__getslice__', '__hash__',
'__init__', '__new__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__setstate__', '__sizeof__',
'__str__', '__subclasshook__', '__unicode__', 'args', 'message']
type(exception_instance):
<type 'exceptions.Exception'>
exception_instance:
('arg1', 'arg2', 'arg3')
exception_instance.args:
('arg1', 'arg2', 'arg3')
Note
|
The exception_instance.message attribute seems like it has
been deprecated. Best not to rely on it! |
This catches any general exception. Presumably to catch
KeyboardInterrupt and even higher level exceptions that you’re not
expecting, try BaseException
.
Custom Exceptions
If none of the built in exception classes seem appropriate or if they
lack the necessary attributes, you can create your own exception
classes which more specifically do what you need. You can inherit from
any exception, but it’s normal to use Exception
as an uncluttered
base class. Here’s an example of what methods and structure such a
user defined exception class entails.
#!/usr/bin/python class UserDefinedException(Exception): """Best to inherit from Exception class.""" def __init__(self, ex_att_param): self.user_attribute= ex_att_param def __str__(self): return repr(self.user_attribute) try: raise UserDefinedException('Idiot programmer alert!') except UserDefinedException as InstanceOfUserDefinedException: print(InstanceOfUserDefinedException.user_attribute)
This example program produces simply Idiot programmer alert!
.
File operations
Composing Paths
The common way for paths to be assembled so that they work on that
other kind of OS is to use os.path.join()
. But there seems to be a
newer, better way with another built-in module
pathlib.
data_dir = Path("/bodyfitter/data") data_dir.mkdir(exists_ok=True) # make the dir if it's not there some_filename = "cool_file.yo" some_filename_path = data_dir / some_filename some_filename_path.exists() # returns a bool if the file is actually there some_filename_path.parent # returns data_dir some_filename_path.parents[1] # climbs the hierarchy and gives you parents `index` levels up; returns Path("/bodyfitter")
Path(__file__)
gives you the filepath of the file you’re in, so you can do
things like this to orient yourself in a tree where things about the workspace or
environment might change but you at least know where you should be relative to
other things in the hierarchy.
data_dir = Path(__file__).parents[2] / "data"
One of the biggest gotchas though is that functions asking for a filepath to do something with it will only sometimes accept a Path object as a viable representation of the path, sometimes you do end up needing to convert it back to a string. But when passing paths around one’s own consistent code, it just makes a lot of little things much cleaner and easier to work with if they’re Paths.
Checking
Use the os.access()
function to check if the file you are interested
in is in the condition you expect.
if not os.access(f,os.F_OK): print("Nonexistent file problem: %s"%f) if not os.access(f,os.R_OK): print("Unreadable file problem: %s"%f)
You can also use os.W_OK
and os.X_OK
to test for writable and
executable.
Also consider os.path
for checking on directories.
os.path.isdir(path)
Simple Reading
To read an entire file into a string:
entire_file_contents= open(filename).read()
To read each line of a file use something like:
f= open(filename,'r') for l in f: print(l)
Note that this comes with newlines from the file and from the print.
Use sys.stdout.write(l)
to avoid this problem.
Other streams besides sys.stdout
are sys.stdin
and sys.stderr
.
Binary File Operations
Sometimes you need very specific things done with your file writes.
If, for example, you need to aggressively prevent buffering, use the
third argument of 0
in the open
function. But this only works with
binary (e.g. wb
) modes. When using binary modes, you have to make
sure any string types are converted — or encoded — into something
at a lower level. Here’s an example that finally did what I wanted it
to do.
data= '05F10D66 00 F8 B4 01 00 00 FF FF' with open('/dev/ttyACM1','wb',0) as f: f.write((data+'\r\n').encode('ASCII'))
Note the with/as syntax is described below.
with
+ as
It looks like since version 2.5, the fanciest (and best?) way to open
files and read through them is to use the new with
and as
reserved
words.
filename= os.path.join('/tmp','c')
with open(filename,'r',encoding='utf-8') as f:
entirefilecontents= f.read()
That also shows how to use the polite os.path.join
to make path
amalgamations; note the necessary leading slash.
This creates a list of lines in a file.
with open(filename,'r',encoding='utf-8') as f: listofallfilelines = f.readlines()
Need to deal with each line on its own individually? Here’s an example that counts the lines in a file.
count= 0 with open('filename','r') as f: for l in f: count += 1 print(count)
The advantage here, apparently, is that the file gets neatly closed even if it’s rudely interrupted and never makes it to your file closing statement. Or something like that.
A better way to think of the with
as
syntax is that it seems to
set up a "context". An object which has an __enter__()
and
__exit__()
method can be used with with
as
such that the enter
method gets called upon entering the block and the exit method gets
called, unsurprisingly, upon exit. This is why it’s such a reasonable
way to handle file opening because things can be done with the file
and the exit function makes sure that, whatever weirdness transpires,
the file will get closed.
Another example of when this would be appropriate would be some kind
of routine that output some SVG. You may want to adjust the view
parameters with a <g transform=...>
block. You might start with the
opening tag and then do a bunch of stuff and then print a </g>
tag
from the exit method of an object. This will allow for nested objects
and the opening and closing tags will always exist and be correct.
#!/usr/bin/python class SVGsettings(object): def __init__(self,p): self.property= p def __enter__(self): print('<g %s>'%self.property) def __exit__(self, type, value, traceback): print("</g><!--%s-->"%self.property) with SVGsettings('stroke-width="1.5"'): with SVGsettings('stroke="red"'): print('<line x1="3" y1="0" x2="25" y2="44"/>')
<g stroke-width="1.5">
<g stroke="red">
<line x1="3" y1="0" x2="25" y2="44"/>
</g><!--stroke="red"-->
</g><!--stroke-width="1.5"-->
It might be smart to rewrite my HTML tagger with this style.
Another example is one of a transaction "lock". If you need to do
something like update a database and it must be locked to prevent
access from other actors, you can have the lock be in the
__enter__()
method and the release be in the __exit__()
method.
This way, even if bad things happen, the lock will get released
properly.
I think the core of the syntax is that this code (with as VAR
optional):
with EXPR as VAR:
BLOCK
Is equivalent to:
mgr = (EXPR) exit = type(mgr).__exit__ # Not calling it yet value = type(mgr).__enter__(mgr) exc = True try: try: VAR = value # Only if "as VAR" is present BLOCK except: # The exceptional case is handled here exc = False if not exit(mgr, *sys.exc_info()): raise # The exception is swallowed if exit() returns true finally: # The normal and non-local-goto cases are handled here if exc: exit(mgr, None, None, None)
The __exit__()
function needs to take 4 values, self
, type
,
value
, and traceback
. I think these are the arguments of a raise
statement.
Also newer Pythons (2.7 and 3+) support stuff like this.
with open("customers") as f1, open("transactions") as f2: # do stuff with multiple files
Complete mind boggling details can be found here.
Temporary File Names
Here is a sample where a temporary file is created and then the program turns control over to Vim for the user to compose something and when the user quits, the Python program has all of the input. This is what you would do if you wanted to, for example, recreate the functionality of a mail client like Mutt.
#!/usr/bin/python import tempfile import subprocess tmpfile= tempfile.NamedTemporaryFile(dir='/tmp').name vimcmd= '/usr/bin/vim' subprocess.call([vimcmd,tmpfile]) with open(tmpfile,'r') as fob: f= fob.readlines() if f[1].strip() == '='*len(f[0].strip()): print("Title: %s" % f[0]) else: print(f[1]) print('='*len(f[0])) print("%d lines entered." % len(f))
Buffering Issues
Although normally not something to worry about, sometimes it’s
important to remember that Python tends to politely buffer output as a
general rule. You can have unbuffered output by invoking Python with
the -u
run time option. I’ve found this to be important when using
tee
, named pipes, and other fancy stream situations.
Also note that you can use something like print(x,flush=True)
.
Sometimes I do print('.',end='',flush=True)
to print a line of
status dots and I don’t want the status to be incorrect for what is
actually going on.
Compression
Python can deal with compressed files just fine. There are the modules
gzip
and bz2
which are very similar but not identical. Here’s how
to read a gzip compressed file.
import gzip f= gzip.open(filename)
Then work with f
as a normal file handle. This example is
illustrative for writing compressed files:
import gzip f_in = open('file.txt', 'rb') f_out = gzip.open('file.txt.gz', 'wb') f_out.writelines(f_in) f_out.close() f_in.close()
Here’s an example of how to use bz2
. This little program is a
(rough) wc
program for text files with bzip2 compression.
#!/usr/bin/python import bz2 import sys fn= sys.argv[1] b=0;w=0;l=0 f= bz2.BZ2File(fn, 'r') for line in f.readlines(): b+= len(line); w+=len(line.split(' ')); l+=1 print("%d bytes, %d words, %d lines" % (b,w,l))
File Input and Standard Input
The fileinput
really simplifies getting things from files or
standard input.
#!/usr/bin/python import fileinput for l in fileinput.input(): print(l.strip().title())
This program produces this output.
$ ./input.py myfile
This File Can Be Sent As Input Both As
A File Argument And As Standard Input.
$ ./input.py <myfile
This File Can Be Sent As Input Both As
A File Argument And As Standard Input.
$ ./input.py myfile - <myfile
This File Can Be Sent As Input Both As
A File Argument And As Standard Input.
This File Can Be Sent As Input Both As
A File Argument And As Standard Input.
Interactive Input
When writing menu-driven features or other interactive programs that wait for a user to input things the following can be useful. This example shows how to suppress echoing for applications like passwords or where the key press' value is not relevant.
import os os.system('stty -echo') passwd= raw_input('Password:') os.system('stty echo')
Note that Python3 took out raw_input
. Now it’s just input
.
An
answer here has a clever comprehensive solution.
try: input= raw_input except NameError: pass print("You entered: " + input("Prompt: "))
CSV
Now and again some lackwit sends you a file in a popular spreadsheet format. You use something like this to try and decrypt it.
libreoffice --headless --convert-to csv --outdir ./csvdir/ yuck.xls
But then you have crazy business like this.
a,b,"c1,c2,c3",d,"e1,e2",f
Which is extremely tedious to parse. But not with Python!
#!/usr/bin/python import sys import csv with open(sys.argv[1],'r') as f: for r in csv.reader(f): print('|'.join(r))
Bitwise Operators
The official guide is good.
-
x<<y = x’s bits shifted left y places
-
x>>y = x’s bits shifted right y places
-
x&y = bitwise AND - can also be overloaded by set classes (and similar) for union
-
x|y = bitwise OR - can also be overloaded for intersection
-
x^y = bitwise XOR
-
~x = complement - this is supposed to change 1s to 0s and 0s to 1s but it can sometimes get tricky. See below.
Note that the complement turns values into their negative version.
>>> x=64
>>> bin(x)
'0b1000000'
>>> bin(~x)
'-0b1000001'
I think this uses 2s complement. Here is an example of decoding a list of 4 bytes (LSB to MSB order) with 2s complement to handle negative values properly.
# This simple method works for positive values. #return (b[3]*256**3 + b[2]*256**2 + b[1]*256 + b[0]) # But 2's complement must be handled for negative. Should also be much quicker. x= 0 for n,B in enumerate(b): # n= is byte index and B is that data byte. x |= B<<(n*8) # Slide first LSB byte over 0, then 8 for next, then 16, finally 24 for MSB. if x>>31: # If leftmost (MSb) is 1, then negative conversion is necessary. x -= 4294967296 # Subtract off 1<<32 from val for 2's complement. return x
Note that besides using the bin()
function, you can visualize binary
with a format specifier like this: "{0:b}".format(x)
Of use format strings. This shows how to preserve consistent length so
you can see leading zeros which can often ease confusion.
>>> f"{213:0{8}b}"
'11010101'
>>> f"{22:0{8}b}"
'00010110'
Here’s an illustrative example of iterating through all the possible configurations of 4 light switches.
>>> q=4
>>> for n in range(2**q):
... print(f"{n:0{q}b}")
...
0000
0001
0010
<...etc...>
1101
1110
1111
Binary Data
Sometimes clever people put data into very efficient binary containers. Using C is the preferred way to deal with this, but if you’re lazy, Python does a great job of decoding binary data too.
>>> import struct
>>> packformat='>cHcHIccccccccccccccccHHIcHccHcHIcHccccccccccccccccccccc'
>>> struct.unpack(packformat,open('/tmp/mybinary.sbd','rb').read())
('\x01', 69, '\x01', 28, 2200337468, '3', '0', '0', '2', '3', '4', '0', '6', '2',
'9', '5', '9', '9', '6', '0', '\x00', 11682, 0, 1456502452, '\x03', 11,
'\x01', ' ', 49763, 'u', 12901, 0, '\x02', 21, '\x00', ' ', 'M', '@',
'\x00', '\x01', 'P', '\xef', '\xf0', ' ', '\x08', 'J', '\x00', 'Y', '_',
'\xcc', '&', 'L', '\x91', '\xe7', '}')
-
c = char 1 byte 0-255 (256 values)
-
H = unsigned short 2 byte 0-65535 (65,536 values)
-
I = unsigned int 4 bytes 0-4294967295 (4,294,967,296 values)
Full unpack codes can be found in the official struct module documentation.
If you’re using binary data, you might need to convert bases which is described here.
See also the bitwise section.
Pickle
Although there are often better and more secure ways to save Python objects (see JSON below for example), an old classic is Python’s pickle. This object serialization basically just takes any Python object and makes it into a thing that can be written into a file. The end result of this trick is that you can dump some memory state (to a file, across a network, etc) and load it back into memory at another time and place.
import pickle my_object= My_Object(1,2,3) # ===== Save Object ===== with open('my_object.p','wb') as pickle_file: pickle.dump(my_object,pickle_file) # ===== Clear Object ===== my_object= None # ===== Restore Object ===== with open('my_object.p','rb') as pickle_file: my_object= pickle.load(pickle_file)
Pickle can serialize any objects you dream up. If your objects don’t involve homemade classes, i.e. they only use Python native types, consider the marshal module.
I think the shelve module provides a key/value style interface to pickle, if you like that kind of thing.
This example compares pickle with json.
#!/usr/bin/env /usr/bin/python3 import json import pickle class MyOwnClass: def __init__(self): self.oblist= [1,2,3,4,'ok'] self.obA= self.calcA() self.obB= 1 def __repr__(self): return f'{self.obA:d} and {self.obB:d}' def calcA(self): return 5 def pickling(A): print("== Pickling ==") with open('/tmp/my_object.p','wb') as pickle_file: # Note 'wb' mode. pickle.dump(A,pickle_file) # Understands and records full object. print("Unpickling... creates: <class '__main__.MyOwnClass'>") with open('/tmp/my_object.p','rb') as pickle_file: # Note 'rb' mode. P= pickle.load(pickle_file) print(P) def JSONing(A): print("== JSONing ==") with open('/tmp/my_object.json','w') as json_file: # Note 'w' mode. json.dump(A.__dict__,json_file) # Note `.__dict__` attribute. print("UnJSONing... creates: <class 'dict'>") with open('/tmp/my_object.json','r') as json_file: # Note 'r' mode. J= json.load(json_file) print(J) A= MyOwnClass() print(A) pickling(A) JSONing(A)
5 and 1
== Pickling ==
Unpickling... creates: <class '__main__.MyOwnClass'>
5 and 1
== JSONing ==
UnJSONing... creates: <class 'dict'>
{'oblist': [1, 2, 3, 4, 'ok'], 'obA': 5, 'obB': 1}
As you can see, the pickle preserved the whole object including the methods. JSON needed to use the dict attribute to just get the data. What is read back in is not the object, but just the data. For more details on the json module, see the next section.
JSON
There are more Pythonic ways of serializing objects (marshal, pickle,
cpickle) but in 2013, the way that makes the most people happiest
across platforms and languages is JSON. Serendipitously, JSON looks
almost identical to a Python dictionary’s __repr__()
output.
Here’s a sample of how to deal with JSON in a simple case.
#!/usr/bin/python import json import sys pfile= open("test.json",'r') P= json.load(pfile) for p in P.keys(): P[p]+= 1 json.dump(P,sys.stdout) # Put some writable file object here. sys.stdout.flush()
This might produce this result:
$ cat test.json
{"a": 1.5, "b": 1.5707963, "c": 0.95, "d": 0.55, "e": 10.0}
$ ./json_sample.py test.json
{"a": 2.5, "c": 1.95, "b": 2.5707963, "e": 11.0, "d": 1.55}
System Control
Python has several methods to allow arbitrary execution of system commands (exiting to a temporary shell). Obviously this is powerful and dangerous where security is an issue. It’s also often clumsy as the proper Python way of doing things is usually better than the shell way when you factor in the spawning of the shell.
This stuff has gone through a lot of changes over the years, but as of 2014, the consensus is to use the subprocess module.
Here is a nice overview of this kind of stuff.
Here are some methods:
os.listdir('./path') # Produces a list. `~` doesn't work. No hidden files. os.system('ls ./path') # Just does the thing. os.popen('ls .','r').read() # Captures the output into a string. for f in os.popen('ls .','r').readlines(): print(f)# Deal with each.
If this doesn’t do what you need, you can investigate the fancier
functions of os
like popen2
, popen3
, popen4
, fork
, spawn
,
and execv
. See the official
os help for more details.
Note
|
It seems that popen and friends are now deprecated since
version 2.6. This is a real moving target. Looks like the new way is
the subprocess module. |
Subprocess
Here’s the recommended way for executing shell commands as of 2013.
Start with getting the lines of output from the simplest kind of command to fill a Python list.
''.join(map(chr,subprocess.check_output(['cal']))).split('\n')
The reason for all that guff is that this check_output
command
produces a bytes object. Another way to untangle a byte stream object
is to decode
it.
>>> b'Line one\nLine two'.decode('utf-8').split('\n')
['Line one', 'Line two']
Here are some more examples.
>>> import subprocess >>> n= subprocess.Popen(['df','-h','/media/WDUSB500TB'],stdout=subprocess.PIPE) >>> o= n.stdout.read() >>> o 'Filesystem Size Used Avail Use% Mounted on\n/dev/sdb 459G 350G 86G 81% /media/WDUSB500TB\n'
Note the stdout=subprocess.PIPE
value to the Popen constructor. This
is required to keep the function from immediately dumping the results
on the spot. The function does run immediately when the constructor
runs. So if you do a date
function, for example, and there’s a lag
between the constructor and the n.stdout.read()
the time will
reflect the initial operation.
Proper Python
documentation suggests that it’s good to use the supplied convenience
functions when possible. These are call
, check_call
, and
check_output
. Here’s how the latter work:
import subprocess findcmd= ['find', '/home/xed/', '-name', '*pdf'] for PDF in subprocess.check_output(findcmd).strip().split('\n'): print("PDF: %s" % PDF) # Might output list like: # PDF: /home/xed/SlaughterhouseFive.pdf # PDF: /home/xed/gpcard.pdf
I started having trouble reading lines of standard output from a process with Python 3. Here’s a way that worked.
import io proc= subprocess.Popen(CMD,stdout=subprocess.PIPE) for oo in io.TextIOWrapper(proc.stdout, encoding="utf-8"): o= oo.strip() # Hmm. Wish I could think of a smarter way to do this.
Here’s an example of an outside command that gets run with some data the Python program knows about being piped to the command and the standard output being captured back into the program.
>>> import subprocess >>> pro= subprocess.Popen(['/usr/bin/tr','a-z','A-Z'], shell=False, stdin=subprocess.PIPE,stdout=subprocess.PIPE) >>> pro.stdin.write("It might get loud.\n") >>> pro.communicate() ('IT MIGHT GET LOUD.\n', None)
Note that the second item (pro.communicate()[1]
) is the standard
error.
Environment Variables
To access environment variables from python use this technique:
>>> import os
>>> os.environ['USER']
'xed'
Console Colors
I came up with this approach to tagging text with ANSI escape codes.
#!/usr/bin/python3 color= { # B___=Background, L___=Light, LB___=Light/Background 'BLD':'\33[1m', 'ITL':'\33[3m', 'UNL':'\33[4m', 'BNK':'\33[5m', 'INV':'\33[7m', 'BLK':'\33[30m', 'RED':'\33[31m', 'GRN':'\33[32m', 'YEL':'\33[33m', 'BLU':'\33[34m', 'VIO':'\33[35m', 'CYN':'\33[36m', 'WHT':'\33[37m', 'GRY':'\33[90m', 'LRED':'\33[91m', 'LGRN':'\33[92m', 'LYEL':'\33[93m', 'LBLU':'\33[94m', 'LVIO':'\33[95m', 'LCYN':'\33[96m', 'LWHT':'\33[97m', 'BBLK':'\33[40m', 'BRED':'\33[41m', 'BGRN':'\33[42m', 'BYEL':'\33[43m', 'BBLU':'\33[44m', 'BVIO':'\33[45m', 'BCYN':'\33[46m', 'BWHT':'\33[47m', 'BGRY':'\33[100m', 'LBRED':'\33[101m', 'LBGRN':'\33[102m', 'LBYEL':'\33[103m', 'LBBLU':'\33[104m', 'LBVIO':'\33[105m', 'LBCYN':'\33[106m', 'LBWHT':'\33[107m' } def colorfn(v): return lambda t:v+t+'\033[0m' for k,v in color.items(): color[k]= colorfn(v) if __name__ == '__main__': print(12*'='+ color['RED'](' Color Examples ') +'='*12) # General usage. for k in color: print(color[k]('Console color testing: '+k)) # Full test.
It seems to work in normal graphical terminals and in the console. YMMV on a bad OS.
Time And Date
Working with times and dates can be tricky in Python. There are a lot
of seemingly overlapping modules (date
, time
, datetime
) and
everything is done very fastidiously. This can make simple things seem
complex. Here are some common usage cases dealing with times.
import datetime print(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
This produces '2012-10-18 16:50:37'
and is typical of timestamps
found in logging situations.
If you have some kind of Unixy tool giving you seconds from the epoch, you can tidy that up with this.
>>> datetime.datetime.fromtimestamp(1456502452).strftime('%Y%m%d %H:%M:%S')
'20160226 08:00:52'
The timedelta
objects can be useful for relative dates. Again note
the datetime.datetime.SOMEFUNCTION
syntax which is not entirely
obvious in the Python documentation.
today= datetime.datetime.strptime('2016-05-20','%Y-%m-%d')
aday= datetime.timedelta(1)
yesterday= today-aday
lastweek= today-(7*aday)
Another common requirement involving time is to profile code or for some other reason find out how long something took.
import time start= time.time() do_some_lengthy_thing() print('Elapsed time: %f' % (time.time()-start))
It might be better to use time.perf_counter()
or
time.perf_counter_ns()
instead of time.time()
. These specialize in
performance timing at the highest resolution possible (though the "ns"
version is limited to nanoseconds as you’d expect). Note that any
specific values, i.e. not used to calculate time deltas, are
meaningless.
See the function timing decorator for this kind of application implemented in a general way.
Here’s an simple example that calculates days between two events.
#!/usr/bin/python """ Usage: dateelapsed.py 2012-01-09 2014-03-13 """ import datetime,sys s,e = sys.argv[1],sys.argv[2] sd= datetime.datetime.strptime(s,'%Y-%m-%d') ed= datetime.datetime.strptime(e,'%Y-%m-%d') dd= ed-sd print(dd.days)
Need to just slow things down or wait in a polling loop?
while still_working(): time.sleep(.1)
Random Numbers
Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin.
Oh well. Here’s how to use random numbers in normal usage.
import random random.seed([any_hashable_object]) # Default is sys' time or urand random.randint(a,b) # a <= N <= b random.choice(sequence) # Pick one random.shuffle(sequence) # Same list scrambled - in place! random.random() # floating point [0.0,1.0) '%06x' % random.randint(0,0xFFFFFF) # random hex color for HTML, etc.
>>> A,B=[1,2,3,4,5],['a','b','c','d','e']
>>> import random
>>> Z=(list(zip(A,B))) ; random.shuffle(Z)
>>> A,B=[n[0] for n in Z],[n[1] for n in Z]
>>> A,B
([3, 2, 1, 5, 4], ['c', 'b', 'a', 'e', 'd'])
Also see from sklearn.utils import shuffle
.
More Entropy Please
Also, os.urandom(n)
returns n random bytes from /dev/urandom
or
some other OS specific collection of high quality entropy. This is
slower and depends on actual random events having occurred on the
system.
Less Entropy Please - For People Who Do Not Like Importing
Do you have some trivial little thing that needs some very rough
unimportant randomness but you don’t want to import random
for such
a trifle? Here’s a way using Python’s core abilities.
n= str(anything_as_a_seed)
R,G,B= hash('R'+n)%255, hash('G'+n)%255, hash('B'+n)%255
Hashing
There is a built in function called hash
which can return the
numerical hash of a hashable object.
>>> print(hash('xed'))
-2056704
But that is probably not what you need. This is the more common application and technique for hashes:
import hashlib md5sum= hashlib.md5(open(fn,'rb').read()).hexdigest()
Here’s a tip that might be helpful.
fn= 'my_critical_data.tgz' goodmd5= '5d3c7e653e63471c88df796156a9dfa9' actualmd5= hashlib.md5(open(fn,'rb').read()).hexdigest() assert actualmd5 == goodmd5, '{} is corrupted!'.format(fn)
Besides md5, hashlib
also supports SHA1, SHA224, SHA256, SHA384, and
SHA512 hashing.
Note
|
It used to be import md5 but that is apparently deprecated. If
you’re using a very old Python and hashlib doesn’t work, give it a
try. |
Base Conversion
To convert common base whole numbers to decimal integers is pretty easy.
>>> int("CE",16)
206
>>> int("1000000",2)
64
This is very handy when getting decimal values out of hex color codes.
>>> hex_color='CEfF80'
>>> R,G,B= [int(hex_color[x:x+2],16) for x in [0,2,4]]
Or alternatively just use native type syntax for common bases if that’s easier.
>>> 0744; 0xCE; 0b100
484
206
4
Going the other way can be done with special built-in functions.
>>> oct(484); bin(512); hex(206)
'0744'
'0b1000000000'
'0xce'
If you don’t like the prefix there are formatting tricks.
>>> "{:o}".format(484); "{:b}".format(512); "{:x}".format(206); "{0:02d}".format(0b00011110)
'744'
'1000000000'
'ce'
'30'
It looks like you can skip using "str".format(args...)
by doing
something like this. This is probably the cleanest looking way to do
this.
>>> f"{math.pi:0.3f} are squared."
'3.142 are squared.'
Old style can do some too.
>>> '%x %o' % (206,484)
'ce 744'
Fullish details on formatting tricks can be found here.
Unfortunately that only works for the commonly used bases. Here’s the simple function I came up with to solve this problem.
def dec2N(n,base=16): anint,o,s= type(int()),str(),str() if (type(n) is not anint) or (type(base) is not anint) or base < 1: return "Undefined" if base == 1: return "|"*n if n == 0: return "0" if n<0: s,n= '-',n*-1 def digit_str(x): if x > 15: return '(%s)'%x if x > 9: return chr(55+x) return str(x) while n: o= digit_str(n%base)+o n= n//base return s+o
This handles hex letters (and could handle higher letters in higher
bases by changing the > 15
value). It doesn’t have stack depth
recursion issues and should work fine for most normal things I can
think of.
If you need very exotic fractional hex conversion, investigate something like this.
>>> float.hex(2.25)
'0x1.2000000000000p+1'
Math
The math
module does all the normal stuff, usually as expected.
-
pi
- Constant ready to use. -
e
- Constant ready to use. -
ceil
- Next whole number float. -
floor
- Previous whole number float. -
sqrt
- Useimport cmath
for negative values and fun with complex numbers. -
atan
- Returns in radians. -
atan2
- Takes two arguments, a numerator and a denominator, so that the correct quadrant can be returned. -
sin
- Trig functions like radians. -
degrees
- Convert from radians. -
radians
- Convert to radians. -
log
- Don’t get caught byint(math.log(1000,10))
being equal to 2. -
log10
- Useint(math.log10(1000))
instead. -
gamma
- Fancy float capable way to do factorials. Maybe supply n+1 if you wantmath.factorial
or just use that function.
NumPy
Can of worms! But super powerful. The key trick of NumPy is that it has an array object that makes arrays more like C arrays (with strides) but with all the accounting done. This allows the performance to be much better than native Python objects, especially for large numeric data sets. It’s also quite good at linear algebra. See my TensorFlow notes for an example of that.
This is a nice tip for getting help strings out of NumPy syntax.
np.info(np.npcommand)
Numpy uses an idea called "broadcasting" to help work with arrays of different dimensions. For example, A * B where A is 10x1 and B is 1 results in a 10x1 array. And a 4x1 array and a 1x3 array results in a 4x3. More gory details and a decent explanation can be found here.
A quick note about some common optional arguments.
-
keepdims=True
- will cause something likesum(array([[1,2],[1,1],[1,0]]))
to be anarray([[6]])
whileFalse
will just produce 6. -
axis=1
- takessum(array([[1,2],[1,1],[1,0]]))
and returnsarray([3,2,1])
.axis=0
will give [3,3] (or something like that). -
dtype=?
- allows a specific numpy type to be specified.
And when printing these objects out, there are some formatting options that can come in handy. See numpy.set_printoptions.
-
linewidth - Defaults to 75 which is very often annoying.
-
sign - Prints "+" for positive values.
-
threshold - Number of items to show before summarizing with
...
. -
edgeitems - How many summary items in representations like
[1,2,...3,4]
. -
precision - How many decimals to display.
-
floatmode - Display all decimals including trailing zeros (
fixed
) or perhaps as needed (maxprec_equal
the default).
Just set the option somewhere before using the output methods.
np.set_printoptions(linewidth=200)
NumPy Types And Object Attributes
-
np.int, np.uint8, np.int64, np.uint16, np.int8, np.int16, np.intc (C sized int)
-
np.float, np.float32, np.float64, np.float16
-
np.complex (same as 128), np.complex128, np.complex64
-
np.bool
-
np.object
-
np.string_
-
np.unicode_
-
mynparr.shape
-
mynparr.flags
-
len(mynparr)
-
mynparr.ndim
-
mynparr.size
-
mynparr.nbytes
-
mynparr.dtype
-
mynparr.dtype.name
-
a[x] - element x
-
a[x,y] - element x,y, similar, perhaps the same as a[x][y].
-
a[0:3,0:3,0:3] - full slices options for each dimension.
-
a[1,…] - same as a[1,:,: etc ,:]
-
mynparr.view(<type>)
-
np.array(my_normal_list) - convert to proper numpy array which is technically an
ndarray
(n-dimensional array) object. -
mynparr.astype(uint8) - cast as integer (not default floats).
-
(mynparr.astype(np.float32) - 127.5) / 127.5
- cast from uint8s 0 to 255 into -1 to +1. -
mynparr.tolist() - convert back out of NumPy to regular Python.
Creating Arrays
-
np.loadtxt(file.txt[,skiprows=1][,delimiter=|) - load from text file. Also, see np.savetxt(file,a,delimiter=|) and np.save()
-
np.array( [ (1,2,3),(4,5,6) ], dtype=float)
-
np.ones( (x,y,z…), dtype=np.int16 )
-
np.zeros( (x,y) )
-
np.zeros_like(template) - I think this makes a zero array in the shape of another one. Like your other array all zeroed out. Also there’s a np.ones_like() that’s about the same but for ones.
-
np.empty( (x,y,z…) ) - similar to zeros but in reality the values are never set.
-
np.arange(2,101,2) = 2,4,6…98,100 - Third arg is tick interval.
-
np.linspace(start,end,[qty]) - qty defaults to 50. evenly spaces values. Third arg is number of ticks.
-
np.full( (x,y), val ) - fills an array of specified size with val. Also if
full
is not there, tryfill
like this:a=np.empty(n);a.fill(val)
-
np.eye(n) - Identity matrix (array really) of given size. Same as np.identity(n).
-
np.random.seed(whatever) - allows controlled repeats of random experiments.
-
np.random.random( (x,y…) ) - makes an array of specified size with random values.
-
np.random.binomial(k,p,size=n) - Create a set from a binomial distribution. A handy one is
np.random.binomial(1,.5,size=20)
which makes 20 random 1s or 0s. Good for things like dropout layers.
Arithmetic
-
np.add(a,b) - two arguments only. Adds corresponding elements of two arrays. Also has the
+
operator overloaded. -
np.sub(a,b) - similar to np.add.
-
-
np.multiply(a,b) - Simple multiplication of each corresponding element.
*
-
np.divide(a,b) - Same as multiply.
/
-
np.remainder(a,b) - Same as mod. Maybe
%
. -
np.exp(n) - Raise e (2.71828182846) to the power of each element in n.
-
np.sqrt(n)
-
np.sin(rad) - Sine of radians for each element.
-
np.arctan2(y,x) - Normal trig angle finding.
-
np.cos(rad) - Cosine of radians for each element.
-
np.log(n) - Natural log for each element.
-
np.dot(a,b) - Dot product of an array. Also a.dot(b) format.
-
np.sum(a) or a.sum() - Adds up contents. Even adds nested values unless a
,axis=n
is included to lock certain dimensions. -
np.min(a)
-
np.max(a) - Same as np.amax() both which return the maximum array element from a single array. Can supply an optional axis=0 which gets confusing fast. See this.
-
np.maximum(a,b) - Unlike np.max() mentioned above, this one takes two np.array arguments and does a listwise max check.
np.maximum(np.array([1,2,3,4,5]),np.array([5,4,3,2,1]))
produces an array of 5, 4, 3, 4, 5. My experiments showed that it would take a 3rd argument but that it seemed to be ignored. Some people believe that np.max() is actually np.maximum.reduce() in disguise. -
np.histogram(a,bins=b,range=(0,255)) - range defaults to a.min() and a.max(). Returns counts in bins and bin edges (fenceposts).
np.histogram(np.array([1,2,2,3,3,3]),bins=4, range=np.array([0,4])) 1, 2, 3]), array([ 0., 1., 2., 3., 4.]))
-
np.argmax(a) - returns the index of the maximum value’s position. Good for finding the peaks locations of a histogram, for example. Also good for picking the best possibility from an output vector of, say, softmax probabilities from a neural network guess.
-
np.argmin(a) - Similar to argmax.
-
np.nonzero(a) - returns indices (or array of locations) where nonzero values occur.
a=np.hstack( (np.arange(3),np.arange(3) ) ) ; (a==1).nonzero()A
produces(array([1, 4]),)
. -
np.mean(a) - average
-
np.median(a)
-
np.std(a) - standard deviation. Or without np:
math.sqrt(sum((x-mean)**2 for x in l)/len(l))
-
np.corrcoef(a,y=b) - Pearson correlation coefficient. a and b must have the same shape.
-
np.logical_or(a,b) - Or.
-
np.logical_and(a,b) - And.
-
np.logical_not(a,b) - Not.
-
np.equal(a,b) -
==
- listwise, returns array of bools. -
np.array_equal(a,b) - True or False if the whole arrays are identical.
Also a[a<2] gives true values where a < 2.
-
np.polyfit(x,y,degree) - So with x 0 to 5 and y being x^2, like this
np.polyfit(np.arange(6),np.arange(6)*np.arange(6),2)
, producesarray([1,0,0])
since this isy= 1*x^2 + 0*x + 0
. This is also very useful to calculate slopes of arbitrary data (using a degree of 1). -
np.cumsum(a) - changes (1,2,3,4) to (1,3,6,10). Cumulative sum. have same length.
-
np.convolve(a,b) - a is longer than b (or they’re auto switched). This is complicated, but I think here it is a function C(t) where t is (time but whatever) an offset for the values of two functions. So at t=0, a and b are checked in the same place and the values are multiplied where they align, the products are summed, and that’s the value returned at C(0). At t=10, the a function is sampled and the b is taken from 10 units ahead (or behind?). A product is found, everything is summed, and that’s C(10). "Same" means size. "Valid" is only returning the full overlapping region.
np.convolve(np.ones(5),np.ones(5)) ([ 1., 2., 3., 4., 5., 4., 3., 2., 1.]) np.convolve(np.ones(5),np.ones(5),mode="same") ([ 3., 4., 5., 4., 3.]) np.convolve(np.ones(3),np.ones(2),mode="valid") ([ 2., 2.])
Here’s an example demonstrating that it’s a type of sum of products function. Imagine three $100 purchases in different states.
tax,cost=np.array([1.08,1.03,1.04]),np.array([100,100,100])
The total spent can be computed like this.
np.convolve(cost,tax,mode="valid") array([ 315.])
See
np.polmul()
too. If you dare. -
np.clip(a,min,max) - Ensures no element in
a
is less thanmin
nor greater thanmax
, resetting values to min and max as needed. -
np.sort(a) - Sorts in place. Seems to return nothing. Use axis where needed.
-
np.flip(a,axis) -
flipup
is same with axis=0,fliplr
is same with axis=1. -
np.flipup(a) - Flips the array up for down (mirrors on a horizontal axis).
-
np.fliplr(a) - Flips the array left for right (mirrors on a vertical axis).
-
np.rot90(a) - Rotates matrix values. Seems CCW.
np.rot90( np.arange(4).reshape(2,2) )
=array([ [1, 3], [0, 2] ])
-
np.copy(a) - Deep copy?
-
np.transpose(a) - or a.T, transpose - makes 3x2 into 2x3.
-
np.ravel(a) - flattens into 1-d array. Apparently the same as
reshape(-1, order=order)
. Basically:np.ravel(np.array([[1,2],[3,4]])
becomesarray([1,2,3,4])
. Not to be confused withtensorflow.contrib.layers.flatten
. -
np.reshape(a,(newx,newy)) - rearranges dimensions but keeps data. This example makes 3 sets of 2.
np.arange(6).reshape(3,2)
→array([[0,1],[2,3],[4,5]])
One dimension argument can be-1
which says to arrange the data into the minimum number of dimensions compatible with any other specified dimension. Useful for unwrapping x,y images into vectors, e.g.np.array([[1,2],[3,4]]).reshape(-1)
→array([1,2,3,4])
, same asreshape(4)
but for when you don’t know the 4. -
np.resize(a,(newx,newy)) - adds (recycled?) data if needed to pad things.
-
mynparr.squeeze() - remove all axes with a length of 1.
-
np.mgrid= Fills multi dimensional arrays with puzzling sequences related to the arrays' dimensions. "dense mesh grid"
np.mgrid[:2,:2]
=array([ [ [0, 0], [1, 1] ], [ [0, 1], [0, 1] ] ])
Use with transpose to get coords for a grid pattern. -
np.ogrid= Similar to mgrid but even weirder. "open mesh grid"
np.ogrid[:2,:2]
-[array([ [0], [1] ]), array([ [0, 1] ])]
-
np.unique(a) - removes duplicate items.
-
np.append(a,b) - Almost identical to concatenate but with syntax differences. Note that you don’t append
[ [*] [*] [*] ]
with a[*]
. You need a[ [*] ]
. See Growing Arrays below. -
np.insert(a,pos,item) - inserts item at position of array a.
-
np.delete(a,[n]) - delete item n from array a. Not in place!
-
np.concatenate( (a,b) ) - [1,2,3] and [4,5] become [1,2,3,4,5].
-
np.c_[a,b] - stack by columns
-
np.column_stack( (a,b) ) - seems the same as np.c_
-
np.r_[a,b] - very similar to concatenate for some simple arrays. The r is for stacking by rows.
-
np.vstack( (a,b) ) - vertical stack. If a and b have shape (3,2) then (6,2) results.
np.vstack( (np.arange(3),np.arange(3) ) )
producesarray( [ [0, 1, 2], [0, 1, 2] ] )
. -
np.hstack( (a,b) ) - horizontal stack. If a and b have shape (3,2) then (3,4) results.
np.hstack( (np.arange(3),np.arange(3) ) )
producesarray([0, 1, 2, 0, 1, 2])
. -
np.hsplit(a,n) - makes a list of arrays broken as specified.
-
np.vsplit(a,n) - similar to hsplit but with different axis perspective.
-
np.dstack( (a,b) ) - if a and b’s shape is (3,2), this makes a shape (3,2,2). Imagine multiple 2d images now in an array (stack) indexable with another dimension.
-
np.where(x<128,x+1,-1) - replaces all instances of x where x is less than 128 with x+1 and the rest are set to -1.
Growing Arrays
The traditional idea with arrays is that you reserve the memory you need and that’s that. But sometimes you need to build an array up from smaller parts and it’s more convenient to increase its size than replace parts of it (e.g. you may not know the final size). This happened to me where I needed to read in a sequence of images and store the whole collection as an array (holding each image) of an array (holding each image’s row) of an array (holding each row’s column) of an array (holding each pixel’s RGB). Assume a collection of three 2x2 grayscale images.
ims= np.reshape(np.random.random(12),(3,2,2)) * 255
ims= ims.astype(np.uint8)
array([[[ 99, 5], [137, 73]], [[145, 124], [ 14, 36]],
[[183, 78], [ 88, 82]] ], dtype=uint8)
Now suppose you have a new image that you want to add.
i= (np.reshape(np.random.random(4),(2,2)) * 255).astype(np.uint8)
array([[155, 237], [160, 27]], dtype=uint8)
You might think that having something like [*]
would be what you
need to add to something like [ [*] [*] [*] [*] ]
but in fact, you
need something like [ [*] ]
. So here’s what works.
i.shape
(2, 2)
i= i.reshape(1,2,2)
i.shape
(1, 2, 2)
np.append(ims,i,axis=0) # Not in place!
array([[[ 99, 5], [137, 73]], [[145, 124], [ 14, 36]],
[[183, 78], [ 88, 82]], [[155, 237], [160, 27]]], dtype=uint8)
np.vstack((ims,i)) # Does the same thing. Note extra paren.
np.concatenate((ims,i)) # Does the same thing.
np.r_[ims,i] # Unbelievably, same thing.
Sorting
Python sorting used to be kind of tricky since the sort function was something that was attached to a list object and sorted in place. That is still true. For example:
>>> a=[3,4,1,2,0]
>>> a.sort()
>>> a
[0, 1, 2, 3, 4]
This caused so much confusion that a new function was added to return a sorted version of the original list. This produces a new list and leaves the original one alone.
>>> a=[3,4,1,2,0]
>>> sorted(a)
[0, 1, 2, 3, 4]
>>> a
[3, 4, 1, 2, 0]
Complex Object Sorting
There are many fancy ways of sorting things. Often you have a list of lists and you want to sort by some item in the list. Here’s a list of tuples representing (model_number,score) which need to be sorted so that the top 5 scoring models are displayed.
for top5 in sorted(score_list,key=lambda x:x[1],reverse=True)[0:5]: print('#{}={:.3f}'.format(top5[0],top5[1]))
Strangely I haven’t found a cleaner way to do this. Here’s another more complicated example of a two level sort.
m=[ ['Cho Oyu',8188,1954], ['Everest',8848,1953], ['Kangchenjunga',8586,1955], ['K2',8611,1954], ['Lhotse',8516,1956], ['Makalu',8485,1955] ] ms2= sorted(m, key=lambda x:x[1], reverse=True ) # Secondary key ms1= sorted(ms2, key=lambda x:x[2]) # Primary key
Here the result ms1
is sorted by date of ascent (earliest first) and
then, if that is the same, by mountain height (highest first). The
results look like this:
[['Everest', 8848, 1953], ['K2', 8611, 1954], ['Cho Oyu', 8188, 1954],
['Kangchenjunga', 8586, 1955], ['Makalu', 8485, 1955], ['Lhotse', 8516, 1956]]
Graphics
There are many options for getting Python to draw arbitrary things graphically.
Tool | Import | Package 1 | |
---|---|---|---|
Tkinter |
|
|
De facto standard. |
|
|
Not the easiest to use. |
|
|
|
Specializes in PostScript. |
|
|
|
||
|
|
||
|
|
Major window took kit. |
|
|
|
Major window tool kit. |
|
|
|
Another QT binding lib. |
|
|
|
Major window tool kit. |
|
|
|
Format filters mostly. |
1. On Ubuntu 12.04.
2. Already installed on Ubuntu and CentOS.
I tend to often just write directly into PostScript.
Tkinter
Although Tkinter is not installed by default on many Linux systems, the rumor is that it is included with Python on other platforms. It is the official graphics toolkit for Python and is blessed by the language maintainers. If you just need to open a window on your screen and draw some stuff, say to plot some data, it is probably the easiest option (well, besides simple SVG). Here is a working example that does the minimum useful thing:
from Tkinter import * c= Canvas(bg='white', height=1000, width=1000) c.pack() c.create_line(100,100,200,200) # X1,Y1,X2,Y2
Plotting Data Visualization Graphs
If you need to "graph" some data, Python can help. The main technique
is to use matplotlib
. Although a bit overly fancy and likely to
spontaneously burst into a GUI, it is powerful and, in some modes, easy:
from pylab import * x= [1,2,3]; y= [1,4,9] plot(x,y) # show() # Use this for interactive goofing off. savefig('./filename.png')
For more details, check out my complete notes on matplotlib.
Also, check out Pychart.
Plotting Graph Theory Graphs
See pydot which is the Python interface to the mighty Graphviz package.
Command Line Parsing
Best to use argparse.
getopt
The original way to parse options draws stylistic inspiration from the C version. Many languages (Bash, Perl) have such a thing and if you’re used to one of them, the Python version won’t be too complicated.
#!/usr/bin/python # An example of how to parse options with the 'getopt' module. import sys import getopt # Initialize help messages options= 'Options:\n' options= options + ' -a <alpha> Set option alpha to a string. Default is "two".\n' options= options + ' -b <beta> Set option beta to a number. Default is 1.\n' options= options + ' -h Show this help.\n' options= options + ' -v Show current version.' usage = 'Usage: %s [options] arguments\n' % sys.argv[0] usage = usage + options # Initialize defaults alpha= "one" beta= 2 version="v0.0-pre-alpha" # Parse options try: (opts, args) = getopt.getopt(sys.argv[1:], 'ha:b:v', ['help','alpha=','beta=','version']) except getopt.error, why: print('getopt error: %s\n%s' % (why, usage)) sys.exit(-1) try: for opt in opts: if opt[0] == '-h' or opt[0] == '--help': print(usage) sys.exit(0) if opt[0] == '-a' or opt[0] == '--alpha': alpha= opt[1] if opt[0] == '-b' or opt[0] == '--beta': beta= int(opt[1]) if opt[0] == '-v' or opt[0] == '--version': print('%s %s' % (sys.argv[0], version)) sys.exit(0) except ValueError, why: print('Bad parameter \'%s\' for option %s: %s\n%s' % (opt[1], opt[0], why, usage)) sys.exit(-1) if len(args) < 1: print('Insufficient number of arguments supplied\n%s' % usage) sys.exit(-1) print('alpha=%s beta=%s' % (alpha, beta)) for (n,a) in enumerate(args): print('Argument %d: %s' % (n,a))
argparse
There is a module called optparse which has been deprecated since
Python version 2.7. In its place is the newer and pretty awesome
argparse
module.
Official documentation
is here. If you’re using an ancient system, check to see if it’s
available but these days (e.g. Python 3) it always is.
These are the main steps to using this module.
-
Import module.
-
Define a parser object.
-
Add arguments to the parser object.
-
Parse the parser object.
-
Use the parsed result.
When defining a parser object, you can use the following optional parameters.
-
description=
- Shows up in automatically composed usage message. -
prog=
- Usage’s executable name instead of inferring it from argv[0]. -
epilog=
- Text at end of usage message.
When adding arguments you want the parser to look out for, start with either a name of the positional argument you want or a list of option strings. Then you can add some of these optional parameters to get the exact behavior you want.
-
name
or flags - Either a name or a list of option strings, e.g. foo or -f, --foo. -
action
- The basic type of action to be taken when this argument is encountered at the command line.-
store
- The default action is to store the argument’s value. -
store_const
-
store_true
-
store_false
-
append
-
append_const
-
count
- Useful for things like -vvv verbose levels. -
help
- Usually automatic with -h -
version
- Needs aversion=
keyword too. -
extend
- For accumulating multiple option instances. (e.g.-f file1 -x -f file2
)
-
-
nargs
- The number of command-line arguments that should be consumed.-
N
- Exact number of option arguments. Note thatnargs=1
makes a list of one item. -
?
- One or zero items (in which case default is used). -
*
- All arguments are put into a list. List can be empty. -
+
- Same as*
but with an error for none. Will even greedily pull from a previous optional argument if that avoids an error. -
argparse.REMAINDER
- All remaining arguments put in a list.
-
-
const
- A constant value required by some action and nargs selections. -
default
- The value produced if the argument is absent from the command line. -
type
- The type to which the command-line argument should be converted. -
choices
- A container of the allowable values for the argument. -
required
- Whether or not the command-line option may be omitted (optionals only). -
help
- A brief description of what the argument does. -
metavar
- A name for the option argument in usage messages. Sometavar="run"
produces--x run
instead of the default--x X
. -
dest
- The name of the attribute to be added to the object returned by parse_args(). I feel like this one is very useful to properly organize variable names.
#!/usr/bin/python3 import argparse parser= argparse.ArgumentParser(prog="xyz",description="A demo of argparse.",epilog="Final notes.") parser.add_argument('-e', '--easy',action="store_true") # Optional argument. parser.add_argument('-x','--normal-value',type=int,metavar="X1",default=10,help="A number.",dest="norm_num") parser.add_argument('A') # Positional argument. Required (because storing the arg is default, and must exist). parser.add_argument('B',nargs="?") # Positional argument. Not required. p= parser.parse_args() print([p.easy, p.norm_num, p.A, p.B])
Here’s how that can be used. Note that the usage program name shows up
as`xyz` and not argtest.py
which is how it was really run; this is
thanks to the prog=
parameter when defining the parser.
$ ./argtest.py --help
usage: xyz [-h] [-e] [-x X1] [A] [B]
A demo of argparse.
positional arguments:
A
B
optional arguments:
-h, --help show this help message and exit
-e, --easy
-x X1, --normal-value X1
A number.
Final notes.
$ ./argtest.py
[False, 10, None, None]
$ ./argtest.py red
[False, 10, 'red', None]
$ ./argtest.py red blue
[False, 10, 'red', 'blue']
$ ./argtest.py -e red blue
[True, 10, 'red', 'blue']
$ ./argtest.py -e -x 99 red blue
[True, 99, 'red', 'blue']
Here’s another example showing how to use the parse object as a global
variable neatly containing all the user’s preferences. All this stuff
is appropriate for a global variable since the sys.argv
input itself
is global to any executed process.
Args= None # Global argument object containing user preferences. def parse_options(): # Function to isolate this option parsing stuff. import argparse # Might as well import this here in case this never gets called. parser= argparse.ArgumentParser(description='sampleprog - Shows off argparse.') parser.add_argument('-q','--quiet',default=True,dest="VERBOSE",action="store_false", help='Suppress printing of published readings on stdout.') # Invert for normally quiet. parser.add_argument('-d','--debug',type=float,default=0,dest="DEBUG", help='Debug features. Look into `action="count"` too.') parser.add_argument('-R', '--red-dev',type=int,default=10,dest="RED_DEV",metavar="ID", help= 'the red device [10].') parser.add_argument('-H', '--hold-value',type=int,default=1,dest="HOLD_VALUE",metavar="VAL", help= 'Hold value. Careful not to conflict with "h" for "help".') parser.add_argument('--messy',type=str,default='/dev/shm/messy',dest="MESSY", help=argparse.SUPPRESS) # A hidden global option that the user doesn't usually care about. parser.add_argument('filelist',type=str,nargs="*",default=[],metavar="INPUT", help='Optional list of files or - for standard input. Empty also reads standard input.') return parser.parse_args() def Do_The_Main_Thing(): # Which can now use things like Args.DEBUG, Args.MESSY, etc. for l in fileinput.input(Args.filelist): pass # Work on each line of all files and standard input. if __name__ == '__main__': Args= parse_options() Do_The_Main_Thing()
This technique also allows the user to read in N files, or standard input in the case of none, while keeping the ability to parse complex options.
Here’s another example of how to use it. This example should be pretty much
functionally equivalent to the getopt
example above.
#!/usr/bin/python import argparse parser = argparse.ArgumentParser(description='A demonstration of argparse.') parser.add_argument('-a', '--alpha', default='one', help= 'Set option alpha to a string.') parser.add_argument('-b', '--beta', default=2, type=int, choices=[0,1], help= 'Set option beta to a binary digit.') parser.add_argument('-v', '--version', action='version', help= 'Print the version.', version="v0.0-pre-alpha") parser.add_argument('the_rest', metavar='file', type=str, nargs='+', help='One or more filenames.') args= parser.parse_args() print('alpha=%s beta=%d' % (args.alpha, args.beta)) print('Specified files: %s' % ', '.join(args.the_rest))
Argparse is powerful and can do weird things too. Here’s a stranger case where I needed two classes of arguments with unknown quantities. One or more files needs to be supplied for each type of file.
parser= argparse.ArgumentParser(description='Vehicles and non-vehicles.') parser.add_argument('-V','--vehicle',dest='V',required=True, nargs='+',metavar="Vlist", type=str, help='CSV list of vehicle directories') parser.add_argument('-N','--nonvehicle',dest='N',required=True, nargs='+',metavar="NVlist", type=str, help='CSV list of non-vehicle directories') args= parser.parse_args()
Run with something like this.
./vehicle_classify.py -V ../data/vehicles/v? -N ../data/non-vehicles/nv?
This produces something like this for args.V
and args.N
respectively.
['../data/vehicles/v1', '../data/vehicles/v2', '../data/vehicles/v3',
'../data/vehicles/v4', '../data/vehicles/v5']
['../data/non-vehicles/nv1', '../data/non-vehicles/nv2']
Sometimes your program is not misbehaving but running just fine as far as arguments go but some processing in your code suggests that the user is an idiot who needs to read the instructions. How can you immediately generate the automatically generated usage message?
parser.print_help()
And here’s another tip when you’re writing instructive descriptions and argparse overly helpfully removes formatting. Here’s how to cure that.
usage= """This is a multi-line description. ./example.py [options] This will not all be jumbled together if you use the following trick. """ parser= argparse.ArgumentParser(description=usage,formatter_class=argparse.RawTextHelpFormatter)
argparse
Limitations
One astonishing thing I discovered is that despite being seemingly
competent at parsing any kind of options a sane person could want,
there is a glaring deficiency. Let’s say we wanted to emulate the
options parsing behavior of the most command line program of command
line programs: unix’s ls
. Well, this command has options (many) that
take an option argument. Not only that, but they take an optional
option argument. For example, this is a reasonable command: ls
--color *.txt
. And so is this ls --color=never *.txt
. How can
argparse
set up this behavior. As far as I can tell, for arbitrary
values, the answer is it can not! I was amazed by this. As far as I
can tell, you must have the default option (without the modifying
option argument) end in an =
to get the default value. So if you
rewrote ls
using argparse
you would have to say something like ls
--color= *.txt
. I do not think there is a way around this.
Here is a demonstration of how to set that up. This imagines setting
up options for a program that creates lines of output where you might
want an integrated behavior of the unix head
command. If you just
say --head=
you get 10, but if you say --head=100
you get 100. If
you leave it out completely you get all the lines. This is just the
option setup for such a thing.
#!/usr/bin/python import argparse def parse_args(): parser= argparse.ArgumentParser(description="Optional option argument test.") h= "Only show beginning of output. Default 10 lines [--head=]. Use optional [--head=N] to change." parser.add_argument('--head', nargs='?',help=h) parser.add_argument('files', nargs='+', help="Files to list.") return parser.parse_args() if __name__ == "__main__": args= parse_args() if args.head is not None: if args.head=='': args.head= 10 else: args.head= int(args.head) if args.head: print(f"head value: {args.head}") print("Files:", args.files)
Here’s what running looks like.
$ ./argtest.py file1 file2 fileN
Files: ['file1', 'file2', 'fileN']
$ ./argtest.py --head= file1 file2 fileN
head value: 10
Files: ['file1', 'file2', 'fileN']
$ ./argtest.py --head=22 file1 file2 fileN
head value: 22
Files: ['file1', 'file2', 'fileN']
This inability to match bog standard GNU getopt_long
behavior found
in commands like ls
, grep
, tar
, etc. left me somewhat
disillusioned with Python’s argument processing abilities.
I did discover that you can probably match the ls --color
option
behavior specifically with something like the following:
parser = argparse.ArgumentParser() parser.add_argument('--color',nargs='?',default='auto',choices=['always', 'auto', 'never'])
However that only worked with a finite set of enumerated
possibilities. How one would go about using it with something like a
--resize[=VAL]
option with an int still a mystery to me.
Web Programming
Python is one of the premier languages for web-based programs. Here are some helpful techniques for web projects.
Simple Web Client
I often use wget
— Apple people like curl
. Python unsurprisingly
has a perfectly good way to simple web client downloads.
import os import urllib.request URL,FILE= 'http://xed.ch/h/python.txt','/tmp/pyhelp' if not os.path.isfile(FILE): print(f'Downloading {URL} and saving as {FILE}...') urllib.request.urlretrieve(URL, FILE)
cgitb Module
One of the best reasons to use Python for web projects is the cgitb
module. This stands for CGI TraceBack and is a diagnostic tool to help
you understand what might be going wrong with your Python script run
over the web. The nice thing is that this is super easy to use and
super useful when activated. Here’s an example showing how to use it
(simply import and enable it) and some faulty code which takes
advantage of it:
#!/usr/bin/python import cgitb cgitb.enable() idontexist()
Putting this in a cgi-bin directory and typing its URL in a browser
produces this very cool diagnostic (which in this case correctly
notices that the function idontexist
does not exist):
<type 'exceptions.NameError'> | Python 2.7.2: /usr/bin/python2.7 Sat Jun 30 12:50:21 2012 |
A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred.
/var/www/fs/users/xed/cgi-bin/cgitest.py in |
2 import cgitb |
3 cgitb.enable() |
=> 4 idontexist() |
5 |
6 #content_type= 'Content-type: text/html\n\n' |
idontexist undefined |
<type 'exceptions.NameError'>: name 'idontexist' is not defined
args =
("name 'idontexist' is not defined",)
message =
"name 'idontexist' is not defined"
If you’re looking at the output without HTML rendering, you’ll also notice that this is tacked on to the previous HTML message for maximum intelligent utility:
<!-- The above is a description of an error in a Python program, formatted for a Web browser because the 'cgitb' module was enabled. In case you are not reading this in a Web browser, here is the original traceback: Traceback (most recent call last): File "/var/www/fs/users/xed/cgi-bin/cgitest.py", line 4, in <module> idontexist() NameError: name 'idontexist' is not defined -->
Note that this works on any Python program run over the web, not just ones that use CGI per se. It is advisable to comment out the enable line when your program is served live to the public to avoid any leaking of sensitive information such as how your code works. But other than that, use this early and often.
Content Type
Before generating any HTML, every web program will most likely need to send back the HTTP content type. It’s often useful to make a global variable of it.
content_type= 'Content-type: text/html\n\n'
HTML Generation
I personally hate Python code that is filled with HTML. HTML should be in HTML documents and Python should be programming. But sometimes they mix annoyingly. This throws off syntax highlighting and the wholesome goodness of Python’s formatting and style. Here is a technique I use in my Python code to completely obviate the need for any HTML.
This function can be imported into programs requiring the generation of HTML. It allows you to not put HTML in python code. It’s easier to type, easier to think about, and it doesn’t break syntax highlighting. When run as a standalone program, it prints a complete HTML document as a demonstration.
#!/usr/bin/python def tag(tag, contents=None, attlist=None): """No HTML in my programs! This function functionalizes HTML tags. Example: tag('a','click here', {'href':'http://www.xed.ch'}) Produces: <a href="http://www.xed.ch">click here</a> Param1= name of tag (table, img, body, etc) Param2= contents of tag <tag>This text</tag> Param3= dictionary of attributes {'alt':'[bullet]','height':'100'} """ tagstring= "<"+tag if attlist: for A in attlist: V= attlist[A].replace('"','"') attstring= ' '+A+'="'+V+'"' tagstring += attstring if contents: tagstring += ">\n"+contents.rstrip()+"\n</"+tag+">\n" else: tagstring += "/>\n" return tagstring if __name__ == '__main__': Title= tag('head', tag('title', "A Test")) Text= tag('body', tag('p', "No html here. Just sensible code.")) print(tag('html', Title + Text))
<html> <head> <title> A Test </title> </head> <body> <p> No html here. Just sensible code. </p> </body> </html>
Web Programming Environment
The technique above is useful for generating web-based output. To
process web sourced input, the cgi
module is helpful. This module
is very helpful but it is not magical. I think the most helpful way to
illustrate what it does is to not use it and see what that looks like.
Assuming the helpers such as the tag()
function as defined as above
are in place, the following code is very illustrative:
#!/usr/bin/python import os vars= ''.join([tag('dt',k)+tag('dd',os.environ[k]) for k in sorted(os.environ.keys())]) print(content_type + (tag('html',tag('body',tag('dl',vars)))))
Note
|
Now that you see how to do it yourself, don’t forget about
import cgi; cgi.test() which when run as a single line program over a web
interface produces similar and somewhat more comprehensive data about
what’s going on. |
When run you get a list of environment variables that your CGI program knows about. This sample list may or may not include some of the following you would see:
- DOCUMENT_ROOT
-
/var/www
- GATEWAY_INTERFACE
-
CGI/1.1
- HTTP_ACCEPT
-
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
- HTTP_ACCEPT_CHARSET
-
ISO-8859-1,utf-8;q=0.7,*;q=0.3
- HTTP_ACCEPT_ENCODING
-
gzip,deflate,sdch
- HTTP_ACCEPT_LANGUAGE
-
en-US,en;q=0.8,de;q=0.6,es;q=0.4
- HTTP_CONNECTION
-
Keep-Alive
- HTTP_COOKIE
-
v1=keyvaluepairs;v2=ofany;v3=cookiesthat;v4=yourbrowser;v5=offersthisdomain
- HTTP_HOST
-
xed.ch
- HTTP_USER_AGENT
-
Wget/ (linux-gnu)
- PATH
-
/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
- QUERY_STRING
-
a=a&b=simple&c=test
- REMOTE_ADDR
-
192.168.0.10
- REMOTE_PORT
-
48731
- REQUEST_METHOD
-
GET
- REQUEST_URI
-
/~xed/cgi-bin/cgitest.py?a=a&b=simple&c=test
- SCRIPT_FILENAME
-
/var/www/fs/users/xed/cgi-bin/cgitest.py
- SCRIPT_NAME
-
/~xed/cgi-bin/cgitest.py
- SERVER_ADDR
-
192.168.0.99
- SERVER_ADMIN
-
wwwadmin@xed.ch
- SERVER_NAME
-
www.xed.ch
- SERVER_PORT
-
80
- SERVER_PROTOCOL
-
HTTP/1.1
- SERVER_SIGNATURE
-
Apache Server at www.xed.ch Port 80
- SERVER_SOFTWARE
-
Apache
- UNIQUE_ID
-
T@9vm6nkPz0AFBbDuW0FAFAM
Plus any special variables your web server sets using Apache’s SetEnv directive will also be present.
Obviously if your program can print this stuff out, you have quite a bit of control over what is going on. This particular little program is quite useful to track down problems with path and environment issues as well as debugging more complicated or annoying details such as user agent settings for stupid web sites.
Two important variables to note for CGI programming are REQUEST_URI
and QUERY_STRING
. The first contains the entire URL used to effect
this response while the second contains just the part intended to
serve as input for this program. You can parse this directly yourself
and for very simple applications, I think it is reasonable to do so.
When the number and complexity of the variables your program wishes to
define from the QUERY_STRING
becomes more involved, then it is
sensible to use the cgi
module. The point of showing how things
would work without it is to illustrate that it’s not absolutely
critical (and sometimes not even especially helpful) to use it.
This exercise also indicates how one might test CGI programs without using a web server at all. Since all that is really going on is that the web server is simply setting some variables, you can explicitly set them on the command line to test things. Here’s an example:
$ QUERY_STRING='a=a&b=simple&c=test' mycgiprogram.cgi
cgi Module
Here is an example of a complete form processing program showing many different kinds of form elements. This program shows a form and if submitted shows the data submitted and a new form to repeat the process.
#!/usr/bin/python import cgi from html_tagger import tag content_type= 'Content-type: text/html\n\n' br= '<br/>' def generate_form(): f= list() f.append( 'Username:' + tag('input', '', {'type':'text', 'name':'uid'}) +br ) f.append( 'Password:' + tag('input', '', {'type':'password', 'name':'pwd'}) +br ) f.append( tag('input',None, {'type':'radio','name':'hyp','value':'1'}) + 'True' ) f.append( tag('input',None, {'type':'radio','name':'hyp','value':'0'}) + 'False' +br ) f.append( tag('input',None, {'type':'checkbox','name':'metal','value':'cu'}) + 'copper' ) f.append( tag('input',None, {'type':'checkbox','name':'metal','value':'fe'}) + 'iron' +br ) f.append( tag('select', tag('option','chromium',{'value':'cr'})+ tag('option','manganese',{'value':'mn'})+ tag('option','nickel',{'value':'ni'})+ tag('option','zinc',{'value':'zn'}) ,{'name':'alloy'}) +br ) f.append( tag('textarea','Edit this text!',{'rows':'5','columns':'40','name':'essay'}) +br ) f.append( tag('input',None,{'type':'submit','value':'Do This Form'}) ) return tag('form', ''.join(f), {'name':'input','action':'./cgitest.py','method':'get'}) def display_data(myf): c= tag('tr',tag('td',"Name:")+tag('td', myf["uid"].value)) c += tag('tr',tag('td',"Password:")+tag('td', myf["pwd"].value)) c += tag('tr',tag('td',"Hypothesis:")+tag('td', myf["hyp"].value)) c += tag('tr',tag('td',"Metal:")+tag('td', ','.join(myf.getlist('metal')))) c += tag('tr',tag('td',"Alloy:")+tag('td', ','.join(myf.getlist('alloy')))) c += tag('tr',tag('td',"Essay:")+tag('td', ','.join(myf.getlist('essay')))) return tag('table',c,{'border':'1'}) form= cgi.FieldStorage() if 'uid' not in form or 'pwd' not in form or 'hyp' not in form: content= tag('h4','A Form') + "Please fill in the user and password fields." + generate_form() else: content= display_data(form) + br + generate_form() print(content_type) print(tag('html',tag('body',content)))
Note
|
For composing HTML output this program uses the tag function
defined above. Also, include cgitb as described above if there
are problems you wish to debug. |
The output of this CGI programming example is the following:
A Form
Please fill in the user and password fields.
Note
|
If you’re seeing this in a web browser, it will look functional,
but obviously it’s not. It’s just the HTML that the previous program
generated (minus html and body tags). |
One Program Executable On The Command Line And Over The Web
Here’s a technique I’ve used for programs that I want to work with a text menu at the console and also to automatically support a web interface when run remotely from a web browser.
if __name__ == "__main__": if os.getuid() == 48: # apache:x:48:48:Apache:/var/www:/sbin/nologin html_version() else: while True: text_version_menu()
Note
|
There may be better indicators of whether we’re coming from a
browser or not. See the cgi.test() above for possibilities. Perhaps
REQUEST_URI . |
Upload A File
Here’s a short routine that does nothing but allow one to upload a
file to the server it’s run on. I found this handy to allow me to
simply upload photos off my stupid Android phone to my own server.
It nicely demonstrates how to handle POST
methods and file uploads
using the cgi
module.
#!/usr/bin/python import os import cgi from html_tagger import tag content_type= 'Content-type: text/html\n\n' form = cgi.FieldStorage() if not form: acturl= "./up.py" ff= tag('input','',{'type':'file','name':'filename'}) + tag('input','',{'type':'submit'}) f= tag('form',ff, {'action':acturl, 'method':'POST', 'enctype':'multipart/form-data'}) H= tag('head', tag('title', "Uploader")) B= tag('body', tag('p', f)) print(content_type + tag('html', H + B)) #elif form.has_key("filename"): elif 'filename' in form: item= form["filename"] if item.file: data= item.file.read() t= os.path.basename(item.filename) FILE= open("/home/xed/www/up/"+t,'w') FILE.write(data) FILE.close() msg= "Success! " else: msg= "Fail." H= tag('head', tag('title', "Uploader")) B= tag('body', tag('p', msg + tag('a','Another?',{'href':'./up.py'}))) print(content_type + tag('html', H + B))
Note
|
The html tagging function defined above is assumed here. |
Warning
|
This program would be best limited to personal use and is not especially secure. |
Output Non-Text
Often you want your CGI program to not just compose some HTML for your web clients, you also want some custom graphics. For example, if you want to show a plot of something that is very up to date. The naive way to do this is to have the program generate a plot file and store it somewhere and then send out HTML that can find it. But this leads to bad guys filling up your drive with such plots. Better to never have the plot stick around.
Here is an example of how to have a plot dynamically sent out to a web client.
#!/usr/bin/python import sys import matplotlib.pyplot as plt print("Content-type: image/png\n") plt.plot([1,2,4]) plt.savefig(sys.stdout,format='png')
Then on the client I can do this.
$ wget -qO test.png http://xed.ch/cgi-bin/mkpng.py
$ identify test.png
test.png PNG 800x600 800x600+0+0 8-bit sRGB 19.1KB 0.000u 0:00.000
This means that you can use the following in any subsequent HTML code that features the image.
<img src="http://xed.ch/cgi-bin/mkpng.py">
Of course the program that generates the image could be the same that generates the HTML where the program chooses which part to generate based on GCI variables.
HTTP Server
Obviously Python serves web pages. Of course it does. And guides
spacecraft and composes hit songs. Python does everything! But I try
to strongly avoid import mysteriousmagicmodule
. Amazingly, Python’s
ability to run an HTTP server is not a fancy external module — it is
now a core part of Python! The module is http.server
. Check out the
official
documentation.
In the past, Python was a perfectly good way to do things over the web — you simply let your web server know which files were CGI scripts. But now, you can do all of that as before, but you don’t need the web server. So is Apache configured properly? Who cares? Skip it entirely!
Here’s a demonstration program that listens to port 8888 for web
connections and adjusts a variable. Again I’m assuming that my HTML
tag()
function — described previously — is
present. This program will allow you to control something over a web
interface.
from http.server import HTTPServer, BaseHTTPRequestHandler V= 0 # Global value of interest that this web interface exists to adjust. URL,PORT= 'http://192.168.1.251',8888 def form_response(): # == Create An HTML Form Response hp= tag('input',None,{'type':'hidden','name':'r','value':'-.5'}) # Note: Submit actions can't have CGI parameters! hs= tag('input',None,{'type':'hidden','name':'r','value':'.5'}) # Hidden form elements send such intent. bp= tag('input',None,{'style':'width:200px;height:50px;font-size:40px;background:red','type':'submit','value':'Left'}) bs= tag('input',None,{'style':'width:200px;height:50px;font-size:40px;background:green','type':'submit','value':'Right'}) fp= tag('form', hp+bp, {'style':'display:inline;','name':'input','action':f'{URL}:{PORT}','method':'get'}) fs= tag('form', hs+bs, {'style':'display:inline;','name':'input','action':f'{URL}:{PORT}','method':'get'}) stats= tag('div',f'v:{V}',{'style':'font-size:40px'}) phonefix= '<meta name="viewport" content="width=device-width, initial-scale=1.0">' return tag('html',tag('body',phonefix+fp+fs+stats)).encode('ascii') class SimpleHTTPRequestHandler(BaseHTTPRequestHandler): # == Handle HTTP Requests def do_GET(self): global V self.send_response(200) # Tell client a good status code. # `self.requestline` produces something like: 'GET / HTTP/1.1' or 'GET /?r=3 HTTP/1.1' cgistuff= self.requestline.split(' ')[1] # Custom input handling; using `import cgi` is sane too. if "r=" in cgistuff[2:]: # r is the CGI variable in the URL and hidden form elements. V+= float(cgistuff[4:]) # Adjust the vaule of interest. self.send_header('Content-type','text/html') # With the HTTPServer, better to not DIY. self.end_headers() # Probably whatever weird double returns are needed after headers. self.wfile.write(form_response()) # Send this program's actual content. httpd= HTTPServer(('',PORT),SimpleHTTPRequestHandler) # First arg can be address to listen on. httpd.serve_forever() # == Start The Server
When you run this program it will sit there, waiting for connection attempts — printing to the console as they occur.
Then a browser will see something like this.
$ wget -qO- 127.0.0.1:8888 <html> <body> <meta name="viewport" content="width=device-width, initial-scale=1.0"><form style="display:inline;" name="input" action="http://192.168.1.251:8888" method="get"> <input type="hidden" name="r" value="-.5"/> <input style="width:200px;height:50px;font-size:40px;background:red" type="submit" value="Left"/> </form> <form style="display:inline;" name="input" action="http://192.168.1.251:8888" method="get"> <input type="hidden" name="r" value=".5"/> <input style="width:200px;height:50px;font-size:40px;background:green" type="submit" value="Right"/> </form> <div style="font-size:40px"> v:12.5 </div> </body> </html>
This will be two big (submit form) buttons for raising and lowering
the value of the variable the program is interested in. They’re
labeled Left/Right because I used something like this to make a
robotic control that I could steer with my phone. You can also just do
a request for something like http://127.0.0.1:8888?r=-5
which will
subtract 5 from the important value.
Beyond Python
Tools to help Python be even more awesome than it normally is:
Socket Programming
Creating client/server connections with internet sockets is pretty easy with Python. A good example of a full practical socket server is my ISBD server. Here is a generic TCP server that covers the main functionality one would require from the network in order to implement something like a web server.
#!/usr/bin/python # A Sample TCP server demonstrating simple socket programming. # This simply echoes what is sent back to the client. # Usage: Run this program and then connect with # echo "The Message" | nc localhost 6660 # What PID is listening? # lsof -i :6660 # Official Socket Documentation - # * https://docs.python.org/2/library/socket.html # Notes about backlog parameter of `listen()` function. # * http://irrlab.com/2015/03/02/how-tcp-backlog-works-in-linux/ # * https://veithen.github.io/2014/01/01/how-tcp-backlog-works-in-linux.html import socket import sys from thread import * HOST= '' # Server interface to bind to. Blank is `INADDR_ANY`. PORT= 6660 BACKLOG= 200 # Max connections on accept queue. See notes. # AF_INET is Address Family IPv4 # SOCK_STREAM is TCP protocol (SOCK_DGRAM for UDP) s= socket.socket(socket.AF_INET, socket.SOCK_STREAM) print('Socket Creation OK') # == Connection Handling == def servicethread(connection): READ_BYTES= 24 connection.send('This is the server. Send something now.\n') while True: data= connection.recv(READ_BYTES) reply= 'You said: %s' % data if not data: break connection.sendall(reply) connection.close() # == Binding == try: s.bind((HOST,PORT)) except socket.error as msg: print('ERROR: Bind failed! %s (error #%s)' % (msg[1],str(msg[0]))) sys.exit() print('Socket Binding OK') # == Listening == s.listen(BACKLOG) print('Socket Listening OK') # == Handle Client Transactions == while True: conn,addr= s.accept() print('Connected to %s:%d' % addr) start_new_thread(servicethread, (conn,) ) s.close()
Running the program starts the server listening.
$ ./sockettest.py
Socket Creation OK
Socket Binding OK
Socket Listening OK
From another terminal (or another computer if you like) you can check up on it.
$ nmap localhost -p 6660 | sed -n /PORT/,+1p
PORT STATE SERVICE
6660/tcp open unknown
Using nmap has consequences for the server. Here are the server’s resulting messages.
Connected to 127.0.0.1:51783
Unhandled exception in thread started by <function servicethread at 0x7f62f336e5f0>
Traceback (most recent call last):
File "./sockettest.py", line 32, in servicethread
connection.send('This is the server. Type something now.\n')
socket.error: [Errno 104] Connection reset by peer
You could handle this error (when the client strangely aborts) more smoothly if you like but it does continue to function just fine.
Additional activity from the client, a classic netcat test, looks like this.
$ echo testmessage | nc localhost 6660
This is the server. Type something now.
You said: testmessage
Or back to Python, this is the simplest socket client.
import socket con= ('isbdserver.example.edu',10800) try: s= socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(con) s.send(entire_message) s.close() except socket.error as msg: log( 'ERROR: ISBD socket client problem! %s (error #%s)' % (msg[1],str(msg[0])) )
Here’s another example of the socket library used to trigger a Wake On LAN (WOL) feature. Note that it probably doesn’t work, but it’s a good starting point for further research into the topic.
#!/usr/bin/python # Wake-On-LAN # From Wikipedia: Magic packet # The magic packet is a broadcast frame containing anywhere within # its payload 6 bytes of 255 (all bits set to the on position) which # has the decimal representation of: 255 255 255 255 255 255 (also FF # FF FF FF FF FF in hexadecimal or 11111111 11111111 11111111 # 11111111 11111111 11111111 in binary), followed by sixteen # repetitions of the target computer's 48-bit MAC address. Since the # magic packet is only scanned for the string above, and not actually # parsed by a full protocol stack, it may be sent as a broadcast # packet of any network- and transport-layer protocol. It is # typically sent as a UDP datagram to port 0, 7 or 9, or, in former # times, as an IPX packet. import struct, socket def WakeOnLan(ethernet_address): # Construct a six-byte hardware address addr_byte = ethernet_address.split(':') hw_addr = struct.pack('BBBBBB', int(addr_byte[0], 16), int(addr_byte[1], 16), int(addr_byte[2], 16), int(addr_byte[3], 16), int(addr_byte[4], 16), int(addr_byte[5], 16)) #print hw_addr # Build the Wake-On-LAN "Magic Packet"... msg = '\xff' * 6 + hw_addr * 16 # ...and send it to the broadcast address using UDP s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1) s.sendto(msg, ('<broadcast>', 9)) s.close() WakeOnLan('e0:cb:4e:56:74:73') # The MAC of the host to wake.
Importing Modules
Importing modules can sometimes be slightly tricky if what you need isn’t already taken care of for you. (Usually it is, which is why this isn’t knowledge you use everyday.)
Here is a way to use modules in another directory by setting the
PYTHONPATH
environment variable.
$ mkdir A B
$ echo "print('This is Python - dir B')" > B/mymodule.py
$ echo "import mymodule; print('This is Python - dir A')" > A/myprog.py
$ python A/myprog.py
Traceback (most recent call last):
File "A/myprog.py", line 1, in <module>
import mymodule; print('This is Python - dir A')
ImportError: No module named mymodule
$ PYTHONPATH=./B:$PYTHONPATH python A/myprog.py
This is Python - dir B
This is Python - dir A
Here is another method which works without requiring an intervention from the execution environment.
import sys
sys.path.append('/tmp/B')
import mymodule
print('This is Python - dir A')
The problem with this method is that relative paths don’t seem to work. You could have Python sort out the relative paths before importing. For example, although this is straying into ugly territory, this works.
import os
pwd= os.path.dirname(os.path.realpath(__file__))
parent= os.path.split(os.path.abspath(pwd))[0]
import sys
sys.path.append(parent+'/B')
import mymodule
print('This is Python - dir A')
If you just want to know if a module is available this is a way to do a quick check. This also shows you exactly where the module really lives which can by quite informative too.
$ python3 -c 'import caffe; print(caffe.__path__)'
['/usr/lib/python3/dist-packages/caffe']
Dots For Relative Paths
Sometimes you find some exuberant dots like this (in Open3D… ahem).
from ...python import ops
from ....torch import classes
What is that? It is a way to specify relative import paths as
specified in PEP-0328.
Note that import ...dotty
probably doesn’t work and that syntax
always implies absolute import paths.
Fortran
WTF? What does a 70+ year old language have to do with anything? Well,
gfortran
is a wrapper over gcc
and compiled Fortran code is simple
and gives superlative raw C performance without so much of the memory
management and security problems of the omnipotent (but easy to shoot
your foot) C. So if you have a gory math function that you’d like to
see running as fast as possible in Python, it is a sane option.
You need f2py. It was included with numpy on my system so I didn’t even need to install that. But I did need to make sure Fortran itself was there.
sudo apt install gfortran
Create a little example Fortran function.
! Calculate the distance between two 3D points. function distance(x1, y1, z1, x2, y2, z2) result(d) real :: x1, y1, z1, x2, y2, z2 real :: d ! Calculate the distance between the two points. d = sqrt((x1 - x2)**2 + (y1 - y2)**2 + (z1 - z2)**2) end function distance
Save that as distance.f90
and create the module.
f2py3 -c distance.f90 -m distance
Now you can use it like so. (Example reminder: 3,4,5 right triangles.)
$ python
>>> import distance
>>> d = distance.distance(0,0,0,3,4,0)
>>> print(d)
5.0
Packaging And Distribution
Python packaging is a nightmare. This is mostly due to so many competing ways to do the job. I personally avoid the topic to the greatest extent possible. I usually rely on my Linux distributions to do the proper thing or I put things in the PYTHONPATH explicitly myself.
This enumeration of project summaries is depressing and helpful. This basic package install guide is wholesome and official if you have to go messing with packages outside your OS distribution. Here’s a popular Stack Overflow discussion about Python packaging. Here are some official best practices for packaging and installation.
Here are some details that might be interesting.
-
PEP 376 “Database of Installed Python Distributions”
-
PEP 426 “Metadata for Python Software Packages 2.0”
Ok, it’s not just me! Here’s XKCD calling it like it is! Wow. Too true.
Hopefully this gets Pythoneers working on this mess. To be fair though, Python is a victim of its deserved success.
distutils
Original Python packaging and distribution system. In the standard library for me (CentOS7).
Uses this.
python setup.py
Can be tar.gz.
from distutils.core import setup
setuptools
Third party system (not part of Python per se) built using
distutils
.
from setuptools import setup
Includes easy_install
which is widely used.
Includes support for eggs which are a package format for distributing binary packages. This seems a bit mental since Python is good at compiling itself on the fly, but apparently waiting a minute for that to happen for monstrous overblown projects is too much for people. Somebody figured spending hours fussing with a fussy binary package was better.
sudo yum upgrade python-setuptools
distribute
A fork of setuptools and is actually called setuptools
which isn’t
confusing at all! Replaces an existing genuine setuptools if one is
already present. Apparently has better support for v2 to v3 issues.
Probably more traction than its ancestor. Also includes an
easy_install
.
Some people believe that this project has been merged back into the
original setuptools. Here is
evidence
that this was intended. But it may be true. Rumor has it that
Distribute was merged back into Setuptools 0.7. It is probably safe to
ignore this now. Let’s just say that setuptools
and distribute
are
very close relatives and maybe a case of dissociative identity disorder.
pip
This does not create packages. It is a system for downloading and
installing Python distributions. This seems to replace easy_install
.
It can roll back a failed install attempt if it determines
dependencies can’t be met. It can uninstall things. It does not use
or install eggs. It doesn’t automatically update things so they won’t
randomly break; apparently some of the other systems try to do that.
Requires a packager (distutils,setuptools,distribute) because this is
just an installer. The packages themselves are not its thing. Here is
a justification for
pip. Apparently this can try to compile things with a C compiler as
part of package installation. This would of course be likely to fail
on a bad OS like Windows. This might explain some bias for using
easy_install
on Windows. However, with wheel
, these issues may be
historical.
Normally used like this.
pip install <package>
It requires setuptools
(which requires distutils
).
This quote from the pip documentation about sums it up for me.
Be cautious if you’re using a Python install that’s managed by your operating system or another package manager. get-pip.py does not coordinate with those tools, and may leave your system in an inconsistent state.
And to contradict that.
sudo yum install python-pip
In early 2020 I got this to work easily with Debian.
sudo apt install python-pip
pip install --upgrade pip
This now seems to install pip3
as merely pip
and brings along
pip2
as the strange way to do things these days.
If that still is having trouble (ImportError: No module named pip
)
try this:
sudo apt install python3-pip
Or install with get-pip.py
which can try to not be so dependent with
--no-wheel
and --no-setuptools
. Apparently pip should be included
with a clean Python install from python.org (not Linux).
Once pip is happy, you can generally install things without too much fuss. For example.
pip install mingus
Sometimes you don’t want to "install" whatever it is but you do want the code. You can actually see all zillion packages available to pip by looking at this URL.
wget -qO- https://pypi.org/simple/ | grep href | wc -l
270894
Each package is a link and you can search for specific ones with grep. Here’s a demonstration with pymidi.
$ wget -qO- https://pypi.org/simple/ | grep pymidi
<a href="/simple/ipymidicontrols/">ipymidicontrols</a>
<a href="/simple/pymidi/">pymidi</a>
Then put https://pypi.org/simple/pymidi
into the browser.
wheel
Installing pip will also install
wheel which is a zip based
archive (extension .whl
) which is like an egg but with subtle
differences. Apparently this is somewhat of a modern (2015+) thing.
Of the name, they say "because newegg was taken" and "a container of cheese".
sudo yum install python-wheel python3-wheel
distutils2
This topic is such a mess, why not scrap it all and start over with yet another attempt?
Does not use setup.py
scripts. Instead it uses setup.cfg
. Also
uses the pysetup
command which seems to try to replace pip.
If you see import packaging
, that is synonymous with distutils2.
The latest release was March 2012 so this project is dead. Anything referring to it is hopelessly out of fashion.
Buildout
Buildout is yet another way to assemble and deploy complex Python applications. It may include non-Python components. Used by Zope, Plone, and Django. Nuff said.
Distlib
This is a new experimental thing (as late as October 2016) which, according to their docs is trying to do this.
Basically, Distlib contains the implementations of the packaging PEPs and other low-level features which relate to packaging, distribution and deployment of Python software. If Distlib can be made genuinely useful, then it is possible for third-party packaging tools to transition to using it. Their developers and users then benefit from standardised implementation of low-level functions, time saved by not having to reinvent wheels, and improved interoperability between tools.
Virtualenv
This is not a packaging or distribution system but it can be very important in solving related problems. Here is a good explanation of what this is from the documentation.
The basic problem being addressed is one of dependencies and versions, and indirectly permissions. Imagine you have an application that needs version 1 of LibFoo, but another application requires version 2. How can you use both these applications? If you install everything into /usr/lib/python2.7/site-packages (or whatever your platform’s standard location is), it’s easy to end up in a situation where you unintentionally upgrade an application that shouldn’t be upgraded.
Or more generally, what if you want to install an application and leave it be? If an application works, any change in its libraries or the versions of those libraries can break the application.
Also, what if you can’t install packages into the global site-packages directory? For instance, on a shared host.
In all these cases, virtualenv can help you. It creates an environment that has its own installation directories, that doesn’t share libraries with other virtualenv environments (and optionally doesn’t access the globally installed libraries either).
Install Virtualenv
CentOS
- python-virtualenv.noarch
-
Tool to create isolated Python environments
- python-virtualenv-clone.noarch
-
Script to clone virtualenvs
- python-virtualenvwrapper.noarch
-
Enhancements to virtualenv
- python-tox.noarch
-
Virtualenv-based automation of test activities
Debian
Install as expected.
sudo apt install python3-virtualenv virtualenv
Usage
Start by creating a virtual environment. Simply pick a point on your filesystem tree to put all the cruft your mission requires and it will dutifully be confined to it.
VEPATH=/home/xed/.virtualenvs
ENV=funproject
cd $VEPATH
virtualenv --python=$(which python3) $ENV
This will create a virtual environment called funproject
and it will
be in a directory called /home/xed/virtualenvs/funproject
.
Pretty much everything related to this mess will be in there.
Obviously the --python
option is optional but this just shows how to
force Python3 if your system otherwise likes to stick with Python2.
Once you have the dumpster ready, you need to activate it so that it is the center of attention with respect to Python package wrangling.
source $VEPATH/$ENV/bin/activate
This just basically sets your $PATH so that the virtual environment
directory’s bin/
is found first. This means you don’t have to be so
explicit about the path when you undo all this — simply type
deactivate
. This also implies that removing a virtual environment is
as simple as just rm -r $VEPATH/$ENV
.
Normally
people
recommend that the $VEPATH be ~/.virtualenvs
.
virtualenvwrapper
The path, ~/.virtualenvs
, is designed to work with
virtualenvwrapper.
Here is the official
documentation for that. It is a virtual environment manager composed
of shell tricks. It basically boils down to this.
mkvirtualenv funproject
workon funproject
I personally don’t think I need this kind of shell obfuscation (that I
didn’t create myself) but its helpful to know that mkvirtualenv
and
workon
are commands from that system and can be skipped.
venv
Here’s PEP 405 describing why yet another crazy management system is needed. This one seems preferable if you need multiple Pythons and they are all later than 2011 (3.3 or so). Here’s some official documentation on how to use it.
The simple description of how to use it seems to be something like this.
python3 -m venv /tmp/isolated_testing_env
source /tmp/isolated_testing_env/bin/activate
Then do your pip stuff and be pretty sure it’s all going in that
directory you specified. To deactivate it you can just type
deactivate
. To get rid of it all just erase it like you’d hope.
To figure out if you’re in a venv environment you can look for the
VIRTUAL_ENV
.
Conda And Miniconda And Anaconda
First — what are they?
-
Conda - a dual purpose package management system and an environment management system for installing multiple versions of software packages and their dependencies and switching easily between them.
-
Miniconda - A distribution of packages managed by the Conda package manager which provide minimal Conda functionality. This includes a particular Python of your choosing which may not be the same as your OS’s Python. The critical ability Miniconda provides is a mostly blank starting point from which you can install the software you need to use and (automatically) its dependencies.
-
Anaconda (not to be confused with the Red Hat installer - good name guys) is a distribution of packages managed by Conda which provides multifarious functions that many people find useful. Think of it as a full featured distribution so that you don’t, for example, have to go installing modules all the time while doing Python work. It is heavy and requires plenty of disk and initial installation time. This could possibly be helpful on systems that will be sent into service where internet connectivity is poor. Presumably everything you’ll need would be present, just don’t trigger any updates!
Installing Conda/Miniconda
Installation details. Installer is 34MB but it did seem to come with Python 3.6 and install as a non-root user. I created a separate Linux account to keep it from doing anything unpleasant but it seems well behaved so far. The executables live here.
export PATH=~/miniconda3/bin:$PATH
Use something like this.
~/miniconda3/bin/conda update conda
They do have instructions for a polite and sensible uninstall.
rm -rv ~/miniconda3 ~/.condarc ~/.conda ~/.continuum
Here is some Conda documentation.
Here’s the procedure I used most recently. Compare to the very similar procedure below and choose what makes most sense.
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh ./Miniconda3-latest-Linux-x86_64.sh
cp .bashrc .bashrc-conda # Fix .bashrc if you don't like the meddling.
vi .bashrc .bashrc-conda
source .bashrc-conda
conda list
conda config --add channels conda-forge
conda create -n xedopencv
conda activate xedopencv
conda search tensorflow-gpu
conda install tensorflow-gpu=1.13.1
conda install opencv
Non-Root Custom Python Environments
Let’s say you need to run some very fancy cool hipster dude Python program which was, for example, written in UTF-8 emojis in Python 3.
(Note that modern Python3 assumes UTF-8 by default and if you want
something different in the source code, you can add a # -*- coding:
utf-8 -*-
line; well, with some other encoding. So this particular
reason for needing a custom Python is moot, but wow, there are many
other reasons!)
Unfortunately the account you were given by a mean sys admin has CentOS 6.8 which, while secure and up to date for the series, is so yesterday. It is possible to set up your own fancy Python environments which can include the Python version you require and your own copies of all modules and dependencies.
Here is an example of a procedure to achieve that.
D=/tmp/${USER}
mkdir -p ${D} && cd ${D}
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ${D}/Miniconda3-latest-Linux-x86_64.sh -b -p $D/miniconda3
export PATH=${D}/miniconda3/bin/:${PATH}
conda update conda # Answer prompt "y".
conda create -n mycoolproject python=3.6 anaconda
source ${D}/miniconda3/bin/activate mycoolproject
conda search h5py
conda install -n mycoolproject h5py
conda install -n mycoolproject opencv
conda install -n mycoolproject matplotlib seaborn pandas HDF5 keras tensorflow theano
conda install -n mycoolproject tensorflow-gpu # If you have GPUs.
Or something like this.
miniconda3/bin/conda create -n my_proj python=2.7 pandas seaborn HDF5 matplotlib h5py
If you are using a machine that needs to use a proxy. You need to
setup a configuration file to specify the proxy (nope, it doesn’t use
wget
standard environment variables which is lame).
Save this as ${HOME}/.condarc
because that and only that is what
gets read by conda operations.
# Proxy settings: http://[username]:[password]@[server]:[port]
proxy_servers:
http: http://user:pass@corp.com:8080
https: https://user:pass@corp.com:8080
If the sysadmin set everything up for you, just activate
is enough
to get going. Here’s a complete demonstration of using a prepared
miniconda setup at pro/bin/python/miniconda3
. The "environment" is
called py17
which is a naming convention I’m using to indicate
modern Python circa 2017.
[~]$ echo $0
-tcsh
[~]$ bash # Bash may not be required, but it makes things easier.
:->[ws8-ablab.ucsd.edu][~]$ python --version
Python 2.6.6
:->[ws8-ablab.ucsd.edu][~]$ source /pro/python/miniconda3/bin/activate py17
(py17) :->[ws8-ablab.ucsd.edu][~]$ python --version
Python 3.6.1 :: Anaconda custom (64-bit)
(py17) :->[ws8-ablab.ucsd.edu][~]$ python
Python 3.6.1 |Anaconda custom (64-bit)| (default, May 11 2017, 13:09:58)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import keras
Using TensorFlow backend.
>>> raise SystemExit
(py17) :->[ws8-ablab.ucsd.edu][~]$ source /pro/python/miniconda3/bin/deactivate
:->[ws8-ablab.ucsd.edu][~]$ python --version
Python 2.6.6
And to pretend like none of this ever happened.
source ${D}/miniconda3/bin/deactivate
${D}/miniconda3/bin/conda remove -n mycoolproject --all
Moving A Miniconda Installation
God help you if you need to move the miniconda installation, say to put it on a NAS. This article has some rough ideas that get you started. I tried this basic sed brute force technique and I got it working. I had to search through 3GB of Python installation and change 762 files.
A lot of general help with conda can be found here.
Python 3.x
Python 3 is a bit of a different animal from Python 2. The most useful treatment of the differences between 2 and 3 are by eev.ee here and here.
Here is an interesting technique to see which kind of Python you’re dealing with if you don’t know which interpreter will be used.
import sys if sys.version_info.major < 3: print("Requires Python 3! Dude, this isn't 2008.") raise SystemExit
Here are some features I commonly deal with.
-
print()
now requires full function style. Seems legit. -
print x,
to suppress automatic new lines doesn’t work. Useprint(x, end="")
instead. Also if you don’t want that to be buffered useprint(x,end='',flush=True)
. -
range()
produces a iterablerange
object.xrange()
is therefore superfluous and gone. Considerlist(range(3))
for old behavior; it is a clearer form of[i for i in range(3)]
. And for padding out lists with the same value,[0]*3
is much better yet. -
x // y
is now floor division while/
is proper division as one expects everywhere but Python 2. -
map(ord,"xed")+[0]
This will now produce an error. Before it was fine. Becausemap
produces a special object type you need this:list(map(ord,"xed")+[0])
. -
raw_input()
has been replaced (more or less) withinput()
. I just ditched theraw_
and it worked fine in my application, but I’m not sure about the gory details. -
Exceptions.
-
Was:
except (Exception1,Exception2),target:
-
Now:
except (Exception1,Exception2) as target:
-
-
Tuple arguments are scrapped. This means that
def f(a,(b,c)):
no longer works. This is actually not really needed ever since you can just setb_c
to(b,c)
and pass it in with very little conceptual difference. But lambdas, especially my favorite way to write them don’t work. Solambda (x,y): x+y
must now be something likexy=(x,y);lambda xy:xy[0]+xy[1]
-
The
filter
command produces an iterator instead of a list. -
Dictionaries are now ordered based on insertion.
-
mydict.keys()
now produces an iterable, not a list. Uselist(mydict.keys())
if you need that. -
iteritems
now justitems
as an iterable. -
mydict.has_key('key')
is no longer present. Use'key' in mydict
. -
There is a bytes object that is like a string but not like a string. It is defined with something like
b'myencodedbytes'
. The difference seems to be that a string is an abstract thing humans can think about while abyte
object is that string encoded (with an "encoding") into some ones and zeros that a computer can deal with. Specifically some functions (e.g.subprocess.read()
) now expect Unicode compatible strings but you may get byte strings from normal ASCII encoded files. So here’s an example where Python2 just worked fine reading stuff (formerlythe_content
but changed now tothe_content_bytes
) in but to make it work in Python 3, I had to add this:the_content= the_content_bytes.decode('ASCII')
. Note also that there is a similarstringlikething.encode(X)
function where X is'ASCII'
or'utf-8'
or god knows what else may lurk out there. -
PEP572 adds "the walrus operator" which is
:=
to assign variables within an expression (formally "assignment expressions"). A classic use is checking for a regex match and knowing what exactly was matched:if (match := pattern.search(data)) is not None: use_the(match)
Here’s a decent way to use Vim to change old print
commands into
Python 3 functions. Find the first print
that needs parentheses and
do this.
:s/print \([^ ].*\) *$/print(\1)/
Then you can just find the second one with n
and do @:
. After
that, you can find them with n
and repeat the change with @@
.
One interesting tidbit I encountered was that source code in Python3 can use extended characters as variable names. Normally (maybe universally!) this is asking for trouble, but you can imagine a function where an angle is called alpha but using a real alpha (α). To get this to work, I had to add a special comment in the second line of the program like this.
#!/usr/bin/python3 # vim: set fileencoding=utf-8 : def greeks(α=0.8, β=1., λ=0.): return (α,β,λ)
The full description of this is in PEP0263.
Python3 Resources
Jupyter Notebooks
Ok, I have no use for this whatsoever. Come on now, if you’re a real programmer, you have a real editor and you can do Knuth style literate programming far away from the UI atrocities of web browsers, right? Well, I can. But enough brain-dead projects seem to only operate in the fuzzy realm of Jupyter notebooks, that I sometimes have to play along. Here are the tricks.
apt install jupyter-notebook
jupyter notebook
Here notebook
is a subcommand that runs the server.
If all goes well, this command’s start up messages should give you a URL that includes the security token. The access methods will look something like this from the server output.
To access the notebook, open this file in a browser:
file:///home/xed/.local/share/jupyter/runtime/nbserver-7854-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=4d0f705a7db10c92a53677f75749f3ef2fb508d3dfab6d22
or http://127.0.0.1:8888/?token=4d0f705a7db10c92a53677f75749f3ef2fb508d3dfab6d22
So put one of those in your browser and that’s that.
Paths can be a problem — I just ran the jupyter
server from the
directory I was examining. And then the silliest path problem is that
once you tame all the dependencies that usually are required the last
one will be the project you are playing with itself. I usually am
trying out some thing in /tmp
so Python has no idea about it. To
overcome that, in the block where it tries to import itself, you can
set the sys path by adding lines like this that give it a chance to
find the libraries it needs.
import sys
sys.path.append('/tmp/pybeginner-2.0.4/src/`)
Troubleshooting
Strict Formatting
Black is described as "an uncompromising Python code formatter."
What if you’re using black
and there’s some section that really
needs a strange formatting style? (One good example was a big matrix
laid out in a sensible way.) This works.
# fmt: off
code black should overlook...
# fmt: on
Debug
Python has a built in variable called __debug__
which is normally
true.
$ python -c "print(__debug__)"
True
Note that the __debug__
constant is immutable within a program.
If you want to strip the debugging, perhaps to improve performance in
some way, you can use the -O option at run time.
$ python -O -c "print(__debug__)"
False
Profiling Tools
If you’re looking to improve performance, py-spy looks very interesting. It describes itself like so:
"py-spy is a sampling profiler for Python programs. It lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way."
Another one that looks interesting is Memray which is strangely from Bloomberg. It describes itself like so:
"Memray is a memory profiler for Python. It can track memory allocations in Python code, in native extension modules, and in the Python interpreter itself. It can generate several different types of reports to help you analyze the captured memory usage data."
It also notes: "Memray only works on Linux and cannot be installed on other platforms." I’m sure it won’t be a problem for people to simply upgrade their operating system (like I’ve been told to do for 25 years).
Assert statements
The main use of the __debug__
variable is to control the execution
of assert
statements.
if __debug__ and not expression1: raise AssertionError assert expression1 # Same as previous line. if __debug__ and not expression1: raise AssertionError(expression2) assert expression1,expression2 # Same as previous line.
Problems With PYTHONPATH
Python is pretty solid and most sensible systems take great care with it since it’s often essential to a functional OS (e.g. emerge, yum). But sometimes things happen. Here’s a very nasty situation I had with CentOS 7.
:->[centos7-][~]$ python --version
Python 2.7.12
:->[centos7-][~]$ python -c 'print("ok")'
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
ImportError: No module named site
:-<[centos7-][~]$ export PYTHONPATH=/usr/lib64/python2.7/
:->[centos7-][~]$ export PYTHONHOME=/usr/lib64/python2.7/
:->[centos7-][~]$ python -c 'print("ok")'
ok
Module Search Path
How the module search path is constructed is almost described here. Unfortunately the final item there is "the installation-dependent default". You can mentally substitute the word "voodoo" for "default" for all the good that description does.
To really find out what’s going on you must go to the source code. Here is the authoritative description of how this works for Linux (Windows and Mac are different).
Pip Fails Because Of SSL Stupidity
Public data; probably does not need to be encrypted. But welcome to the modern world where pip expects its sources to be obtained free from the prying eyes of eavesdroppers (who can’t figure out how to do trivial traffic analysis and go get the same file you just got). The problem I ran into was pip suddenly didn’t want to get anything complaining of "Can’t connect to HTTPS URL because the SSL module is not available." This was after compiling Python myself and the problem was I did not have this installed.
apt install libssl-dev
This makes the resulting compiled package sort of brain dead with respect to pip acquisitions.
Linking
This was a very tricky problem to diagnose. Here are two systems which appear to have the exact same Python, but when run, they clearly are not the same.
:->[goodhost][~]$ md5sum /usr/bin/python
49623a632cb4bf3c501f603af80103c4 /usr/bin/python
:->[goodhost.example.edu][~]$ /usr/bin/python --version
Python 2.7.5
:->[messedup][/etc/ld.so.conf.d]# md5sum /usr/bin/python2.7
49623a632cb4bf3c501f603af80103c4 /usr/bin/python2.7
:->[messedup-new][/etc/ld.so.conf.d]# /usr/bin/python2.7 --version
Python 2.7.12
How can this be? After checking all possibilities, path and symbolic link issues were not relevant. This problem is why simply reinstalling Python may not fix its incorrect behavior. The answer, it turns out, is that the shared library linking was messed up on the non-working machine.
:->[goodhost][~]$ ldd /bin/python2.7
linux-vdso.so.1 => (0x00007fff494c5000)
libpython2.7.so.1.0 => /lib64/libpython2.7.so.1.0 (0x00007f03e28f1000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f03e26d5000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f03e24d0000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007f03e22cd000)
libm.so.6 => /lib64/libm.so.6 (0x00007f03e1fcb000)
libc.so.6 => /lib64/libc.so.6 (0x00007f03e1c08000)
/lib64/ld-linux-x86-64.so.2 (0x00007f03e2cdf000)
:->[messedup][/etc/ld.so.conf.d]# ldd /usr/bin/python2.7
linux-vdso.so.1 => (0x00007ffca97df000)
libpython2.7.so.1.0 => /public/apps/coot-0.8.5/lib/libpython2.7.so.1.0 (0x00007f1554076000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1553e59000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f1553c55000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007f1553a52000)
libm.so.6 => /lib64/libm.so.6 (0x00007f1553750000)
libc.so.6 => /lib64/libc.so.6 (0x00007f155338d000)
/lib64/ld-linux-x86-64.so.2 (0x00007f155449e000)
I had originally assumed that if a core component like Python was
reinstalled from clean packages, libraries and all, that it would have
to behave like a clean installation. But this is not true. If the
ldcache is set to link Python to some spurious installation then it
might not work. Or worse, barely work giving a maddening situation to
troubleshoot. This problem arose when a program (coot) tried to
allow for its own separate version of Python to be linked to. The
moral of the story is to use ldd
to check the Python executable
before trying to diagnose things like PYTHONPATH and PYTHONHOME which,
if the linking is bad, may not be able to help no matter what they’re
set to.