I read PerformanceTips page on the official Python wiki. I often use this kind of code to sort list of lists, or list of tuples:
def sortby(nlist, n):
nlist.sort(lambda x,y:cmp(x[n], y[n]))
sample = [(1, "c"), (2, "t"), (0, "d")]
sortby(sample, 0)
print "sorted by first element", sample
sortby(sample, 1)
print "sorted by second element", sample
outputs:
sorted by first element [(0, 'd'), (1, 'c'), (2, 't')]
sorted by second element [(1, 'c'), (0, 'd'), (2, 't')]
Ok, that's consice but, is it efficient ? The PerformanceTips page, give 2 different functions to do the same (called in the following listing sortby1 and sortby2). Note: the profileit decorator is described in this post (Python profiling decorator).
import operator
from random import random
from profileit import profileit
@profileit(5)
def sortby1(nlist, n):
nlist[:] = [(x[n], x) for x in nlist]
nlist.sort()
nlist[:] = [val for (key, val) in nlist]
@profileit(5)
def sortby2(nlist ,n):
nlist.sort(key=operator.itemgetter(n))
@profileit(5)
def sortby3(nlist, n):
def __sort_by(n):
def _sort_by(a, b):
return cmp(a[n], b[n])
return _sort_by
nlist.sort(__sort_by(n))
# Same as above but much simpler using lambda
@profileit(5)
def sortby4(nlist, n):
nlist.sort(lambda x,y:cmp(x[n], y[n]))
testlist1 = [(random(), random()) for i in range(10000)]
testlist2 = testlist1[:]
testlist3 = testlist1[:]
testlist4 = testlist1[:]
print "===================== sortby 1"
sortby1(testlist1, 0)
print "===================== sortby 2"
sortby2(testlist2, 0)
print "===================== sortby 3"
sortby3(testlist3, 0)
print "===================== sortby 4"
sortby4(testlist4, 0)
outputs:
===================== sortby 1
>>>---- Begin profiling print
1 function calls in 0.024 CPU seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.024 0.024 0.024 0.024 psycoperfs.py:5(sortby1)
0 0.000 0.000 profile:0(profiler)
>>>---- End profiling print
===================== sortby 2
>>>---- Begin profiling print
1 function calls in 0.111 CPU seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.111 0.111 0.111 0.111 psycoperfs.py:11(sortby2)
0 0.000 0.000 profile:0(profiler)
>>>---- End profiling print
===================== sortby 3
>>>---- Begin profiling print
119928 function calls in 0.320 CPU seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.245 0.245 0.320 0.320 psycoperfs.py:15(sortby3)
119926 0.075 0.000 0.075 0.000 psycoperfs.py:18(_sort_by)
1 0.000 0.000 0.000 0.000 psycoperfs.py:17(__sort_by)
0 0.000 0.000 profile:0(profiler)
>>>---- End profiling print
===================== sortby 4
>>>---- Begin profiling print
119927 function calls in 0.401 CPU seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.282 0.282 0.401 0.401 psycoperfs.py:23(sortby4)
119926 0.118 0.000 0.118 0.000 psycoperfs.py:25(<lambda>)
0 0.000 0.000 profile:0(profiler)
>>>---- End profiling print
The first version is the fastest, it's called Schwartzian transform. The two last are really slower than the two others. Beafore running the profiling test, I supposed the second one will be the fastest, I was wrong.