Python sorting efficiency
By Maxime Biais, Sunday 28 January 2007 at 13:53 :: Python :: #23 :: rss
I read PerformanceTips page on the official Python wiki. I often use this kind of code to sort list of lists, or list of tuples:
def sortby(nlist, n): nlist.sort(lambda x,y:cmp(x[n], y[n])) sample = [(1, "c"), (2, "t"), (0, "d")] sortby(sample, 0) print "sorted by first element", sample sortby(sample, 1) print "sorted by second element", sample
outputs:
sorted by first element [(0, 'd'), (1, 'c'), (2, 't')] sorted by second element [(1, 'c'), (0, 'd'), (2, 't')]
Ok, that's consice but, is it efficient ? The PerformanceTips page, give 2 different functions to do the same (called in the following listing sortby1 and sortby2). Note: the profileit decorator is described in this post (Python profiling decorator).
import operator from random import random from profileit import profileit @profileit(5) def sortby1(nlist, n): nlist[:] = [(x[n], x) for x in nlist] nlist.sort() nlist[:] = [val for (key, val) in nlist] @profileit(5) def sortby2(nlist ,n): nlist.sort(key=operator.itemgetter(n)) @profileit(5) def sortby3(nlist, n): def __sort_by(n): def _sort_by(a, b): return cmp(a[n], b[n]) return _sort_by nlist.sort(__sort_by(n)) # Same as above but much simpler using lambda @profileit(5) def sortby4(nlist, n): nlist.sort(lambda x,y:cmp(x[n], y[n])) testlist1 = [(random(), random()) for i in range(10000)] testlist2 = testlist1[:] testlist3 = testlist1[:] testlist4 = testlist1[:] print "===================== sortby 1" sortby1(testlist1, 0) print "===================== sortby 2" sortby2(testlist2, 0) print "===================== sortby 3" sortby3(testlist3, 0) print "===================== sortby 4" sortby4(testlist4, 0)
outputs:
===================== sortby 1
>>>---- Begin profiling print
1 function calls in 0.024 CPU seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.024 0.024 0.024 0.024 psycoperfs.py:5(sortby1)
0 0.000 0.000 profile:0(profiler)
>>>---- End profiling print
===================== sortby 2
>>>---- Begin profiling print
1 function calls in 0.111 CPU seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.111 0.111 0.111 0.111 psycoperfs.py:11(sortby2)
0 0.000 0.000 profile:0(profiler)
>>>---- End profiling print
===================== sortby 3
>>>---- Begin profiling print
119928 function calls in 0.320 CPU seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.245 0.245 0.320 0.320 psycoperfs.py:15(sortby3)
119926 0.075 0.000 0.075 0.000 psycoperfs.py:18(_sort_by)
1 0.000 0.000 0.000 0.000 psycoperfs.py:17(__sort_by)
0 0.000 0.000 profile:0(profiler)
>>>---- End profiling print
===================== sortby 4
>>>---- Begin profiling print
119927 function calls in 0.401 CPU seconds
Ordered by: internal time, call count
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.282 0.282 0.401 0.401 psycoperfs.py:23(sortby4)
119926 0.118 0.000 0.118 0.000 psycoperfs.py:25(<lambda>)
0 0.000 0.000 profile:0(profiler)
>>>---- End profiling print
The first version is the fastest, it's called Schwartzian transform. The two last are really slower than the two others. Beafore running the profiling test, I supposed the second one will be the fastest, I was wrong.




Comments
1. Sunday 28 January 2007 at 18:15, by paddy3118
2. Sunday 28 January 2007 at 19:59, by maxime
3. Monday 29 January 2007 at 09:02, by maxime
4. Thursday 29 March 2007 at 21:41, by Rhamphoryncus
5. Tuesday 25 March 2008 at 03:10, by A Kong
Write your comment