biais.org

Wednesday 31 January 2007

Spelling correction using the Python Natural Language Toolkit (nltk)

Natural Language Toolkit (nltk) is an amazing library to play with natural language. I read an article about spelling correction, and I wanted to use nltk to code something useful. The purpose of the following code is to implement a similar "Did You Mean" feature used by search engines. Note: to run the code, you need to install nltk.

How it works:

  • Learning. From a word dictionary or a corpus of sentences: create a dict containing for each word specialhash, the word and its number of occurrences.
  • Testing. The tested word is hashed, we get the associated words in the learned db and print them sorted by number of occurrences.

Notes:

  • Porter class create a stemmer object with a stem function returning the word stem (removing morphological affixes). For example stem('operation'), stem('operator'), stem('operating') all return the string 'oper'.
  • we expect words and erroneous words collide in the dictionary to return the correct ones.
  • brown.raw() return a generator that iterates over the brown corpus sentences.
from nltk_lite.stem.porter import Porter
from nltk_lite.corpora import brown
 
import sys
from collections import defaultdict
import operator
 
def sortby(nlist ,n, reverse=0):
    nlist.sort(key=operator.itemgetter(n), reverse=reverse)
 
class mydict(dict):
    def __missing__(self, key):
        return 0
 
class DidYouMean:
    def __init__(self):
        self.stemmer = Porter()
 
    def specialhash(self, s):
        s = s.lower()
        s = s.replace("z", "s")
        s = s.replace("h", "")
        for i in [chr(ord("a") + i) for i in range(26)]:
            s = s.replace(i+i, i)
        s = self.stemmer.stem(s)
        return s
 
    def test(self, token):
        hashed = self.specialhash(token)
        if hashed in self.learned:
            words = self.learned[hashed].items()
            sortby(words, 1, reverse=1)
            if token in [i[0] for i in words]:
                return 'This word seems OK'
            else:
                if len(words) == 1:
                    return 'Did you mean "%s" ?' % words[0][0]
                else:
                    return 'Did you mean "%s" ? (or %s)' \
                           % (words[0][0], ", ".join(['"'+i[0]+'"' \
                                                      for i in words[1:]]))
        return "I can't find similar word in my learned db"
 
    def learn(self, listofsentences=[], n=2000):
        self.learned = defaultdict(mydict)
        if listofsentences == []:
            listofsentences = brown.raw()
        for i, sent in enumerate(listofsentences):
            if i >= n: # Limit to the first nth sentences of the corpus
                break
            for word in sent:
                self.learned[self.specialhash(word)][word.lower()] += 1
 
def demo():
    d = DidYouMean()
    d.learn()
    # choice of words to be relevant related to the brown corpus
    for i in "birdd, oklaoma, emphasise, bird, carot".split(", "):
        print i, "-", d.test(i)
 
if __name__ == "__main__":
    demo()

outputs:

birdd - Did you mean "birds" ? (or "bird")
oklaoma - Did you mean "oklahoma" ?
emphasise - Did you mean "emphasize" ? (or "emphasizes", "emphasizing")
bird - This word seems OK
carot - I can't find similar word in my learned db

It's a minimalist "Did you mean" implementation, google or yahoo use context and number of results to give you the best similar words. Also, here the specialhash method is an empirical way to reduce words. Rather than using a dictionary and specialhash, we may use simple list and a similarity method that computes a percentage of similarity between two words, but it would be less efficient to run the test method.

Tuesday 30 January 2007

Simple timeit decorator

A very simple decorator to notify users (no profiling) about the execution time of some parts of your code.

import time
 
def timemeth(func):
    "Internal class method timing decorator"
    def _inner(self, *args, **kw):
        start = time.time()
        res = func(self, *args, **kw)
        self.timed = time.time() - start
        return res
    return _inner
 
class TimedClass:
    def __init__(self):
        self.timed = 0
 
class Sample(TimedClass):
    @timemeth
    def learn(self):
        for i in range(1000000):
            i * i
 
if __name__ == "__main__": 
    s = Sample()
    print "Learning sample..."
    s.learn()
    print "Learning done ! (%.2f secs)" % s.timed

outputs:

$ python timedclass.py
Learning sample...
Learning done ! (2.01 secs)

Sunday 28 January 2007

Python sorting efficiency

I read PerformanceTips page on the official Python wiki. I often use this kind of code to sort list of lists, or list of tuples:

def sortby(nlist, n):
    nlist.sort(lambda x,y:cmp(x[n], y[n]))
 
sample = [(1, "c"), (2, "t"), (0, "d")]
sortby(sample, 0)
print "sorted by first element", sample
sortby(sample, 1)
print "sorted by second element", sample

outputs:

sorted by first element [(0, 'd'), (1, 'c'), (2, 't')]
sorted by second element [(1, 'c'), (0, 'd'), (2, 't')]

Ok, that's consice but, is it efficient ? The PerformanceTips page, give 2 different functions to do the same (called in the following listing sortby1 and sortby2). Note: the profileit decorator is described in this post (Python profiling decorator).

import operator
from random import random
from profileit import profileit
 
@profileit(5)
def sortby1(nlist, n):
    nlist[:] = [(x[n], x) for x in nlist]
    nlist.sort()
    nlist[:] = [val for (key, val) in nlist]
 
@profileit(5)
def sortby2(nlist ,n):
    nlist.sort(key=operator.itemgetter(n))
 
@profileit(5)
def sortby3(nlist, n):
    def __sort_by(n):
        def _sort_by(a, b):
            return cmp(a[n], b[n])
        return _sort_by
    nlist.sort(__sort_by(n))
 
# Same as above but much simpler using lambda
@profileit(5)
def sortby4(nlist, n):
    nlist.sort(lambda x,y:cmp(x[n], y[n]))
 
testlist1 = [(random(), random()) for i in range(10000)]
testlist2 = testlist1[:]
testlist3 = testlist1[:]
testlist4 = testlist1[:]
print "===================== sortby 1"
sortby1(testlist1, 0)
print "===================== sortby 2"
sortby2(testlist2, 0)
print "===================== sortby 3"
sortby3(testlist3, 0)
print "===================== sortby 4"
sortby4(testlist4, 0)

outputs:

===================== sortby 1
>>>---- Begin profiling print
         1 function calls in 0.024 CPU seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.024    0.024    0.024    0.024 psycoperfs.py:5(sortby1)
        0    0.000             0.000          profile:0(profiler)


>>>---- End profiling print
===================== sortby 2
>>>---- Begin profiling print
         1 function calls in 0.111 CPU seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.111    0.111    0.111    0.111 psycoperfs.py:11(sortby2)
        0    0.000             0.000          profile:0(profiler)


>>>---- End profiling print
===================== sortby 3
>>>---- Begin profiling print
         119928 function calls in 0.320 CPU seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.245    0.245    0.320    0.320 psycoperfs.py:15(sortby3)
   119926    0.075    0.000    0.075    0.000 psycoperfs.py:18(_sort_by)
        1    0.000    0.000    0.000    0.000 psycoperfs.py:17(__sort_by)
        0    0.000             0.000          profile:0(profiler)


>>>---- End profiling print
===================== sortby 4
>>>---- Begin profiling print
         119927 function calls in 0.401 CPU seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.282    0.282    0.401    0.401 psycoperfs.py:23(sortby4)
   119926    0.118    0.000    0.118    0.000 psycoperfs.py:25(<lambda>)
        0    0.000             0.000          profile:0(profiler)


>>>---- End profiling print

The first version is the fastest, it's called Schwartzian transform. The two last are really slower than the two others. Beafore running the profiling test, I supposed the second one will be the fastest, I was wrong.

Friday 26 January 2007

CAPTCHA resistance test

From wikipedia:
CAPTCHA: (an initialism for "Completely Automated Public Turing test to tell Computers and Humans Apart") is a type of challenge-response test used in computing to determine whether or not the user is human. A common type of CAPTCHA requires that the user type the letters of a distorted image, sometimes with the addition of an obscured sequence of letters or digits that appears on the screen. CAPTCHAs are used to prevent bots from performing actions which might be used to make a profit on the part of the person running a bot.

There is many project aiming to defeat CAPTCHA using OCR or other AI algorithm: AiCaptcha, Breaking visual CAPTCHA. Some spammers also uses social engineering (manipulate people): solving and creating captchas with free porn.

After reading pwntcha, it's seems really easy to pass through CAPTCHA, but do spam bots really use this kinds of advanced solutions to get email adresses or post comments on blogs ? I didn't find any test on the web. That's why I want to test spam bots that come here. I specially created the following email addresses and I'm hoping spam bots will try to send me emails:

  1. email: ceresistan@biais.org
  2. email: recetansis@biais.org
  3. email:
  4. email:
  5. email:
  6. email:

Note: each address use the same set of characters and aren't in my english dictionnary. All addresses are neither used nor published elsewhere. I also created a "not yet" published address to testify the results. I will post my results when size of mailboxes become significant.

Now, it's time to invoke spam bots. Come on, suck this page, use your regexp, algorithm and send me viagra ads ! I hope to receive a least one mail in the first mailbox ;)

EDIT: First results here, More results here. I would like to digg a bit more in that spam data, but it takes more time...

Wednesday 24 January 2007

Abused media computational art gallery

Beautiful piece of art on the Abused Media media website. Here is one of the gallery. The artist use Processing and Nodebox, that I just discover.

From Nodebox website:

NodeBox is a Mac OS X application that lets you create 2D visuals (static, animated or interactive) using Python programming code and export them as a PDF or a Quicktime movie. NodeBox is free and well-documented.

Tuesday 23 January 2007

Python threads and os.chdir

I had some problem using os.chdir and threads. In fact, current directory is shared between threads. Here is a simple python snippets to demonstrate it.

from threading import Thread
import os
import time
 
class ThreadTest(Thread):
    def __init__(self):
        Thread.__init__(self)
 
    def run(self):
        for i in range(2):
            time.sleep(1)
            print "name=%s, cwd=%s" % (self.getName(), os.getcwd())
            os.chdir("..")
 
t1 = ThreadTest()
t2 = ThreadTest()
t1.start() # start first
time.sleep(0.5) # .5 second shift
t2.start()

outputs:

name=Thread-1, cwd=/home/max/work/blogcode
name=Thread-2, cwd=/home/max/work
name=Thread-1, cwd=/home/max
name=Thread-2, cwd=/home

If chdir wasn't shared between threads, we could expect this:

name=Thread-1, cwd=/home/max/work/blogcode
name=Thread-2, cwd=/home/max/work/blogcode
name=Thread-1, cwd=/home/max
name=Thread-2, cwd=/home/max

In a multi threaded program, try to avoid os.chdir, but if you really need to change current directory to run external program or whatever, use locks, . It depends why you need threads, but you may want to fork instead. Current directory is process dependant. Same program as above using fork:

import os
import time
 
def run(t, name):
    for i in range(2):
        time.sleep(t)
        print "name=%s, cwd=%s" % (name, os.getcwd())
        os.chdir("..")
 
pid = os.fork()
if pid == 0:
    run(0.5, "father")
else:
    run(0.5, "child-" + str(pid))

outputs:

name=father, cwd=/home/max/work/blogcode
name=child-635, cwd=/home/max/work/blogcode
name=father, cwd=/home/max/work
name=child-635, cwd=/home/max/work

That's what we expected first when we write the threaded version. For more informations about fork, check the standard library: Process Management.

Saturday 20 January 2007

Python profiling decorator

I often profile or time my functions using hotshot profiler and timeit modules from the stantard library. Recently, I wrote this simple profiler decorator:

import hotshot, hotshot.stats
 
def profileit(printlines=1):
    def _my(func):
        def _func(*args, **kargs):
            prof = hotshot.Profile("profiling.data")
            res = prof.runcall(func, *args, **kargs)
            prof.close()
            stats = hotshot.stats.load("profiling.data")
            stats.strip_dirs()
            stats.sort_stats('time', 'calls')
            print ">>>---- Begin profiling print"
            stats.print_stats(printlines)
            print ">>>---- End profiling print"
            return res
        return _func
    return _my

Usage:

def mip():
    a = 0
    for i in range(10000):
        a += 1
    return a
 
@profileit(20)
def mop():
    a = 0
    for i in range(100):
        a += mip()
    return a
print mop()

Results:

>>>---- Begin profiling print
         101 function calls in 0.580 CPU seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.578    0.006    0.578    0.006 mip.py:29(mip)
        1    0.002    0.002    0.580    0.580 mip.py:35(mop)
        0    0.000             0.000          profile:0(profiler)


>>>---- End profiling print
1000000

Now the same code with printlines=1 settings (@profileit(1)):

>>>---- Begin profiling print
         101 function calls in 0.587 CPU seconds

   Ordered by: internal time, call count
   List reduced from 3 to 1 due to restriction <1>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.585    0.006    0.585    0.006 mip.py:29(mip)


>>>---- End profiling print
1000000

Tuesday 16 January 2007

Django generic view and performance problem

Django generic views let the developper to quickly write common pattern to create, update, delete and show objects. There is a common problem with the create / update generic view discussed on the django developpers mail list with solutions on the django wiki. Generic view create_object generates a HttpResponse from a model name. You can then write a basic template and use the form generated by the generic view.

A basic example starting with the model.py:

class Thing(models.Model):
    name = models.CharField(maxlength=100)

views.py (note you can directly write it in your urls.py file as in the django documentation):

from myapp.models import Thing
 
def create_thing(request):
    return django.views.generic.create_update.create_object(Thing, 
             template='create.html', post_save_redirect="..")

In the urls.py file:

(r'^creatething/', "myapp.views.create_thing")

In the template create.html:

<form method="post" action="">
  <p>
    <label for="id_name">Name:</label> {{ form.name }}
  </p>
  <input type="submit" />
</form>

And, that's all ! Now, we are going to update our models:

class Element(models.Model):
    name = models.CharField(maxlength=100)
 
class Thing(models.Model):
    element = models.ForeignKey(Element)
    name = models.CharField(maxlength=100)

The default behavior of the generic_view is to generate a form with a combobox to select the foreign element from a list. The problem here, if there is a lot of Element objects, django needs to get all data from the Element table and generate an enormous combobox. We are solving this problem by using the raw_id_admin option. Note: the problem go upstream to the generic view from the AutomaticManipulator.

Supposing we are creating Thing from an element_detail page.

(r'^(?P<elementname>\w+)/creatething/$', "myapp.views.create_thing")

Add the raw_id_admin option:

class Thing(models.Model):
    element = models.ForeignKey(Element, raw_id_admin=True)
    name = models.CharField(maxlength=100)

Modify the views.py

from myapp.models import Thing, Other
 
def create_thing(request, elementname):
    element = get_object_or_404(Element, name=elementname)
    return django.views.generic.create_update.create_object(Thing, 
        template='create.html', context={'element':element}, 
        post_save_redirect="..")

Modify the template:

<form method="post" action="">
  <input type="hidden" id="id_element" name="element" value="{{element.id}}" />
  <p>
    <label for="id_name">Name:</label> {{ form.name }}
  </p>
  <input type="submit" />
</form>

Ok, the performance problem for creating a new Thing is solved, but you need to find the Element before. How ?

Monday 15 January 2007

Graphical network representation

Research project for visual representation of complex network. GNOM project. Produce great artwork. From there website:

Network topology has become a paradigmatic field in sciences (as well as mathematics, social sciences and arts) since it studies the way in which many they are organized. The interaction between genes is a specific example of scientific studies where this is demonstrated. The network structure formed by molecular interactions defines functional genetic aspects of great importance. Scientists in biology, genetics and bioinformatics whose fields of research include genetic networks, have to deal with huge amounts of relational information. The management of this information, its visualisation and in general, the way in which scientists relate to it, determines the direction of their research. The information is inert, scientists have to establish a relationship with it, and in that process play role many parameters, some can be controlled and some can even be created.

Sunday 14 January 2007

Metaheuristic: Particle Swarm Optimization (PSO) in Python

From : PSO wikipedia article

Particle swarm optimization (PSO) is a form of swarm intelligence. Imagine a swarm of insects or a school of fish. If one sees a desirable path to go (e.g., for food, protection, etc.) the rest of the swarm will be able to follow quickly even if they are on the opposite side of the swarm. On the other hand, in order to facilitate felicitous exploration of the search space, typically one wants to have each particle to have a certain level of "craziness" or randomness in their movement, so that the movement of the swarm has a certain explorative capability: the swarm should be influenced by the rest of the swarm but also should independently explore to a certain extent. (This is a manifestation of the basic exploration-exploitation tradeoff that occurs in any search problem.)

PSO is a metaheuristic (also called global optimization algorithm), for dealing with problems in which a best solution can be represented as a point or surface in an n-dimensional space. Other famous metaheuristics:

PSO is one of the best to find best solution of continuous functions (for example tabu search is better to find best solution of discrete functions). This article is about writing a PSO in Python for solving unary (arity 1) functions, code is easily generalizable to solve any arity functions. Each red point is an element of the swarm. The blue point is the global best element.

Python PSO code, nothing difficult to write, the complete code including PygamePrinter, is available here under GPL.

from random import uniform
 
class PSO:
    def __init__(self, pop_size, min, max, phi, phi2, lr, maxiter, func):
        self.func = func
        self.pop = []
        # 0: position, 1: velocity, 2: fitness
        self.min = min
        self.max = max
        for i in xrange(pop_size):
            self.pop.append([uniform(self.min, self.max), 
                                   uniform(-1, 1), 0])
        self.evaluate()
        self.gdest = self.pop[0]
        self.pdest = self.pop[0]
        self.phi = phi
        self.phi2 = phi2
        self.lr = lr
        self.maxiter = maxiter
    
    def update_velocity(self):
        for i in self.pop:
            i[1] = self.lr * i[1] + uniform(0, self.phi) \
                    * (self.pdest[0] - i[0]) + uniform(0, self.phi2) \
                    * (self.gdest[0] - i[0])
 
    def evaluate(self):
        for i in self.pop:
            i[2] = self.func(i[0])
 
    def move(self):
        for i in self.pop:
            i[0] += i[1]
 
    def __cmp_by_fitness(self, a, b):
        return cmp(a[2], b[2])
    
    def run(self, update_func=False):
        for i in xrange(self.maxiter):
            if update_func:
                update_func()
            self.update_velocity()
            self.move()
            self.evaluate()
            self.pop.sort(self.__cmp_by_fitness, reverse=0)
            self.pdest = self.pop[0]
            if self.pdest[2] < self.gdest[2]:
                self.gdest = self.pdest

Printed run on this test function:

func = lambda x:math.cos(x) * math.exp(math.sin(x)) * math.sin(x)  / 1.5

Note the global best element is not computed at step 0:

Friday 12 January 2007

Dojo chaining effects and callback use

Dojo is an opensource javascript library that help web developper to write browser side code. It provides widgets, effects, graphic renderers and also helpers to write AJAX enabled apps.

This is not an really an AJAX example, just a simple effect created in chaining two basic dojo effects. This is not the case here, but in my application loadHandler function is called when data arrive from the webserver.

    var sentData;
 
    function changeScore() {
       dojo.byId("score").innerHTML = sentData[0];
    }
 
    function loadHandler(type, data, evt) {
       sentData = data;
       fo = dojo.lfx.html.fadeOut("loginButton", 750, 0, changeScore);
       fi = dojo.lfx.html.fadeIn("loginButton", 750);
       dojo.lfx.chain(fo, fi).play();
    }

The last argument of dojo.lfx.html.fadeOut is the callback called when the effect have been totally played. In this case, the callback will be called after fo finish and just before fi begins.

You can try it here: http://www.biais.org/demo/dojo-chaining-effect.html

Python SVG generated arcs

A small and humble script that generate SVG images like the illustration above. Look at the code and a sample SVG output.

Thursday 11 January 2007

Django custom template filters

Django template filters are like unix pipes. In this example, we want to create a filter that take a $Date$ Subversion keyword and modify the date and time string format. Note: that also work with CVS, because CVS and Subversion shares the same keyword syntax.

Create a templatetags directory in you application folder, touch a __init_.py and a file where you are going to write your filters, for this example svn_keyword_filters.py

You should have a directory tree looking like:

myapp/
    views.py
    templatetags/
        __init__.py
        svn_keyword_filters.py

In svn_keyword_filters.py write:

from django import template
register = template.Library()
 
@register.filter("svndate")
def svndate(value):
    return value[7:26]

In the template add something like:

<p>
  {% load svn_keyword_filters  %}
  {{ "$Date: 2006-11-02 11:45:14 +0100 (Thu, 02 Nov 2006) $"|svndate }}
</p>

Produces this html output

<p>
  2006-11-02 11:45:14
</p>

Check the Django documentation about custom template filters.

Wednesday 10 January 2007

Visit Python Abstract Syntax Tree

Python standard library allows developper to manipulate Python Abstract Syntax Tree (AST) and bytecode objects. I wondered how to write a Python code pretty printer, I think the best way is to use the ASTVisitor from compiler.visitor module. Warning, the following code is only for demonstration purposes; for example visitConst will badly process string constants:

import compiler
 
class CodePrinter:
    def __init__(self):
        self.src = ''
 
    def visitName(self,t):
        self.src += t.name
 
    def visitConst(self,t):
        self.src += str(t.value)
 
    def visitStmt(self, t):
        for i in t:
            a = pretty_print(i)
            self.src += a + "\n"
 
    def visitAssName(self, t):
        self.src += t.name + " = "
 
def pretty_print(node):
    myvisitor = CodePrinter()
    # compiler.walk return the visitor instance : 2nd arg
    return compiler.walk(node, myvisitor).src

A simple example to use the previous pretty printer

test = """hello =              1; hey = 2
ola = 3"""
# Parse the test string, return the AST of the parsed string
node = compiler.parse(test)
print pretty_print(node)

produces the following output:

hello = 1
hey = 2
ola = 3

All statements (visitStmt) now finish by a newline and successive white spaces disapeared.

Now, you want to write your own visitor but you don't want to miss one of the AST node. You may try to use the walker class ExampleASTVisitor from compiler.visitor instead of the default one (ASTVisitor). In verbose mode, ExampleASTVisitor will produce a text output of node you may forget to visit. Let modify the pretty_print function:

def pretty_print(node):
    myvisitor = CodePrinter()
    example_walker = compiler.visitor.ExampleASTVisitor()
    example_walker.VERBOSE = 1
    return compiler.walk(node, myvisitor, walker=example_walker).src

The previous example with the modified function produces:

<__main__.CodePrinter instance at 0xb7cc306c>
compiler.ast.Module
     [...]
<__main__.CodePrinter instance at 0xb7cc31ac>
compiler.ast.Assign
     [...]

hello = 1
hey = 2
ola = 3

It seems we forgot to visit Module and Assign nodes. Let's print the AST of the parsed code:

>>> test = """hello =              1; hey = 2
... ola = 3"""
>>> compiler.parse(test)
Module(None, 
  Stmt([
    Assign([AssName('hello', 'OP_ASSIGN')], Const(1)), 
    Assign([AssName('hey', 'OP_ASSIGN')], Const(2)), 
    Assign([AssName('ola', 'OP_ASSIGN')], Const(3))
  ])
)

In this print we see 1 Module and 3 Assign nodes. If we need them in our visitor, we have to had visitModule(self, t) and visitAssign(self, t) methods.

Tuesday 9 January 2007

Chained comparison

I just read PEP 207 a proposal for rich comparisons that allow you to write this kind of comparisons

>>> 1 < 2 < 3
True
>>> 2 < 3 < 2
False

This comparisons have the same meaning as in mathematics rather than the meaning in C. The previous code is equivalent to

>>> 1 < 2 and 2 < 3
True

In C, the results are not really what a mathematician expects. Here is a C snippet for demonstration:

int test1() {
    return 1 < 2 < 3;
} /* return 1 */
 
int test2() {
    return 2 < 3 < 2;
} /* return 1 */
 
int test3() {
    return 1 < 3 < 0;
} /* return 0*/

Following operators priority, the "evaluation" of 2 < 3 < 2 is equivalent to (2 < 3) < 2. Evaluation of (2 < 3) return 1, then 1 < 2 return 1.

Returning to the Python code, the two version are exactly equivalent when inside expression are side-effect free. In fact the inside expressions are evaluated only one time using chained comparison and two times when you write the "and" version.

# Function using a local variable for c-static-like variable hack
def mop(m=[1]):
    m[0] += 1
    return m[0]
1 < mop() < 3 # True 

First call to "mop" return 2.

def mop(m=[1]):
    m[0] += 1
    return m[0]
1 < mop() and mop() < 3 # False 

First call to "mop" return 2, and the second return 3. 3 < 3 is False, the "and" conclude ;). From the page PythonSpeed in the official Python wiki, chained comparisons are faster than the "and" version.

Sunday 7 January 2007

John Maeda

John Maeda is a scientist and an artist. He works at MIT media laboratory and produced great static and dynamic computational art. MaedaStudio.

Saturday 6 January 2007

Django framework and complex DB request

I'm using the Django framework for my project at work. Django is really great but sometimes I have difficulties to find the "good way". Today, I discovered how to do complex lookups. Starting with this models :
from django.db import models
class Recipe:
     name = models.CharField(unique=True)
class Ingredient:
     recipe = models.ForeignKey(Recipe)
     name = models.CharField()
     quantity = models.FloatField(max_digits=5, decimal_places=2)
You're looking for all recipes containing ingredient name begining by chick or named turkey ?
from myapp.models import Recipe
from django.db.models import Q
 
Recipe.objects.filter(Q(ingredient__name__startswith="chick")\
                     |Q(ingredient__name="turkey"))
Or maybe recipes with name containing "thanks giving" and using chicken as ingredient:
Recipe.objects.filter(Q(name__icontains="thanks giving")\
                     &Q(ingredient__name="chicken"))
Read more in the Django documentation.

Gallery of Computation

Art and computing. Using the Processing language. Gallery, animation and printings here : http://www.complexification.net/gallery/

Default Dict

Bored to write 4 lines of code when you can write only one ? Yes, me too. I don't like the following code, when my fingers type this kind of code, my brain tell me "That's obvious I want to do that, I shouldn't write it".
d = {"chicken": 1, "rabbits": 2}
l = ["chicken", "capcat"]
for i in l:
    if not i in d:
        d[i] = 1
    else:
        d[i] += 1
The following class appease a part of my mind.
class DefaultDict(dict):
    "Dictionary with a default value for key that does not exist"
    def __init__(self, default, iterable_init=[], **kwds):
        super(DefaultDict, self).__init__(iterable_init)
        self.update(kwds)
        self.default = default
    def __getitem__(self, key):
        return self.get(key, self.default)
    def __copy__(self):
        return DefaultDict(self.default, self)
Usage:
d = DefaultDict(0)
d["capcat"] += 1

First post

biais.org blog is born ! I will try to expose Python code snippets, pieces of art I like (computational or classic art), not my life.