Monday 17 June 2013

Smart caching in Python

Prasoon and I are currently looking for ways to reduce the computational load of calculating the DCM(Direction Cosine Matrice) of one frame with respect to another. This DCM is useful to mathematically understand how one reference frame is oriented with respect to another. The reason why we need to find a way is, if we are talking about something like a 100 frames each oriented in some random way with respect to one of its predecessors, it gets pretty messy- for your computer AND you.
We thought of various ways, but apart from some very inefficient methods like 'World' frames (to store these DCMs for a particular run of the program) and using a global frame as an intermediate, we really couldn't find anything much. Then Gilbert suggested looking at the way SymPy does caching of some of its methods.
We may or may not use this technique eventually, but I learnt a cool way to implement caching in Python-using dictionaries and method decorators. If you don't know what method decorators are, have a look at this - decorators are essentially ways to mutilate(or rather, add to) the functioning of a function/class.

I am going to try and show a simplified version of what I found in the SymPy source, their original method is close but much more detailed- for purposes of immutability and debugging and stuff.

The components involved in this technique are these - 

1) CACHE = {}
This is a Python dictionary (empty intially). This is what we will use as our cache.

2) The caching decorator -

def globalcached(function):
    @wraps(function)
    def wrapper(*args):
        tup = tuple([function] + [args])
        if tup in CACHE:
            print('In cache')
            return CACHE[tup]
        else:
            print('Not in cache')
            temp = function(*args)
            CACHE[tup] = temp
            return temp
    return wrapper

Understand what the above will do. wraps is a decorator from the functools Python library- its there by default. It enables the 'decorated' function to retain the docstring of the original function- we would probably want to retain the docs.
As you might have understood, 'wrapper' is the modified function that will get executed in place of the original one.
Now, tup is a tuple (tuple is immutable in python and hence hashable. You can only use hashable objects as dictionary keys) containing info about the following-
i. The function being called
ii. The arguments of the function
Thus, I know exactly what function it is, and what it is going to work on. Hence, the code will not directly let the original function go computing again- that may be a waste of resources. It first checks in CACHE, whether it has that particular tup as one its keys. If its not there, then this is the first time that function is being called with those arguments. So, I will calculate the needed quantity, but before returning it, I will store the calculated stuff with tup as the key. Hence, the next time the same function gets called with the very same arguments, the decorator will have a 'deja vu' and return the value- from CACHE. Ingenious, isnt it?
I don't know how many other projects use this method, or something similar. But its quite cool, it you want speed and you know the computation is going to be intensive.

Anyways, here is a snippet using the above decorator-
 CACHE = {}  
 def globalcached(function):  
   @wraps(function)  
   def wrapper(*args):  
     tup = tuple([function] + [args])  
     if tup in CACHE:  
       print('In cache')  
       return CACHE[tup]  
     else:  
       print('Not in cache')  
       temp = function(*args)  
       CACHE[tup] = temp  
       return temp  
   return wrapper  
 def printcache():  
   for x in CACHE:  
     print 'Function: ' + str(x[0])  
     print 'Arguments: ' + str(x[1:])  
     print 'Value: ' + str(CACHE[x])  
     print()  
 @globalcached  
 def myfunction(x, y):  
   """ myfunction docs"""  
   return x ** 2 - y ** 2  
 class SomeClass(object):  
   def __init__(self, x):  
     self.x = x  
   @globalcached  
   def function1(self, otherinstance):  
     """function1 docs"""  
     return otherinstance.x ** 2 - self.x ** 2  



Using it in a Python IDLE session-


 >>> myfunction(8, 3)  
 Not in cache  
 55  
 >>> myfunction(8, 3)  
 In cache  
 55  
 >>> i1 = SomeClass(4)  
 >>> i2 = SomeClass(7)  
 >>> i1.function1(i2)  
 Not in cache  
 33  
 >>> i1.function1(i2)  
 In cache  
 33  
 >>> myfunction(6, 7)  
 Not in cache  
 -13  
 >>> printcache()  
 Function: <function myfunction at 0x026B38B0>  
 Arguments: ((8, 3),)  
 Value: 55  
 ()  
 Function: <function function1 at 0x026CD330>  
 Arguments: ((<__main__.SomeClass object at 0x026C9B10>, <__main__.SomeClass object at 0x026C9B50>),)  
 Value: 33  
 ()  
 Function: <function myfunction at 0x026B38B0>  
 Arguments: ((6, 7),)  
 Value: -13  
 ()  
 >>>   

I hope its clear now. Its not rocket science, but its a clever way of doing things.

Anyways, that's all for now. And btw, thanks to Google for giving us a free membership to ACM for one year. The GSoC welcome package will also be arriving soon, and I am excited :-D


4 comments:

metapundit.net said...

Thanks for the link. Just as an FYI - the general idea of caching the output of a function then returning the cached value on successive calls is called function "memoization". Good luck with your GSoC stint!

Unknown said...

Oh! Did not know that..will have to look more into it. Anyways, thanks for the wishes :-)

Aaron Meurer said...

SymPy already has a global cache that works the same way as this. Take a look at sympy.core.caching. You can use the @cacheit decorator to cache a function.

Unknown said...

I know. Thats where I learnt the technique :-)I wrote this post to kinda explain how it works. Cool functionality! :-D