I use to serve up a website with several thousand pages. Until recently, all the website content was stored in MySQL and I used as my ORM. Caching was done with the Pylons decorator together with .
While beaker_cache and memcached certainly met the simple objective of caching the first request to any given webpage and allowing subsequent request to the cached page to be served < 300ms, there is a major trade off. Pylons provides no way to invalidate a single memcached object by key. That means if a record in MySQL gets edited, there is no way to either automatically or manually clear the cached webpage generated from that record.
So my choices have been either invalidate everything, or invalidate nothing and wait until the individual memcached key expires for the page to regenerate.
Even if there was a working method-level cache invalidation capability within the Pylons framework, I still would be unsatisfied. To invalidate the caches of all webpages affected by an update to a DB record, I must know all the controllers and methods where that record is being cached, as that is how the key for the memcached object is generated.
Code maintenance for the method level approach will become a hellish nightmare, as suddenly I have to track all these relations between DB records and where methods where they are accessed and cached.
In order to do this efficiently, our data store needs to keep track of the revisions made to a record. MySQL does not have this built in.
We could add a hash field on every one of these MySQL tables that will act as a ETag and get updated every time a record is updated.
However, that would put us back into the realm of code maintenance - we now have to make sure that on each MySQL UPDATE we set a new revision hash. An overcomplication introducing code maintenance issues.
In order to meet some other objectives outside the scope of this post, I recently migrated the content for this app from MySQL to .
Fortunately, Couchdb automatically maintains the revision on each document, and those revisions will work perfectly as . I hacked this caching solution together for Pylons.
"""
The etag_check() decorator checks if an ETag was received from the client
and is in our list of valid ETags. If so, it skips the method
entirely and return a 304 HTTP Not Modified Response.
If there is not a match in the ETags list, the requested method
is called.
Inside the requested method, etag_set() is used to send the ETag
to the client, and add the ETag to the ETags list for future lookup.
The couchdb doc._id is added to the Keys dictionary with the value
of doc._rev. This is so we can quickly invalidate the ETag associated
with a given couchdb document.
To remove a specific ETag, pass it to etag_remove() or
remove any ETag for a specific Couchdb document by passing the doc._id
to etag_remove_by_key()
"""
import pylons
from decorator import decorator
from webob.exc import status_map
import re
IF_NONE_MATCH = re.compile('(?:W/)?(?:"([^"]*)",?\s*)')
ETags = []
Keys = {}
def etag_remove(etags):
"removes an etag to force refresh"
for etag in etags:
if etag in ETags:
ETags.remove(etag)
#print "Removed ETag %s" % etag
def etag_remove_by_key(key):
"removes etag for the given key"
if key in Keys:
etag = Keys[key]
etag_remove([etag])
def etag_exists(etags):
"looks through etags for a match"
for etag in etags:
if etag in ETags:
return True
return False
def etag_cache(_rev, _id):
"send/store the etag for future checking"
response = pylons.response._current_obj()
response.headers['ETag'] = '"%s"' % _rev
if _rev not in ETags:
ETags.append(_rev)
if _id:
Keys[_id] = _rev
#print "ADDED ETAG: %s, %s" % (_rev, _id)
@decorator
def etag_check(func, *args, **kwargs):
"check if the client sent an etag that is in our list "
etags = IF_NONE_MATCH.findall(
pylons.request.environ.get('HTTP_IF_NONE_MATCH', ''))
if etag_exists(etags):
#print "ETag match, returning 304 HTTP Not Modified Response"
pylons.response.headers.pop('Content-Type', None)
pylons.response.headers.pop('Cache-Control', None)
pylons.response.headers.pop('Pragma', None)
raise status_map[304]().exception
else:
#print "ETag didn't match, returning response object"
return func(*args, **kwargs)
This file is saved to myapp/lib/etag_cache.py. Example usage in an Pylons controller using to interface with couchdb follows. The index() method sets/checks the ETags and the save() method clears the ETag for document.
from myapp.lib.etag_cache import etag_cache, etag_set, etag_remove_by_key
class SomethingController(BaseController):
@etag_check
def index(self, id):
"public cached view"
c.doc = Something.get(id)
# send the ETag to the client,
# and add it to our list of valid ETags
etag_cache(c.doc._rev, c.doc._id)
return str(c.doc)
def save(self, id):
"save changes to the doc"
doc = Something.get(id)
doc.save()
# Invalidate the previous ETag stored for this couchdb doc
etag_remove_by_key(c.doc._id)
redirect(url(controller="something", action="edit"))
That is it. Of course without some proxy / cache like between the client and the app, we are not going to see much of a load. Most webservers such as and can be setup as a with caching pretty easily.
Another interesting undertaking would be to and pass any changed doc._id to the etag_remove_by_key() method. This would mean that etag_remove_by_key() would not need to be added anywhere else in the app to invalidate ETags, since you can use the couchdb asynchronous option and continually be updated with any changes.
Thanks for wading through my brain dump, any comments are welcome.