Tuesday, June 30, 2009

HTTP Servlet Caching Filter

The nice thing about the HTTP protocol is how easy it is to implement an trivial HTTP server. At some point, however, just responding to HTTP requests is not enough and response caching must be introduced.

If you search for "servlet response caching", you will find advice to ensure you use the correct response headers to facilitate HTTP caching and suggestions to use a servlet filter to cache the response. If you are like me, you would continue to search for a way to use both - a servlet filter that caches based on the correct response headers.

With the HTTP protocol so well supported and J2EE so popular, finding a caching servlet filter that adheres to the HTTP spec should be easy, but it isn't. In fact, it is really hard to find any Java implementations that caches based on the HTTP response headers (servlet filter or otherwise). This seemed like an interesting problem that is fairly common, so I spent some time to see how far I could get with a servlet filter that understands HTTP caching.

Despite my general knowledge of the HTTP spec, implementing it is a lot more difficult. For example, the If-Modified-Since and If-None-Match headers are fairly easy to understand, but when you try and implement this logic, things get a little more complicated. In working through this, I realized that there are nearly 20 possible scenarios that need to be handled by the server. The request might not have an If-Modified-Since header, it might have been modified, and it might not have been modified are three states the server must handle for that header. The If-None-Match may not be present or it might or might not match and it might have a '*' tag, which may not may not have an existing entity. You can't just process these one at a time either, but once you write down the edge cases it can all be handled fairly compactly within a precondition check.

Another area that surprised me was the request Cache-Control directives. There are five boolean directives and three with a specified value. All of these directives are fairly easy to understand and are used to determine if the cache can be used and if it needs to be validated. However, that is a lot of variables to manage and combined with the possible server directives, its gets really harry tracking their state. There were many occasions when adding support for a new header/directive that I inadvertently broke an earlier unit test (couldn't have done it without them).

The HTTP spec is fairly clear is some areas, but less so in others. One area that has had various interpretations is entity tags. It is fairly clear how ETags should be used with static GET requests, although I had to digest their implications on caches before I could understand how to use them with content negotiation. However, their recommended use with PUT and DELETE is still a bit of a mystery. When an entity has no fixed serialized format (such as a data record), it has many entity tags (one for each serialized variation and version). So, which entity tag should be used after a PUT or a DELETE that effects all variations?

This get even more complicated when some URLs are used to represent a property of a data record. If the property is a foreign key, the response has no serializable format, its a 303 See Also response. What does a PUT look like when you want to reference another resource? Furthermore, a DELETE of a property, just deletes the property, but the data record still exists and there is still a version associated with it, shouldn't the client be given the new version?

In the end I have a new appreciation for why there are so many interpretations of the HTTP spec and I have a fairly general purpose HTTP caching servlet filter on top of it.
Reblog this post [with Zemanta]

No comments:

Post a Comment