Monday, August 30, 2010

A JSON API for Posterous

I recently became mildly obsessed with Europopped.com, a blog that highlights both really catchy & horribly tacky music videos from all over Europe, and I've started thinking up mashups to fuel my obsession. So, I looked up the API for Posterous.com, the blogging platform that powers Europopped, and discovered that its API is not quite as mashup-friendly as I hoped. They do offer an API for retrieving public feeds without authentication -- the first thing I looked for -- but the API result output is a custom XML format -- not optimal for client-side mashups. I was expecting to find an API output that was either ATOM-based, so I could pipe it through existing Feed->JS proxies like the Google AJAX Feeds API, or even better, an API output in JSON with support for callback parameters. The documentation indicates the API is still under development, however, so hopefully they will soon go down one or both of those routes.

But in the meantime, I decided to remedy their lack of a JSON output with a quick App Engine app to proxy API requests, convert the XML to JSON, and return it.

First, the end result:

If I wanted to use the Posterous API to get the last 50 posts from the Europopped blog, I'd fetch this URL and it would return XML for each post:

http://posterous.com/api/readposts?hostname=europopped&num_posts=50

To use my proxied JSON API to get those 50 posts, I'd fetch this URL:

http://posterous-js.appspot.com/api/readposts?hostname=europopped&num_posts=50
Tip: Install the JSONView extension for Chrome to see the result pretty-printed.

Notice that the only difference is the domain name -- I wanted the proxied API to mirror the actual API as much as possible, to make it easy to figure out the URLs to construct from the documentation, and to make it easy to port to an actual JSON offering from Posterous in the future, on the assumption that actually happens. :)

If I want to get the same JSON wrapped in a callback, to use it inside a webpage, I'd fetch this URL:

http://posterous-js.appspot.com/api/readposts?hostname=europopped&num_posts=50&callback=loadPosts

Now, the code behind it:

I've checked in the two files it took to write the proxy on App Engine for Python, and I'll step through them here.

First, I set up a URL handler to direct all /api requests to my api.py script:

application: posterous-js
version: 1
runtime: python
api_version: 1
handlers:
- url: /api/.*
  script: api.py

Then, in api.py, I directed all requests to be handled by ApiHandler, a webapp.RequestHandler class.In that class, I reconstruct the URL for the Posterous API request:

  url = 'http://posterous.com' + self.request.path + '?' + self.request.query_string

Then I check memcache to see if I've already fetched that request recently (in last 5 minutes):

   cached_result = memcache.get(url)
    if cached_result:
      dict = simplejson.loads(cached_result)
    else:
      dict = self.convert_results(url)

If I didn't find it in cache, then I'll call a function to fetch the URL and convert specified top-level tags in the XML to JSON:

  result = urlfetch.fetch(url, deadline=10)
  if result.status_code == 200:
      dom = minidom.parseString(result.content)
      errors = dom.getElementsByTagName('err')
      if errors:
        dict = {'error': errors[0].getAttribute('msg')}
      elif url.find('readposts') > -1:
        dict = self.convert_dom(dom, 'post')
      elif url.find('gettags') > -1:
        dict = self.convert_dom(dom, 'tag')
      elif url.find('getsites') > -1:
        dict = self.convert_dom(dom, 'site')

I convert from XML to JSON using the minidom library, converting each tag to a JSON key and recording the text data or CDATA as the JSON value. This technique means that I don't actually convert any nested XML tags, but in the Posterous API, that only means that my output is missing the comments information for posts, which is the least interesting information for me.

 def convert_dom(self, dom, tag_name):
    dict = {}
    top_nodes = dom.getElementsByTagName(tag_name)
    nodes_list = []
    for top_node in top_nodes:
      child_dict = {}
      for child_node in top_node.childNodes:
        if child_node.nodeType != child_node.TEXT_NODE:
          child_dict[child_node.tagName] = child_node.firstChild.wholeText
      nodes_list.append(child_dict)
    dict[tag_name] = nodes_list
    return dict

Finally, after getting the JSON representing the API call, I output it to the screen with the appropriate mime-type and wrap it in a callback, if specified:

      json = simplejson.dumps(dict)
      memcache.set(url, json, 300)
      callback = self.request.get('callback')
      self.response.headers['Content-Type'] = 'application/json'
      if callback:
        self.response.out.write(callback + '(' + json + ')')
      else:
        self.response.out.write(json)

It's a quick hack and one that I hope to see replaced by the official Posterous API, but it's cool that it was so easy to do and now I can move on to actually making the Europopped mashup of my dreams. :)

No comments: