Sunday, March 31, 2013

Source "Snapshots"

In our small team of 20 engineers at Coursera, we've been talking a lot lately about what our approach to the world of "open source" should be, both now and long term.


Why open source?

We're all interested in the idea of open-sourcing parts of our code, and we're motivated by a few main reasons:

  • We are heavy users of open-source code in our Coursera codebase, and we would like to give back. Like many startups, we're built on top of a plethora of open-source technologies and libraries - Scala, Play, PHP, Django, Python, Backbone, RequireJS, plus numerous third party libraries built on top of those. One way of giving back is to contribute to the libraries themselves, and we've done that via bug fixes, documentation suggestions, and talks on how we use them. But we would also like to give back by showing them what we've built with their libraries, to serve as examples for other users of the library.

  • We would like to be able to point to our code as a reference in public settings. For example, if we post a bug report about a particular library, we'd like to be able to point to our whole code that uses it, to give the context. Maybe we could elicit a more informed response that way, and someone could suggest a better way to accomplish what we're trying to do. Plus, when I give talks about a particular feature, I'd love to be able to finish by pointing people to the full code for that feature, so that they don't have to guess at the functionality from the snippets I managed to fit on the slides.


Why not open source?

As much as we are motivated to open-source parts of our code, we also have our reservations:

  • We do not have the engineering resources to maintain both a private codebase and public, open-source, community-fed codebase. There have been many interesting discussions lately about the "burden of open source", like this post from Divya after she deleted a popular github repository, and this talk by the Twitter Bootstrap co-creator where he compares the Bootstrap project to a cute puppy that becomes an old, fat, unwanted dog.

    They both realized that when you open source something, you're also creating a community that requires nurturing and a product that requires maintenance and upgrades. Most open source projects do not come close to approaching the popularity of Bootstrap, but even an open source project with a handful of forks and pull requests requires resources and time. For example, my lscache library with only 35 forks has had 5 pull requests over the course of its 2 year existence, and I can distinctly remember putting lscache maintenance on my TODO list, procrastinating it, and feeling bad.

    We're in startup mode at Coursera, and we need to budget our time for internal needs first, without worrying about the potential time it can take to maintain an open-source codebase.

  • We do not always feel comfortable open-sourcing all the parts of a feature. For example, many of our Backbone JS apps communicate with a REST API. We're comfortable with developers seeing our Backbone code, since our JS is already inherently public (in an obfuscated way), but we are hesitant about the server-side REST API side. Maybe it would reveal secrets about upcoming or unknown features, or hey, maybe it has security holes that we haven't yet discovered. That code could also contain secrets like API keys, salts, and hashes, that we would have to carefully remove via a scrubbing script.

  • We do not always have the time to package a feature so that it's "ready-to-run". When you find an open source repository for a particular library or app that you actually want to use, you usually scroll down to find the "Installation instructions" and you hope that it's easy to get it working in a few minutes. As it turns out, it's not that easy to extract bits of a codebase and make them easy for anyone else to get running in their own environment. For example, at Coursera, we have our own custom build tool for our frontend code, we have custom templated configuration files that are used by that tool, and we have several Coursera-specific libraries that we use across all of our frontend apps. Yes, we could either open source all of those things, or figure out how to make our code work without depending on them, but either of those approaches would take significant time and resources.


An Approach: Source Snapshots?

But, as I started off by saying: we really want to give back to open source in some way. We've been mulling over our motivations and our reservations, and I think I've come up with an approach that I can use for many bits of our code, at least in the short term while we're low on resources: "snapshots".

A "snapshot" is a dump of some part of our codebase, taken at a point in time and copied into a public repository. It may be an incomplete dump (missing dependencies or server-side, e.g.), it would not necessarily be runnable, and it would have no guarantees of being up-to-date or ever being updated in the future.

The snapshot would still be useful, for developers looking to see how we approached some aspect in the codebase, and also for us to refer to in talks and blog posts. It would also be a way for us to dip our toes into the open source waters, and to see what developers are most interested in. If a particular snapshot got a lot of attention, then maybe one day, when we felt we had the resources, we would turn it into an actual living open-source library and spend the time needed to nurture that community. A snapshot can serve almost as an MVP, if you think of open source repositories as new products/features.

As an example, I've open sourced a snapshot of the Backbone JS for our forum rewrite, along with a blog post about how it works. If this goes well, we hope to snapshot more of our Backbone apps in the future, as well as the JS UI libraries that we've built on top of Bootstrap and Require.

So that's the hope: we can avoid the burden of open source while satisfying our desire to share our learnings. Let's see how this works. ☺

No comments: