Tuesday, July 23, 2013

What to look for in a software engineering culture

When I chat with new programmers that are interviewing for their first ever software engineering job, I encourage them to try to figure out if they'll be in a healthy engineering culture at the prospective job by asking the right questions. For a comprehensive run-down of what a great engineering team is made up of, I typically tell them to read Team Geek, a book by my former colleagues, but for a shorter list, I've written up this post with my own thoughts based on my experiences so far at Google and Coursera.

It's important to find a job where you get to work on a product you love or problems that challenge you, but it's also important to find a job where you will be happy inside their codebase - where you won't be afraid to make changes and where there's a clear process for those changes.

Here are some of the things that I look for:

Code Reviews

Are code reviews a regular part of their process?

Before my first job at Google, I had never experienced a code review before. Now that I have, I never want to submit code to a shared codebase without a review. The point of a code review isn't for another engineer to spot a flaw in your code (thought that can happen), the point is for them to make sure the code entering the codebase is following your team's conventions and best practices - to make sure that code "belongs". Code reviews are a great way for new engineers to learn how to work inside a new codebase, and also a way for existing engineers to discover best practices their colleagues have discovered in other projects, or learn about functionality they didn't realize they had.

Using the Mondrian tool at Google, we had a very clear code review process, where a changelist could not actually be submitted until the reviewer gave the "approval." Using Github's more lightweight code reviews at Coursera, we've had to come up with our own conventions on top of it, where the reviewer will say "Merge when ready" when they're happy or the reviewee will say "Please take another look" if they want a second review.

Regardless of the particulars of the process, you want to be on a team where code reviews are required for every bit of code, and the reviews are there to make everyone's code fit better together.

Coding Conventions

Do they follow standard conventions for every language in their stack?
Do they have their own conventions for the frameworks they use?

Google has publicly documented coding conventions for all their languages and internally, they actually have a formal process to verify that an engineer is well-versed in a particular language's conventions - you submit a 300-line code review in that language to a designated reviewer, and if it's approved, you've earned "readability" in that language and that grants you code submission rights. That process was quite useful as a learning tool for me, and it was a proud moment when I earned my JS readability badge, but that may be a bit much for most teams.

At Coursera, we document what coding conventions we follow for each language, and we try to use the industry standard if one exists, and then we encourage engineers to install plugins that will check for violations (like SublimeLinter with PEP8). We also have a build tool that uses JSHint to check for major JS syntax no-nos, and our Jenkins build tests for PEP8 violations. Besides the languages we use, we also document conventions for frameworks like Backbone.JS and come up with design guidelines for our REST APIs.

Basically, any time that there is a technology that can be used in multiple ways, and a team has decided upon a particular way to use it, that should be documented as a convention. As an engineer joining the team, you want to know that you'll have conventions to help you decide how to do what you do, so that your code will become part of a cohesive codebase.

Comments

Is complex code commented?
Are there readmes describing higher-level systems in the codebase?

We tend to think that our code is obvious, but, as the writers of it, we're obviously biased. That's why I think an important part of a good code review is for the reviewer to be honest when they're reading code that doesn't make immediate sense, and for them to ask for a comment to be added. I also find that sometimes higher-level overviews are needed for systems that span multiple parts of the codebase, and those may be particularly useful for new engineers that join and find themselves tasked with adding to an existing system.

Tests

Are there any tests?
What kind of tests?
How often are the tests run?
Is there a testing requirement for new features?
Is there a testing engineer or testing team?

As it happens, many codebases start off as prototypes, with the engineers behind them thinking "Ah, we don't need to test this thing, we'll just throw it away." But then those untested codebases become the real product, and the longer a codebase goes untested, the harder it is to introduce tests. When we found ourselves in that situation at Coursera, we held a "Testathon", locking ourself in a room until we figured out how to test all the different parts of our codebases, and had written a few example tests for each part. Then, for every feature going forward, we made the question "are there tests?" a regular part of the code review process.

There are different approaches to testing that are all valid - some teams might practice TDD, where they write the tests before the regular code, some teams may require tests as part of writing code (but in whatever order the engineer prefers), and some teams may hire testing engineers and entrust them with writing the tests and keeping the tests stable.

The thing to look for is whether a team cares about testing at all, or if they simply rely on engineers writing "safe code". You do not want to join a team that expects you to magically write untested code that doesn't break other parts of the system that you've never seen and that will never be broken by another engineer's future code. You want to join a team that knows the value of tests, and is doing what they can to make that a part of the culture.

Release process

How often do they deploy their code?
How fast is the process?
How fast is the rollback?
Who's in charge of a deploy?

When I was working with the Google Maps team, we had weekly release cycles, and there'd be quite a few changes in each release. The team was big enough that there was a designated "build cop" who oversaw each release, checking the tests passed, deciding which changelists were worth a cherry pick, and monitoring the roll out of the release across the servers.

At Coursera, we wanted to be able to release smaller changes, more often - several times a day, if possible. It was hard at first because our deploys took 45 minutes - and so did our rollbacks! Fortunately, our infrastructure team came up with a new user-friendly deploy tool that takes only a few minutes to roll new code out to the AWS servers or roll it back. I used to have mini heart attacks every time I deployed, fretting over how horrible it would be if I accidentally took down the site, but now, I deploy happily because I know I can take it back quickly. Of course, we also have tests that are automatically run before deploys, and we always check the status of those first.

The deploy process can vary greatly across companies, and even in teams within a company, but the important thing is that there *is* a process, and ideally it's one that will allow you to spend most of your time writing code, not releasing it.

Post-mortems

Shit happens. When shit does happen, does the team take steps to prevent that shit from happening in the future?

At Google, I learnt the value of post-mortems, documents that detail the timeline of an issue, the "things that went right", the "things that went wrong", and assigned action items to prevent it that were actually reasonable to get done. We wrote up post-mortems for everything in the company, not just in engineering (I wrote up an epic post-mortem after a badly worded tweet landed me on TechCrunch, for example). We do the same at Coursera, and we share the post-mortems with the entire company and sometimes even with our university partners.

It's common-place these days for public-facing companies to put post-mortems on their blogs, and to me, it's a sign that the companies believe in being transparent to their users, and that's a good thing. You want to be at a company that regularly writes up post-mortems, because then you'll know that they care about learning from their mistakes and not just sweeping them under a carpet.

Stack Stability

How often are they changing their stack? Too often?

It's good to try out new technologies that can vastly improve a codebase, but it can also be a big time suck, and a sign that the focus is not in the right place. Plus, every time a team tries out a new technology, it almost always leaves a bit of code using the old stack, which leaves the codebase as a stack of legacy codebases that most engineers don't know how to touch.

Ideally, when you join a team, they will be able to tell you something like "We use X, Y, and Z. We are evaluating W due to repeated issues with X, and coming up with a plan to see whether the migration would be worth it." If there is a "W", that "W" is hopefully a tried-and-true technology, not the latest thing to hit the front page of HN, otherwise your team will be the one that has to learn all the quirks and drawbacks of "W" from scratch.

What else?

There are also things to look for in the broader company culture too, like communication, transparency, prioritization, humility, but that's a whole other post.

Now, don't expect every engineering culture to have gold stars on every bullet point - but, look for signs that they are headed in that direction, that they do sincerely want to make their codebase a place where everyone can enjoy coding and learning from each other.

What did I miss? Let me know in the comments what you look for!

No comments: