Wednesday, June 21, 2023

Best practices for prompting GitHub Copilot in VS Code

I've been using GitHub Copilot for the last six months in repositories large and small, old and new. For me, GitHub Copilot is the pair programmer that I never knew I wanted. It gives me great suggestions most of the time, but it doesn't give me the social anxiety I feel when a human is watching me code.

In this post, I'm going to share my best practices for prompting GitHub Copilot, to help all of you get great suggestions as often as possible. At a high level, my recommendation is to provide context and be predictable.

For more details, watch my YouTube video or keep reading.

Provide context

GitHub Copilot was trained on a huge number of examples, so for a given line of code, it may have many possible predictions for the next line of code. We can try to narrow down those possibilities with the context around our line of code.

  1. Open files: LLMs like GitHub Copilot have limited "context windows", so it can't keep an entire codebase in its window at once. But GitHub still wants to give Copilot some context, so if your editor has some files open, GitHub may send the contents of those files to Copilot. I recommend keeping files open that are the most relevant to the code you're writing: the tested file if you're writing tests, an example data file if you're processing data, the helper functions for a program, etc.

    Screenshot of three tabs open in VS Code

    As I showed in the video at 10:02, sometimes there are a few files relevant to the context: one that demonstrates the format of the new file (e.g. the new views.py has a similar format to the existing views.py), and one that provides the data for the new file (e.g. the new views.py uses the classes from models.py).

  2. Comments: You can write comments at many levels: high-level file comments that describe the purpose of the file and how it relates to the rest of the project, function-level comments, class-level comments, and line comments. The most helpful comments are ones that clear up potentially ambiguous aspects of code. For example, if you're writing a function that processes lists in Python, your function-level comment could clarify whether it returns a brand new list or mutates the existing list. That sort of distinction fundamentally changes the implementation.

    def capitalize_titles(book_titles):  
    
      """Returns a new list of book_titles with each title capitalized"""
    
  3. Imports: Many languages require you to explicitly important standard library or third-party modules at the top of a file. That information can be really helpful for Copilot to decide how it's going to write the desired code. For example, if you're using Python to scrape a webpage, then importing urllib3 and BeautifulSoup at the top will immediately guide Copilot to write the most appropriate code.

  4. Names: I have always been a fan of descriptive names, as I was brought up in a Java household, but now I have additional motivation to use fairly descriptive names: more context for Copilot. For example, if your code parses a JSON file and stores the result, you might use a variable name like data. However, if you know that result is actually a list, and each item describes a book, a more helpful variable name would be books. That indicates the variable likely holds a sequence-type object, and that each object inside represents a book.

    books = json.loads(open('books.json'))
  5. Types: If you are coding in a language where typing is optional (like Python or JavaScript), you may want to consider adding types, at least to parameters and return values. That can help narrow down the possible lines of code for completing a block of code.

    def capitalize_titles(book_titles:list[str]) -> list[str]:

As I showed in the video at 1:46, I sometimes change the names or types of Copilot generated code, to give it more context for the code coming afterwards.

Many of the above practices are generally beneficial for your codebase regardless of whether you use GitHub Copilot, like descriptive names and type annotations. As for comments, I keep the block-level comments in my code but remove line-level comments that seem redundant. Your final code has two audiences, the computer interpreting it and the humans reading it, so keep the practices that produce code that is both correct and clear.

Be predictable

LLMS like GitHub Copilot are very good with patterns. You show it a pattern, and it very much wants to keep the pattern going. Programming is already inherently pattern-filled, due to having to follow the syntax of a language, but there are ways to make your programs even more "pattern-ful".

  1. Variable naming conventions: Use a naming scheme that makes it easier to predict the way a variable should be used, especially if that naming scheme is already a convention for the language. For example, the Python language doesn't technically have constants, but it's still conventional to use all caps for a variable that shouldn't change, like PLANCK_CONSTANT. Another convention is to append an "s" suffix for variables that hold array-like objects. You may also have conventions for your particular codebase, like always starting getter functions with "_get_". Copilot will learn those conventions as well, if you are consistent with them.

  2. Structured software architecture: Just like humans, Copilot doesn't do that great with spaghetti code. What functions goes where? What function comes next? It really helps, for you and Copilot, to organize your code in a predictable manner, especially if you can use a popular framework for doing so. For example, the Django framework for Python always modularizes code into "models.py", "views.py", "urls.py", and a "templates" folder. When Copilot is inside one of those files, it has a better idea of what belongs there.

Use linters

My final tip isn't so much about how to use Copilot itself, but more about how to use VS Code along with Copilot. VS Code can basically become another pair programmer for you, by installing linting tools and enabling them to work in real-time. That way, if Copilot does give a bad suggestion (like an old method name or an incorrect chain of function calls), you'll immediately see a squiggle.

Screenshot of Python code with squiggly line under one line

Then you can hover over the squiggle to see the error report and check the Intellisense for the functions to see their expected parameters and return values. If you're still not sure how to fix it, search the relevant documentation for more guidance. If you have Copilot chat enabled, you can even try asking Copilot to fix it.

A linter won't be able to catch all issues, of course, so you should still be running the code and writing tests to feel confident in the code. But linters can catch quite a few issues and prevent your code from going down a wrong path, so I recommend real-time linters in any Copilot setup.

For linting in Python, I typically use the Ruff extension along with these settings:

"python.linting.enabled": true, 
"[python]": { 
  "editor.formatOnSave": true, 
  "editor.codeActionsOnSave": { 
    "source.fixAll": true 
  } 
}

Learn more

Here are some additional guides to help you be productive with GitHub Copilot:

No comments: