Security Implications of AI-assisted Coding

Security Implications of AI-assisted coding

GitHub quietly announced the technical preview for their new Copilot feature recently. Copilot is an AI-assisted pair programming tool that can be used in VS Code or in a GitHub codespace. The tool uses a machine learning model based on all public GitHub source code, to allow for in-depth suggestions and code completion as you code. This isn’t the first product like this of its kind, as Microsoft offers its own IntelliCode code completion tool, but what makes Copilot exciting is that it’s based off the very promising OpenAI Codex system. This is undoubtedly the beginning of what many programmers have been hoping for – automated coding! Developers will be excited about the fact that GitHub claims the tool can assist with writing unit tests, because who wants to write those? 

Copilot is definitely exciting. Look at what the tool comes up with by just creating the name for the function “send_tweet_with_image” in Python: 

This new technology does present some interesting questions for application security. For one, how do we know that the code that is being suggested is secure, if the model has been trained with some vulnerable code? Since the model has been trained on public source code, it’s very possible that it may have issues. Security researcher ‘0xabad1dea’ recently performed a risk assessment, and ironically as their twitter handle reads, Copilot could give us a few bad ideas.   0xabad1dea begins her analysis by looking at one of the most common operations when it comes to web applications, validating a Json Web Token. Typically, this would involve checking that the JWT is well formed, checking that the signature of the JWT is valid, and verifying the standard claims in the payload. This is an involved process, and there are many libraries to handle this so you don’t have to reinvent the wheel, such as this one  So what does Copilot generate if you prompt it with a function named “validateUserJWT”? The result is a little concerning:

Any developer would realize that this function is incomplete.  

Before going further, it’s helpful to understand the type of machine learning model that Copilot is using. There are two different types of learning methods to create machine learning models: supervised and unsupervised. With supervised learning, a model is trained by providing it inputs, observing the outputs, and making the necessary corrections to achieve the desired outputs. Supervised learning is commonly associated with the task of classification, such as classifying shapes or facial expressions. Classification is also commonly referred to as discriminative modeling, because the input is being processed through the model to map it to a classification. With unsupervised learning, there is no correction made to the model, and the model is constructed by simply summarizing the patterns from the input. Unsupervised models may be used to create or generate new examples in the input distribution and are thus sometimes referred to as generative models.  

So why is this important? We mentioned earlier that Copilot is based on OpenAI Codex, which uses a Generative Pre-trained Transformer (GPT-3) language model. Since the model is generative, you may not get the same output every time a new example is generated. 0xabad1dea states: 

Copilot is a generative model, meaning its purpose is to generate output that statistically resembles its input, the training data. (GitHub has not provided a complete list of training data, but it appears to be roughly everything publicly available on GitHub.) The goal is not to exactly reproduce the input, because you definitely do not need a machine learning system for that.  

Since we can’t expect the same output every time new code is generated, we will get varying results when it comes to how secure the code is, or if it even compiles. In fact, GitHub makes no guarantees about the code that Copilot generates: 

GitHub Copilot doesn’t actually test the code it suggests, so the code may not even compile or run. GitHub Copilot can only hold a very limited context, so even single source files longer than a few hundred lines are clipped and only the immediately preceding context is used. And GitHub Copilot may suggest old or deprecated uses of libraries and languages. You can use the code anywhere, but you do so at your own risk. 

As an example of this, 0xabad1dea prompts Copilot multiple times to create a function that calculates the current phase of the moon. As expected, the functions appear to have different output upon each iteration.  

0xabad1dea uncovers another common security blunder, by prompting Copilot with a few lines of PHP and showing that the tool includes untrusted input in a SQL query resulting in SQL injection. 

We had the opportunity to try the tool here at Forward and we also came up with some interesting examples. By prompting the tool with the following signature of a function in JS `function encrypt(str)`, the tool suggests the following: 

The encryption method used appears to be a Caesar cipher, which is obviously not suitable, but this could get past an unknowing developer.  

When we prompt the tool with a more descriptive signature indicating that we want to use AES for the algorithm, the output seems to be more reasonable:  

In another example, we prompted Copilot with `var corsOptions` to see how the tool would handle the common options that are used when configuring an ExpressJS web application. The tool completed variable instantiation by setting it to an object with the origin value set to the wildcard ‘*’: 

This configuration setting whitelists requests from all origins and opens the application up to potential cross-origin based attacks. 

One more example to close things off. By prompting with a comment to execute a system command from input passed in a URL parameter, the tool simply takes whatever command is passed in the URL parameter, executes it as a shell command, and returns the output: 

If a developer takes this code as it is and does not use any preventative measures such as validation on the input, this can result in a command injection attack. 

As you can see, there is some room to be concerned about AI-assisted code completion with respect to security. Tools such as these can be very powerful in improving the speed of software development, but we highly encourage developers to review all generated code to ensure that security is not neglected. Although GitHub Copilot does present some security issues, we look forward to seeing how they are addressed, and overall, we’re very excited about the possibilities that this brings for software development!