Why is Copilot giving me bad code

Exploring security issues in code generation LLM tools

Vickie Li
3 min readJun 27


Photo by Madalyn Cox on Unsplash

I recently had a conversation with a friend about using GPT for software development. He is a startup founder who is very hands-on with details of his product, and uses GPT to learn new technologies and quickly implement features using languages he is not familiar with.

I am also using Copilot to navigate building software in new languages and quickly familiarize myself with new APIs. LLM code generation tools is really cutting down the time someone needs to learn new skills. Tools like Copilot and ChatGPT are already being used by engineers to write production-level code.

I’m not sure if this is your experience as well, but the recommendations I get from Copilot and scripts from GPT are often wrong. Sometimes I see inefficient solutions, code that has bizarre logic errors, or code that seems right but just doesn‘t work. I have not seen Copilot output outright vulnerable code, but it’s not an impossible leap to assume that it is possible too.

How LLM code generation works

Code completion from LLMs works similarly to text completion: the LLM is trained on sample code gathered from a variety of sources, and the model will predict the next most likely token or symbol based on your local context. For example, Copilot was trained using a lot of public code on GitHub. Say you are implementing a graph cycle-detection algorithm. Copilot will find similar examples from the code it's trained on, and give you suggestions based on what you already have in the file.

LLM generating insecure code?

The issue here is that there is no way of guaranteeing the code generated is secure. Code sources from public repositories, blog posts, and Stackoverflow can contain vulnerabilities and outdated best practices. Learning from these subpar or outdated examples can lead to the model generating ineffective or insecure code. With my own experience using Copilot and GPT for programming, these models are still currently more often wrong than they are right.

Even if quality training code is used, most best practices eventually become outdated, and once secure code, dependencies, and…



Vickie Li

Professional investigator of nerdy stuff. Hacks and secures. Creates god awful infographics. https://twitter.com/vickieli7