Are the robots coming for your job? You’ve heard this question before (we’ve even asked it before). But in 2022, with AI increasingly ubiquitous in the lives of most coders, the issue feels more pressing.
Given the explosive progress AI has made over the last few years, it might seem like only a question of time (or data) until its mastery over complex, nuanced problems clearly outstrips our own. From Go to poker to StarCraft II, AI has bested humans in plenty of arenas where we were once uncontested champions. Is the same true of coding?
Programs like GitHub Copilot have already won widespread adoption, and organization-wide investment in AI has exploded since 2020, expanding developers’ access to and understanding of intelligent automation tools. In this environment, will code written by AI replace code written by humans?
New numbers indicate it already is. Since the program’s launch in June 2021, more than 35% of newly written Java and Python code on GitHub has been suggested by its Copilot AI. To put this in perspective, GitHub is the largest source code host on the planet, with over 73 million developers and more than 200 million repositories (including some 28 million public repositories).
Coding a tool or service, of course, is fundamentally different from playing a game. Games unfold in accordance with fixed rulesets, while codebases are dynamic: they must evolve as new technologies emerge and adapt to meet new business needs. And it’s not as if Copilot has led to a 35% drop in demand for human programmers: demand for software developers remains high after doubling in 2021.
Still, if AI is writing more than a third of the fresh code for some of the most popular languages on the world’s largest development platform, the AI coding revolution isn’t imminent; it’s already here. In this piece, we’ll explore what AI programs are out there and how developers are using them. We’ll look at their current limitations and future potential. And we’ll try to unpack the impact of these programs on developers and the software industry as a whole.
Based on functionality, there are three species of AI coding tools currently on the market:
Bug-hunting tools and AI pair programmers like Copilot are steadily becoming more popular and more powerful, while emergent technologies like MISIM still have a way to go before they become a seamless part of most developers’ working lives. Let’s break these tools down.
Tools that automatically identify bugs
Tools that automatically identify bugs represent one of the most successful applications of AI to programming. These programs not only enhance code safety and quality; they allow developers to focus more time and energy on writing business logic that improves the end product, rather than scanning their code for possible errors and vulnerabilities. Amazon CodeGuru, for example, helps AWS BugBust participants “find [their] most expensive lines of code”—the bugs that drain resources and allow tech debt to flourish.
Other programs focus on scanning code for potential security risks. GitHub’s GitGuardian scans source code to detect sensitive data like passwords, encryption keys, and API keys in real time. Software failures due to relatively simple mistakes like these cost over $2 trillion annually in the US alone.
Tools that produce basic code by themselves or can autocomplete code for programmers
Automatic code generators and AI pair programmers fall into another category: tools that can produce code independently or autocomplete a human programmer’s code. For example, Facebook’s Aroma is an AI-powered code-to-code search and recommendation tool that saves developers time by making it easier to draw insights from huge codebases.
Meanwhile, a new open-source AI code generator called PolyCoder was trained not only with code files, but also by reading questions and answers on Stack Overflow. The creators describe our corpus as a rich source of natural language information that reveals how real people use, troubleshoot, and optimize software.
At the cutting edge of more research-oriented projects is DeepMind’s AlphaCode, which uses transformer-based language models to generate code. AlphaCode does as well as most humans in coding competitions, ranking among the top 54% of participants “by solving new problems that require a combination of critical thinking, logic, algorithms, coding, and natural language understanding,” according to the company. DeepMind principal research scientist Oriol Vinyals told The Verge that AlphaCode is the latest product of the company’s goal to create a flexible, autonomous AI capable of solving coding problems only humans are currently able to address.
AlphaCode has achieved impressive results, but there’s no need to start watching your back just yet: “AlphaCode’s current skill set is only currently applicable within the domain of competitive programming,” reports The Verge, although “its abilities open the door to creating future tools that make programming more accessible and one day fully automated.”
OpenAI’s GPT-3 is the largest language model yet created. With 175 billion parameters, it can generate astonishingly human-like text on demand, from words to guitar tabs to computer code. The API is designed to be straightforward enough for almost anyone to use, but also flexible and powerful enough to increase productivity for AI/ML teams. More than 300 applications were using GPT-3 only nine months after its launch, with the program generating 4.5 billion words every day, per OpenAI.
In 2020, OpenAI and end-user developers had noticed that GPT-3 could autocomplete code in addition to sentences. GPT-3 had been trained on billions of documents scraped from the web, including pages where programmers had posted their code, so it had learned patterns not just in English but also in Python, Java, C+, R, HTML, and on and on. This realization sparked OpenAI’s interest in creating a code-writing AI: Copilot, built with GitHub and first released in the summer of 2021.
Ask most developers for the gold standard in AI pair programming, and they’ll mention Copilot. Trained on public code, Copilot makes suggestions for lines of code or entire functions directly in the editor. Users can explore alternative suggestions, accept or reject Copilot’s input, and edit suggested code manually when required. Importantly, Copilot adapts to users’ edits to match their coding style, increasing the value and relevance of the program’s suggestions over time. Since the program’s launch in June 2021, more than 35% of newly written Java and Python code on GitHub has been suggested by Copilot.
Copilot, writes Clive Thompson in Wired, offers “a first peek at a world where AI predicts increasingly complex forms of thinking.” Despite errors “ranging from boneheaded to distressingly subtle,” Copilot has earned the wide-eyed approval of plenty of developers. “GitHub Copilot works shockingly well,” says Lars Gyrup Brink Nielsen, an open-source software developer and GitHub Star. “I will never develop software without it again.”
Mike Krieger, cofounder and former CTO of Instagram, calls Copilot “the single most mind-blowing application of ML I’ve ever seen,” comparing the program to “a team member who fits right in from the first time you hit /Tab/.”
Copilot is also an invaluable resource for people who want to expand and deepen their coding knowledge (and who doesn’t, really?). “I’m learning TypeScript by hacking through another extension,” says GitHub Star Chrissy LeMaire. “When my previous development experience fails me, I now use GitHub Copilot to learn how to do what I need!” Thompson, the Wired journalist, experimented with asking Copilot to write a program to scan PDFs, starting with a plain-text comment:
# write a function that opens a pdf document and returns the text
In response, Copilot wrote:
def pdf_to_text(filename): pdf = PyPDF2.PdfFileReader(open(filename, “rb”)) text = “” for i in range(pdf.getNumPages()): text += pdf.getPage(i).extractText() return text
This code not only fulfilled the request exactly; it made use of an open-source Python code—PyPDF2 —Thompson had never even heard of: “When I Googled it, I learned that PyPDF was, indeed, designed specifically to read PDF files. It was a strange feeling. I, the human, was learning new techniques from the AI.”
Copilot’s reception hasn’t been universally glowing. Some developers have raised concerns that Copilot could “effectively launder open-source code into commercial uses without proper licensing,” violate copyrights, and regurgitate developers’ personal details, according to Fast Company. But more developers see Copilot as “the next step in an evolution that started with abstracting assembly languages.” Says Kelsey Hightower: “Developers should be as afraid of GitHub Copilot as mathematicians are of calculators.”
OK, so AI can write code, spitting out patterns or producing tools and solutions it’s seen before. But it doesn’t really know what that code means, right?
Well, a consortium of researchers from Intel, MIT, and Georgia Tech have developed a new machine programming system called machine inferred code similarity (MISIM). Much as natural language programming (NLP) can recognize the meaning of text or spoken words, MISIM can learn what a piece of software is supposed to do by examining code structure and syntactic differences between the software and other code that behaves similarly.
Language-independent MISIM has revolutionary potential: it can read code as it’s written and automatically generate modules to check off common, time-consuming tasks. The code that automates cloud backups, for instance, is often the same across programs, as is the code used in compliance processes. Conceivably, MISIM could shoulder responsibility for processes like these, leaving developers free to focus on other work.
Intel’s goal is to build MISIM into a code recommendation engine to help developers working across Intel’s various architectures: “This type of system would be able to recognize the intent behind a simple algorithm input by a developer and offer candidate codes that are semantically similar but with improved performance,” said Intel in a press release.
From improving code quality to tuning out distractions, programs like AlphaCode and Copilot make developers more productive, happier in their work, and more available for higher-order tasks.
Keep developers in the flow and focused on higher-order work
Developers are keenly aware that context-switching and distractions like chat notifications and email pings are highly disruptive to their workflows. As much as 20% of developers’ time is spent on web searches, for example.
One of the primary benefits of AI coding tools is that they can keep developers focused, issuing suggestions and recommendations without jerking people out of their flow states. AI tools that minimize distraction help developers carve out uninterrupted time, making them more productive but also happier and less stressed by their jobs. An internal GitHub investigation found that developers stood an 82% chance of having a good day when interruptions were minimal or nonexistent, but only a 7% chance of having a good day when they were interrupted frequently. In helping developers carve out more uninterrupted time, AI tools also increase coders’ availability for complex, creative problem-solving.
These AI programs don’t replace humans; they increase our productivity and allow us to devote more resources to the kind of work AI is less able to tackle. Which brings us to our next question: What are the limitations of these AI tools?
As we’ve previously explored on our blog, AI coding tools still have plenty of limitations. Broadly speaking, their ability to create new solutions is limited, as is their capacity for understanding the complexities of modern coding—at least for now.
They produce false positives and security vulnerabilities
As many developers are already painfully aware, AI programs designed to catch bugs in code written by humans tend to produce a huge volume of false positives: that is, things the AI identifies as bugs when they’re not. You might argue that, from the perspective of information security, it’s better to produce a ton of false positives than a few potentially devastating false negatives. But a high number of false positives can negate the AI’s value by obscuring the signal in the noise. Plus, security teams become “overwhelmed and desensitized” in the face of too many false positives.
Consider NPM audit, a built-in security feature in Node package manager (NPM) intended to scan projects for security vulnerabilities and produce reports detailing anomalies, potential remediations, and other insights. That sounds great—but a “deluge” of security alerts that overwhelms developers with distractions has made NPM audit a classic example of what’s been called “infosec theater,” with some NPM users saying 99% of the possible vulnerabilities flagged are “false alarms in common usage scenarios.” The prevalence of false positives underscores the fact that AI still struggles to grasp the complexity of contemporary software.
In addition to a high volume of false positives, AI programs can also produce security vulnerabilities. According to Wired, an NYU team assessing how Copilot performed in writing code for high-security scenarios found that 40% of the time, Copilot wrote software prone to security vulnerabilities, especially SQL injections: malicious code inserted by attackers.
They still require human input and direction
As things stand, tools like Aroma and GPT-3 can produce straightforward pieces of code—but only when directed by humans. As Technology Review puts it, “GPT-3’s human-like output and striking versatility are the results of excellent engineering, not genuine smarts.”
Given a tightly controlled problem, these programs can produce impressive solutions, but they’re not yet at the point where, like a skilled human developer, they can examine a design brief and work out the best approach from there. Even Copilot is still “more a hint of the future than the future itself,” writes Thompson in Wired.
Aesthetics is another arena where AI tools still fall short of human capabilities, which is to say the front end is often neglected in favor of the back end during the AI/ML lifecycle.
They absorb and spread harmful biases
AI programs are tools made by humans, prone to the same constraints and flaws as humans ourselves. When the single word “women” was used to prompt GPT-3 to write a tweet, the program generated gems like, “The best female startup founders are named…Girl.” (Nice.) “GPT-3 is still prone to spewing hateful sexist and racist language,” sighed Technology Review. DALL-E, which lets users generate images by entering a text description, has raised similar concerns. And who could forget Microsoft’s ill-starred AI chatbot Tay, turned into a racist, misogynistic caricature almost literally overnight on a rich diet of 2016 Twitter content?
These revealing episodes underscore the importance of prioritizing responsible AI: not to keep the robots from taking our jobs, but to keep them from making the world less inclusive, less equitable, and less safe. As the metaverse takes shape, there are growing calls to develop AI with a greater degree of ethical oversight, since AI-powered language technology can reinforce and perpetuate bias.
But for plenty of companies, responsible AI isn’t a priority. A recent SAS study of 277 data scientists and managers found that “43% do not conduct specific reviews of their analytical processes with respect to bias and discrimination,” while “only 26% of respondents indicated that unfair bias is used as a measure of model success in their organization” (Forbes). By these numbers, the industry has yet to reckon with Uncle Ben’s evergreen advice: “With great power comes great responsibility.”
A matter of trust
A common thread runs through all the limitations we’ve mentioned: developers’ trust, or lack thereof, in a tool. Research (and more research) shows that trust impacts the adoption of software engineering tools. In short, developers are more likely to use tools whose technology and results they trust, and intelligent automation tools are still earning that trust.
David Widder, a doctoral student at Carnegie Mellon studying developer experiences, conducted a 10-week case study of NASA engineers collaborating with an autonomous tool to write control software for high-stakes missions (“Trust in Collaborative Automation in High Stakes Software Engineering Work: A Case Study at NASA,” 2021). The study was designed to examine which factors influence software engineers to trust—or not trust—autonomous tools.
The bottom line, says Widder, is that “developers may embrace tools that automate part of their job, to ensure that high-stakes code is written correctly, but only if they can learn to trust the tool, and this trust is hard-won. We found that many factors complicated trust in the autocoding tool, and that may also complicate a tool’s ability to automate a developer’s job.”
The study found that engineers’ level of trust in autonomous tools was determined by four main factors:
- Transparency of the tool: A developer’s ability to understand how the tool works and confirm it works correctly.
- Usability of the tool: How easy developers find the tool to use.
- The social context of the tool: How people are using the tool and checking it for accurate performance, including the trustworthiness of the person or people who built the tool, the people and organizations that have endorsed the tool, and whether the tool has “betrayed” users by introducing errors.
- The organization’s associated processes: To what degree the company or organization is invested in the tool, has thoroughly tested it, and has proven its effectiveness in real-world contexts.
The study results suggest that training and documentation in how to use a tool are not enough to build engineers’ trust: “Software engineers also expect to understand why by including not just the rationale for what they are told to do, but also why certain design decisions were made.” This suggests, according to the study, that “not only should automated systems provide explanations for their behavior to incur trust, but that their human creators must too.”
Collaboration, not competition
Instead of checking over our shoulders for a robot army, the path forward involves identifying which tasks are best performed by AI and which by humans. A collaborative approach to coding that draws on the strengths of humans and AI programs allows companies to automate and streamline developers’ workflows while giving them the chance to learn from the AI. Organizations can realize this approach by using AI to:
- Train human developers: AI coding tools can help teach human developers in an efficient, targeted way—like using Copilot to learn additional languages.
- Track human developers’ work and make recommendations to improve efficiency and code quality: Imagine if every human coder had an AI pair programmer that would learn how they worked, anticipate their next line of code, and make recommendations based on prior solutions. Those coders would get a lot more done, a lot more quickly—and learn more while doing it.
- Rewrite legacy systems: Systems like MISIM may not be able to fully automate coding, but they can be of enormous assistance in rewriting legacy systems. These programs are platform-independent, so they have the potential to teach themselves elderly or obscure programming languages like COBOL, on which the US government—not to mention plenty of finance and insurance companies—still relies. MISIM-type programs can rewrite the COBOL programs in a modern language like Python so that fewer devs need to brush up on their COBOL skills to keep these services up and running.
As with most workplace relationships, collaboration, not competition, is the way to approach our relationship with AI. The robots aren’t coming for your job—at least not yet—but they are well on their way to making your job easier, your work life happier, and your code better.
Edited by Ben Popper.