GitHub Copilot and the Open-Source Copyright Controversy
Developers allege Microsoft's AI coding tool was trained on their open-source code without respecting license requirements
GitHub Copilot, Microsoft's AI-powered coding assistant, has become the subject of a significant legal and ethical controversy over its use of open-source code for training data. The tool, developed in partnership with OpenAI, was trained on billions of lines of code hosted on GitHub, including code published under various open-source licenses that impose specific conditions on how the code can be used, modified, and distributed. Developers argue that Copilot's training process and its code suggestions violate these license terms.
A class-action lawsuit filed in late 2022 alleges that GitHub, Microsoft, and OpenAI violated the rights of developers whose open-source code was used to train Copilot without complying with the attribution and copyleft requirements of licenses like the GPL, MIT, and Apache licenses.
Key Takeaways
- A class-action lawsuit alleges Copilot violates open-source license requirements including GPL attribution and copyleft provisions
- Researchers documented instances where Copilot generated code character-for-character identical to copyrighted open-source code
- The case could either restrict AI training on licensed code or undermine open-source licensing enforceability