Anti-AI Code Licenses: Navigating Open Source Definitions and Fair Use Battles

The ambition to make code publicly accessible while explicitly preventing its use for training artificial intelligence models highlights a tension between the spirit of open collaboration and the evolving landscape of AI. This objective raises significant questions about software licensing, legal enforceability, and the very definition of "open source."

Defining "Open Source" in the Age of AI

A central point of contention is the definition of "open source." According to the Open Source Initiative (OSI), a license that discriminates against specific fields of endeavor or use cases (such as AI training) does not qualify as "open source." Such a license would instead fall under the category of "source available"—meaning the code is visible, but its usage is restricted. This distinction is crucial because major corporations and projects often have policies to only adopt OSI-approved licenses due to their legal clarity and widespread compatibility. Opting for a non-OSI compliant license could severely limit a project's reach and ability to integrate with the broader open-source ecosystem.

The historical context also plays a role, with some arguing that "open source" originally meant simply "source available" before the OSI formalized its definition. However, the OSI's interpretation has gained significant traction and is recognized by governments and industry as the de facto standard.

The Fair Use Dilemma

The enforceability of any anti-AI clause in a license is largely contingent on the legal doctrine of "fair use." Many AI companies currently operate under the premise that training their models on publicly available data, including copyrighted code, constitutes fair use and therefore does not trigger copyright-based license restrictions. This is a highly debated area, with ongoing lawsuits globally.

For instance, recent court cases, like one in Germany involving OpenAI, indicate that both training AI models with copyrighted work and providing derivative outputs could be considered copyright violations, necessitating respect for licenses. If courts ultimately decide that AI training is not fair use, then licenses would indeed become highly relevant, potentially subjecting AI companies to significant legal challenges. Conversely, if fair use is upheld, then explicit anti-AI clauses in licenses might have limited legal power for training purposes.

Practical Considerations and Alternatives

Even if an anti-AI license were deemed legally sound, practical challenges remain:

Enforcement Costs: Licenses do not enforce themselves. Detecting violations and pursuing legal action against well-funded AI corporations would require substantial financial resources.
Platform Terms: Platforms like GitHub have their own terms of service. By publishing code on such platforms, creators often grant them broad rights to "analyze" the code, which could implicitly cover AI training, regardless of the project's specific license.
Project Adoption: Legal departments in many organizations are reluctant to approve projects with non-standard or custom licenses due to the complexity and potential legal risks. This could lead to a lack of adoption and the emergence of OSI-licensed alternatives that the community gravitates towards.
Dependency Compatibility: Adding new restrictions to a license can create incompatibilities with upstream dependencies, especially if those dependencies are under copyleft licenses like GPL, which might not permit additional restrictions.

While creating a truly "open source" license that prohibits AI training appears to be a contradiction in terms, developers looking to restrict AI use have a few paths to consider:

"Source Available" Licenses: Opt for a custom "source available" license that explicitly outlines prohibitions against AI training. Be aware of the trade-offs in terms of project adoption and legal enforceability.
Copyleft Licenses (GPL/AGPL): These licenses, particularly AGPL, could be leveraged if the output of AI models trained on such code is considered a derivative work and redistributed, potentially requiring the release of the model's weights or training data under compatible terms. This is a complex legal area.
Ethical Source Movement: Explore licenses from the "ethical source" movement (e.g., Hippocratic License), which aim to incorporate ethical considerations. While not OSI-approved, they reflect a similar desire for value-driven licensing.
Gatekeeping: For projects where the creator can control access, implementing a system that requires users to agree to terms (e.g., an EULA or NDA) before accessing the source code could be an option. This moves from a public license to a private contract, with different legal implications (e.g., contract breach vs. copyright infringement).

Ultimately, the decision requires a careful balance between the desire to control one's work and the practical realities of software distribution and legal precedent in a rapidly evolving technological landscape.