Generative AI systems rely on vast quantities of text, images and other materials in their training. These models can generate new content, yet the copyright status of both the underlying training data and the outputs produced remains uncertain. Researchers and developers therefore need to understand the differing legal frameworks in the United States and Europe, the lessons emerging from current litigation and the contractual terms imposed by generative AI providers.
Human authorship and copyright protection of outputs
U.S. copyright law accords protection solely to works that are original intellectual creations of a natural person. The New York State Bar Association observes that a longstanding principle of U.S. copyright is that the author must be human. In Thaler v. Perlmutter the court rejected copyright protection for a work produced entirely by a “Creativity Machine” because no human creative input was present. Consequently, purely AI-generated outputs are generally regarded as public domain material unless the user contributes substantial creative input – for example through selection or editing (Lim 2023). Extending copyright protection to AI outputs could stifle innovation and does not address broader ethical issues. The U.S. Copyright Office and courts recognise that AI-assisted works may obtain protection when a human contributes sufficient original expression (Lemley and Casey 2020). In practice, such human involvement must be documented, for example by recording the prompts used and by evidencing subsequent editing of outputs (Boyden 2024, 1).
European copyright law likewise requires that a work be the author’s own intellectual creation. In the Infopaq and Levola decisions the Court of Justice of the European Union held that a work must reflect the author’s personality (Rosati 2019). A work produced solely by AI cannot be protected under EU law because it lacks human creative contribution (Fritz 2025). In May 2024 the Municipal Court of Prague addressed the copyright status of an AI-generated image. The court ruled that the AI system could not be the author, and that the user’s prompt was merely an idea which, without evidence of creative input, did not constitute authorship. The judgment suggested, however, that copyright protection might be attainable where a user imparts personal characteristics through specific instructions such as choices of colours, mood and composition (Chloupek & Taimr 2024).
Training data and the “fair use”/TDM exceptions
Under section 107 (the fair use doctrine) of the U.S. Copyright Act, courts consider four factors when assessing fair use: the purpose and character of the use (for example, whether it is transformative and serves education or criticism), the nature of the original work (factual or creative), the amount taken, and the effect on the market for the original. In generative AI training, however, models often copy entire works and the output may compete with the original content, which weakens the fair‑use argument (Dornis & Stober 2025).
Recent litigation illustrates how courts apply these principles. In Thomson Reuters v. Ross Intelligence a Delaware court held that the AI‑assisted legal research service Ross Intelligence could not rely on fair use because its use of Westlaw headnotes was not transformative and the service might substitute for Westlaw’s database (Maynard et al. 2025). This ruling suggests that copying entire works in training can cause significant market harm. In December 2023 The New York Times sued OpenAI and Microsoft, alleging that they used millions of Times articles without authorisation, thereby creating a “market substitute” product; the defendants rely on a fair‑use defence. The court consolidated the case with suits brought by other news organisations in September 2024 and ordered the preservation of chat logs to determine whether paywalls were bypassed (Chen 2025). Record labels have also sued the generative music services Suno and Udio, alleging that their models used virtually all available music and that the outputs resemble existing recordings. The defendants characterise “background technological copying” as fair use, but courts have yet to rule on this argument (Dornis & Stober 2025).
Directive 2019/790 of the EU introduces two exceptions for text‑ and data‑mining (TDM). Article 3 allows non‑profit research organisations and cultural heritage institutions to reproduce and analyse works for scientific research, provided they have lawful access; such copies may be retained indefinitely and rights‑holders cannot contractually forbid this use. Article 4 extends the exception to commercial TDM but requires users to retain copies only for as long as necessary and enables rights‑holders to signal an opt‑out via robots.txt or another machine‑readable means. Dornis and Stober (2025) argue that generative training differs fundamentally from the TDM envisaged in the Directive. While TDM aims to extract statistical patterns, generative training seeks to create new works that may imitate the style and structure of the originals. Because training copies entire works and the outputs can compete with the originals, the authors contend that the fair‑use and TDM exceptions do not apply and that explicit licences are required. The EU Artificial Intelligence Act is primarily product‑safety oriented, yet draft provisions require providers of generative AI systems to comply with the TDM exceptions and opt‑out signals and to publish a list of the copyright‑protected sources used for training. Several Member States consider that the current Directive does not address generative training and advocate new licensing mechanisms (Dornis & Stober 2025).
Providers’ contractual terms and rights in outputs
AI providers generally shift the risk of infringement to users and do not guarantee that outputs are free from intellectual‑property claims. OpenAI’s conditions assign all rights in the output to the user while confirming that the user retains ownership of prompts and uploaded material; OpenAI states that it assigns to the user any rights it may have in the generated output:"As between you and OpenAI, and to the extent permitted by applicable law, you (a) retain your ownership rights in Input and (b) own the Output. We hereby assign to you all our right, title, and interest, if any, in and to Output" (OpenAI 2024). Microsoft Copilot and GitHub Copilot likewise specify that the user owns both inputs and outputs and that the provider claims no ownership, with GitHub emphasising that it does not own the code generated from its suggestions: "The code you write using GitHub Copilot is not “Content” under the Agreement until you upload it to GitHub.com. The code, functions, and other output returned to you by GitHub Copilot are called “Suggestions.” GitHub does not own Suggestions. You retain ownership of Your Code and you retain responsibility for Suggestions you include in Your Code. It is entirely your decision whether to use Suggestions generated by GitHub Copilot. If you use Suggestions, GitHub strongly recommends that you have reasonable policies and practices in place designed to prevent the use of a Suggestion in a way that may violate the rights of others" (GitHub Terms for Additional Products and Features 2025). Anthropic’s Claude adopts a similar approach by waiving any rights it may have and assigning them to the user, provided the user complies with the service terms: "Subject to your compliance with our Terms, we assign to you all our right, title, and interest (if any) in Outputs" (Anthropic 2025).
Across these services, terms typically permit the provider to use user inputs and outputs to improve models unless the user opts out, and they place responsibility for lawful use on the user. These examples illustrate the variability in provider approaches; researchers should therefore carefully examine the specific terms and conditions of each AI model regarding input data handling, output ownership rights, and training data policies before use.
Research practice: what can be done with AI‑generated content?
Researchers should recognise that AI outputs generally fall into the public domain and may therefore be used freely; however, when there is no substantive human contribution the outputs should not be presented as original work. Outputs are best regarded as neutral sources from a copyright perspective: they may be used, but they should not be cited as original creative works. To obtain copyright protection in AI‑assisted works, authors must demonstrate human creative input; accordingly, prompts, editing and decisions should be carefully documented.
The protection of private and confidential data is paramount. Information entered into generative AI tools may be disclosed to third parties, researchers should therefore avoid uploading confidential research data, personal information or manuscripts into public AI tools, and instead use licensed or locally deployed models. Data management must also accord with licensing: researchers must ensure that training datasets are lawfully licensed or fall within fair‑use or TDM exceptions. For commercial research, clear licensing agreements with content owners are advised. Finally, ethical considerations are central: AI tools may incorporate biases and errors; researchers must verify the accuracy of outputs and should not rely uncritically on AI suggestions. The ethical provenance of data, the presence of bias and the consent of creators should all inform the responsible use of generative AI.
In sum, AI‑generated text and images can serve as a starting point for brainstorming, inspiration or drafting, but the researcher must contribute original analysis and exercise source criticism. Because AI outputs are generally unprotected by copyright, they may be integrated into scholarly work without infringing third‑party rights; nonetheless, if an output contains elements memorised from the training data – such as lengthy verbatim passages – its use may infringe copyright, so such passages should be avoided or licensed. Researchers who rework and edit AI outputs to produce original text or interpret data thereby create a human‑authored work that may be eligible for protection. In all cases, the use of AI should be transparently disclosed in the methods or acknowledgements section of a publication, including the prompts and model version employed. Finally, because providers typically assign output rights to users while disavowing liability, users remain responsible for ensuring lawful use.
References:
1. Anthropic. 2025. Consumer Terms of Service (Effective May 1, 2025). Available at: https://www.anthropic.com/legal/consumer-terms ^ Back
2. Boyden, Bruce E. 2024. Generative AI and IP Under US Law. Marquette Law School Legal Studies Paper 24-09. ^ Back
3. Chen, Tian. 2025. Language Models' Verbatim Copying: Copyright Infringement Analysis through the Lens of the New York Times Co. v. Microsoft Corp., OpenAI, Inc. et al. Cardozo Arts & Entertainment Law Journal, 43(2): 349–369. ^ Back
4. Chloupek, Vojtěch, and Martin Taimr. 2024. Czech Court Denies Copyright Protection of AI-Generated Work in First Ever Ruling. Bird & Bird. Available at: https://www.twobirds.com/en/insights/2024/czech-republic/czech-court-denies-copyright-protection-of-ai-generated-work-in-first-ever-ruling ^ Back
5. Dornis, Tim W.; Stober, Sebastian. 2025. Generative AI Training and Copyright Law. arXiv preprint arXiv:2502.15858. Available at: https://arxiv.org/abs/2502.15858 ^ Back
6. Fritz, Johannes. 2025. Understanding Authorship in Artificial Intelligence-Assisted Works. Journal of Intellectual Property Law and Practice, jpae119. ^ Back
7. GitHub. 2025. Terms for Additional Products and Features (Effective April 1, 2025). Available at: https://docs.github.com/en/site-policy/github-terms/github-terms-for-additional-products-and-features ^ Back
8. Lemley, Mark A., and Bryan Casey. 2020. ‘Fair Learning’. Texas Law Review 99: 743. ^ Back
9. Lim, Daryl. 2023. ‘Generative AI and Copyright: Principles, Priorities and Practicalities’. Journal of Intellectual Property Law and Practice 18(12): 841–842. ^ Back
10. Maynard, John Gary; Reichman, Jonathan D.; Maddry, Tyler; Pauling, Kate. 2025. Fair Warning: Artificial Intelligence’s First Copyright Fair Use Ruling, Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc. Intellectual Property & Technology Law Journal, 37(4): 12–13. ^ Back
11. OpenAI. 2024. Terms of Use (Effective December 11, 2024). Available at: https://openai.com/en-GB/policies/row-terms-of-use/ ^ Back