Fair use in the context of machine learning training. Is this approach more flexible than that of text and data mining exceptions?
INTELLECTUAL PROPERTY RIGHTS
11/8/202524 min read
The previous post, entitled "An introduction to copyright infringement in machine learning training and in the deployment of generative artificial intelligence", explained that there is considerable debate about whether copyright and related rights are infringed when protected materials are collected, pre-processed, and used to train machine learning (ML) models (the input phase), and when these models generate outputs that are similar to, or partially replicate, the inputs (the output phase). The post also noted that, in this debate, the exceptions provided for in jurisdictions such as the US and Japan are often considered to be more flexible than those contemplated by the EU. Notably, there has been significant discussion about whether using works to train ML models constitutes “fair use” under Section 107 of the Copyright Act, with various authors arguing that it does, given the transformative nature of the use.
What is fair use?
Recall that, unlike in the EU, where specific exceptions or limitations are in place, the question of whether a given use of a work amounts to fair use must be assessed on a case-by-case basis. This involves carefully examining the following factors:
The purpose and character of the use, including whether such use is commercial or is for nonprofit educational purposes. In this sense, it must be analysed whether the secondary use is transformative, i.e., “whether and to what extent the new work merely supersedes the objects of the original creation, or instead adds something new, with a further purpose or different character”.[1] The extent to which the use of a protected work has a further purpose or different character is “a matter of degree”, and the degree of transformation needed “must go beyond that required to qualify as a derivative”.[2] Other elements that may be relevant in analysing this first factor include the commercial nature of the use and whether it was made in good or bad faith. Regarding the commercial nature of the use, the Supreme Court has ruled that the more transformative the new work, the less significant it is. Conversely, when the purpose of the original and new works is similar, commercial use counts against fair use.[3] As for whether the use was made in good or bad faith, it is still unclear to what degree it matters in the analysis, with the Supreme Court being “skeptical”.[4]
The nature of the copyrighted work. This involves distinguishing between different types of works, some of which are “closer to the core of the intended copyright protection” than others.[5]
The amount and substantiality of the portion used in relation to the copyrighted work as a whole. Although, as a general rule, the more that is copied, the worse it is for the defence of fair use, copying small amounts of a work can still lead to infringement if they comprise the core of the creative expression, whereas a more extensive copy may be justified if the passages do not contain creative expressions or are necessary for the use's purposes.[6] Therefore, how much copying is permissible is determined by the purpose and character of the use.[7] Indeed, even copying the entire work may be justified if it is “reasonably appropriate to achieve the copier´s transformative purpose and was done in such a manner that it did not offer a competing substitute”.[8] Furthermore, it has been emphasised in some cases that “what matters is not so much the amount and substantiality of the portion used in making a copy, but rather the amount and substantiality of what is made accessible to a public for which it may serve as a competing substitute”.[9]
The effect of the use upon the potential market for or value of the copyrighted work. Here, the focus is on whether the secondary use of the work would result in a substantially negative impact on current and potential markets of the original and derivative works, as well as the extent of this impact.[10] In general, the more transformative the secondary use of the work, the less risk of actual or potential market substitution. However, even when the secondary use is transformative, if widespread disclosure of significant portions of the original work occurs, it could also lead to market substitution.[11] In this analysis, the public benefits that the copying will likely produce must also be taken into account.[12]
It should also be acknowledged that the fair use doctrine is a flexible concept which cannot be reduced to “bright-line rules”.[13] The four aforementioned factors must be weighed together by the judge, according to the circumstances, which may include significant changes in technology.[14] In some contexts, one factor may carry more weight than the others, although it has occasionally been asserted that the fourth factor is the most significant.[15]
Fair use and artificial intelligence
Until very recently, there had been no specific rulings in the US that applied the fair use doctrine to the training of ML models. Nevertheless, the US Court of Appeals for the Second Circuit's decision in Google Books (Authors Guild v. Google, Inc., No. 13-4829 (2d Cir. 2015)) influenced the theory that such use of protected works could constitute fair use. In this case, the court deemed several practices to be legitimate, including the digitisation of books provided to Google by major libraries, the creation of a publicly available search function to identify books containing specified terms and view snippets of relevant text, and the text and data mining (TDM) facilitated by the search engine.[16] Once again, it should be emphasised that TDM is not the same as ML. In any case, since the beginning of this year, decisions concerning the use of works to train ML models have begun to emerge. The first such decision was issued by the District Court for the District of Delaware on 11 February 2025, in the form of a summary judgment in the case of Thomson Reuters v. ROSS Intelligence (Thomson Reuters enter. Ctr. GMBH v. Ross Intel. Inc., No. 1:20-cv-613-SB, 2025 WL 458520 (D. Del. Feb. 11, 2025)).
Thomson Reuters
In short, in May 2020, Thomson Reuters sued ROSS Intelligence, alleging that it had unlawfully copied protected content from Thomson Reuters' Westlaw legal research platform to train its AI-based legal search engine. This content included 21,787 headnotes, the editorial decisions in 500 judicial opinions, and West's Key Number System. Specifically, ROSS Intelligence needed a database of legal questions and answers for this training and requested Thomson Reuters a licence to use its content, but this request was denied. To train the AI system, ROSS then entered into an agreement with LegalEase to obtain training data in the form of “Bulk Memos”, which are “lawyers' compilations of legal questions with good and bad answers.”[17] To create these, LegalEase's lawyers had to use Westlaw headnotes without copying and pasting them.[18] Against this background, Judge Bibas first determined whether LegalEase's Bulk Memos had copied Thomson Reuters' headnotes or drawn upon uncopyrightable judicial opinions. In this regard, the judge emphasised that the originality threshold “is extremely low, requiring only a minimal degree of creativity”, a threshold that Westlaw's notes meet both collectively and individually.[19] Before proceeding with the legal analysis of fair use, it should be noted that the judge did not grant summary judgment in relation to the Key Number System and the 500 judicial opinions containing editorial decisions, as factual questions still needed to be resolved. The summary judgment also excluded those that are verbatim copies of extracts from judicial opinions and those whose valid copyright registration is still in question. It was therefore “limited” to 22,430 headnotes that closely resembled the Bulk Memo questions and diverged from the texts of judicial opinions, thus meeting the requirements of actual copying and substantial similarity that Thomson Reuters needed to prove.[20]
Moving on to the analysis of fair use, when it comes to the purpose and character of the use of the works, the judge ruled that Ross's use was not transformative, as it “does not have a further purpose or different character from Thomson Reuters´s”. This is because Ross used Thomson Reuters' headnotes as training data to create a legal research tool that would compete with Westlaw, thereby developing a market substitute. Moreover, the copying was commercial. Whether Ross acted in good or bad faith was not assessed, but Judge Bibas clarified that the conclusion would have been the same even if Ross had acted in good faith.[21]
Turning to the nature of the copyrighted work, the judge pointed out that the protected material “is not that creative”.[22]
Then, when analysing the amount and substantiality of the portion used, Judge Bibas emphasised that, although Ross's argument that the number of headnotes taken amounted to only a small percentage of all Westlaw's notes was invalid - as the percentage of a work taken is not decisive-this factor still favoured Ross, given that the headnotes had not been made available to the public.[23]
Regarding the effect of use upon the market for or value of the copyrighted work, the judge concluded that Ross's use of headnotes had a negative impact on both the “original market”, i.e. legal research platforms, and the potential derivative market for data to train (legal) AIs. In this regard, Ross failed to prove that the second market does not exist or would not be harmed. Furthermore, the judge noted that the potential benefit to the public does not tip the balance in Ross's favour, since “legal opinions are freely available, and the public's interest in the subject matter is not enough”.[24]
All in all, the judge emphasised that the first and fourth factors carry the most weight in the analysis. Although factors two and three of the fair use analysis favoured Ross, factors one and four, as well as the overall balance, favoured Thomson Reuters.[25]
While this is certainly an important precedent, as the judge pointed out, this is not a case dealing with generative AI, the outcome of which may differ. Whether the use of copyrighted content constitutes fair use must be assessed based on the specifics of each case, and this could still favour the training of ML models that power systems or applications which do not compete with those exploited by the rightholders.
Anthropic
Subsequently, on 23 June 2025, the District Court for the Northern District of California issued another ruling on the matter, this time regarding generative AI (Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024)). The facts of the case were as follows: Anthropic offers Claude, an AI software service that generates text in response to users' prompts using LLMs (large language models). Anthropic created a central research library comprising pirated copies of over seven million books, as well as purchased copies. The pirated copies originated from shadow libraries such as Genesis, Books 3, LibGen, and PiLiMi. The purchased copies were digitised versions of print copies obtained from major book distributors and retailers. Some of these copies were used to train the aforementioned LLMs. After evaluating potential legal issues, Anthropic decided not to use certain copies for this purpose. Nevertheless, the library copies were retained as a permanent, general-purpose resource. It was in this context that three authors complained, arguing that Anthropic had infringed their copyright by pirating copies for its library and using them to train its LLMs.[26]
Before outlining Judge Alsup's reasoning regarding the existence of fair use in this case, it is worth noting that Anthropic argued that pirating copies of books should be justified as they were reasonably necessary for training its LLMs. However, Judge Alsup did not accept this argument, but instead evaluated, on the one hand, the legitimacy of training LLMs with copies of books, and on the other, the legitimacy of creating a central research library. Regarding the latter, he first considered the legitimacy of digitising books purchased in print form to contribute to the library and, secondly, the contribution of pirated copies.[27]
Let us begin by analysing the legality of the copies used to train LLMs. It should be noted that only the lawfulness of operations in the input phase was questioned, not those in the output phase. This is because, when the LLMs were incorporated into a publicly accessible version of Claude, another software filtered both user prompts and outputs. Consequently, no infringing content was provided to users.[28]
Regarding the purpose and character of the use of the works in the input phase, the judge noted that, while authors may require users to pay for a copy of their work, they cannot charge users each time they read, remember or build upon it. Furthermore, while the authors claimed that the training aimed to enable the LLMs to memorise the creative elements of their works, Judge Alsup concluded that the LLMs had not reproduced these elements or the “identifiable expressive style” of any author, but rather had produced grammar, composition and style drawn from many different works. Overall, the judge found that the purpose and character of using copyrighted works was “spectacularly” transformative. In the judge's words, “like any reader aspiring to be a writer, Anthropic´s LLMs trained upon works not to race ahead and replicate or supplant them, but to turn a hard corner and create something different”. What's more, according to Judge Alsup, copies reasonably required for training purposes, e.g. within the LLMs, would also be used in a transformative way.[29]
Moving on to the nature of the copyrighted work, the judge considered that it points against fair use, without delving too deeply into this issue, and bearing in mind that the defendants have already acknowledged that the books used for Claude's training featured expressive elements.[30]
Regarding the amount and substantiality of the portion used, the judge held that even if billions of works had been copied in their entirety, all the copying was reasonably necessary for the transformative use in question. This conclusion took into account the absence of a traceable link between Claude's outputs and the works, and the fact that the copies made by Anthropic fell outside the scope of the ordinary use of the books. At this point, it is interesting to note that the authors argued that Anthropic could have used different books, or even no books, to train LLMs. Although they acknowledged that training any LLM requires a large amount of texts, they claimed that using their books was not “reasonably necessary”. However, the judge pointed out that “reasonably necessary” does not equate to “strictly necessary”, and that “because using so many works was reasonably necessary, using any one work for actually training LLMs was about as reasonable as the next”.[31]
Lastly, as regards the effect of use upon the market for or value of the copyrighted work, Judge Alsup also considered that it favours fair use, since the training and development of LLMs does not displace demand for books. The plaintiffs emphasised that the development of LLMs could result in “an explosion of works competing with their works”. Nevertheless, the judge made the noteworthy statement that “author´s complaint is not different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works”, and that, consequently, this type of displacement falls outside the scope of the Copyright Act. The judge also ruled that, while a market for licensing works for training LLMs may develop, “this market is not one the Copyright Act entitles Authors to exploit.”[32]
Overall, the general balance and factors one, three and four support the fair use defence for Claude's training. Furthermore, the judge stated that this technology is “among the most transformative many of us will see in our lifetimes”.[33]
Turning to the discussion of the legality of the copies used to build a central research library, Judge Alsup found that changing the format of each purchased copy from print to digital for storage and searchability purposes was transformative. The digital copies replaced the physical ones and were not used to create new copies for sale or external sharing”.[34] In contrast, Judge Alsup held that downloading pirated digital copies to create a central, general-purpose library as a substitute for paid copies was not transformative. Indeed, the judge added that “this order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use”. This would apply even if some of these copies were immediately used for further transformative purposes.[35]
Secondly, the nature of the copyrighted work counts against fair use for all copies, whether legal or pirated.[36]
Regarding the amount and substantiality of the portion used, the judge ruled that copying the entire purchased works was necessary to store and search for them more effectively. For pirated copies, however, the conclusion was the opposite.[37]
When it comes to the effect of the use upon the market for or value of the copyrighted work, Judge Alsup concluded that the change in format of the copies does not hinder any market that the Copyright Act reserves for the authors to exploit, bearing in mind again that the library would be used solely internally. Therefore, this factor was neutral with regard to fair use.[38] In turn, the copies obtained from pirated sources did indeed displace demand for the Author´s books. Furthermore, the judge emphasised that the result that the defendants claim to be fair use amounts to “steal a work you could otherwise buy so long as you at least loosely intend to make further copies for a purportedly transformative use without any accountability”. Allowing this would, consequently, “destroy the entire publishing market”.[39]
Therefore, with factors one and three in favour of fair use, factor four neutral and factor two against, the copies used to convert purchased print library copies into digital library copies are legitimate. Conversely, with all the factors against it, the downloaded pirated copies used to build a central library are not justified by the fair use doctrine.[40]
Kadrey v. Meta
On 25 June 2025, the District Court of the Northern District of California issued another ruling concerning generative AI (Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025). This time, thirteen authors sued Meta for downloading copies of their books from various shadow libraries, such as LibGen and Anna's Archive, and using them to train different versions of Llama, Meta's LLM. Meta attempted to obtain licences, but this proved more challenging than anticipated. Therefore, Meta decided to use the pirated copies to train Llama. The plaintiffs focused on two primary theories of harm. The first one is that Llama can reproduce snippets of text from their books. Nevertheless, it should be noted that Meta adopted “mitigation measures”, so that different versions of the Llama model cannot generate more than 50 tokens from the plaintiffs' books. The second theory is that Meta has impaired the market for licences to use copyrighted works for training LLMs. Notwithstanding, Judge Chhabria labelled these theories as “losers” for reasons that will be presented below. He also considered that the “potential winning argument”, which is that Meta copied the books to develop a product that could flood the market with competing works, had not been adequately put forward and evidenced by the plaintiffs.[41]
Before examining the four factors of fair use, a few points must be highlighted. Firstly, Judge Chhabria criticised some of the arguments used by Judge Alsup in Anthropic. According to Judge Chhabria, Judge Alsup “focused heavily on the transformative nature of generative AI while brushing aside concerns about the harm it can inflict on the market for the works it gets trained on”. What´s more, Judge Chhabria held that Judge Alsup's view that such harm would be no different to that caused by using works to teach schoolchildren to write well is misguided due to AI´s capabilities.[42] Secondly, Judge Chhabria noted that some argue against ruling against tech companies to avoid hindering AI development. However, he stated that “if using copyrighted works to train the models is as necessary as the companies say, they will figure out a way to compensate copyright holders for it”.[43] Thirdly, the judge emphasised that the ruling only concerns the plaintiffs. It does not rule that Meta's use of copyrighted materials to train its LLMs is lawful. Instead, the ruling rejects the plaintiffs' arguments and finds that they failed to provide sufficient evidence to support a dilution of the market.[44]
Regarding the purpose and character of the use of the works in the input phase, the judge ruled that Meta's use of the books was “highly transformative”. In this sense, one must bear in mind that Llama can perform several functions, such as generating new text, editing emails and translating. The plaintiffs argued that Llama's outputs could replicate parts of their works or writing style upon prompting, but it was shown that Meta had adopted measures to avoid regurgitation. Moreover, style is not copyrightable. Although Meta's use is commercial, the judge recalled that this does not tip the balance against the company in the case at hand. The next issue discussed was whether Meta's downloading of copies of the books from shadow libraries would count against fair use. Using shadow libraries could influence the determination of whether Meta acted in bad faith. However, Judge Chhabria stated that, even if this were the case, bad faith would not be decisive in this analysis. Despite this, he added that downloading from shadow libraries could become relevant if it benefited those who created them, for example, through ad revenue from visits. Still, the plaintiffs did not prove the latter. Additionally, Judge Chhabria considered that, while downloading the books and Meta's use of the copies to train Llama are different uses, the former must be evaluated in light of “the ultimate, highly transformative purpose of training Llama”. Even the initial copies that Meta made from LibGen to check whether the books were suitable for training purposes were regarded as a “reasonable first step towards training”. Plaintiffs also alleged that not all downloaded copies had been used for training. Yet, the judge ruled that this had not been proven, and that in any case, “fair use doesn´t require that the secondary user make the lowest number of copies possible”. For all these reasons, this factor favoured Meta.[45]
In terms of the nature of the copyrighted work, the books are highly expressive, and even if models learn statistical relationships, these are “the product of creative expression”. Thus, this factor favoured the plaintiffs.[46]
When it came to the amount and substantiality of the portion used in relation to the copyrighted work, the judge ruled that copying the entire books was reasonably necessary for training Llama. Then, this factor also favoured Meta. The fact that Llama does not generate infringing outputs was important in reaching this conclusion. [47]
Ultimately, concerning the effect of use upon the potential market for or value of the copyrighted work, reference must be made again to the theories of harm brought by the plaintiffs. The first theory was that Llama's outputs may regurgitate parts of the books. Nonetheless, as Llama cannot regurgitate more than 50 tokens, the judge dismissed this theory.[48] The second theory was that Meta's unauthorised use of books harms the market for licensing books for training LLMs. In this regard, the judge determined that this market “is not one that the plaintiffs are legally entitled to monopolise.”[49] The third alleged harm was that the rapid generation of non-infringing works competing with the originals on the same topics or in the same genres could lead to market dilution. This effect is usually not that significant when evaluating the fourth factor. However, the judge argued that since other technologies cannot match the capability of generative AI to “flood the market with competing products”, this factor should become “highly relevant” here. Although the plaintiffs did not develop and present sufficient evidence to support this theory, Judge Chhabria offered some valuable insights that could inform future plaintiffs on how to present their cases more successfully. To begin with, depending on the type of work, the degree of market dilution could vary; it would be greater when these works are more functional, and probably lesser when they require more creativity. Then, a series of relevant questions would arise, such as whether Llama is capable of generating books, or will be in the near future; whether these books would compete with those of the plaintiffs; what the impact of competition would be on the sales of the plaintiffs' books, and how these effects are likely to increase in the future. Last but not least is the question of “how does the threat to the market for the plaintiffs´ books in a world where LLM developers can copy those books compare to the threat to the market for the plaintiffs´ books in a world where developers can´t copy them?”. This would be the situation, for example, if the LLM were only trained using public domain or non-copyrighted texts.[50] As discussed in the evaluation of the first factor, the possibility that Meta could benefit pirate libraries and their users by downloading books was explored. While this would indeed be important for this factor as well, plaintiffs did not prove that this was the case or that Meta was supporting or encouraging the widespread use of shadow libraries.[51] Additionally, Llama may help its users inter alia to create new expressions, so public benefit considerations also favoured Meta.[52]
Altogether, Meta won the summary judgment on fair use, with all factors except the second in its favour.
A few points to reflect on
These cases provide some insight into where things might be headed in terms of applying the fair use doctrine to ML training and generative AI deployment. That said, many questions remain unanswered, and it may be years before more clarity is achieved on how companies developing ML-powered AI systems can legitimately use copyrighted works.
A key point to emphasise is that fair use is a flexible doctrine. The factors set out in Section 107 of the Copyright Act are evaluated according to the specific facts presented to the court. Therefore, the outcome may differ depending on whether the AI system/ML model trained using protected works competes with a product or service created by the authors of those works, as was the case with Ross Intelligence, or if the ultimate purpose is entirely different. On this note, I have in mind scenarios where images are used to train models to distinguish between traffic signals, animals and even pedestrians. To the best of my knowledge, such cases have not yet been brought before a court. In any case, the hot topic remains what happens in cases involving generative AI.
Of the generative AI cases analysed, only the legality of the actions undertaken in the “input phase” was called into question, as the systems powered by LLMs did not generate infringing content. Despite this, both Judge Alsup and Judge Chhabria made it clear that, should this change, the authors could bring the matter before the court. Therefore, if measures are not taken to prevent regurgitation, and generative AI systems generate outputs that are similar to, or replicate, the original protected works used in the training phase, these may not be considered fair use. For now, though, there has been no opportunity to observe how the courts would react to this scenario.
Whether it is legitimate to download copies of protected works from shadow libraries for subsequent use in ML training is far from clear. In fact, Judges Alsup and Chhabria have taken different positions on the matter. In Anthropic, Judge Alsup addressed the legality of creating a “long-term” central library and of training ML models with its content separately. The facts of this case differ from those that gave rise to the Kadrey v Meta ruling, since in the latter, pirated copies were initially downloaded and then “immediately” used for training ML models. In this regard, Judge Chhabria assessed the legality of the former based on the ultimate purpose for which they were used. However, based on my reading of Anthropic, even if those facts had been presented to Judge Alsup, the outcome would likely have diverged from that handed down by Judge Chhabria.
Much debate has centred on whether the idea-expression dichotomy is an adequate basis for determining whether using works for ML training constitutes infringement. In my doctoral thesis, I argued that sometimes it is not. It is true that, in the training of some ML models, works are merely used as data. For example, images may simply be used to teach a model to distinguish between different traffic signals or between a dog and a cat. Here, the “expressive” or “artistic” value of the images is irrelevant. Conversely, some models that power Generative AI systems do aim to learn from the expressive elements of works. Indeed, in their analysis of the second factor, both Judge Alsup and Judge Chhabria highlight that the books used to train the LLMs were chosen for their quality and expressive elements.
Regarding market displacement in the fourth factor analysis, Judge Alsup's and Judge Chhabria's arguments differ greatly. Judge Alsup “encourages” authors to file a lawsuit if LLMs generate infringing outputs, which would lead to a displacement of demand for copies of the authors' books. However, he does not consider the “indirect market substitution” to be as relevant to the analysis as Judge Chhabria does. In my doctoral thesis, I suggested that it would be interesting to explore how unfair competition law could address market dilution due to the use of generative AI, rather than focusing on copyright when the outputs do not infringe it. In any case, this remains a highly controversial issue.
Next, with regard to the fourth factor, it is unclear on which market the negative actual or potential impact should be evaluated. Judge Bibas found the market for licences of works for AI training relevant, whereas Judges Alsup and Chhabria did not.
Against this background, I return to the question of whether the doctrine of fair use is more permissive than the TDM exceptions set out in the DSM Directive — especially the TDM exception in Article 4, which covers cases where reproductions are made for commercial purposes. My answer is yes. While the debates and the General Purpose AI (GPAI) Code of Practice focus on ensuring that GPAI model providers comply with copyright law, the TDM exceptions have a broader application. In other words, according to the legal text, it makes no difference whether a model is trained using protected works to generate new creative material or for object recognition. Additionally, there is a requirement for “lawful access” to the works, and under Art. 4 of the DSM Directive, the “opt-out mechanism” must likewise be respected, with all the complications that this entails. There is not much room for flexibility (see the previous post “The arduous application of the exceptions provided for in EU copyright law in the context of training machine learning models”). Following the Anthropic ruling, access to the works must be lawful. Beyond that, if the AI system does not regurgitate, training an LLM should, in principle, be permissible. Conversely, if the Kadrey v. Meta ruling is followed, lawful access to the works for subsequent ML model training is less critical given the highly transformative nature of the latter purpose. But if indirect market substitution occurs, which must be proven on a case-by-case basis, authors must be asked for a licence. Fair use certainly does not constitute a “free pass” for using copyrighted works to train ML models. Nevertheless, it provides more room for manoeuvre and, in my opinion, is better suited to addressing the challenges posed by the use of copyrighted works in training ML models and deploying AI systems, whether generative or non-generative.
If I have sparked your interest in this topic and you would like to find out more, here are some references to some good resources:
Brauneis, Robert: `Copyright and the Training of Human Authors and Generative Machines´ (2024)
<https://scholarship.law.gwu.edu/faculty_publications/1751/>.
Carroll, Michael W.: `Copyright and the Progress of Science: Why Text and Data Mining Is Lawful´ (2019) 53 (893) UC Davis Law Review 893, 964.
Mǎrginean, Maria Alexandra: `Copyright Infringement & AI: A Case Study of Authors Guild v. OpenAI and Microsoft´ (2024) <https://www.4ipcouncil.com/research/copyright-infringement-and-ai-case-study-authors-guild-v-openai-and-microsoft>.
Sag, Matthew: `The New Legal Landscape for Text Mining and Machine Learning´ (2019) <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3331606>.
Sag, Matthew: `Copyright Safety for Generative AI´ (2023) <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4438593>.
Sag, Matthew: `Fairness and Fair Use in Generative AI´ (2024) <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4654875>.
Samuelson, Pamela: `Fair Use Defenses in Disruptive Technology Cases´ (2023) <https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4631726>.
Sobel, Benjamin: `Artificial Intelligence’s Fair Use Crisis´, (2017) 41(45) Colum. J.L. & Arts 45, 97.
[1] Campbell v. Acuff-Rose Music, Inc. 510 U.S. 569 (1994) 579.
[2] Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith et al., 598 U.S. 508, 143 S.Ct. 1258 (2023) 15, 16.
[3] Campbell v. Acuff-Rose Music, Inc. 510 U.S. 569 (1994) 579 and 584; Google LLC v. Oracle Am., Inc. 141 S. Ct. 1163 (2021) 27; Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith et al., 598 U.S. 508, 143 S.Ct. 1258 (2023) 18.
[4] Google LLC v. Oracle Am., Inc. 141 S. Ct. 1163 (2021) 28.
[5] Campbell v. Acuff-Rose Music, Inc. 510 U.S. 569 (1994) 586; Google LLC v. Oracle Am., Inc. 141 S. Ct. 1163 (2021) 24.
[6] Google LLC v. Oracle Am., Inc. 141 S. Ct. 1163 (2021) 28.
[7] Campbell v. Acuff-Rose Music, Inc. | 510 U.S. 569 (1994) 587.
[8] Authors Guild v. Google, Inc., No. 13-4829 (2d Cir. 2015) 29, 30.
[9] Authors Guild v. Google, Inc., No. 13-4829 (2d Cir. 2015) 31.
[10] Campbell v. Acuff-Rose Music, Inc. | 510 U.S. 569 (1994) 590.
[11] Google LLC v. Oracle Am., Inc. 141 S. Ct. 1163 (2021) 30, 31.
[12] Google LLC v. Oracle Am., Inc. 141 S. Ct. 1163 (2021) 31.
[13] Campbell v. Acuff-Rose Music, Inc. | 510 U.S. 569 (1994) 577.
[14] Google LLC v. Oracle Am., Inc. 141 S. Ct. 1163 (2021) 14.
[15] Harper & Row v. Nation Enterprises | 471 U.S. 539 (1985); Authors Guild v. Google, Inc., No. 13-4829 (2d Cir. 2015) 16.
[16] Authors Guild v. Google, Inc., No. 13-4829 (2d Cir. 2015) 16, 41.
[17] Thomson Reuters enter. Ctr. GMBH v. Ross Intel. Inc., No. 1:20-cv-613-SB, 2025 WL 458520 (D. Del. Feb. 11, 2025) p. 2, 3.
[18] Ibid.
[19] Thomson Reuters enter. Ctr. GMBH v. Ross Intel. Inc., No. 1:20-cv-613-SB, 2025 WL 458520 (D. Del. Feb. 11, 2025) p. 4, 8.
[20] Thomson Reuters enter. Ctr. GMBH v. Ross Intel. Inc., No. 1:20-cv-613-SB, 2025 WL 458520 (D. Del. Feb. 11, 2025) p. 9, 14.
[21] Thomson Reuters enter. Ctr. GMBH v. Ross Intel. Inc., No. 1:20-cv-613-SB, 2025 WL 458520 (D. Del. Feb. 11, 2025)P.16, 20.
[22] Thomson Reuters enter. Ctr. GMBH v. Ross Intel. Inc., No. 1:20-cv-613-SB, 2025 WL 458520 (D. Del. Feb. 11, 2025) p. 20.
[23] Thomson Reuters enter. Ctr. GMBH v. Ross Intel. Inc., No. 1:20-cv-613-SB, 2025 WL 458520 (D. Del. Feb. 11, 2025) p. 20, 21.
[24] Thomson Reuters enter. Ctr. GMBH v. Ross Intel. Inc., No. 1:20-cv-613-SB, 2025 WL 458520 (D. Del. Feb. 11, 2025) p. 21, 22.
[25] Thomson Reuters enter. Ctr. GMBH v. Ross Intel. Inc., No. 1:20-cv-613-SB, 2025 WL 458520 (D. Del. Feb. 11, 2025) p. 16.
[26] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024)P. 1, 9.
[27]Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) P. 9, 11.
[28] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 7.
[29] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 11, 14.
[30] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 24.
[31] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 25, 26.
[32] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 28, 29.
[33] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 30.
[34] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 14, 18.
[35] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 18, 25.
[36] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 25, 26.
[37] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 26, 27.
[38] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 29.
[39] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 29, 30.
[40] Bartz et al. v. Anthropic PBC, No. 3:24-cv-05417 (N.D. Cal. Aug 19, 2024) p. 30, 31.
[41] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025) p. 8, 15.
[42] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025) p. 3.
[43] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025) p. 3, 4.
[44] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025) p. 5.
[45] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025) p. 15, 22.
[46] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025) p. 23, 24.
[47] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025) p. 24, 25.
[48] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025) p. 26, 27.
[49] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025) p. 27, 28.
[50] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025) p. 28, 35.
[51] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025) p. 35, 37.
[52] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417-AMO (N.D. Cal. June 25, 2025) p. 37, 39.
