Legitimate interest as a legal basis for processing personal data when developing and deploying artificial intelligence
GENERAL DATA PROTECTION REGULATION
11/8/202525 min read
It is well known that the machine learning (ML) models powering multiple Artificial Intelligence (AI) systems require vast amounts of data during training, much of which is personal. Furthermore, personal data is processed when AI systems are deployed, for example, during interactions with users. According to Art. 6 of the General Data Protection Regulation (GDPR), processing is unlawful if the data controller does not have a legal basis for it. Under Art. 85(3) GDPR, this could result in administrative fines of up to EUR 20,000,000 or, in the case of an undertaking, up to 4% of the total worldwide annual turnover of the preceding financial year, whichever is higher.
All bases contemplated in Art. 6 GDPR are on an equal footing. However, since the GDPR came into force, consent has been the most popular, as it seems to give data subjects more decision-making power and control over their personal data. When personal data is processed to train a ML model, it can be difficult to rely on consent due to the huge quantity of data required and the fact that data is not always collected directly from the data subjects. For this reason, the “legitimate interest” is beginning to take on greater prominence. According to Art. 6(1)(f) GDPR, processing is lawful if it is “necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child”. This post will explore how this provision is evaluated in the context of AI development and deployment. The first step to doing so is identifying how the purpose of data processing is defined in this field, since, according to Art. 5(1)(d) GDPR, personal data must be “collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes”.
Purpose of the processing
In general, two phases can be clearly distinguished: the development and deployment of AI systems. The European Data Protection Board (EDPB) states that the former covers “all stages before any deployment of the AI model, and includes, inter alia, code development, collection of training personal data, pre-processing, and training”.[1] The second, in turn, encompasses “all stages relating to the use of an AI model and may include any operation conducted after the development phase”.[2] Thus, different data processing operations may be carried out by the same or different data controllers in each phase, and these operations may serve the same or different purposes. This must be evaluated on a case-by-case basis.[3] It must be clear that developing an AI system for its own sake is not a valid purpose. The development of an AI system must be for specific purposes, such as fraud detection in the banking sector. Furthermore, when deployed, an AI system is an element of the data processing and will be included in one or more phases.[4] Sometimes it is possible to determine the operational purpose of the AI system's deployment from the development phase. Nonetheless, this can be challenging, particularly in the case of “general-purpose AI models”. Still, the data controller must provide context and information about the processing, such as the model's type, expected functionalities and capabilities, intended use (internal or external) and purpose (research or commercial).[5] In this regard, the French Data Protection Authority (Commission Nationale de l'Informatique et des Libertés, CNIL) indicates that a lower degree of precision is acceptable when the purpose is to develop an AI system for scientific research.[6]
Legitimate interest
As indicated above, the data controller may invoke Art. 6(1)(f) GDPR to legitimise the processing of personal data. This provision must be interpreted restrictively and requires data controllers to be accountable and committed to privacy. They must conduct a thorough analysis of the risks involved in processing personal data and, where appropriate, implement measures that go beyond those set out in the GDPR.[7]
Three requirements must be met cumulatively under Art. 6(1)(f) GDPR to serve as a legitimate legal basis for processing personal data:
The controller or a third party must pursue a legitimate interest. The interest is defined as “the broader stake or benefit that a controller or third party may have in engaging in a specific processing activity”.[8] Then, it must be related to the controller's actual activities. For the interest to be legitimate, it must, in turn, cumulatively be:
o Lawful.
o Clearly and precisely articulated.
o Real, present, and not speculative at the time of data processing.[9]
Interests that can be considered legitimate can be diverse and of different kinds, even commercial. Examples include improving a product, direct marketing, accessing online information, and detecting and preventing fraud.[10]
The processing of personal data must be necessary for the legitimate interest(s) pursued. “Necessary” means that the legitimate interest “cannot reasonably be achieved just as effectively by other means less restrictive of the fundamental rights and freedoms of the data subjects”.[11] This requirement must be examined alongside the data minimisation principle of Art. 5(1)(c)GDPR, which states that personal data shall be “adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed”.[12]
The interests, fundamental freedoms and rights of the data subjects concerned must not take precedence over the legitimate interest(s) of the controller or a third party. This means that, before processing personal data, controllers need to carry out a balancing exercise [13] To do this, they must identify and describe:
o The interests of the data subjects (including financial, social and personal), as well as their fundamental rights and freedoms (such as the right to data protection and privacy, liberty and security, freedom of expression and information, physical and mental integrity, and the right not to be subjected to discrimination).[14]
o The impact of processing on data subjects, both positive and negative. The greater the anticipated benefits for not only the controller but also society, the more likely it is that the controller's legitimate interest will prevail.[15] The following factors should be assessed:
The nature of the data to be processed, including whether special categories of data (Art. 9 GDPR), data relating to criminal convictions and offences (Art. 10 GDPR), and data considered more private by data subjects are involved.[16]
The context of the processing and the processing methods. The controller must take into account the following, among other things:
· The scale of the processing and the amount of personal data to be processed.
· The status of the controller, including in relation to the data subject.
· Whether the personal data is combined with other datasets.
· The degree to which the data to be processed is accessible and/or publicly available.
· The status of the data subjects, e.g. whether they belong to vulnerable groups or are children.[17]
Any further consequences of the processing on the data subjects, such as:
· Potential future decisions or actions by third parties.
· Possible legal effects.
· Exclusion or discrimination.
· Defamation, risk of damaging reputation, negotiating power or autonomy.
· Financial losses.
· Exclusion from services.
· Risks to freedom, safety, physical and mental integrity.[18]
In this context, it must be analysed how likely these consequences are to materialise, the specific circumstances of the processing and the technical and organisational measures in place.[19]
The reasonable expectations of the data subjects. This requires an assessment of the characteristics of the relationship between the data subjects and the controller, as well as the characteristics of the average data subject. A direct link may allow the controller to easily provide data subjects with information on processing activities.[20]
o The final balancing of opposing rights and interests, including the possibility of implementing further mitigating measures.[21]
Legitimate interest in the development and deployment of artificial intelligence systems
This section explores how the aforementioned criteria could be evaluated within the context of developing and deploying AI systems. It should be emphasised that the analysis presented here is merely indicative. Therefore, it includes aspects that are not relevant to some processing operations and most likely excludes others that are necessary in certain cases. The necessary criteria depend on the type of AI system being developed and used. This analysis is flexible and should be adapted to the specific case.
The controller or a third party must have a legitimate interest. In this context, legitimate interests could entail developing a conversational agent service to assist users, an AI system to detect fraudulent content or behaviour, improving threat detection in an information system, or enhancing new functionalities for service users.[22] Legislation such as the Digital Services Act (DSA)[23] and the AI Act[24] must be consulted, as well as sector-specific legislation, to avoid developing and deploying AI systems that contradict the prohibitions imposed therein.[25] There may also be ongoing legitimate data processing activities to which an AI system is subsequently incorporated. While the purpose of the processing does not change, the risks posed by integrating the system must be analysed, and a new rights balance must be conducted, as we will see below.[26]
The processing of personal data must be necessary for the legitimate interest(s) being pursued. While data controllers should explore technological solutions that allow for the development of models with less personal data and the use of synthetic or anonymised data, this is not always feasible. Therefore, processing personal data may still be crucial to achieving the project's goals successfully. This could entail mitigating potential biases and errors that could lead to undesirable consequences during deployment, for example.[27] In any case, it should be borne in mind that accumulating data does not necessarily improve its quality or guarantee its fitness for purpose. To this end, the data must contain relevant patterns. Likewise, the data must be representative of the population targeted by the AI system in question. Therefore, the criteria to be followed in the collection process must be established before collection begins. Furthermore, the amount of data required for each phase of the AI system's life cycle may vary.[28] Also, as previously mentioned, when addressing a “known problem” that has not previously been tackled with AI, the use of AI and the change in approach must be justified. An explanation must also be provided as to why AI is preferable to other possible technologies.[29] Similarly, when addressing a “new problem”, the reason for using an AI system must be provided.[30] Moreover, although legal debates sometimes only refer to AI or ML, it is important to emphasise that many approaches exist. Consequently, the rationale behind selecting a specific technique must be substantiated.[31]
The interests or fundamental freedoms and rights of the concerned data subjects do not take precedence over the legitimate interest(s) of the controller of a third party. The development and deployment of AI systems can result in positive outcomes for data subjects. These include better access to essential services, education and information, improved healthcare and personalised treatment, and support in complying with various legal requirements, such as the detection of illegal content online. Economic benefits may also be realised when AI is employed in economic activities to generate revenue.[32] However, there may also be negative consequences, so the likelihood and severity of these must be properly assessed. Such an assessment is influenced by the type of processing and the AI system involved. In the development phase, the following risks must be considered when collecting data: the risk of users self-censoring due to a feeling of surveillance; the risk of loss of confidentiality of personal data in training datasets and trained models, and possible misuse in the event of a data breach (e.g. due to an attack); the risk of a lack of transparency towards data subjects owing to complicated technology and the “black box” nature of some AI systems, and; technical and organisational difficulties in guaranteeing the exercise of certain rights provided for in Chapter 3 GDPR (e.g. because of a long data or model chain or complexity in making trained models forget specific data).[33] In the development phase, some risks to bear in mind are those relating to the memorisation, regurgitation or generation of personal data. These could result in reputational damage, the dissemination of sensitive or false information, identity theft, and disinformation. Furthermore, the potential security risks that data subjects could face if AI systems are used maliciously or if biases in the training dataset are not adequately identified and lead to discrimination or the amplification of inappropriate recommendations must be determined.[34]
As mentioned above, if special categories of data are processed under Arts. 9 and 10 GDPR, as well as other data revealing highly private information, such as financial or location data, the EDPB considers that there may be a serious impact on data subjects.[35] Art. 9(1) GDPR prohibits processing personal data “revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation”. Art. 9(2) GDPR sets out several exceptions. Only when the prohibition of Art. 9(1) GDPR has been lifted, is it possible to carry out an analysis to determine whether there is a legal basis according to Art. 6 GDPR that legitimizes the processing. The CJEU has indicated that, where data is collected en bloc without it being possible to separate the data items from each other at the time of collection, this regime is applicable to the dataset that “contains at least one sensitive data item”.[36] In addition, when processing personal data, it must be determined whether the data in question allows information falling within one of the aforementioned categories to be revealed. This is independent of whether the revealed information is correct, and of whether the controller intends to obtain information that falls within one of the special categories.[37] Thus, a broad and protective interpretation has been adopted. Given the vast quantities of data processed when collecting and training AI systems, it is highly likely that sensitive data, or data from which sensitive information can be inferred, will be involved. Nevertheless, it should be noted that this is not usually the intention of the processing; rather, such data is often collected or inferred incidentally and residually. While it is true that the question of whether the intention of the processing should be factored in when applying Art. 9 GDPR is highly debated, in my humble opinion, this should be the approach instead of applying Art. 9(1) GDPR literally. As we will see below, this does not mean that measures should not be taken to prevent the collection of data of special categories when its processing is unnecessary.[38]
Next, the case of GC and Others v. CNIL must be examined in this context. When deploying certain AI systems, including those powered by LLMs, it is possible that data from special categories will once again be processed upon user prompts. This may occur either incidentally or intentionally; for instance, users may wish to obtain information about public figures. The aforementioned case concerns the processing of personal data of special categories by search engine operators. To the best of my knowledge, the CJEU has not yet analysed the situation under discussion. Nevertheless, some of the principles applicable to search engine activity could be invoked. According to the CJEU, the processing prohibitions provided in Arts. 9(1) and 10 GDPR apply to search engine operators in the same way as to any other controller. The specific features of their processing activities cannot exempt them from compliance. However, these features may affect the extent of the operators' responsibility and obligations under these provisions. Taking into account the responsibilities, powers and capabilities of search engine operators, the prohibitions of Arts. 9(1) and 10 GDPR can only apply to search engine operators “by reason of that referencing and thus via a verification, under the supervision of the competent national authorities, on the basis of a request by the data subject”.[39] The CJEU then stated that search engine operators must comply with requests from data subjects to remove links to webpages containing personal data relating to special categories, subject to the exceptions provided for in Art. 9(2) GDPR.[40] Search engine operators can thus refuse to remove content reflecting sensitive personal data if, inter alia, the data subjects have given their consent (which, at this stage, is difficult to obtain), if these data subjects have made their data manifestly public or if “processing is necessary for reasons of substantial public interest, on the basis of Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject”. Regarding whether data subjects have made their data manifestly public, the CJEU has emphasised the importance of establishing “whether the data subject had intended, explicitly and by a clear affirmative action, to make the personal data in question accessible to the general public”.[41] As for whether processing is necessary for reasons of substantial public interest, the CJEU has ruled that search engine operators must evaluate on a case-by-case basis whether processing personal data relating to special categories is “strictly necessary for protecting the freedom of information of internet users potentially interested in accessing that webpage by means of such a search, protected by Art. 11 of the Charter of Fundamental Rights”.[42] In striking this balance, consideration must be given to the nature of the information in question, its sensitivity to the data subject's private life, and the public interest in the information. This varies depending on the data subject's role in public life.[43] The same could apply to LLM operators when the outputs reflect sensitive information about public figures.[44]
Lastly, it should be noted that Art. 10 AI Act states that “to the extent that it is strictly necessary for the purpose of ensuring bias detection and correction in relation to the high-risk AI systems in accordance with paragraph (2), points (f) and (g) of this Article, the providers of such systems may exceptionally process special categories of personal data, subject to appropriate safeguards for the fundamental rights and freedoms of natural persons”, provided that certain conditions set out in the same provision are met.
In terms of the context of processing and the methods employed, the scale and scope of the data to be processed, the number of data subjects affected and their relationship with the controller, the nature of the model, and the intended operational uses all play a key role.[45]
With regard to any further consequences of the processing on the data subjects, it is important to adopt measures to avoid certain potential uses of models, especially when they can be deployed for various operational purposes, as we will see below.[46]
Concerning the reasonable expectations of the data subject, two situations must be differentiated: when the data is collected directly from the data subject and when it is collected from third parties. In the former case, the relationship between the controller and the data subject, the nature of the service and the context of the processing must be observed. As illustrated by the CNIL, users of an online coaching service do not expect the company to use their interactions with coaches to improve a conversational agent-powering model, given the expectation of confidentiality and the sensitive nature of the information shared.[47] When data is collected via web scraping, important aspects include whether the data subjects are aware that their personal data is online, the type of data published, the type of website from which the data is extracted, and its settings.[48] Additionally, the steps taken by the controller to inform data subjects about how their data are being processed must be evaluated, as well as the complex technology at hand and its potential operative uses, capabilities and limitations.[49] Finally, when the AI system improves through interaction with users, it must be ascertained whether they are aware that they are providing their personal data for this purpose, as well as whether this improvement occurs only for interaction with the user in question or more generally. [50]
When it comes to mitigating measures, several types can be adopted at different stages. At the data collection stage, anonymisation is highly recommended. If this is not possible or appropriate for the type of processing, then pseudonymisation should be used instead.[51] It would also benefit the data controller to provide users with information on the selection and processing criteria for datasets, even beyond what is required by Arts. 13 and 14 GDPR.[52] Similarly, it is essential to facilitate the exercise of data subjects' rights. Particular attention has been drawn to the requirement to guarantee data subjects an unconditional and user-friendly opt-out from processing, which goes beyond the right to object set out in Art. 21 GDPR. Data subjects should also be given a reasonable amount of time to make a considered decision.[53] As previously mentioned, web scraping can be a fairly invasive form of processing due to its scale. Therefore, to mitigate the risks associated with this technique, limits must be imposed on data collection, and the criteria for it must be clearly defined. In this regard, categories of sensitive data, as well as data relating to children and vulnerable individuals, could be excluded from collection. The same applies to data whose processing would pose a high risk to data subjects. Special care should be taken to respect any objections made on websites regarding the processing of data using this technique or for specific purposes.[54]
Subsequently, during the processing stage, it should be noted that, while training models with “traditional” encryption techniques can be challenging, techniques such as homomorphic encryption are being developed for this purpose.[55] The data controller should also consider other privacy-preserving technologies, such as “secure multiparty computation”, “privacy-preserving XGBoost algorithms”, and “federated learning”. Furthermore, measures should be taken to prevent personal data from being stored during the training process that could affect the privacy of data subjects at the deployment stage.[56] To this end, it would be best to ensure that the trained model is “anonymous”. Nonetheless, the standard set by the EDPB is high, and the concept of “anonymous” data is controversial.[57] Once the model has been trained, measures must be adopted to safeguard it against potential attacks and prevent the extraction and regurgitation of personal data.[58] The latter is particularly relevant when deploying generative AI systems. Providing a channel through which data subjects can report incidents could facilitate this task.[59] Implementing filters currently seems to be the most widespread solution to these issues. However, adopting unlearning and retraining techniques would better guarantee the rights of data subjects. It should be noted, though, that these techniques are still under development, and re-training an AI system every time a data subject exercises the right to erasure under Art. 17 GDPR seems disproportionate. Periodic retraining that incorporates a range of requests seems much more feasible. When the model is shared, measures must be taken to ensure that the rights of data subjects are respected throughout the chain, both technically and contractually. Therefore, traceability of both the model's users and its use becomes paramount.[60] To prevent misinformation and the malicious or unexpected deployment of generative AI from the perspective of data subjects, technical and contractual measures should also be employed to restrict certain uses of the model and to digitally watermark its outputs.[61] Furthermore, to enhance transparency in AI system deployment, the existence and auditability of log or activity files must be guaranteed, alongside the provision of information on risks and measures taken, as long as this does not compromise the confidentiality of data controllers.[62]
Practical examples: a successful and unsuccessful one.
Having examined the factors to consider in the analysis of the legitimate interest, it is time to present a case in which the company developing an AI system could not rely on it, and another in which it could.
The first case is that of Clearview AI. Several data protection authorities fined the company for creating a facial recognition search engine that mined a database for images without a legal basis. This database was created using web scraping techniques to collect images from publicly available websites. The images were processed using biometric techniques to extract identifying features and then transformed into vectors. These representations were then hashed for database indexing and subsequent searching. Biometric templates were subsequently created and compared to those found in the search phase, creating a one-to-many verification process. Each image was also annotated with associated metadata. When the software identified a match, it extracted all related images from the database and presented them to the service client as a search result. The images remain in the database even if the original photo or reference website is subsequently deleted or made private. Clearview's ML-based biometric search service was intended for specific customer groups, such as police forces. Clearview AI had an economic interest in the aforementioned processing, which did not outweigh the interests, fundamental rights and freedoms of the data subjects. The processing was especially intrusive as multiple personal data, including biometric data, were collected from each individual and from a large number of individuals, including minors, and from which various aspects of their private lives could be inferred. Furthermore, data subjects could not reasonably expect this processing to occur, even if the data was publicly accessible. There was no relationship between Clearview AI and the data subjects, many of whom were not even aware of its existence or that its services could be contracted by law enforcement agencies. For all these reasons, Clearview's legitimate interest could not serve as a legal basis.[63]
On the contrary, the Higher Regional Court of Cologne ruled that Meta had a legitimate interest in processing the personal data published by Facebook and Instagram users for the development and improvement of AI systems. This has been a significant case, so I will discuss it in more detail. The facts are as follows: On 10 June 2024, Meta announced in a press release that, from 26 June 2024, it would use both the data publicly posted by users over the age of 18 on Facebook and Instagram and the users' interactions with its AI model to train and improve it. In March 2024, Meta informed the Irish data protection authority of its intention. Both this authority and a German non-profit organisation, among others, expressed concerns. Then, Meta announced that it would postpone its plans. Following several exchanges with the supervisory authority, Meta announced in a press release on 14 April 2025 that it would begin its operations on 27 May 2025. The Irish data protection authority did not prohibit the aforementioned processing, but rather tasked Meta with reporting in October 2025 on the effectiveness and suitability of the mitigation measures taken. Meanwhile, the aforementioned German non-profit association sued Meta, requesting an injunction against the processing of personal data published by users on Facebook and Instagram on 12 May 2025. The association argued that with the aforementioned operations, Meta infringed Art. 5(2)(b) DMA, Art. 6(1)(f) GDPR, and Art. 9(1) GDPR.[64]
Let's start with the first claim. As an introduction to the DMA, it is worth noting that some markets operated by digital platforms are characterised by strong network effects, economies of scale and scope, consumer lock-in, a lack of multi-homing and data-driven advantages. High barriers to entry, alongside an array of exploitative commercial practices, have made it challenging for existing or new market entrants to compete with incumbents. Neither market forces nor competition law have been sufficiently effective in ensuring broad competition in these markets. Consequently, the EU legislator adopted the DMA. This ex-ante regulation complements competition law enforcement and aims “to contribute to the proper functioning of the internal market by laying down harmonised rules ensuring for all businesses contestable and fair markets in the digital sector across the Union where gatekeepers are present, to the benefit of business users and end users”. Thus, the EU legislator has opted to intervene only in digital markets where contestability is weak due to the presence of "gatekeepers". Gatekeepers are undertakings that meet the quantitative requirements of Art. 3 DMA, and that provide certain digital services referred to as 'core platform services' (CPS) in Art. 2(2) DMA, to users established or located in the EU. Meta has been designated as a gatekeeper, with Facebook and Instagram being classified as CPS. To ensure fair and open digital markets, the DMA imposes several obligations on gatekeepers in Arts. 5, 6 and 7. Some of these obligations are aimed at prohibiting gatekeepers from raising barriers to entry or expansion, while others are aimed at lowering those barriers. Art. 5(2)(b) DMA prohibits gatekeepers from “combining personal data from the relevant core platform service with personal data from any further core platform services or from any other services provided by the gatekeeper or with personal data from third-party services”, unless the end user has been presented with the specific choice and has given consent. The issue here is whether combining partially de-identified and disaggregated data from Facebook and Instagram users to create a training dataset constitutes a combination prohibited by this provision. According to the Court, the answer is no, since the combination does not establish a specific connection between the data of the same person.[65] This is a topic that I would like to evaluate in more depth in another post.
Regarding Art. 6(1)(f) GDPR, the Court considers that Meta has a legitimate interest in “using the possibilities offered by generative AI to provide a conversational assistant that can, for example, provide real-time responses for chats, help with organising and planning holidays, and even help with writing texts”.[66] To this end, the AI will adapt to regional customs. Furthermore, it's intended to create content such as texts, images and audio.[67] Processing personal data is necessary to achieve this interest. Meta has demonstrated that there is no other reasonable, less intrusive alternative. Total anonymisation was not possible in this case. Using only the data that Meta collects from users when they interact with its AI is insufficient in terms of volume to train and improve it, which would worsen its results. The results obtained using synthetic data would also not be comparable. Furthermore, although the plaintiff argued that Meta should prove the necessity of each individual data point, the court rightly indicated that this would not be the case, given that training an AI requires large amounts of data, and individual data points have barely any quantifiable influence.[68]
Ultimately, Meta's interest in this case prevails over that of the data subjects. When examining the consequences of the aforementioned processing, the Court notes that the lack of explainability and transparency of large generative AI models can affect the data subjects' self-determination regarding their data. This is of particular importance given that training datasets consist of large amounts of personal data from a significant proportion of the German population, and the AI is available to an unpredictable number of users.[69] Furthermore, AI can, in certain cases, generate outputs that lead to the identification of some data subjects.[70] Moreover, the deletion of data once a model has been trained is limited, which affects the right to erasure under Art. 17 GDPR.[71] There are other possible violations of rights that could result from the deployment of the AI, but the Court notes that these must be evaluated separately and that it is unlikely that they will materialise to the extent of preventing the legitimate use of AI.[72] That said, the processing affects data that was already publicly accessible, and Meta has adopted de-identification measures, as well as adequate technical, physical and organisational measures to prevent unauthorised access to the training data and identify relevant security threats.[73] Regarding the reasonable expectations of the data subjects, the court considers that they could reasonably expect, at least since 10 June 2024, the aforementioned processing.[74] Even with regard to previously published data, data subjects had several options to avoid being included in the training dataset. The first is to make their publications private. The second is to use the right of opposition that Meta offers for this purpose.[75] Overall, the Court recognises that when content contains personal data of third parties or when institutional accounts publish sensitive personal data or data relating to minors, the right of opposition loses its effectiveness.[76] Nevertheless, taking into account the measures adopted by Meta and making a general assessment, the Court found that Meta could use the legitimate interest as the legal basis for processing personal data to train its AI.[77]
Regarding Art. 9(1) GDPR, the Court acknowledges that training datasets may contain personal data of special categories.[78] While Meta can invoke Art. 9(2)(e) GDPR for data that data subjects have made available about themselves in a public account, it cannot do so for data that data subjects have published about third parties.[79] Nonetheless, in light of the ruling in GC and Others v CNIL, as discussed above, the Court deems that Art. 9(1) GDPR does not preclude Meta's processing activities. For that to be the case, the third party concerned would have to request that Meta remove their data from the training dataset. Although the Court acknowledges that subsequent deletion of data is only possible to a limited extent, it also notes that Meta has demonstrated that data subjects do not face a significant risk from the processing of data that is already public.[80]
This ruling by the Higher Regional Court of Cologne is certainly novel and raises some interesting points in the ongoing debate. We eagerly await a CJEU analysis of the matter — let's hope it arrives soon! In the meantime, here are some recommended readings on the topic:
`Fünf Meinungsbeiträge: Die „Meta-KI-Entscheidung“ des OLG Köln – Ein Urteil und fünf Meinungen´ (2025) Zeitschrift für Datenschutz und Digitalisierung.
Gil González, Elena: El interés legítimo en el tratamiento de datos personales (Wolters Kluwer, 2022).
Moerel, Lokke / Storm, Marijn: `Using special categories of data for training LLMs: never allowed?´ (iapp, 28 August, 2024) <https://iapp.org/news/a/using-special-categories-of-data-for-training-llms-never-allowed->.
Paal, Boris P.: `KI-Training mit sensiblen Daten und Art. 9 DS-GVO´ (2025) Zeitschrift für Datenschutz und Digitalisierung 20, 27.
Silveira Baylao, Tainá: `Legitimate interest as a lawful basis for AI training: limits, safeguards, and governance´ (2025) <https://zenodo.org/records/17244762>.
Trigo Kramcsák, Pablo: `Can legitimate interest be an appropriate lawful basis for processing Artificial Intelligence training datasets?´ (2023) 48 Computer Law & Security Review 105765.
[1] EDPB, `Opinion 28/2024on certain data protection aspects related to the processing of personal data in the context of AI models´ 11.
[2] Ibid.
[3] At this point, it is worth mentioning that the AEPD distinguishes in more detail between the life cycle stages of an AI solution: training, validation and deployment, operation, decision and evolution, and withdrawal. The AEPD points out that each stage has a different purpose and that a different legal basis can be used for each of the following processing activities: the training and validation of the model; the use of third-party data in inference; the communication of data embedded in the model; the processing of data subjects' data within the framework of the AI-based service; and the processing of data subjects' data for model evolution. Agencia Española de Protección de Datos (AEPD), `Adecuación al RGPD de tratamientos que incorporan inteligencia Artificial. Una Introducción´ (2020) <https://www.aepd.es/guias/adecuacion-rgpd-ia.pdf> 20, 21.
[4] AEPD (n. 3) 6, 7.
[5] EDPB (n. 1) 20, 21.
[6] CNIL, `IA : Mobiliser la base légale de l’intérêt légitime pour développer un système d’IA´ (19 July, 2025) <https://www.cnil.fr/fr/base-legale-interet-legitime-developpement-systeme>.
[7] AEPD (n. 3) 22.
[8] EDPB, `Guidelines 1/2004 on processing of personal data based on Article 6(1)(f) GDPR, Version 1.0´, 7.
[9] EDPB (n. 9) 8, 9.
[10] EDPB (n. 9) 7, 8.
[11] EDPB (n. 9) 12.
[12] Ibid.
[13] EDPB (n. 9) 13.
[14] Ibid.
[15] EDPB (n. 9) 14.
[16] Ibid.
[17] EDPB (n. 9) 14, 15.
[18] EDPB (n. 9) 15, 16.
[19] Ibid.
[20] EDPB (n. 9) 16, 18.
[21] EDPB (n. 9) 18, 19.
[22] EDPB (n. 1) 21, 22; CNIL (n. 6).
[23] Regulation (EU) 2022/2065 of the European Parliament and of the Council of 19 October 2022 on a Single Market for Digital Services.
[24] Regulation (EU) 2024/1689 of the European Parliament and of the Council of 13 June 2024 laying down harmonised rules on artificial intelligence.
[25] CNIL (n. 6).
[26] AEPD, `Requisitos para Auditorías de Tratamientos que incliyan IA´ (2020) <https://www.aepd.es/guias/requisitos-auditorias-tratamientos-incluyan-ia.pdf> 17.
[27] EDPB (n. 1) 28, 29.
[28] AEPD (n. 26) 23.
[29] AEPD (n. 26) 17.
[30] Ibid.
[31] AEPD (n. 26) 15.
[32] EDPB (n. 1) 24; CNIL (n. 6).
[33] EDPB (n. 1) 24; CNIL (n. 6).
[34] EDPB (n. 1) 24, 25; CNIL (n. 6).
[35] EDPB (n. 1) 25.
[36] C-233/23, Meta Platforms v Bundeskartellamt [2023] ECLI:EU:C:2023:537, para. 89.
[37] C-233/23, Meta Platforms v Bundeskartellamt [2023] ECLI:EU:C:2023:537, para. 68, 69.
[38] See Tainá Silveira Baylao, `Legitimate interest as a lawful basis for AI training: limits, safeguards, and governance´ (2025) <https://zenodo.org/records/17244762> 17, 26.
[39] C-136/17, GC and Others v. CNIL [2019] ECLI:EU:C:2019:14, para. 48.
[40] C-136/17, GC and Others v. CNIL [2019] ECLI:EU:C:2019:14, para. 61.
[41] C-233/23, Meta Platforms v Bundeskartellamt [2023] ECLI:EU:C:2023:537, para. 77.
[42] C-136/17, GC and Others v. CNIL [2019] ECLI:EU:C:2019:14, para. 66.
[43] C-136/17, GC and Others v. CNIL [2019] ECLI:EU:C:2019:14, para. 53.
[44] For a deeper analysis, see Tainá Silveira Baylao, `Legitimate interest as a lawful basis for AI training: limits, safeguards, and governance´ (2025) <https://zenodo.org/records/17244762> 17, 26.
[45] EDPB (n. 1) 25, 26.
[46] EDPB (n. 1) 26; CNIL (n. 6).
[47] CNIL (n. 6).
[48] EDPB (n. 1) 27; CNIL, `La base légale de l’intérêt légitime : fiche focus sur les mesures à prendre en cas de collecte des données par moissonnage (web scraping)´ (19 June, 2025) <https://www.cnil.fr/fr/focus-interet-legitime-collecte-par-moissonnage>.
[49] EDPB (n. 1) 27.
[50] EDPB (n. 1) 27, 28.
[51] EDPB (n. 1) 28, 29; CNIL (n. 6).
[52] EDPB (n. 1) 29.
[53] EDPB (n. 1) 29; CNIL (n. 6).
[54] EDPB (n. 1) 29, 30; CNIL (n. 49).
[55] See Tanveer Khan, Khoa Nguyen and Antonis Michalas `Split Ways: Privacy-Preserving Training of Encrypted Data Using Split Learning´ (2023)<https://arxiv.org/abs/2301.08778>.
[56] EDPB (n. 1) 30; CNIL (n. 6).
[57] See EDPB (n. 1) 14, 19.
[58] AEPD (n. 26) 31.
[59] EDPB (n. 1).
[60] AEPD (n. 26).
[61] CNIL (n. 6).
[62] CNIL (n. 6); AEPD (n. 26) 30.
[63] See the resolution issued by the Italian Data Protection Authority <https://www.garanteprivacy.it/web/guest/home/docweb/-/docweb-display/docweb/9751362>; and the resolution issued by CNIL <https://www.cnil.fr/sites/default/files/atoms/files/decision_ndeg_med_2021-134.pdf>.
[64] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 4, 25.
[65] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 41, 48.
[66] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 58, 62 (own translation).
[67] Ibid.
[68] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 65, 73.
[69] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 79, 80.
[70] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 78.
[71] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 79.
[72] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 76.
[73] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 81, 83.
[74] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 94.
[75] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 84, 88.
[76] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 90.
[77] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 101, 104.
[78] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 106.
[79] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 111, 115.
[80] Beschluss vom 16.05.2000 - 2 Zs 1330/99, para. 116, 124.
