Workshop on the implications of GDPR in scientific research

Rédigé par Mehdi Arfaoui et Vincent Toubiana

 - 

25 August 2025


As with previous editions, the day after Privacy Research Day is an opportunity for us to present the CNIL's activities and to exchange with the conference speakers during a series workshops. The objective of one of these workshops was to address the issue of the protection of personal data within research projects themselves. The exchange led to a shared diagnosis of the difficulties encountered in the field and promising avenues for collaboration to address them.

The CNIL has long supported researchers, and helped them articulate their work with strong guarantees for the rights of individuals. Several resources, such as the dedicated factsheets on scientific research (outside the health sector) or the recommendations on the reuse of publicly accessible data on the internet, have, for example, already been produced to help researchers comply with the GDPR and the French Data Protection Act.

This workshop was held as a continuation of this approach. The objective was to go further by confronting these principles with the reality on the ground, in order to better understand the concrete challenges faced by researchers and to identify with them the most relevant tools to develop for the future. It should also be noted that we only dealt with public research during this workshop. The challenges of private, or public-private research may be addressed in another workshop.

The first issue raised was the difficulty for researchers to clearly identify their intermediaries in ensuring the compliance of research protocols, with responsibility sometimes dispersed between the institution, the laboratory, the research team, and even the committees of academic journals. Moreover, when they are identified, crucial intermediaries such as Data Protection Officers (DPOs) and ethics committees face major obstacles: lack of time and sometimes training to properly inform teams, work overload, or even contradictory incentives that hinder their ability to effectively support projects. And, while they could have been considered an interesting lever to encourage compliance, academic publication venues (conferences, journals) often do not have the means to ensure this role today.

A second issue concerned the lack of concrete benchmarks and understanding of the framework that should be applied by researchers. The absence of standards and harmonized processes poses a particular challenge for research teams exceeding a certain number of partners. The application of certain GDPR exceptions related to research thus constitutes a significant "grey area." Researchers struggle to determine which parts of their work can benefit from them, which creates great uncertainty. Similarly, while most public research will be covered by the legal basis of the public interest mission, researchers consider relying on the basis of legitimate interest, despite the constraints and limitations that this legal basis may entail.

Researchers also have difficulties identifying under what constraints they can reuse data for new purposes. This misunderstanding is exacerbated by the growing (and sometimes paradoxical) pressure for open data and data sharing, a necessary practice, particularly for the proper evaluation of the reproducibility of research results. This confusion extends to the complex issues of archiving and deleting research data. In practice, only anonymized data can be made "open data". In other cases, a clear framework must be established: retention period, limited access, etc. (see box).

Opening and reuse of data

The CNIL had already pointed out that, without questioning the need to guarantee the free circulation of knowledge and open access to scientific publications, data must be anonymized beforehand to be disseminated and not just pseudonymized.

Similarly, concerning the retention of personal data for the purpose of reproducibility of results, the CNIL insists that pseudonymization be as strong as possible and that access to the data be limited to researchers who wish to use it and who would be subject to strict confidentiality conditions. It recalls that in the case of dissemination of pseudonymized data, it is essential to set a retention period, as this data cannot remain available for an unlimited period.

It was also noted that the "technical" approach of the GDPR, often designed for hypothetico-deductive methodologies, is not suited to inductive approaches like ethnographic studies. These often involve collecting data without knowing the precise purpose in advance or without requesting initial consent that could bias the study. In this regard, a clarification of the framework on the concept and collection of consent seems necessary (see box).

About consent

A distinction should be made between "ethical" consent, related to the researcher's code of conduct, and consent in the sense of the GDPR, which is one of the six possible legal bases for justifying the processing of personal data. When consent is the legal basis for the research, the person being interviewed must also give specific consent for the processing of their personal data. Using two separate boxes in the consent form, one to collect GDPR consent and the other for "ethical" consent, would be an excellent practice to ensure clarity and to prove that the two consents were collected separately.

In practice, for scientific research conducted by public bodies, data processing can very often be based on the performance of a task carried out in the public interest (Article 6(1)(e) of the GDPR), rather than on the legal basis of consent. However, even when the legal basis for the data processing is the public interest mission (and not GDPR consent), the researcher must inform the individual and, depending on the type of research, obtain either their written consent or their non-opposition.

Finally, the workshop highlighted the sometimes blurry line between legal constraints and ethical considerations. The CNIL's factsheets, although based on law (the GDPR), can be perceived by the scientific community as ethical recommendations. This also raises questions about how the GDPR interacts with fundamental rights like academic freedom, which is sometimes invoked by research projects.

Paths toward concrete solutions

Far from stopping at this observation, the workshop was above all an opportunity to propose practical solutions centered on the needs of researchers. Several avenues for collaboration between the CNIL and the academic community were outlined:

  • A decision tree for research projects: The CNIL could develop a tool to guide researchers in identifying their situation, understanding their legal obligations, and determining the actions to take. This tool would need to recognize the existence of "gray areas" and not suggest that projects outside its scope are necessarily non-compliant.
  • Doctoral training : Training organized by doctoral schools could be an opportunity to raise awareness among young researchers about security and personal data protection issues.
  • A checklist for information and consent: Providing a checklist to improve the quality of information given to individuals whose data is used would be a valuable asset in ensuring their rights are respected.
  • Contextualized use cases: Developing a series of practical examples showing how the principles apply in various research scenarios. The idea of calling on the community (crowdsourcing) to feed this case base was mentioned.
  • Feedback interviews: Conducting interviews with researchers who have found effective solutions would allow for the sharing of concrete best practices.
  • -
  • Self-assessment tools for privacy protection: Equipping researchers with tools that allow them to assess the level of protection of their projects themselves would help them identify and reduce risks upstream.
  • Strengthening intermediaries: Similarly, providing DPOs and ethics committees with precise information and clear guidelines would significantly improve their ability to support research teams.

The « Projet mentions » tool

Without formally anonymizing the data, the tool by sociologist Baptiste Coulmont allows users to find first names with similar sociological characteristics based on baccalaureate results. It thus makes it possible to pseudonymize data without having too much of an impact on the sociological coherence of the survey.

This workshop has therefore confirmed the need for a sustained dialogue between the regulator and the research field. Building on these lessons, the CNIL is committed to continuing this collaboration to transform these ideas into concrete and useful tools for the entire scientific community.


Article rédigé par Mehdi Arfaoui et Vincent Toubiana