A new project on explainable artificial intelligence

Rédigé par Romain Pialat

31 July 2024

Artificial intelligence systems, and algorithms in general, are sometimes opaque, and their results can be complex to interpret. A young but prolific field of research is dedicated to their explicability. The aim of this project is to understand how this research is structured, based on mathematical analyses of the techniques used, crossed with quantitative and qualitative elements from the social sciences. As part of this study, the CNIL will use a database of scientific publications relating to the explicability of artificial intelligence, obtained via a search engine specialized in scientific literature.

What is the objective of this study?

Explainable AI, or simply xAI, is a scientific field developing methods and techniques for explaining the information, predictions or decisions generated by artificial intelligence systems. This explanation is necessary when using these systems in critical contexts (medicine, military, transport, ...). Since 2016 and DARPA's launch of the Explainable AI Program, there has been a sudden and massive appearance of scientific publications containing the term Explainable AI.

This discipline, which is still largely associated with computer science, does not enjoy a consensus, either in terms of the techniques used, the objective of the explanation, or what constitutes an explanation. This lack of consensus seems to be little or non-existent within the xAI field. The aim of this study is therefore to gain a better understanding of the underlying issues regulating the Explainable AI scene.

What data for what uses?

In order to draw up a typology of xAI techniques, and so as not to produce a study that is too quickly obsolete from a technical point of view, given the rapid evolution of this field of research, we are interested here in the social principles and mechanisms behind the organization and production of techniques. To this end, we aim to understand and identify regularities in the institutional, academic and social positions of xAI players.

Semantic Scholar

We therefore collected a large dataset of around 16,000 topic-specific publications in this field, using the SemanticScholar search engine. This dataset includes the titles of the papers, the names of the authors, and other features inherent to a publication, such as the year of publication, the journal or conference in which it appeared, the number of citations, and so on.

We will therefore process all these data, as well as data relating to the professional lives of the people in our database, and publicly available on the Internet, such as:

The academic position,
The home university,
Previous publications or fields of research.

We will be carrying out similar studies in other, less recent fields of research, so as to have control databases with which to compare our results. For the moment, only one field of research is concerned, that of fairness in artificial intelligence.

OpenAlex

In addition, we also use the OpenAlex database to cross-check our sources. This platform applies an unsupervised artificial intelligence algorithm to automatically classify research papers by subject. In this way, we collect two groups of papers: those related to the xAI subgroup and those from the broader AI group.

The dataset of papers on AI in general allows us to visualize how the xAI community is positioned in relation to the broader AI field, and to describe the relationships between the two.

The personal data processed in these groups are the same as those from the Semantic Scholar dataset.

Compliance of the websites

In line with the CNIL’s recommendations, we verified how these websites use scraping tools and their declarations regarding the sources they rely on.

Semantic Scholar builds its database in partnership with various universities, research institutes, and scientific journals, as well as through harvesting from academic websites or scientific conference pages. The Semantic Scholar crawler identifies itself when visiting a website, thereby respecting the protocols set out in “robots.txt” files.

OpenAlex uses the Microsoft Academic Graph (MAG) and harvests papers from registries such as CrossRef or HAL. Each paper is linked to a source among the 240,000 different sources listed on the website, which makes it easy to trace where a reference was obtained. A list of the main sources used can be found here.

How are people's rights respected?

The data processed during this project are obtained in the following ways :

From OpenAlex by downloading the “Topic” Explainable Artificial Intelligence (~18 500 papers)
From OpenAlex by downloading the “SubField” Artificial Intelligence (~3 500 000 papers)
From OpenAlex and Semantic Scholar by performing the following API query :

fields = "paperId,corpusId,url,title,venue,publicationVenue,year,authors,externalIds,abstract,referenceCount, citationCount,influentialCitationCount,isOpenAccess,openAccessPdf,fieldsOfStudy,s2FieldsOfStudy,publicationTypes, publicationDate,journal,citationStyles"

url=“http://api.semanticscholar.org/graph/v1/paper/search/bulk?query={query}&fields={fields}&year=1970-"

The « fields » option is only necessary on SemanticScholar.

Since API queries on these websites are blind to stop words (such as “of” or “or”), a significant portion of the articles retrieved are unrelated to our study. We therefore carried out a second processing step using regex in order to keep only the papers containing the exact words from our query.

You have the right to access and obtain a copy of your data, to object to its processing, and to have it corrected or deleted. You also have the right to restrict the processing of your data.

To do so, you can contact the CNIL's Digital Innovation Laboratory ([email protected] ) or the CNIL's Data Protection Officer (DPO) for any request to exercise your rights regarding this processing. The DPO's contact details are at the bottom of the page.

If, after contacting us, you feel that your "Data Protection" rights have not been respected, you may submit a complaint to your local Data Protection Authority.

How is this project managed?

This project falls within the scope of the public interest mission entrusted to the CNIL under the General Data Protection Regulation and the amended French Data Protection Act. It is part of the CNIL's mission to provide information, as defined in article 8.I.1 of the French Data Protection Act, as well as its mission to monitor developments in information technology, as defined in article 8.I.4.

Only members of the CNIL's Digital Innovation Laboratory (LINC) and Artificial Intelligence Department (SIA), in charge of this study, will have access to the personal data collected and processed as part of the experiment.

How long will the study last?

This project will end in December 2026. At the end of the project, the processed data will be deleted. It will be covered by several publications on the LINC website.

Last updated on September 17, 2025