Explaining AI systems, a problem renewed by the success of deep learning algorithms [1/3]

Rédigé par Nicolas Berkouk, Mehdi Arfaoui et Romain Pialat

 - 

08 July 2025


As artificial intelligence systems become increasingly prevalent, the need to explain their decisions has become a major regulatory and political issue. This article traces the evolution of this challenge by contrasting symbolic AI, based on interpretable human logic, with deep learning, whose ‘black box’ operation makes the question of explicability even more complex.

When we talk about "artificial intelligence" (AI), we are often confronted with a myriad of concepts and terms which, when used by one player or another, admit different meaning. Here, we'll take as our starting point the OECD definition, which is also the one adopted by the European Parliament in the Artificial Intelligence Act (AI Act). This states:

"An artificial intelligence system is an automated system which, for a given set of human-defined objectives, is able to make predictions, recommendations, or decisions influencing real or virtual environments."

We can see that what characterizes artificial intelligence systems in the sense of this definition, unlike any automated system, is the ability, to some degree, to entrust these systems with a capacity for agency, or to trigger human actions on the basis of their results. This very general definition encompasses a wide variety of systems based on different algorithms. Examples include, but are not limited to: expert systems, decision trees, regressions (linear, logistic, etc.) and neural networks.

As soon as the declared vocation of these systems is to produce, or to be at the origin of real or virtual actions, i.e., to be conferred with a direct or indirect capacity to act, it is necessary for the actors who allow them this capacity to act, either to prove that they master their systems, or that they are capable of justifying the results.

It is in this context that the question of how to explain the results of AI systems emerged, first in the 1990s with the expansion of expert systems, and then, renewed by the advent of deep learning and neural networks.

In this article, we will trace the evolution of formulations of the problem of producing explanations for the results of AI systems, in connection with the change of regime in the field of artificial intelligence (symbolic/expert systems vs. deep learning). We will use concrete examples to illustrate the way in which this change of regime is renewing and making more complex the question of explicability. This question is becoming a major regulatory and political issue as systems based on deep learning are deployed on a large scale.

 

Producing explanations: a challenge for the dissemination and adoption of AI systems...

In organizations

Prior to the massive development of mass-market AI systems such as ChatGPT, the use of such systems was mostly confined to organizations, be they public administrations or companies, which used them to accomplish their core missions or to support them. Thus, the question of delegating agentivity to AI systems arose first and foremost within organizations, which intended to leverage or profit from their performance.

For example, every year, IBM devotes a chapter of its Global AI Adoption Index to the relationship between the companies surveyed and the notion of AI trust and AI explainability.

Similarly, the first questions about producing explanations of the results of systems based on deep learning emerged mainly in 2016 following DARPA's call, in its " Explainable AI Program " (see Figure 1). DARPA is an agency of the U.S. Department of Defense, charged with "making critical investments in breakthrough technologies for U.S. national security". It is one of the biggest funders of American academic research.

Figure 1 - Illustration from DARPA's Explainable AI Program

For citizens

Organizations don't just use AI systems for internal purposes. It is now commonplace for companies to use AI algorithms to process customer data, or for administrations to use algorithmic processing based on citizen data to carry out their missions. Thus, France through the Loi Informatique et Libertés (LIL), and then the European Union through the General Data Protection Regulation (GDPR), have developed a regulatory framework that aims to frame the processing of personal data, which applies in particular when this processing is based on an AI algorithm.

In order for such AI-based processing to be legal, it must comply with their requirements. In particular, any citizen whose data is processed by an organization has rights regarding this processing. The LIL and the GDPR provide that, in certain situations, a citizen can ask an organization that processes his or her personal data using an AI algorithm for elements of understanding of the results of this processing. More specifically Article 47.2 of the LIL provides that with regard to individual administrative decisions based on algorithmic processing:

 « For these decisions, the data controller shall ensure control of the algorithmic processing and its developments in order to be able to explain, in detail and in an intelligible form, to the data subject the way in which the processing has been implemented with regard to him or her. »

And Articles 13.2.f), 14.2.g), 15.1.h) of the GDPR mention about automated decision-making:

« [...] the existence of automated decision-making, including profiling, referred to in Article 22(1) and (4), and, at least in such cases, useful information concerning the underlying logic, as well as the significance and intended consequences of such processing for the data subject. »

For customers

Finally, the AI Act aims to regulate the marketing of products based on Artificial Intelligence systems. This regulation applies according to the different levels of risk associated with the AI system under consideration:

  • Unacceptable risk: these systems are considered too dangerous and will be banned, such as social rating for government use, or voice-assisted toys that encourage children to adopt dangerous behaviors;
  • High risk: these systems are considered potentially dangerous for the safety or fundamental rights of individuals, and will have to comply with strict obligations. Examples include AI systems incorporated into toys, aircraft, medical devices, used for recruitment or border control;
  • Limited risk: these are systems such as chatbots, or image generation tools, for which a minimum of information will be required, including a clear indication that their generation is from an AI system and not a human creation;
  • Minimal risk: these are systems that do not fall into the previous categories, such as video games or spam filters. There are no requirements for putting them into circulation in the EU.

For high-risk systems, the AI Regulation imposes measures to produce explanations, as stipulated in Article 86.1:

« Any person affected by a decision taken by the person responsible for deployment on the basis of the results of a high-risk AI system listed in Annex III, with the exception of the systems listed in point 2 of that Annex, and which produces legal effects or significantly affects that person in a way that he or she considers to have a negative impact on his or her health, safety or fundamental rights, has the right to obtain from the person responsible for deployment clear and useful explanations of the role of the AI system in the decision-making procedure and of the main elements of the decision taken. »

 

What happened in 2012? The advent of deep learning

In the early 2010s, the computer vision community - which aims, for example, to build algorithms capable of differentiating an image containing a cat from one containing a dog - is actively engaged in successfully classifying the ImageNet database. This is a collection of 1.2 million images, each with its own label: "huski", "tree frog", "peacock" and so on. This database contains a total of 1,000 labels. "Classifying" ImageNet means building an algorithm, which takes an image from this database as input, and which is able to determine the label of each image with the least possible error. The percentage of times a given algorithm is wrong, called the error rate, is the yardstick by which this community judges and ranks proposed classification algorithms.

Figure 2 - from the ImageNet dataset (J. Deng, 2009)

As reported in (Cardon, 2018), an earthquake occurred in 2012 in this community at the ECCV12' conference. Indeed, Geoffrey Hinton, one of the major researchers in deep learning (a particular type of AI whose premises date back to the 1950s, and then rather marginalized since the 1970s, we'll come back to that) proposed a deep neural network algorithm to classify ImageNet. The title-holders of the ImageNet competition are dubious, yet Hinton improves the best error rate by 10 points with his proposal.

The earthquake here is that Hinton is not a computer vision researcher, nor does he have any particular knowledge of the field. His tour de force is essentially based on engineering: Hinton has managed to train a neural network with over 100 million parameters, distributing his calculations across several graphics processing units (GPU) designed for other purposes. Thanks to this immense computing power, and the vast amount of data contained in the ImageNet database, Hinton has succeeded in scaling up deep learning technology, which allows the algorithm to be given the images to be analyzed as they are, without the need for any particular expertise in image analysis to extract features. He thus showed that the conditions were ripe in 2012 to finally give deep learning techniques a considerable lead, in all fields (vision, audio, text, etc...), over expert techniques.

This triumph is the product of a long history of struggle between different conceptions of what it means to "make an intelligent machine". To trace and account for this, the authors of (Cardon, 2018) propose - as is classic in the social sciences - to distinguish between two ideal types of AI conceptions: symbolist and connectionist. It's important to specify here that this is a methodological choice that aims to isolate broad types of AI conceptions, in order to facilitate the study and understanding of their mutual relationships, without entering into the identification - impossible to carry out in practice - of all traces of this or that type in each AI technique. Thus, AI techniques are not always completely symbolic or completely connectionist, but the authors' thesis is that to understand "AI", it is fundamental to study the controversy associated with the confrontation of one type with another.   

Thus, the symbolist or expert conception of Artificial Intelligence (which is to be understood more broadly than formal methods or rule-based systems, and also includes, for example, regression or support vector machines) is based on the idea that to design an "intelligent" machine, it is necessary to incorporate into its operation a theory about the world we are trying to analyze. Thus, the machine's "intelligence" comes above all from the developer's implementation of reasoning about the domain being analyzed. As reported in (Cardon, 2018) during an interview they conducted with a computer vision researcher:  

« they're people who are still into this idea that you have to understand, that you have to know how to explain why you put branches like that and reason like that, and that you advance like that... »

In contrast, the proponents of connectionism, which is based primarily on neural network models trained by deep learning, believe that for an algorithm to "learn", it must be given as much latitude as possible to improve itself with the help of a large number of examples. Thus, connectionism works on an ideal of learning without a theory of the world it seeks to analyze, based on a large amount of data and a certain amount of computing power.

Deep learning is gaining a form of hegemony every day, and all the major recent successes in artificial intelligence are due to these techniques: the game of go, autonomous cars, chatbots, image generators...

 

Lesson in symbolic classification: giraffes have longer necks than cats

Let's put ourselves in the shoes of a symbolic computer vision researcher wishing to classify ImageNet[1] . As a reminder, to classify ImageNet means to build an algorithm, which takes an image as input and is able, as best as possible, to return the probability that this image belongs to such and such a class. This is generally done in three main phases: feature extraction, construction of a classification algorithm based on these features, and evaluation of the algorithm's performance.

Typically, if we want to distinguish photos of cats from photos of giraffes, we can use our knowledge of these animals, and therefore the fact that we know that a cat's neck is generally much shorter than a giraffe's neck relative to its body, to distinguish the two. So we can build an algorithm that extracts the relative proportion of neck from a cat or giraffe image.

Figure 3 - Chat et Girafe

Given enough examples of photos whose labels are known - this is the training database - we can study the distribution of the extracted feature among cats and giraffes. We then observe that, among these images, 97% of giraffes have a neck that occupies more than 30% of their body, and this is the case for 5% of cats. This enabled the researcher to construct the following algorithm:

CatOrGiraffe(Image) : 

Calculate the relative proportion of neck in the image

If this proportion is > 30%,

then return Giraffe

Otherwise,

Return Cat

To be sure that the algorithm was working, it had to be tested on new images with known labels, which were not present in the training database. The researcher then realizes that 93% of the time, the algorithm returns the right label. With a sense of accomplishment, the researcher can present his results to the computer vision community.

This vision, of course highly simplified, reproduces the main stages in the construction of a symbolic classification algorithm. Indeed, the winners of the ImageNet 2011 challenge[2] (the year before neural networks entered the competition), relied precisely on these three steps. Features are extracted from images using the technique SIFT, which detects patterns that are invariant to changes in image size or displacement. This set of extracted features is then statistically analyzed on a training set, using a sophisticated method called Fisher Kernel, to derive a classification algorithm. The team's algorithm was evaluated at the ImageNet 2011 challenge, and achieved the lowest classification error rate of that year.

 

Connectionist classification lesson: learning by example

Now let's put on the suit of a connectionist researcher. For him, the key to building an algorithm that classifies ImageNet is not to decide a priori on the features to be extracted from the data, but rather to give it input data as close to "reality" as possible. The model on which the algorithm is based must then have sufficient latitude (known as parameters) to be able to adjust itself as well as possible to the data contained in the training set: this is known as learning. This learning process requires a great deal of computing power and a large amount of training data. So, for the connectionist researcher, the algorithm needs to improve on a large number of examples, avoiding as far as possible the researcher's pre-suppositions about important image features being incorporated into the algorithm. This is the ideal of agnostic learning.

Let's now turn to a description of the major steps Geoffrey Hinton implemented during the ImageNet 2012 challenge to improve the state-of-the-art in classification of this dataset by more than 10%. As we've just seen, unlike the symbolic researcher, the connectionist doesn't aim to extract features that seem relevant to him from ImageNet images. First, he decides what type of neural network model he wants to use and how he wants the computations within the model to be organized: this is known as the neural network architecture. The architecture chosen by Hinton is a convolutional network with 60 million parameters.

Figure 4 - Schematic representation of the neural network used by Hinton and his team - (Krizhevsky, 2012)

The second step is to train the neural network. To do this, the idea is to "show" images from the training dataset to the network. As soon as it finds the wrong label, we modify the values of the model parameters to reduce the error. This operation, which seems simple to describe, in fact requires a very large computing capacity, which, in Hinton's day, was not readily available. What made this training phase possible was the meticulous engineering work of Hinton's students, who succeeded in distributing the training calculations correctly over a particular type of computer: the graphics processing unit (GPU). It was this reorganization of computations, known as parallelization, that enabled Hinton's large convolutional neural network to be trained in a reasonable amount of time. 

As in the case of symbolic algorithms, the final phase is always that of evaluation, where the performance of the trained neural network is assessed on images that were not part of the training set.

 

The revival of explanation production

Now that we've outlined the main principles underlying symbolic and connectionist conceptions of artificial intelligence, let's ask ourselves, in the light of the connectionist revolution we're witnessing, how the production of explanations of AI system results is transposed from symbolism to connectionism.

As we have seen, symbolic AI is all about implementing a theory about the world, or reasoning, in an algorithm. To define an algorithm that determines whether an image represents a cat or a giraffe, we start from the theoretical knowledge that giraffe necks are longer than cat necks. The classification algorithm manipulates the quantity "relative neck length" and, based on statistical analysis, implements the model in the form of a rule: if the relative neck length is greater than 30%, return giraffe; otherwise, return cat. 

In this way, it should be emphasized that the algorithm manipulates quantities, or "symbols", that make sense to its developer. What's more, the operations performed on these symbols relate to a theory about the world that its developer formulated, implicitly or explicitly, when developing the algorithm. This leads Bruno Bachimont (Bachimont, 1996), philosopher of science, to conclude for the case of formal systems (a subset of symbolic systems):

"Whereas the automatic formal systems developed in AI are not denotational models, i.e. formal representations denoting the world, their symbolic and effective character enables them to be linguistically interpretable and to refer to the knowledge of which they are the formal transcription."

To explain, or interpret, the operations of a symbolic system, we need to trace the knowledge of the programmers to which they refer. Here, we can see that it's a matter of disembedding the computing system from its strictly technical dimension, by tracing the logic or intention that led to its development. While this may seem feasible in theory, we can only imagine the institutional and organizational obstacles it may come up against, given that the development of such systems is rarely the product of a single person, nor of an organization's unambiguous intention. This is precisely what is underlined in the report by (Merigoux D., 2023)

, which shows that administrations that deploy algorithms as simple as systems of rules (allocation calculation), are unable to produce explanations of their results for essentially organizational reasons: the code bases are several decades old, and their updating in line with changes in legislation takes place in successive layers that do not allow their interpretability to be maintained. So, while it is possible in principle to produce explanations of the results of symbolic systems, the implementation of such production very often comes up against organizational obstacles.

As far as connectionist systems are concerned, we have seen that as a matter of principle, their implementation is not in obvious connection with the programmer's knowledge of the task to be solved, since they operate on abstract variables, and adjust their computations from example bases and not from the programmer's theories about the world. This situation is perfectly illustrated by the quote collected from a computer vision researcher by the authors of (Cardon, 2018) about Geoffrey Hinton's arrival at the ECCV 12' conference on ImageNet classification.

« I mean, the guys [ed. note: researchers in the computer vision community in 2012] were all on the floor because roughly speaking this was screwing up 10 years of intelligence, tuning, sophistication. [...] And this guy [editor's note: Geoffrey Hinton] comes along with a big black box of deep learning, he's got 100 million parameters in it, he's trained that and he's blowing the whole field out of the water. »

We can see here, then, that the shift to connectionism operates a radical change in the understanding that AI developpers can have of their systems, and therefore a fortiori of their ability to account for their calculations and produce explanations for them. It is precisely this upheaval in the relationship between "the world" and "the machine" that has given rise to the now popular term "black box" for neural network models, to convey the idea that while neural network computations can be described as a very large succession of elementary mathematical operations from raw data, this sequence of operations is not directly related to their developers' conceptions, and therefore doesn't make sense a priori.

With the aim of overcoming this "black box" status, a research ecosystem has formed around the issue of producing explanations for neural network results: Explainable AI. This research community produces techniques (algorithms, for the most part) for calculating, from a neural network and an input, an explanation of the model's result. While the output of this field of research is vast - more than 15k papers containing "Explainable AI" are present since 2016 in the SemanticScholar database - it must be stressed that it is very heterogeneous - the explanations that are proposed are of a very diverse nature -, and that none of these methods today enjoys scientific consensus. But that's another story that we tell in the next two articles.

 


[1] For the sake of presentation, what follows is obviously a simplification of reality. Nevertheless, we aim to preserve the main principles.

[2] XRCE team, Florent Perronnin – Xerox INRIA - France

 


Next article [2/3] ⮕

Summary


Article rédigé par Nicolas Berkouk, Mehdi Arfaoui et Romain Pialat