[3/3] Predicting without explaining, or when algorithmic opacity muddies the waters

Rédigé par Charlotte Barot

 - 

16 October 2024


As decision-support systems become more widespread, their use in critical decision-making contexts raises profound ethical and legal concerns. To counter the main risks identified in the field of decision-making, legislation requires human intervention or integrated human oversight within the decision-making process, leading to “hybrid” mechanisms that combine computational power with human judgment. In this series of articles, the LINC examines, based on scientific literature, two key obstacles to the effectiveness of such mechanisms: user trust biases toward the system and the opacity of the system’s suggestions.

 

This article is the third (and last) in a series of three:

The second article in this series illustrated the trust biases at play in assisted decision-making and how they act as an obstacle to the free exercise of human judgment in a hybrid system. While these biases are linked to specific abilities and contextual factors inherent to decision-making, they also reflect an intrinsic difficulty in how the system itself operates : the ability to interpret its outputs. Indeed, when the decision-maker cannot evaluate the system’s suggestion, they are left either to rely on their own intuition or to adopt a heuristic of doubt or trust. Even the minimum condition required for meaningful human oversight, namely, rejecting obvious errors, is not always realistic in contexts where certain outputs are difficult to assess. On the one hand, absurd responses may sometimes appear plausible, for example, in text-generating AI systems that insert an invented name or fact into an otherwise coherent passage. On the other hand, the very format of the output can make it resistant to evaluation: assessing the validity of a numerical score in order to propose an alternative would, in most cases, require reproducing the inference that produced the score in the first place.

 

Predicting is not explaining

 

In 1999, in a behavioral study, Goodwin et Fildes found that when decision-makers were presented with trend predictions in the field of marketing, they tended at best to ignore reliable predictions, and at worst to degrade them by attempting to modify them. In other words, they showed a bias of distrust toward the algorithm. The authors noted, however, that the outputs were difficult for decision-makers to evaluate, since their format, given as a score or percentage, was not easily contestable. When users displayed such maladaptive attitudes, it was because they were unable to interpret the score provided and therefore preferred to ignore it most of the time. Yet when they did attempt to propose an alternative, they failed to perform better than the system.

As this article shows, those tasked with evaluating system outputs are ultimately responsible for two subtasks: understanding and evaluating. On the one hand, the user must make sense of the system’s suggestion, for instance, if the output is a score, they must understand the scale on which it is expressed and the thresholds considered critical. The user must be able to interpret the overall message conveyed by the system (the number) within its context (the scale and thresholds).

Next, the user must evaluate it, that is, make a judgment about its relevance, either accepting it or rejecting it in favor of their own opinion or a corrected version. In the context of machine learning systems, however, evaluating outputs is not straightforward, since these models operate as black boxes. When the inference process that generated the system’s suggestion cannot be retraced or unpacked, alternative ways of interpreting the outputs must still be found.

 

Toward introspective systems

 

One option could be to ask the system itself, prompting it to produce justifications that would both help explain its output and allow an assessment of its own reliability. Unfortunately, the justifications and confidence scores accompanying the responses are not always reliable, even when the outputs themselves are correct.

Thus, Jin et al. 2024,  analyzing the performance of a model tasked with solving clinical cases based on medical imaging, observed the model’s limited ability to justify its answers. The model was tested using prompts structured in three parts: first, it had to describe the provided medical image, then recall relevant medical information to address the question, then produce a medical reasoning process, and finally choose a diagnosis from a set of options. While the model demonstrated high accuracy -sometimes even surpassing that of physicians - in its final diagnoses, it struggled significantly with understanding the medical images. This often led it to produce flawed reasoning to support an otherwise correct diagnosis, thereby generating misleading justifications.

These limitations make its use in clinical settings still premature, as such weaknesses threaten its potential integration into medical practice. The risk of introducing misleading justifications for instance, in a case where the human expert does not have access to the image, or relies on the system’s misinterpretation of it would be to mislead the decision-maker, possibly even leading them to reject an otherwise correct final suggestion.

 

Behavioral analysis of systems

 

In sum, systems do not always possess strong introspective abilities: they are not always able to analyze their own behavior, whether good or bad. However, one can still find guidance without attempting to open the black box, by relying instead on a behavioral analysis of the model.

It is in this context that a system developer plays a crucial role in the proper integration of the model into a domain-specific expertise process. The developer can provide several contextual elements that offer a clearer understanding of the conditions under which the model was “trained,” helping to interpret its behavior:

  • the context in which the algorithm was designed,
  • its known limitations,
  • tests conducted prior to deployment,
  • tasks on which it typically performs less well, etc..

Information about the training data, the system’s behavior in real-world situations, and the error margins observed during testing also helps to shed light on its outputs.

 

User reinforcement learning

 

Exploratory research delves deeply into this notion of behavioral analysis of models by providing users with “training” models of human decision-makers to familiarize them with the behavior of the system in use (Lian et Tan 2019, Suresh et al.. 2021, Wortman Vaughan et Wallach 2021). The goal is to teach users, through a series of trials, to become acquainted with the system’s behavior, enabling them to know when to follow its suggestions, when to reject them, and, in the latter case, when to investigate the problem more thoroughly.

In their experimental setup, Mozannar et al. 2022 explore optimizing collaboration between humans and artificial intelligence systems on tasks involving question answering based on text passages (using the HotPotQA dataset). The article proposes a method to help users collaborate with different AI models: by the end of the training, they should be able to decide when it is preferable to delegate the answer to the model and when they should intervene themselves.

This method draws on educational research emphasizing the importance of feedback in learning. It is based on the principle of specific examples, which are prototypical cases designed to illustrate situations in which the algorithm is reliable and those in which it is not. The examples are selected to represent different scenarios: some in which the algorithm has a high level of confidence and makes a correct prediction, others in which confidence is high but the prediction is incorrect, and cases where the confidence level is uncertain, regardless of whether the prediction is correct or not.

The goal is to improve the “mental model” that humans form of the algorithm’s capabilities, that is, to help them understand the cases in which it is likely to make errors, including errors in its own confidence estimates. This learning process enables users to better recognize situations where they can trust the algorithm and those where, conversely, it is necessary to verify the results more carefully.

Experiments show that users trained with this method are more effective at deciding when to delegate decisions to the classifier, enhancing collaboration between decision systems and humans and reducing judgment errors.

 

Overall conclusion

 

Scientific literature shows that implementing hybrid decision-making systems involves two types of challenges: first, enabling the decision-maker to exercise judgment that is, in principle, informed and impartial, which depends on exogenous conditions of the decision-making process ; and second, allowing the decision-maker to correctly interpret the system’s outputs, which depends on intrinsic conditions of system readability. Ultimately, the decision-maker’s trust attitudes merely reflect these underlying conditions. These two types of obstacles indicate that responsibility for hybrid decisions must be shared between the system deployer, who bears the operational risk, and the system designer, who is responsible for the system’s proper functioning and for providing the tools needed to learn how to use it effectively.

Indeed, if human intervention requires a significant degree of discretion, the risk is that individual responsibility in decision-making increases proportionally: the greater the freedom, the greater the associated costs borne by the decision-maker. Beyond the decision procedure itself, it is therefore necessary to broaden the perspective and consider these new procedures within the full work context, in order to best integrate machine suggestions into human decision-making.