AI System Security: The Actions That Make a Difference

Rédigé par Félicien Vallet

21 janvier 2022

Securing an AI system turns out to be a particularly complex task. Nevertheless, it is possible to implement measures throughout the system’s lifecycle which, although they do not guarantee absolute security or the impossibility of an attack, do significantly reduce the privacy risks.

Several publications - such as the one released by the company Wavestone - list both technical and governance best practices to implement in order to secure an AI system. As AI system security is a rapidly growing field of research, it is worth noting that, beyond basic measures, more sophisticated - sometimes even experimental - defense mechanisms can be added, many of which are evolving at a very fast pace.

1. Have a deployment plan

Implementing an AI system first and foremost requires careful consideration of how it will be done. While practical implementation often challenges certain design choices, formally defining requirements beforehand helps ensure that the deployed solutions remain consistent with the initial expectations and constraints.

1.1 Designing the architecture

Depending on the task at hand (image classification, text data extraction, etc.), different types of AI systems can be deployed. It is therefore necessary to address several key questions to choose the most appropriate one.

What type of problem is being solved?

Determine whether it involves regression, classification, clustering, segmentation, content generation, etc.

What type of system should be implemented?

Based on the problem, clarify whether a supervised, unsupervised, reinforcement, continual, federated, distributed, or centralized learning system is needed.

What are the implications of these choices?

Each system has its own characteristics and leads to specific consequences. For example, the privacy implications of a federated learning system are not the same as those of a centralized one, and continual learning systems have different vulnerabilities compared to those where models are trained « once and for all ».

1.2 Sequencing the processing

The power of machine learning lies in its ability to identify correlations within very large volumes of data. During the training phase, models are often exposed to more data than what will ultimately be strictly necessary, in order to determine the most effective combinations in practice. Once the model has been selected and validated, the quantity and nature of the data used can be reduced to the bare minimum. This refined dataset is then used for deploying the AI system in production.

It is therefore essential to carefully sequence AI processing by clearly separating:

The R&D phase: during which exploratory work is conducted to identify the best system to solve the target task (in terms of architecture, data used, choice of parameters and hyperparameters, etc.) and to carry out training.
The production phase: which follows R&D and involves using the AI system for the specific task it was trained to perform.

Having a clear understanding of this phase separation is crucial, as each phase is subject to different requirements. They may correspond to different purposes under data protection regulations, potentially requiring distinct legal bases. Depending on the chosen architectures, this separation can be more difficult to implement. For instance, a continual learning system may be seen as a succession of R&D and production phases.

1.3 Adopt a privacy-by-design approach

Depending on its use case, architecture, and the data it processes, an AI system may pose varying levels of risk to privacy. In some cases, these risks can be mitigated by implementing protective measures from the design stage onward. Several levers can be used in practice:

Explore privacy-preserving architectures. Certain implementation practices in AI systems can help ensure data confidentiality. For example, federated learning approaches may offer stronger privacy guarantees than centralized systems by limiting data transfers (see the article Chacun chez soi et les données seront bien gardées: l’apprentissage fédéré).
Strengthen the AI system's execution environment. The practical conditions under which systems are deployed must ensure robust security. For instance, trusted execution environments (TEEs) can be used to provide secure processing frameworks.
Leverage cryptographic tools. Recent scientific advances in cryptography offer strong guarantees for data protection. Depending on the use case, it may be relevant to explore techniques such as secure multi-party computation, oblivious transfer, or homomorphic encryption.
Use anonymization techniques. Certain anonymization methods, such as differential privacy, can be introduced and applied at various stages of the processing pipeline. In practice, using anonymization techniques helps reduce the risk of data breaches, model theft, or reverse engineering. These techniques can be applied, for example, to:
- Disturb the data used during the training phase
- Disturb the learned model parameters, e.g., by injecting noise during the parameter update process as proposed by (Long et al, 2017)
- Disturb the loss function used during training
- Disturb the input submitted to the system in the production phase
- Disturb the output provided by the system during production

It is important to find a balance between utility and protecting the model against privacy attacks (Rahman et al, 2018).

Ensure that basic IT security (cybersecurity) principles are properly applied. There are specialized frameworks for developing secure AI systems, such as SecML (Melis et al, 2019). It is also essential to remember that AI systems are, above all, information systems—and as such, they must comply with the same security requirements. The CNIL’s Guide to the Security of Personal Data outlines the fundamental measures to be implemented, including for AI systems.

2. Be mindful of the resources used

An AI system involves the use of different components. Three in particular can be distinguished:

Data: used both for model development during the R&D phase and for operation in the production phase
Models: created from data, they may sometimes result from the enhancement and adaptation of existing models by using transfer learning technologies
Code: the computational “embodiment” of the AI system, it can be specifically developed for the task at hand or derived from existing resources (such as open-source code). It almost systematically relies on existing building blocks (libraries, available examples, etc.)

2.1 Data

2.1.1 Ensuring legality

Since data lies at the heart of how AI systems operate, it is essential to ensure that their collection, processing, and use comply with applicable regulations, particularly those related to data protection when personal data is involved (GDPR, Data Protection Act, ePrivacy Directive, Law Enforcement Directive, etc.).

A compliance analysis regarding access to this data for the development and use of an AI system is therefore indispensable. Like any data processing activity, it must be characterized and documented as required by applicable regulations (purpose, legal basis, retention period, exercise of rights, security measures). Depending on the purpose of the processing, conducting a Data Protection Impact Assessment may prove to be necessary.

2.1.2 Ensuring quality

The data, used in training, validation, or testing sets, must also meet quality requirements in order to build the best possible system. It is therefore important to pay attention to various aspects, such as:

- The usefulness of the data for the problem to be solved

- The way in which the data were collected, which may shed light on the data

- The representativeness of the data with respect to the problem to be solved

- The presence of potential biases in the data

- The availability and quantity of data available

- Potential issues within the data (missing values, incomplete entries, outliers, etc.)

The data sanitization is an essential step in building a dataset usable by an AI system. Monitoring of data sources and thresholding mechanisms aimed at limiting the use of data from a single source (geographical area, individual, IP address, etc.) can also be implemented to reduce the risk of over-representation of certain data categories.

It is also crucial to ensure the integrity of the data collected and used by the AI system. This applies both to the R&D phase and to the production phase, and the data must not be altered in ways not foreseen by the system’s designer (principle of accuracy). The data acquisition chain must therefore be designed to ensure end-to-end protection, by limiting entry points, applying strict access management, and encrypting data flows.

In the case of reused datasets, and particularly those available as open source, a duty of vigilance is required, as such datasets may not provide satisfactory quality guarantees and may even have been compromised in order to carry out an attack on the AI system.

2.1.3 Ensuring data desensitization

The functioning of AI systems, based on the use of large or even very large amounts of data, puts certain GDPR principles under strain, particularly the principle of data minimization. Furthermore, as previously mentioned, the design or R&D phase of an AI system may require access to broader categories of data than those ultimately used in production.

However, it is essential to ensure that the data is not excessive. Information that clearly must not be used should not be retained within datasets. For example, the development of an AI system intended to produce a medical diagnosis must never be carried out on directly identifying data. The use of pseudonymization mechanisms (such as for textual documents) or filtering/obfuscation (for instance, 16-digit credit card numbers) is therefore indispensable.

Furthermore, in order to overcome certain constraints related to the protection of personal data, the use of anonymized or synthetic data can prove to be highly effective. However, since anonymization is a complex and evolving concept, it is essential to first ensure that it has been properly carried out (see the related CNIL how-to-sheet).

2.1.4 Ensuring traceability

It is essential to be able to ensure the traceability of the data used by AI systems (particularly in the R&D phase) and to document its origin and the conditions under which it was collected. In addition to ensuring legality, this information will enable the AI system designer to improve the quality of their system.

Furthermore, certain technologies currently under development aim to ensure data traceability and thus be able to observe whether they have been used for model training. This is referred to as « radioactive » data (Sablayrolles et al., 2020). This research, which is generally motivated by intellectual property issues, may be applicable in the case of personal data.

2.2 Models

In many cases, AI systems can be designed by reusing existing AI models. This is because the constraints imposed by deep learning make it costly in terms of computational resources, the amount of data required, and time. In many areas of application, transfer learning technologies are therefore used to facilitate model creation (e.g., for language models, vision models, etc.).

In practice, this involves using a model trained on very large categories of data and adapting it to the specific problem using data relevant to that problem, but in much smaller quantities than would be required for full training. For example, cancerous tumors can be detected using the generalist GoogLeNet model adapted to this use case.

As described in the article A Brief Taxonomy of AI System Attacks, an attacker can use an open source model that they have infected to introduce backdoors. It is therefore essential, when using a model implemented by a third party, to ensure:

• The reliability of the source from which it is obtained

• That you have knowledge of how it was created

• That you have the latest version of the model

2.3 The code

Like the development of any computer program, the design of AI systems relies on code development. However, since designing AI systems is a highly complex task, it often relies on the use of existing resources. As in other areas of computer science, it is virtually impossible to do without reference libraries (TensorFlow, Keras, Scikit-Learn, etc.), and it is also common to reuse parts of source code produced by others.

As with open source AI models, it is necessary to ensure :

The reliability of the source from which these elements are retrieved
That you have knowledge about how they were produced
That they have not been corrupted (by inspecting the retrieved code)

The CNIL's GDPR developer's guide specifies the best development practices to be implemented in general to meet data protection requirements.

Furthermore, in the context of a contract between an organization and an external AI solutions provider, the use that can be made of proprietary source code can be very precisely regulated. These issues are addressed in section 5.3.

3. Securing and strengthening the learning process

The learning phase of AI systems is a key stage and one of their specific characteristics. It is therefore necessary to make a particular effort to develop this phase in order to ensure that learning is carried out correctly, that the model is robust, and that it does not offer any opportunities for potential attackers. The learning process must therefore be protected on two levels: at the level of the data used to train the system and at the level of the learning method implemented.

3.1 In terms of training data

In addition to issues relating to the quality of data collected for use in training an AI system (see section 2.1.2), it is important to ensure that this data cannot be used for malicious purposes.

3.1.1 Monitoring the impact of data

As presented in the article A brief taxonomy of AI system attacks, infection attacks, particularly poisoning attacks, allow attackers to covertly control AI systems by contaminating data (to lower the quality of the model produced, to integrate a backdoor, etc.). The state of the art offers several methods to protect against such attacks and consolidate the AI model produced, such as:

Iterative learning control: studies the impact of each piece of data on the functioning of the model. This is also known as RONI (Reject On Negative Impact) defense, which removes data that has a negative impact on the accuracy of the model from the learning set (Nelson et al, 2008).
Active learning: involves an operator during the learning process to ask them to qualify certain data (Settles, 2010). This is a semi-supervised learning method.

Less complex methods can also be implemented to ensure data control during the learning phase, such as data integrity checks or the definition of blacklists listing keywords or patterns to be systematically removed from the learning set (e.g., offensive vocabulary in the case of a chatbot).

3.1.2 Consolidating your dataset

In addition to infection attacks, certain attacks known as manipulation attacks aim to degrade the quality of outputs by feeding corrupted data into the production system. To protect against such attacks, scientific literature proposes various strategies such as:

Data augmentation: this method increases the amount of data by adding slightly modified copies of existing data. This allows for « regularization » of the AI model. This technique is effective, except when data is limited. In such cases, certain information may be missing from the data that was not used for training, and the results may therefore be biased.
Randomization: adds random noise to each piece of data used for training. Adding this noise makes it more difficult for an attacker to predict the perturbation to add to an input to achieve their goals. The intensity of the noise to be added must be optimized to achieve the best compromise between the accuracy of the algorithm and its robustness.
Adversarial training: uses adversarial examples, most of which are generated using generative adversarial networks (GANs). Training a model with such data should enable the system to ignore noise that may have been added by an attacker and learn only from robust features.

While these methods primarily aim to prevent attacks involving the poisoning of training data, none of them offer formal guarantees that they can protect against all such attacks. Even though an acceptable compromise between performance and robustness must always be found, the idea is that they improve the generalization capacity of the trained model.

3.2 In terms of the learning method

In parallel with the work on the data used, it is also essential to secure and strengthen the learning process itself. To do this, conventionnal methods can be used:

Cross validation: allows the reliability of a model to be estimated using a sampling technique. This technique is effective, except when data is limited. In such cases, certain information present in the data that was not used for training may be missing, and the results may therefore be biased.
Bootstrapping: allows statistical values to be estimated for a population by averaging the estimates obtained from numerous samples taken from that population (for example, to estimate the mean, standard deviation, or even a confidence interval for the model parameters, or to estimate its performance). This method consists of randomly selecting observations one after the other in a uniform manner and returning them to the original sample once they have been chosen.
Batch normalization: this technique, specific to deep neural network training, standardizes (reduces the center) the inputs submitted to a layer of the network for each mini-batch (the input layer, but also the hidden layers). This has the effect of stabilizing the learning process and reducing the duration of network training. Note that other types of normalization exist: weight normalization, layer normalization, and group normalization.
Quantization:This technique, also used in deep learning, is a process of approximating a complex neural network that uses floating point numbers with a network of numbers from a fairly small « discrete » set. This truncation significantly reduces the memory requirements and computational cost of neural networks.
Pruning: This technique involves removing certain connections in a deep neural network in order to increase inference speed and reduce the storage size of the model. In general, neural networks are highly overparameterized. Pruning a network can be thought of as removing unused parameters.
Dropout: this regularization technique aims to reduce overfitting in neural networks. To do this, neurons are randomly selected and ignored during the learning phase. These ignored neurons are temporarily removed during the forward pass, and their weights are not updated during the backward pass. Unlike pruning, the neurons are not permanently deleted.

In addition, various learning strategies have been proposed in scientific literature to strengthen models and minimize opportunities for potential attackers. In particular, numerous ensemble learning methods have been developed, extending the methods traditionally used (bagging, boosting, or random forests). These techniques rely on combining multiple algorithms to increase model performance and achieve a level of accuracy far superior to that which would be obtained using any of these algorithms separately:

Micromodels: a technique that allows training to be carried out on multiple models and their relevance (and the possibility that some may be infected by corrupted data) to be assessed by majority vote at the time of inference (Cretu et al., 2008).
Defensive distillation: a technique that departs from ensemble learning approaches but uses a reference model (teacher) to train a second « smoothed » model introducing low uncertainty (student). The second model is more robust and less vulnerable to attackers seeking to exploit weaknesses (Papernot et al., 2016).
Private Aggregation of Teacher Ensembles (PATE): ensemble technique using differential privacy. Noisy master models enable the training, by majority vote on the outputs they produce, of a student model that will be used by the system. The underlying idea is that this model will leave very little room for a confidentiality attack (Papernot et al., 2017).
Ensemble adversarial training: a technique aimed at countering attacks using contradictory examples, particularly their transferability. To do this, the model is trained using contradictory examples created from the model itself, as well as examples transferred from pre-trained models (Tramer et al., 2020).

Finally, it should be noted that more and more research is focusing on how to enable individuals to exercise their right to object to a machine learning model, as provided for in the GDPR. Exercising this right requires that learning with appropriate properties is carried out. This is referred to as machine unlearning, and several learning approaches, such as SISA (Sharded, Isolated, Sliced, and Aggregated), are beginning to be proposed (Bourtoule et al., 2020).

4. Making the application reliable

While it is essential to take into account the specific risks of an AI system before putting it into production, it is also important to bear in mind that it is first and foremost a computer system in the traditional sense of the term. Its security and robustness therefore also depend on the application of « traditional » security measures. The best practices for web development presented in the CNIL's GDPR developer's guide specifies and in other initiatives such as the Open Web Application Security Project (OWASP) are therefore applicable. In particular, it will be necessary to both protect the process of feeding data into the AI system during the production phase and to control the outputs it produces, as both of these stages could give rise to malicious actions. In practice, it is recommended to use IT security experts (pentesters, red teamers, blue hats, etc.). The aim is to ensure the reliability and robustness of the AI system by carrying out various tests and attacks (intrusion, circumvention, etc.).

4.1 Control inputs

Firstly, it is essential that an AI system only exposes what is strictly necessary for its proper functioning. Access to the system must therefore be strictly limited to data subjects who need to access it. A « black box » mode of operation, which only allows users to provide inputs and observe outputs, is therefore generally preferable.

« Decontamination chambers » can be implemented in order to:

Ensure that submitted files are in the correct format: this involves checking the type of data, the completeness of the information entered or extracted, etc.
Check data consistency: this involves measuring any discrepancies with anticipated data, historical data, etc.
Detect the addition of noise in the data (noise prevention): this involves detecting the possible presence of a corrupted entry, for example by comparing the prediction obtained from the cleaned data with the submitted data (Akhtar et al, 2018).
Feature squeezing: this involves reducing the information in the submitted data to a minimum of features sufficient to perform the task, thereby preventing the addition of corrupted information. Comparing the outputs produced on the original data and on its compressed version makes it possible to detect contradictory examples with a high degree of accuracy, as a significant difference in behavior is observed. This is particularly suitable for computer vision processing (Xu et al, 2017).

In addition, it is strongly recommended to secure the APIs (application programming interfaces) that enable the use of online AI systems, for example, in order to:

Limit the number of requests that can be submitted by a user
Ensure that the user is human, for example by using CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart)
Impose a significant computational cost on each request (similar to the key derivation functions used for hashing in cryptography)
Analyze user behavior. Suspicious behavior can be detected and blocked. This is known as UEBA (User and Entity Behavior Analytics).

Finally, if the data submitted during the production phase is to be reused for re-training the AI model, setting up a sandbox can help verify that the re-training does indeed deliver the expected performance gains. This involves monitoring the model's performance over time to ensure that it is not permanently exposed to risks of drift and infection, such as poisoning attacks or backdoors.

4.2 Controlling outputs

While it is essential to control the inputs submitted to an AI system, it is equally important to ensure that the outputs produced are well protected and do not provide any opportunities for potential attackers. As illustrated in the article A Brief Taxonomy of AI System Attacks, an attacker can exploit the outputs of an AI system for multiple purposes: stealing the model, inverting it, conducting a membership inference attack, etc. To this end, it is advisable to:

Reduce the « verbosity » of outputs: for example, only display inferred labels and not the system's confidence scores, or produce a rough version of these (e.g., low/medium/high confidence) rather than raw scores. This makes it more difficult for an attacker to infer how the targeted AI system works. While they cannot counter all attacks, gradient masking techniques are an important step in defending AI systems (Papernot et al, 2017).
Adapt outputs: since an AI model always provides a response to the queries submitted to it, it is necessary to offer a « silence »/ »don't know » option when the decision is uncertain.
Detect suspicious outputs: this involves comparing a result produced with reference indicators and raising an alert in case of doubt (e.g., a history of past interactions). The way in which anomalies are handled must then be defined on a case-by-case basis: stopping processing, requesting re-authentication, alerting the processing supervisor, etc.
Offer manual moderation: in some cases, it may be useful to submit the outputs produced to an operator so that they can ensure that the AI system is functioning properly before returning a response.

5. Designing an organizational strategy

Depending on the purpose of the AI system, its field of application, its intended users, the learning strategy deployed (continuous, occasional, or « once and for all »), etc., different risks may arise. It is therefore recommended to i) document design choices, ii) supervise the system’s functioning, iii) identify key personnel and regulate the use of subcontractors, and iv) implement a risk management strategy. The article « Thinking about AI system security » provides further details on how such a strategy can be developed.

5.1 Documenting design choices

Documentation is the cornerstone of implementing a safe and resilient AI system. This documentation must reflect the reasoning that led to the design choices implemented in the system in production. It must also be regularly updated throughout the use of the AI system. To this end, the CNIL provides professionals with a Self-assessment Guide for their AI system, with a particular focus on the use of personal data. In addition, various resources can be used to document the system and ensure that no « blind spots » remain.

Here are some examples:

The AI Process Certification from the French National Metrology and Testing Laboratory (LNE)
The ALTAI (Assessment List on Trustworthy Artificial Intelligence) tool developed by the High-Level Expert Group on AI (GEHN IA) set up by the European Commission
The Labelia association's responsible and trustworthy data science assessment framework
The practical guide to ethical AI from the Numeum professional union
The algorithmic impact assessment tool implemented by the Government of Canada

5.2 Supervising system operation

It is essential to ensure that the AI system does not deviate from the expected behavior throughout its lifecycle. To achieve this, it is necessary to anticipate, as early as the design stage, how the system’s evolution will be monitored and which indicators will enable such monitoring. Developing a rigorous evaluation method is therefore crucial. In addition, it is indispensable to implement a process for maintaining operational conditions. This means ensuring that the AI functionality remains compliant with the defined specifications after deployment and throughout its operational phase.

This supervision is particularly critical when implementing continuous learning systems, in which the model is updated as the system is used. Measuring the system’s performance evolution against key parameters or setting up regular audits must be considered before activating continuous learning.

Traceability requirements must therefore be implemented. Traditional traceability practices do not take into account the complexities introduced by machine learning. It is thus essential to define specific traceability requirements, notably to retain information concerning the data submitted to the system, the parameters leading to the system’s decisions, and so forth.

5.3 Identifying key personnel and overseeing the use of subcontractors

The implementation of a complete algorithmic chain is a delicate task that requires the skills of many individuals with complementary expertise. It is therefore necessary to identify these skills, which contribute to the effectiveness of the design processes, the development of the AI system, its evaluation and operational maintenance, the configuration of AI functionalities, and the ability to achieve the expected results.

An organization may, in some cases, decide to rely on AI solutions developed by external providers. In such situations, vigilance is required, as the use of the system may engage the organization’s liability, particularly with regard to personal data protection. It is therefore necessary to establish a contract governing the relationship and the commitments of the subcontractor(s), in particular to specify:

The framework within which the AI system must operate
The expected requirements and the methods of control and evaluation
The modalities for addressing personal data protection requirements
The deployed risk management mechanisms
The desired level of documentation
The terms of access to and use of resources (e.g., whether the subcontractor may reuse the data, the trained model, etc.)
The reversibility needs at the end of the contract
The management of intellectual property issues

5.4 Implementing a risk management strategy

As detailed in the article Thinking about AI system security, a strategy for risk management and resilience against errors and potential attacks must be implemented throughout the design, development, deployment, and production phases of the AI system. This strategy must be specifically adapted to the problem to be solved and aligned with potential impacts. In practice, it should be developed according to the sensitivity of the processing, the types of data processed (structured data, images, speech, text, etc.), the learning strategies employed (continuous, occasional, or « once and for all »), the system’s exposure modalities (internal, public, etc.), the evaluation stages of the system, etc.

Such a strategy should enable bodies to take responsibility by building safe and resilient AI systems and by demonstrating their compliance with applicable regulations, such as the General Data Protection Regulation (GDPR). It therefore combines a legal analysis of the system’s implementation with a clear definition of the technical and organizational measures necessary to protect both the system and its associated assets (data, models, configuration, etc.). This strategy should make it possible to:

Establish an exhaustive and precise inventory of the different AI systems deployed within the body
Raise awareness and foster accountability among the teams involved
Validate design choices
Assess the risks involved, particularly regarding personal data protection (including residual risks)
Define data backup and management procedures
Ensure confidentiality, integrity, and availability by deploying appropriate measures (access restrictions, encryption, etc.)
Define a business continuity plan

The procedures and organizational structures already in place to ensure GDPR compliance or information system security can in particular be used to incorporate the management of risks specific to the use of AI systems.

Illustration : Flickr - cc-by-nc - Todd Lappin