Go Back

Discover the advertising web with Ads.txt files

The Ads.txt files, a tool used in the fight against advertising fraud on the Web, contains keys to visualise the relationships between the various players in the field.

This content, published by the CNIL's Digital Innovation Lab, is based on factual analysis using freely available information. These results does not constitute an analysis of the compliance of the practices observed and in no way prejudges the qualification that could be made by the CNIL of these practices. This study was carried out as part of the CNIL's general mission of monitoring information technologies (Art 8-4 of the French Data Protection Act).

What are Ads.txt files?

It all starts with the "inventory". Most of the websites you visit reserve one or more areas on their pages for displaying advertisement. For each web user who visits these pages, there is therefore a possible "impression" (i.e. display) of an advertisement in this space for that user. This “impression” constitutes an element of inventory sold by the publisher.

For example, for a website with 1 advertising space and 1000 daily visitors, the site inventory is made up of 1000 impressions for sale per day.

The publisher will generally use the services of one or more SSPs (for « supply-side platform »)to sell these impressions on an advertising exchange network (or "ad exchange"). Each of these SSPs is therefore authorised by the publisher to sell its inventory on a certain number of ad exchange networks, whether this SSP is directly integrated on the publisher's page or just a simple reseller.

Those relationships are described in the Ads.txt files, with the aim of fighting against advertising fraud.

What types of fraud are publishers trying to avoid?

Why do publishers voluntarily make the list of advertising companies they work with available to everyone? The aim is to avoid two types of fraud:
  • The first is quite simple: it is the usurpation of a publisher’s identity. Let's imagine that the SSP1 is selling the inventory of website A on an advertising exchange network. This website has a good reputation, and the inventory is selling well. SSP2 then decides to sell the inventory on the same network inventory of a website B, which is less valuable, by passing it off as the one from website A. The SSP2 can then sell its lower quality inventory at a higher price, which results in the authentic inventory being discounted (since prices are adjusted according to supply and demand). Ads.txt files allows all the actors in the ecosystem to check whether SSP2 is allowed to sell the inventory of website A and therefore preserving its value.
  • The second is more subtle: it is the resale of inventory. Let's imagine that inventory from website A is sold on two different advertising exchange networks, the ADX1 and the ADX2. On the ADX1 this inventory is sold for €1 per impression. On the ADX2 where there is less demand, it is sold for €0.50 per impression. An SSP can notice this variation and buy inventory onADX2 at a low price and resell it on ADX1 at a higher price. Although this practice is less fraudulent, it can lower the average profitability of the inventory, hence its prohibition. With the Ads.txt files it is possible to know that the SSP should not be able to resell this inventory.

Why do we care?

Ads.txt files allow to identify the relationships between online advertising companies and website publishers. It is important to know that the players in this chain do not limit themselves to organising inventory sales. Each inventory item sold is associated with a unique identifier of the web user which is generally stored in a cookie. This identifier makes it possible to build advertising profiles of users, for example by tracing their browsing habits through the web. To better understand these practices, read the article (in French) on the RTB, one of the most popular programmatic sales modalities.

By visualising the prevalence of each player on the French Web, we can therefore get a good idea of the extent of the collection of browsing data by advertisers, as each advertising service generally uses cookies to track Internet users.

Methodology

We use the top 5000 French Alexa to identify the most visited sites by French web users.

On these 5000 websites, 31.8% have an Ads.txt file (see a random one).

We crawled this sample on the 26th August 2020.

Disclaimer: this data is based on the values declared by the publishers, which are not verified. The results of this study are therefore dependent on the accuracy of these declared data.

The results


The sites visited by French users use a large number of different advertising systems

Here is the list of the top 25 sites with an Ads.txt file, ranked by popularity: You can rank them by value:

An average site claims to use the services of 38 different companies.

The maximum is 188 and the minimum 1.

Here is a visualisation of the 400 most visited French websites with an Ads.txt file.


Each box represents a site and its colour is relative to the number of declared advertising services used.Move your mouse over the boxes for more information.

The reason why there are so many different services is simple: the more exchanges on which inventory is sold, the more the expectation of the selling price increases, as publishers can select the highest bidder.

However, this means that an extremely large number of third parties have access to the users’ data, often without them realising it.

Here is a visualisation of the 400 most visited French websites with an Ads.txt file.


Each box represents a site and its colour is relative to the number of declared advertising services used.Move your mouse over the boxes for more information.

The reason why there are so many different services is simple: the more exchanges on which inventory is sold, the more the expectation of the selling price increases, as publishers can select the highest bidder.

However, this means that an extremely large number of third parties have access to the users’ data, often without them realising it.

The most popular advertising networks are present on an extremely large proportion of websites.

Here is a visualization of the proportion of sites on which each of the major advertising systems are present.

Move your mouse over the boxes to see the proportion of presence of each network.

Each block is made up of the 400 previously identified sites, and if a box is coloured, the advertising system is present.
In addition to advertising systems' knowledge of browsing history of users, more than half of the programmatic advertising for which Ads.txt is used is Real-Time Bidding or RTB.

In RTB there can be several hundred buyers on each platform who can access the data of the users. These data streams include at least cookie identifiers and usually the URLs (or at least the domain) of the webpage on which the ad would be displayed. It is sometimes possible to associate this information with page content. In addition, cookie identifiers can be synchronized across different ad ecosystems to better capture the full navigation of users.

The Gordian knot of online advertising

Finally, let's look at the connections between the first thirty advertising networks and the first 100 websites with an ads.txt file.

Although this graph is hard to read, it shows the complexity of an ecosystem whose extent remains largely unknown to the general public. How many of the companies in the left column have you heard of?

Click on the company or website names to better visualise the relationships between them.

What if we were to represent all the players who have access to each of these advertising exchanges?

We have to wonder whether the system as implemented is not structurally incomprehensible to users, since its precision and complexity cannot reasonably be understood by an ordinary user. As this system is based on the processing of data relating to individuals (generally cookie identifiers), it is essential that it can be understood and controlled by users and that it incorporates a "privacy by design" approach.

To learn more about the online advertising ecosystem, see our study on Sellers.json files
Article based on a scan made on 26 August 2020.
Find some of the source data and the code of this page on the CNIL's Github.

This study is based on the collection of URL and data that is exclusively related to legal persons. However, in some instance this data might contain personal data. This processing is carried out by the CNIL. It is based on the exercise of an official authority and its purpose is to produce studies on the use of technology. The data collected is related to website domain names that are freely accessible over the Internet. This data will be stored for a maximum duration of 5 years. For more information on the way the data is processed or to exercise your rights, you can consult this page.