The Ads.txt files,
a tool used in the fight against advertising fraud on the Web, contains keys to
visualise the relationships between the various players in the field.
This content, published by the
CNIL's Digital Innovation Lab, is based on factual analysis using freely
available information. These results does not constitute an analysis of the
compliance of the practices observed and in no way prejudges the
qualification that could be made by the CNIL of these practices. This study
was carried out as part of the CNIL's general mission of monitoring
information technologies (Art 8-4 of the French Data Protection Act).
What are Ads.txt files?
It all starts with the "inventory". Most of the
websites you visit reserve one or more areas on their pages for displaying
advertisement. For each web user who visits these pages, there is therefore
a possible "impression" (i.e. display) of an advertisement in this space for
that user. This “impression” constitutes an element of inventory sold by the
publisher.
For example, for a website with 1 advertising space and
1000 daily visitors, the site inventory is made up of 1000 impressions for
sale per day.
The publisher will generally use the services of one or more SSPs (for
«
supply-side platform »)to sell these impressions on an advertising
exchange network (or
"ad exchange"). Each of these SSPs is therefore
authorised by the publisher to sell its inventory on a certain number of ad
exchange networks, whether this SSP is directly integrated on the
publisher's page or just a simple reseller.
Those relationships are described in the Ads.txt files, with the aim
of fighting against advertising fraud.
What types of fraud are publishers trying to avoid?
Why do publishers voluntarily make the list of advertising companies
they work with available to everyone? The aim is to avoid two types
of fraud:
- The first is quite simple: it is the usurpation of a publisher’s
identity. Let's imagine that the SSP1 is selling the inventory of website A on
an advertising exchange network. This website has a good reputation, and the
inventory is selling well. SSP2 then decides to sell the inventory on the same
network inventory of a website B, which is less valuable, by passing it off as
the one from website A. The SSP2 can then sell its lower quality inventory at a
higher price, which results in the authentic inventory being discounted (since
prices are adjusted according to supply and demand). Ads.txt files allows all
the actors in the ecosystem to check whether SSP2 is allowed to sell the
inventory of website A and therefore preserving its value.
- The second is more subtle: it is the resale of inventory.
Let's imagine that inventory from website A is sold on two different
advertising exchange networks, the ADX1 and the ADX2. On the ADX1
this inventory is sold for €1 per impression. On the ADX2 where
there is less demand, it is sold for €0.50 per impression. An SSP
can notice this variation and buy inventory onADX2 at a low price
and resell it on ADX1 at a higher price. Although this practice is
less fraudulent, it can lower the average profitability of the
inventory, hence its prohibition. With the Ads.txt files it is
possible to know that the SSP should not be able to resell this
inventory.
Why do we care?
Ads.txt files allow to identify the relationships between online
advertising companies and website publishers. It is important to
know that the players in this chain do not limit themselves to
organising inventory sales. Each inventory item sold is associated
with a unique identifier of the web user which is generally stored
in a cookie. This identifier makes it possible to build advertising
profiles of users, for example by tracing their browsing habits
through the web. To better understand these practices,
read the article (in French) on the RTB, one of the most popular
programmatic sales modalities.
By visualising the
prevalence of each player on the French Web, we can therefore get a
good idea of the extent of the collection of browsing data by
advertisers, as each advertising service generally uses cookies to
track Internet users.
Methodology
We use the
top 5000 French Alexa to identify the most visited sites by
French web users.
On these 5000 websites,
31.8% have an Ads.txt
file (
see a
random one).
We crawled this sample on the 26th August 2020.
Disclaimer:
this data is based on the values declared by the publishers, which
are not verified. The results of this study are therefore dependent
on the accuracy of these declared data.
Here is a visualisation of the 400 most visited French websites
with an Ads.txt file.
Each box represents a site and its colour is
relative to the number of declared advertising services used.Move your
mouse over the boxes for more information.
The reason why
there are so many different services is simple: the more exchanges on which
inventory is sold, the more the expectation of the selling price increases,
as publishers can select the highest bidder.
However, this means that an extremely large number of third parties
have access to the users’ data, often without them realising
it.
Here is a visualisation of the 400 most visited French websites
with an Ads.txt file.
Each box represents a site and its colour is
relative to the number of declared advertising services used.Move your
mouse over the boxes for more information.
The reason why
there are so many different services is simple: the more exchanges on which
inventory is sold, the more the expectation of the selling price increases,
as publishers can select the highest bidder.
However, this means that an extremely large number of third parties
have access to the users’ data, often without them realising
it.
The most popular advertising networks
are present on an extremely large proportion of websites.
Here is a
visualization of the proportion of sites on which each of the major
advertising systems are present.
Move your mouse over the boxes to see the proportion of presence
of each network.
Each block is made up of the 400
previously identified sites, and if a box is coloured, the advertising
system is present.
In addition to advertising systems' knowledge of
browsing history of users,
more
than half of the programmatic advertising for which Ads.txt is used is
Real-Time Bidding or RTB.
In RTB there can be several hundred buyers on each platform who
can access the data of the users. These data streams include at least
cookie identifiers and usually the URLs (or at least the domain) of the
webpage on which the ad would be displayed. It is sometimes possible to
associate this information with page content. In addition, cookie
identifiers can be synchronized across different ad ecosystems to better
capture the full navigation of users.
The Gordian knot of online advertising
Finally, let's look at the connections between the first
thirty advertising networks and the first 100 websites with an
ads.txt file.
Although this graph is hard to read, it shows the complexity
of an ecosystem whose extent remains largely unknown to the general
public. How many of the companies in the left column have you heard
of?
Click on the company or website names to better
visualise the relationships between them.
What if we were to represent all the players who have
access to each of these advertising exchanges?
We have to
wonder whether the system as implemented is not structurally
incomprehensible to users, since its precision and complexity cannot
reasonably be understood by an ordinary user. As this system is
based on the processing of data relating to individuals (generally
cookie identifiers), it is essential that it can be understood and
controlled by users and that it incorporates a
"privacy by design"
approach.
To learn more about the online advertising ecosystem, see our study on Sellers.json files
Article based on a scan made on 26 August 2020.
This study is based on the collection of URL and data that is
exclusively related to legal persons. However, in some instance this data
might contain personal data. This processing is carried out by the
CNIL. It is based on the exercise of an
official authority and its purpose is to produce studies on the use of
technology. The data collected is related to website domain names that are
freely accessible over the Internet. This data will be stored for a maximum
duration of 5 years. For more information on the way the data is processed
or to exercise your rights, you can
consult this page.