Understanding Google’s new online/offline conversion tracking

02 juin 2017

Franck Baudot and Vincent Toubiana - Technologists - In a Washington Post article, Google announced a “privacy preserving” method used to estimate conversion tracking but remained secretive about the underlying patent. We looked at Google patent portfolio and found a candidate that could be the  one at play. We analyze the promising patented system in the light of Google announcement. Is patenting privacy preserving system the right innovation model?

Last week Google made an announcement about their new solution to measure offline conversion: comparing its advertising campaigns data (i.e. to whom they showed an ad) to the credit card transaction of their partners (i.e. who bought what). Google argued that they obtained consent to share data with third party partners. While it is true that Google can obtain data from its partners (more on this) most users were unlikely realizing to consent to this. From a privacy perspective, that sounds quite scary and so the news made the headline: reading about closing the gap between offline and online always sends chills down the spine. Google and Facebook are making incremental efforts to measure offline conversion (and thus provide alternatives to the cost per click measure)… is it so intrusive?
 

Digging into Google patents


Google’s spokesperson briefly mentioned a pending patent about their solution. So we had a quick look at Google patents on payment and measurement. A closely related patent was actually filed in 2013 and published in May 2017. Why did Google wait 4 years (which is quite long) to deploy this system? One of the explanations might be that the patent relies on homomorphic encryption which was for quite a long time, hardly achievable in a realistic time. We’ll dig in the details of the patent later, but it’s closely related to the objective mentioned in the WaPo article: the patent’s abstract describes “A method of measuring a campaign performance includes transforming identifiers into non-identifiers,  and […] identifying a subset of the spending values associated with members of the encrypted non-identifiers in the common encrypted non-identifiers; and deriving a total spending value based on the subset of the spending values.” As any patent, this one is written to be very broad. 
In a few words, it’s meant to be a privacy preserving method to find out how much people exposed to an ad-campaign have been spending. How privacy preserving is it? Let’s see. 
 

Google patent n° US 9,641,332


Entitled Privacy-preserving measurements of a campaign, this patent aims at measuring a campaign performance by collecting the global amount of transactions in relation with the campaign. The “tour de force” is to do so without exposing the amount for each customer. The most informative section in the patent is the “Example Implementation”. 

It should be noted that the patent makes use of non-conventional cryptographic algorithms:

  • Pohlig-Hellman algorithm: it allows a cleartext to be encrypted twice with two different keys. The interesting property here is that it will give the same result whichever key is used first.
  • Second, Paillier cryptosystem allows addition of ciphertexts. 

The patent mentions two actors: the campaign promoter (referred to as P), and one Merchant (referred to as M).
Our understanding is that the method would be deployed for a brand (let’s refer it as B) that would subcontract a campaign to P, B’s products (or services) would be purchased by end customers at different (possibly web) merchants.

 

Two objectives characterize this patent:

1. Allow P to determine a subset of people having purchased at the merchant’s place and being part of the campaign without M divulging its set of purchasers;
2. Let P know the global amount of money spent by the buyers at M without knowing each single purchase and without letting M know this amount. 

The first objective is reached by means of Pohlig-Hellman algorithm, used to encrypt twice every set of identifiers, once by P and once by M. This is achieved as follows:

  • P and M agree on the same calculation group, i.e. sharing the same prime number
  • P encrypts the identifiers in its set with a given exponent and transmits them to M,
  • M encrypts them again with its own exponent and sends them back to P , 
  • M encrypts its own dataset of identifiers,
  • M also encrypts the amount for each buyer with a Paillier key and sends the whole to P,
  • P encrypts the identifiers of M set and compares them with the results with its own dataset.
  • The intersection of P and M datasets consists in every encrypted identifier which appears in both datasets.

Et voilà!

 

The second objective is achieved using Paillier’s cryptosystem applied to the amounts. As explained in the previous objective :

  • M sent the amount of each buyer encrypted with a public key along with the encrypted identifier.
  • P computes the multiplication of every encrypted amount belonging to the encrypted identifiers of the intersected dataset, plus one encrypted random value Rand, sending the result to M.
  • M decrypts and sends the result back to P.
  • P subtracts Rand to this result to obtain the global amount of purchases done at M.
     

So, how privacy preserving is this method?


Undeniably, the process is far better than what happens today on such matters (the patent intro refers to Facebook practices). A key element that enforces privacy was not given above: each actor can (and even must), randomize the order of the encrypted identifiers before transmitting the dataset. In this way, it breaks the link between an identifier and its encrypted counterpart.

The good point is that Google’s patent shows that it is possible to define methods that are privacy –preserving while achieving a clear objective. It opens the door to processing data without sharing them with third parties, and thus does not require to share data between third parties. Google is doing privacy engineering there and we hope that’s really the technique they are using. A bad point would be: why define such methods in patents instead of an article that would let the solution open to everyone? 

One final remark on the patent: as far as we understood the cryptographic parts, the attacker model seems to be “honest but curious”:  we found that the Merchant can “tamper” with the amount returned to the Promoter. Indeed, there is no way for P to ensure that the deciphering of its value done by M is correct. Similarly, there is no way for M to know if P added all the amounts or just selected one from the dataset.

 

Franck Baudot and Vincent Toubiana - Technologists


Document reference

Go further

View an intent to draw the Google's conversion tracking architecture.