Ubuntu privacy blunder over Amazon ads continues

By on .

Ubuntu privacy blunder over Amazon ads continues

First, some context: There have been quite a few complaints and concerns about Ubuntu's attempt to include advertisements in their operating system, in the form of Amazon-affiliate-tracked results showing up in Unity's Dash interface by default. There has also been some attempts to do some damage control over this PR disaster, including one by Mark Shuttleworth himself, Ubuntu's Self-Appointed Benevolent Dictator For Life (SABDFL).

To his credit, he isn't pulling any punches or dancing around the question:

Why are you telling Amazon what I am searching for?

We are not telling Amazon what you are searching for. Your anonymity is preserved because we handle the query on your behalf. Don't trust us? Erm, we have root. You do trust us with your data already. You trust us not to screw up on your machine with every update. You trust Debian, and you trust a large swathe of the open source community. And most importantly, you trust us to address it when, being human, we err.

One of the statements here is pretty ominous at first: "Don't trust us? Erm, we have root." Mark refers to the fact that system updates are all done as root, and they can indeed slip in any code they want in there, which could include a remote-administration trojan or a little script uploading all of $HOME to Canonical's servers... But doing so would go directly against their users and instantly ruin their reputation. It is expectable from users to trust their operating system vendor will not snoop on them. The argument, while technically correct, doesn't hold much water when considering user expectations and Canonical's own business interests.

However, I'd like to challenge one particular passage (emphasis mine):

We are not telling Amazon what you are searching for. Your anonymity is preserved because we handle the query on your behalf.

There's a number of issues here.

"We are not telling Amazon what you are searching for."

The way the search is handled goes as follows:

  • User begins typing in the Dash search field
  • An HTTP request (not HTTPS!) is made to a server called productsearch.ubuntu.com, containing the keywords
  • productsearch.ubuntu.com asks Amazon's API for search results; to do so, it obviously needs to send the search terms. It is unknown whether that query is made over HTTPS or not.
  • The search results are sent back to the client in a JSON string

The request looks like this:

GET /v1/search?q=test HTTP/1.1
Host: productsearch.ubuntu.com
Accept-Encoding: gzip, deflate
User-Agent: gvfs/1.13.9
Accept-Language: en-ca, en;q=0.9, en;q=0.8
Connection: Keep-Alive

And the response:

HTTP/1.1 200 OK
Date: Tue, 25 Sep 2012 07:17:39 GMT
Server: gevent/0.13.0 gunicorn/0.13.4
Vary: X-Geo-Country
Content-Type: application/json
Content-Length: 44674
X-Cache: MISS from alkes.canonical.com
X-Cache-Lookup: HIT from alkes.canonical.com:3128
Via: 1.0 alkes.canonical.com:3128 (squid/2.7.STABLE7)
Via: 1.1 productsearch.ubuntu.com
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive

...

Of course, it is trivial to see why the statement is wrong in the first place: productsearch.ubuntu.com is telling Amazon what you're searching for. What it is not telling is who you are, because (supposedly) the API request doesn't contain any identifying information other than your search terms.

This oversight is most likely just poor wording on Mr. Shuttleworth's part, though. What the sentence is really trying to say is: "We are telling Amazon what Ubuntu users are searching for, but we are not telling them who these users are."

That's fine, although it still raises some important privacy questions. Indeed, this search is performed when the user is using Unity's "Home" lens, which is where you can search for applications, files in your $HOME folder, and now Amazon search results. However, the documents in one's $HOME folder are usually fairly private. Even their filename alone usually speaks volumes (Top-secret plan to kill my boss.doc, Confessions of a (fill in the blank).pdf, How to (fill in the blank).epub, credit-card-(some number).kmy, etc.). They usually contain people's names in them, too. The search terms reveal a lot by themselves about the person typing them. Users are going to be searching for those files in the Home lens, because that is what they have always done and that is what they are used to. Unbeknownst to them, they are now sending these sensitive search terms over plain HTTP (visible to your local sysadmin, your boss (through your sysadmin), your ISP, and who knows, maybe your government (through subpoenas). And then Canonical sees it, and Amazon does too, and any other peer along the ride. The only thing that Canonical is doing is masking your IP address from Amazon.

The net result that Canonical claims: Canonical knows IPs and search terms, Amazon knows search terms. The only thing that Amazon doesn't know: Who is searching what.

Now, I'd like to question this claim by simply looking at the Wireshark output from running any search query. Try to do the following:

  • Install Wireshark
  • Start it and launch a capture
  • Open the dash and type a few characters
  • Check what Wireshark says

You'll see something like this:

GET /images/I/41Qemdr7ieL._SL160_.jpg HTTP/1.1
Host: ecx.images-amazon.com
Accept-Encoding: gzip, deflate
User-Agent: gvfs/1.13.9
Accept-Language: en-us, en;q=0.9
Connection: Keep-Alive

And the response:

HTTP/1.0 200 OK
Date: Mon, 24 Sep 2012 15:55:23 GMT
Server: Server
Cache-Control: max-age=630720000,public
Expires: Wed, 18 May 2033 03:33:20 GMT
Content-Length: 4630
Last-Modified: Wed, 08 Aug 2012 22:34:21 GMT
Content-Type: image/jpeg
Age: 47375
X-Cache: Miss from cloudfront
X-Amz-Cf-Id: aBNnNXkOlBFeFzoYljLrLBE2MTi0TMmDIvZfbzslKOM-8V1Wi9T2sA
Via: 1.0 574341a971a46a2980db13237b8175da.cloudfront.net (CloudFront)
Connection: keep-alive

...

This is simply the Dash downloading the thumbnails that accompany each search result. Each item in the dash has a prominent icon, and a label underneath:

Unity Dash thumbnail

Of course these images need to be downloaded from somewhere. Let's download them from the source, images-amazon.com! What could possibly go wrong? This goes against Mr. Shuttleworth's claim that Amazon doesn't know who is searching what. Indeed, while Amazon can't map search terms to IP addresses, what they can do is log the requests on their images server, and simply look at the name of the corresponding product and figure out what the search terms were. Or simply correlate them with a recent API query received from productsearch.ubuntu.com.

Some additional nitpicks:

  • Those image requests are done over HTTP as well, even though Amazon provides an SSL version of their image service at ssl-images-amazon.com. Fixing it would be a simple one-line replace in the code. The gain from using SSL for image content isn't enormous, but if it's available, why not use it? Some may argue "for speed". I'd advise these people to try out the Unity Dash search by themselves, and get back to me about how fast it currently is. I doubt speed was a big concern.
  • The request uses a fairly unique User-Agent header: gvfs/1.13.9. GVFS is a component of the GNOME desktop used for filesystem stuff, including mounting WebDAV shares and the like. Unity is likely using the GNOME library to perform these HTTP requests. However, I think there is little reason for that component to ever hit the amazon-images.com domain, other than because of the Unity Dash advertisements. As such, Amazon now has an easy way to identify which image requests result from a Unity Dash search.
  • The request contains an Accept-Language header which contains the user's locale. It is set to en-us, en if you install the US English version of Ubuntu, but can be set to fr if you install the French language pack and set it as default, and so on. This isn't a huge information leak, but it gives Amazon more data to correlate the terms with, because you probably typed your search terms in that language. At any rate, it is not necessary for Amazon to know the language in order to serve static image files, so why tell them?

I have filed a bug about all of these issues.

So there we have it. Something which may have started from good intentions ("Let's have the Dash search the web to provide users with richer search results!") turned into something much worse ("Let's put irrelevant revenue-generating advertisements on by default in a place where the user is likely to type private information and wouldn't expect that information to be sent out to anyone!") through a series of oversights. This was pushed through Ubuntu's Feature Freeze period because it had executive support from the top people, and its release was rushed through with little regard to the users' interest (there was no warning that this was coming), or to the PR disaster that was inevitably going to follow.

Oh, and did I mention that, privacy concerns aside, advertisements in an operating system are not a good idea in the first place? It's an intrusion of the user's personal space, and it drowns the search results in inconsistent, unnecessary, inappropriate, slow-loading, irrelevant noise that sometimes replaces existing local search results. It's especially annoying when you're about to click on one of these, and suddenly what you're clicking on just turned into an ad.

For the record: I don't use Ubuntu personally, although I tend to recommend it to non-technically-inclined people who want to try out a Linux distribution. This whole easily-avoidable advertising mess would make me change my tune.

How to fix it

So now that the damage has been done, how do we get things straightened out?

Step 0: Reconsider

It's not too late to reconsider everything, and to dismiss the idea entirely. There's plenty of justification for that in this very post, or in the comments thread of the main Launchpad bug report. People won't forget what happened, but they will certainly appreciate such a decision because it means that their complaints have been heard.

Step 1: Update your privacy policy

This is a no-brainer. If you're going to gather more data about your users than you previously did, you need to update the privacy policy.

Thankfully, there is already a bug report about this, so this is on Canonical's radar.

Step 2: Make things clear to users

Users don't read privacy policies. It's important to have one, but users won't read it. Yet, they need to be aware of what is happening to their own data. To this end, I propose the following solution:

  • Whenever the current lens is going to communicate with the Internet, replace the looking glass icon in the text field by a globe icon.
  • Whenever there is a web request actively going on, make the globe rotate (as opposed to the spinner animation currently in use for local searches).
  • Whenever the globe icon is clicked, open a little panel explaining to the user the implications of the search they are about to make.

This makes it clear that there is something going on that will send data over the network, and it gives the user easy access to more detailed information about what exactly is going to happen.

Here's quick mockup of what this could look like (though it needs better fonts and icons):

Unity Dash disclosure mockup thumbnail

Think this message sounds scary? That's true. But then again, so is sending sensitive search terms to various unrelated third parties.

Step 3: Make it opt-in rather than opt-out

This is pretty self-explanatory. Any feature that goes against user expectations when enabled by default should be opt-in.

At the very least, it should be easy for the user to remove this feature. Currently, it isn't: The user needs to remove the unity-lens-shopping package:

$ sudo apt-get purge unity-lens-shopping

This is not user-friendly nor obvious. Canonical plans to address this, though they do not intend to make it opt-in at this time.

Step 4 option A: Make your actual strategy match your intended one

The current strategy doesn't respect the privacy guarantees that Canonical wants to provide. To fix this, here is what needs to happen:

  • Make the Dash use SSL/TLS when talking to productsearch.ubuntu.com (this is already in Canonical's plans)
  • Open up the source code used on the backend servers at productsearch.ubuntu.com (why not?)
  • Make the request from productsearch.ubuntu.com to Amazon use SSL as well. There's no reason not to, and having both hops over SSL strengthens the guarantee that only Canonical and Amazon can see the search terms.
  • Include the thumbnails of each item inside the reply from productsearch.ubuntu.com to the user. Use the data URI scheme to do that, or have the client request it by itself from productsearch.ubuntu.com (not Amazon), over SSL as well.

Step 4 option B: Actually make search terms anonymous

There is a relatively easy solution for Canonical to provide full search terms anonymization, such that Canonical only knows the IP of Ubuntu users (but not what they're searching), and Amazon only knows what Ubuntu users are searching for (but not who is searching what).

To pull this off, all Canonical needs to do is to set up a relay server instead of the current web server at productsearch.ubuntu.com. That relay server would simply forward whatever it gets from a client to Amazon, and send everything it got from Amazon back to the original client.

The client would effectively be performing an Amazon API request directly, using SSL, and Canonical's server would simply forward the encrypted bits along. This way, Canonical doesn't get to see which search terms are sent, thus any logging they may do would be useless. Amazon would see the search terms, but the only IP they would get is the Canonical server's IP. Users would still need to be warned that they shouldn't type identifying information as search terms, so that Amazon cannot link those search terms back to the users.

One of the consequences of this approach is that productsearch.ubuntu.com could now easily become a publicly-available spam relay towards Amazon's API servers. While I doubt that Amazon's API could be brought down solely from traffic coming from a Canonical server (my guess is that the Canonical server would crash and burn long before this happens), such a situation could potentially be solved through abuse complaints from Amazon to Canonical, asking Canonical to block certain IPs from sending further requests.

The downside of this system, of course, is that Canonical doesn't get to see the search terms. They claim they need to gather the search terms and click data so that they can "provide better, more relevant results", in order to make the user experience better.

I have an alternative suggestion for Canonical to make the user experience better: Allow users to rate search results. Add a little section to the Dash under the Amazon results that asks "Were these results relevant?", and corresponding "Yes"/"No" buttons. The data from these buttons will be a more precise metric than the current metric: "whatever the user clicks is relevant".

And if you're telling yourself: "This will never work! Users will click 'No' all the time!", then perhaps you should ask yourself whether this feature was really made with the users' interest at heart in the first place.


Comments on "Ubuntu privacy blunder over Amazon ads continues" (2)

#1 — by

Editor's note: The following comment was initially an email from back when the site didn't have a commenting system. The contents of the email has been republished here with permission.

I liked your article on the dash online search topic, and have been looking around and talking to some of our ubuntu community in the hope that there will be some progress that will improve the issue at hand. Now, I don't have much knowledge about the backend, I am learning as I dig deeper into it.

I can tell you know more than I do, I sent you this picture, which could be an option as solution, as I found it to be very similar to your idea, I made it before I read your article.

In short: separation of online search and offline (local) search, by adding the globe icon on the bottom of the search dash (see picture).

It's very user-friendly, everyone knows that when you choose the globe icon your query will yield online results and when choosing the home icon it will show local results only.

What do you think?

I know there is work on the way to resolve it all, so let's hope it's what we would like to see, to keep ubuntu the best system on the planet to use.

Leslie Scheelings, Ubuntu desktop user.

#2 — by Anonymous

This was the best article I saw regarding this issue. Thank you.

Replying to: Ubuntu privacy blunder over Amazon ads continues