Defaulting to CDN for avatars is a privacy and security risk


(Eric Mill) #1

Discourse 1.4 defaults to serving “letter” avatars from, to speed things up by default and take advantage of global caching.

Unfortunately, this also means that Discourse’s servers learn about the browsing activity of all users of Discourse instances which have this default in place, on any page with at least one letter avatar. While Discourse (the company) may be as trustworthy a steward of such information as possible, just having that information in one place is a serious privacy risk (especially if it were stolen or compelled by legal order).

In addition, there is a security risk of being compromised and used to serve malicious images, or to serve redirects to other third-party domains who wish to use collected browsing activity for malicious purposes.

The simplest and most complete way to mitigate the privacy and security risks for Discourse (the platform)'s community is to make the avatar service opt-in, rather than opt-out. You could create a commented out section of site_settings.yml that helpfully shows people how to opt-in, as well.

I’m a little confused as to why a global CDN for avatars is desirable enough to be worth the privacy risk, in the age of HTTP/2 and/or SPDY. Have any tests been done in an HTTP/2 or SPDY environment to measure the performance impact of downloading additional avatars, once the TCP connection to the main domain is already open?

HTTP/2 and SPDY may not always be available for each host, and if the resource is already cached, that’s obviously faster than downloading it no matter what – but it also won’t take that long for the browser to cache most of an instance’s own self-hosted letters either, even over HTTP 1.1. There’s only 26 letters, and not all of them are commonly used.

So to me, the performance benefit here is so negligible that the privacy and security concerns totally outweigh it. I strongly recommend that Discourse revert this feature for Discourse 1.5.

(Rafael dos Santos Silva) #2

Calm down bro :smiley:

The performance hit was about rendering the images in the first place (CPU) not network.

And you can always turn it off :smiley:. I’m happy because the work to put this service allowed me to switch to another by simple configuration. They just ship with a working default.

(Sam Saffron) #3

This is not correct

The issue is not serving these letters, it generating them at scale, especially when algorithms change. Its 300ms of work to generate the original and then another bunch of work for each resize, in scale this adds up.


So yeah, if you are not happy with our do not track policy and want to take the hit yourself, disable the site setting.

(Eric Mill) #4

This is not correct

I’m glad you implemented EFF’s DNT standard, but no, this doesn’t address my concern.

EFF’s DNT standard is great, but you’re still collecting logs in one place, and your servers still learn about everyone’s behavior. I’d rather have the DNT policy on record than not, but fundamentally we have to trust Discourse (the company) to obey it. That’s not nearly as private as simply never talking to your servers at all.

And all of the elements of this policy only apply to people who have the DNT header set in their browser, so there are no formal guarantees in place for users without it. Even if you informally apply the policy to everyone, this still doesn’t protect against someone compromising your servers.

DNT policy or not, using by default materially increases the privacy and security risks of your users, for dubious performance gain.

The issue is not serving these letters, it generating them at scale, especially when algorithms change. Its 300ms of work to generate the original and then another bunch of work for each resize, in scale this adds up.

I understand this route is expensive, and you detailed a number of other options you could pursue. I also noted that the only one that worked well for email is the hosted version.

What I don’t understand is why the majority of size/letter combinations can’t be computed ahead of time, so that you only take the performance hit for novel sizes.

Another alternative would be having the Discourse server instance proxy to server-side. This would leave the security risk in place from a compromised avatar CDN, but would remove the privacy risk almost entirely.

Another alternative is to use the SVG approach you outlined, but export it to an image in the size you need – that seems offhand like it would be faster. Another alternative is to pursue tuning image generation itself in some way. Another alternative is to do nothing at all.

I’m not trying to armchair-engineer Discourse for you, but of all of those, precomputing sounds the simplest and most helpful to me.

In any case: DNT or not, making Discourse instances phone home by default is a bad move, for small benefit, and Discourse should find another way forward.

(Jeff Atwood) #5

We feel the performance benefits are worth it, and it is trivially disable-able in site settings.

If you have any suggestions for more severe and extreme “we do not care about tracking anyone, this is only about performance so everyone that uses Discourse can have a good experience” policies or files or notices we can put in place, beyond what we currently have, let us know – happy to do so.

We do not have any form of logging enabled for this service, for both privacy and performance reasons. Happy to go on record with that, in as many ways and places as needed, happy to have a third party auditor of your choice come in and verify that as needed.

(Eric Mill) #6

We feel the performance benefits are worth it,

I disagree, and I’m not sure you’ve considered the concrete suggestions I made at the end of my last comment that might address both your performance concerns and the privacy and security concerns I’ve raised.

and it is trivially disable-able in site settings.

That’s not relevant. No matter how easy you make it, because it’s opt-out, most Discourse installations of version 1.4 and above will phone home for much of their users’ browsing activity.

We do not have any form of logging enabled for this service, for both privacy and performance reasons. Happy to go on record with that, in as many ways and places as needed…

That Discourse 1.4+ instances will phone home to a very privacy-friendly steward is much better than the alternative kind of steward, and I certainly appreciate it.

It’s just not good enough, for the marginal benefit this optimization offers. EFF’s DNT policy is an acknowledgment by the privacy community that third party services are useful, and aren’t going to go away. But that policy shouldn’t be a free pass to add more third party services than are necessary to provide a quality product.

I understand your team has judged that the performance benefits meet that standard, but from everything I’ve read about the issue, the current solution, and the potential alternatives to this solution, I don’t agree.

(Kane York) #7

This is also different from Gravatar in the sense that the images are cached aggressively, as opposed to Gravatar’s 5-minute invalidation.

The headers shown in the code match up with what is being served:

kane@kane-TECRA-Z40-B:~$ curl -I
HTTP/1.1 200 OK
Server: cloudflare-nginx
Date: Fri, 25 Sep 2015 06:10:30 GMT
Content-Type: image/png
Content-Length: 290
Connection: keep-alive
Set-Cookie: __cfduid=d3753f1bba1bbb1604c75d85835ef10a01443161430; expires=Sat, 24-Sep-16 06:10:30 GMT; path=/;; HttpOnly
Cache-Control: public, max-age=157788000
Last-Modified: Tue, 11 Jan 2000 00:57:26 GMT
Expires: Thu, 24 Sep 2020 12:10:30 GMT
Etag: L251971612101
Vary: Accept-Encoding
CF-Cache-Status: HIT
CF-RAY: 22b490fc7d8211f5-SJC

Looks like Cloudflare is adding a cookie. But that’s nothing you weren’t already getting on every single other Cloudflare site out there.

Also, that request was a Cloudflare hit so it didn’t even reach the Discourse servers.

(Joe Seyfried) #8

@konklone, I’ve stumbled across some issues concerning the adoption of privacy in Discourse, too and have learned that the Discourse team is adopting a kind of “post privacy era” point of view towards them.

I don’t think the issue of security is too relevant here (although such a single point of attack makes the target very promising - to an attacker, the avatar service would provide leverage to compromise Discourse installations around the globe).

The privacy issue, however, is quite an obvious one which is even worse since world-wide, the legislation concerning privacy for online services differs hugely. This can make the use of Discourse difficult or even cause legal problems for the site admins. My wish would be that at some point in the future, such privacy concerns will be taken into consideration more seriously.

(Jeff Atwood) #9

Sure, just flip the site setting switch, and you have 100% local avatars like before. Not sure what all the fearmongering is about.

(Sam Saffron) #10

This is unfounded, baseless conjecture and off topic. You can’t be throwing around daggers like this, you have not alerted us to even a SINGLE other place where we are acting as if we are in a post privacy area

@konklone if you can submit a PR that pre-generates letter avatars and removes the CPU cost so it is only paid up front on user avatar selection, well I would be open to turning off this service by default.

If you want to submit a PR that cleanly proxies letter avatars locally without walking the rails stack, also open to it. Its possible this can be done in NGINX pure with mod proxy, I would also be open to turning that on by default. (then default could be proxied letter avatars, and users would not hit our service)

But, we can not afford to spend dev time on this at the moment and I am not going to accept a performance hit here.

(Eric Mill) #11

Again, that is not relevant. An individual Discourse installation can address this problem, if they choose. The issue I’m raising is that the community of Discourse (the platform) installations will now, by default (and so most of the time) be phoning home to Discourse (the company).

That’s a bad dynamic for Discourse user privacy, and the fact that there’s a site setting switch to turn it off doesn’t change that.

That’s a great point – but also undermines the case for moving these to a CDN at all. If isn’t actually going to be hit all that much, then why is it so necessary to integrate the service by default? You could do the same aggressive caching with avatars generated and hosted locally.

I hadn’t realized Cloudflare was fronting the site. Unfortunately, that makes the problem even worse, as now most Discourse users will be sharing their browsing information with Cloudflare and Discourse’s servers. This will be true whether or not the operator of the Discourse installation has a relationship with Cloudflare, or whether they’ve even heard of Cloudflare.

As you note, they place a cookie that tracks visitors across Cloudflare sessions. While Cloudflare’s support docs are clear that this doesn’t connect to anything personally identifiable, it nonetheless allows Cloudflare to connect the IP addresses of Discourse users with whatever else their browsing activity is that touches other Cloudflare-hosted sites.

@codinghorror, if it’s really so easy to flip the site setting switch, and the performance hit of avatar generation is a real problem, then site owners who are willing to trade privacy for performance should make that decision by opting in. That’s not a decision that Discourse should make for most of the community, which – whether you intend it this way or not – is what making it an opt-out does.

(Michael Downey) #13

I thought we had already worked through this ethical question about automatically opting people in to 3rd party services without the explicit knowledge of admins or users, during the “discourse hub” check-in feature discussions. (Which I can no longer find here on the site, but perhaps someone else can dig up.)

It is disappointing to see similar issues back again, and I hope they are short-lived for all the same reasons the community was concerned about last time.

(Robin Ward) #14

I am curious why nobody brought this up when we used to default to gravatar? It’s exactly the same isn’t it?

Except in this case:

  • We have a do not track manifest

  • Our server implementation is 100% open source

  • You have the word of a co-founder of the company that we are not logging

On top of that it’s easily disabled!

(Michael Downey) #15

I think all of those bullet points are really good things, but to be fair, people were concerned about Gravatar usage as early as February 2013 (and keep in mind the denominator of how many Discourse “people” were around back then was much lower):

AFAIK Gravatar was “always” used, at least from when I started with Discourse. And the difference here (and a source of OP’s original concern, I think) is moving from a local to an outsourced solution by default, without any notice or chance to opt-out before it happens.

(Robin Ward) #16

Not sure that topic is the same – in that case they were sending the MD5 of an email over. We do no such thing with our service so that is not a concern. The privacy issues here are about HTTP headers being sent.

I get the whole “It’s a new default thing” but I do think we announced it all our release notes and blog posts. People were made very aware of this new feature.

(Gerhard Schlager) #17

I can understand both sides and I’d say leave it on by default. It’s a good default for most sites.
Admins who care about their users privacy will look at each setting anyway and can easily disable it if they think it’s bad.

But, I have two feature requests:

  1. As a Discourse admin I’d like to be able to have some kind of filter that shows only those settings that were added or changed during the last update. Ideally the system would show me those settings as soon as the update has finished or at least show a notice on the Dashboard (“There are new or changed settings. You should take a look at them. [Link to filtered settings]”).
  2. Optional: Add a “Privacy” section to the settings that shows all the options that could have privacy implications (like Google Analytics, Gravatar and the external avatar service).

(Michael Downey) #18

One thing that WordPress does that is useful is a release note style page within the app after you go through its internal upgrade routine. Would something like that be useful for your concerns @gerhard?

(Gerhard Schlager) #19

I’m no WordPress user, so I don’t know how that “release note style page” looks like.
But release notes are something I read before an update, not afterwards. What I want is to filter the settings and see only those that were not available in the last version or those where the default value changed.

Something like this:

Notify administrators of new settings on upgrade
(Jakob Borg) #20

Having a note in the release notes is fine for upgraders (at least those who don’t track the beta, thereby deviating from the recommended practice…). But new users don’t get to see anywhere front and center that there is a third party service in use. Gravatar is well known and an admin may understand to take a look at that, but that default letter images are loaded from a third party is fairly unusual I think.

(Jeff Atwood) #24

Just to illustrate the performance impact, here’s what it looks like on Meta pre-change:

July 4

Route                                   Duration Reqs  Mobile       
-----                                   -------- ----  ------       
topics/show                              1472.59 15980 2607 (16.31)% 
user_avatars/show_letter                  448.51  1949   153 (7.85)% 
static/cdn_asset                          367.41 13054 7004 (53.65)% 
list/latest                               227.95  3936   126 (3.20)% 
user_avatars/show                         198.00 11649  1005 (8.63)% 

July 5

topics/show                              1912.75 20764  713 (3.43)% 
user_avatars/show_letter                  402.97  1794  127 (7.08)% 
list/latest                               266.15  4367  155 (3.55)% 
user_avatars/show                         210.25 12632 1003 (7.94)%  

July 6

topics/show                              1604.10 16436  438 (2.66)% 
user_avatars/show_letter                  920.77  3056  204 (6.68)% 
user_avatars/show                         339.71 19379 1177 (6.07)% 

After the change to

Sep 23

Route                                 Duration Reqs  Mobile      
-----                                 -------- ----  ------      
topics/show                            1848.01 19649  502 (2.55)% 
user_avatars/show                       290.53 20604 1435 (6.96)% 
static/cdn_asset                        273.60  8891  701 (7.88)% 
list/latest                             272.96  3703  163 (4.40)% 

Sep 24

topics/show                            2127.26 20893  929 (4.45)% 
user_avatars/show                       280.75 20468 1232 (6.02)% 
list/latest                             273.07  3699  164 (4.43)% 

Sep 25

topics/show                            1992.43 19368  680 (3.51)% 
list/latest                             249.98  3306  146 (4.42)% 
user_avatars/show                       236.79 18039 1177 (6.52)% 

Notice that

  • show_letter no longer eats up 7% of all CPU time after the change.
  • show_letter has incredible cost – if you divide requests by duration, it’s super high!