Avatar service somewhat broken

It seems like as of recently, users who sign up (edit: or maybe just have the “system avatar” but not cached in my browser) don’t get the typical colored-character system avatar, instead they just get a white / nothing picture. In some cases it seems like it loads, refreshing asynchronously. I see this on my instance, but also here:

Edit:

Looks like the avatar thing is sometimes returning a 502 error.

2 Likes

A fire/explosion in the vicinity of the datacenter where the avatar service is hosted has caused the outage. We are working on getting everything back online.

12 Likes

Thanks for the update.
Is there any status page of the datacenter to track the incident ourselves?

And is it just us having admin reports not working?

Browser Network

Admin reports loading...

Admin reports failed to load

1 Like

I’m not seeing any problems with admin reports on our sites.

You could track the incident on our status page at https://status.discourse.org/

3 Likes

I have seen the same problem (admin reports not loading) this morning. The logs show lots of timeout errors in the avatar proxy service and the version check job. I wonder if that is causing other background tasks to be delayed, including the generation of reports.

3 Likes

The fact that avatars are being proxied through nginx, and those are now taking 15 seconds to timeout instead of milliseconds to load can under some circumstances seriously saturate your nginx capacity, resulting in errors in other, unrelated, requests as well.

It does seem to help to temporarily disable Admin - Settings - Files - external_system_avatars_enabled. (thx @gerhard)

5 Likes

Thank you guys. I’m forwarding this to our sysadmin team for them to check what may be causing reporting errors and if temp fix you provided should be applied in our case

It took me some time to realize this is a forum setting. Can report that disabling external avatar service does not help admin dashboard modules to load.

1 Like

The central avatars service should now be back online :rocket:

7 Likes

It’s interesting to me that our self hosted instance suffered a lot of random flakiness during this issue, on top of the obvious one of just avatars not loading… Turning off “external avatar service” in the settings also did not help – avatars then did not render at all, and the API calls still blocked for a long time before failing. I thought our instance was more or less independent, or could be, but apparently not so.

6 Likes

Flipping the setting will remove the dependence on the central service, but it will require a full rebake of all posts to update all the cached avatar URLs.

You’re absolutely right that an outage in the letter avatar service shouldn’t affect the rest of the site. As @RGJ noted, this seems to be related to NGINX capacity when the proxied requests block for a long time - we’ll certainly look into whether any improvements can be made there.

6 Likes