It seems like as of recently, users who sign up (edit: or maybe just have the “system avatar” but not cached in my browser) don’t get the typical colored-character system avatar, instead they just get a white / nothing picture. In some cases it seems like it loads, refreshing asynchronously. I see this on my instance, but also here:
A fire/explosion in the vicinity of the datacenter where the avatar service is hosted has caused the outage. We are working on getting everything back online.
I have seen the same problem (admin reports not loading) this morning. The logs show lots of timeout errors in the avatar proxy service and the version check job. I wonder if that is causing other background tasks to be delayed, including the generation of reports.
The fact that avatars are being proxied through nginx, and those are now taking 15 seconds to timeout instead of milliseconds to load can under some circumstances seriously saturate your nginx capacity, resulting in errors in other, unrelated, requests as well.
It does seem to help to temporarily disable Admin - Settings - Files - external_system_avatars_enabled. (thx @gerhard)
Thank you guys. I’m forwarding this to our sysadmin team for them to check what may be causing reporting errors and if temp fix you provided should be applied in our case
It took me some time to realize this is a forum setting. Can report that disabling external avatar service does not help admin dashboard modules to load.
It’s interesting to me that our self hosted instance suffered a lot of random flakiness during this issue, on top of the obvious one of just avatars not loading… Turning off “external avatar service” in the settings also did not help – avatars then did not render at all, and the API calls still blocked for a long time before failing. I thought our instance was more or less independent, or could be, but apparently not so.
Flipping the setting will remove the dependence on the central service, but it will require a full rebake of all posts to update all the cached avatar URLs.
You’re absolutely right that an outage in the letter avatar service shouldn’t affect the rest of the site. As @RGJ noted, this seems to be related to NGINX capacity when the proxied requests block for a long time - we’ll certainly look into whether any improvements can be made there.