User-specific 502 errors after login — traced to DiscourseUpdates.has_unseen_features?

Hello,
I’m facing a strange issue with my Discourse where 502 errors happen only for one specific user (admin) account after login, while:

  • Anonymous users can access the site normally

  • Other user accounts can log in and use the site normally

  • The problem only appears for one specific account (admin account)


:puzzle_piece: Environment

  • Discourse installed via official Docker setup

  • Reverse proxy / CDN: ArvanCloud (similar to Cloudflare)

  • Access to international internet is restricted / unstable (GitHub and some external services not reachable)

  • Discourse has not been updated for ~1 month


:red_exclamation_mark: Symptoms

When accessing the site:

  • If I open the site in private/incognito mode → site loads fine

  • If I log in using my main account → immediately get 502 Bad Gateway

  • If I log in using another account → everything works fine

So the problem is clearly user-specific and triggered after authentication.


:page_facing_up: CDN (ArvanCloud) error logs

Two main errors appear:

1. Upstream timeout while reading upstream

upstream timed out (110: Connection timed out) while reading upstream

Affected URLs are mostly assets, for example:

  • /assets/browser-detect-*.js

  • /assets/plugins/automation-*.js

  • /assets/plugins/discourse-gamification-*.js

  • /assets/plugins/discourse-lazy-videos-*.js

2. Upstream prematurely closed connection

upstream prematurely closed connection while reading response header from upstream

For example:

  • /stylesheets/common_theme_rtl_*.css

  • /theme-javascripts/*.js

So CDN is waiting for response from Discourse but backend times out or closes connection.


:magnifying_glass_tilted_left: What I found on the backend

In Rails stack traces, the error path points to:

  • current_user_serializer.rb

  • discourse_updates.rb

  • method: DiscourseUpdates.has_unseen_features?

Which suggests the crash/timeout happens while checking new feature announcements for logged-in users.

Since only one user is affected, this strongly suggests the problem is triggered during user-specific serialization, not global site rendering.


Any guidance would be appreciated.
Thanks a lot.

Did you try on other browsers/devices? Did you try deactivating your browser’s extensions?

Edit: Perhaps I misunderstood. Does the site loads in incognito mode AND logged-in as a user?

Yes, I tested on multiple devices and browsers, and it’s not related to browser extensions.

What I’m seeing is user-specific, not device-specific:

  • On any device where the admin account was already logged in, opening the site immediately results in 502 Bad Gateway.

  • In incognito/private mode, the site loads normally and I can reach the login page.

  • From there, I can log in successfully with another (non-admin) account and the site works fine.

  • But when I try to log in with the admin account, right after submitting the email and password, I consistently get 502, and the page never loads.

We encountered the same issue and traced it further down.

The root cause is DiscourseUpdates.has_unseen_features? calling GitUtils.has_commit?, which runs:

git merge-base --is-ancestor <sha> HEAD

In newer Discourse Docker images, the repo inside the container is configured as a partial clone with promisor remote. When the feature SHA is not present locally, Git attempts a lazy fetch from the remote (upload-pack), which takes ~3–4 seconds per call.

Since multiple features are checked, this results in 30+ seconds request time and eventually a 502 (Unicorn timeout), especially for staff users where this check runs.

Key points:

  • Happens only for staff (via CurrentUserSerializer)

  • Caused by missing commits in partial clone

  • Git operations on missing objects trigger slow remote lookups

  • Reproducible via git merge-base --is-ancestor <missing_sha> HEAD inside container

A simple mitigation is caching GitUtils.has_commit? results (e.g. per HEAD+SHA), which avoids repeated expensive git calls.