Lots of HTTP 502 and 429 after updating to 3.4.0.beta1-dev

Crius · August 9, 2024, 1:33pm

Discourse version: 3.4.0.beta1-dev (bf3d8a0a94)

Updated yesterday and had to disable minify from Cloudflare as suggested here:

However, since then, lots of users (me included) experimented several instances of 502 (Bad Gateway) and 529 (Too Many Requests).

In an attempt at trying to alleviate the problem, I followed this guide as well:

However, nothing seems to have changed in terms of frequency of those errors.

The update happened yesterday around 11AM. I performed a full rebuild because I wanted to disable a plugin as well.

I have a Prometheus+Grafana instance that monitor the server and discourse, but the server seems fine in terms of loading:

Discourse metrics (the drop in metrics around 11AM yesterday is the rebuild bringing down the container):

Again, I don’t see any strange pattern.

However, this is the browser console just a minute ago, after trying to send a PM to a user:

Anything else I might provide (logs of any kind) please ask away. Thank you.

Crius · August 9, 2024, 1:37pm

Oh, if it helps, also lots of “background” real time operations are clearly lagging behind.

Topics already read are not registered as being read, for example.

Firepup650 · August 9, 2024, 1:40pm

Just to make sure, you did this step as well, then rebuilt, correct?

Using Discourse with Cloudflare: Best Practices

Additional configuration for self-hosters

To ensure the correct IP address gets sent to Discourse, you will want to add the following line to the end of your containers/app.yml.
cloudflare.template.yml
(Related: How do you setup Cloudflare? - #6 by codinghorror)

Crius · August 9, 2024, 4:03pm

Yes, sorry, I forgot to add, I had already added the Cloudflare template to the app.yml file a long time ago. We have always been behind Cloudflare, since the very first day.

This is a partial of the app.yml, we have our own certificates independently renewed, which is why the letsencrypt one is commented out:

## this is the all-in-one, standalone Discourse Docker container template
##
## After making changes to this file, you MUST rebuild
## /var/discourse/launcher rebuild app
##
## BE *VERY* CAREFUL WHEN EDITING!
## YAML FILES ARE SUPER SUPER SENSITIVE TO MISTAKES IN WHITESPACE OR ALIGNMENT!
## visit http://www.yamllint.com/ to validate this file as needed

templates:
  - "templates/postgres.template.yml"
  - "templates/redis.template.yml"
  - "templates/web.template.yml"
  - "templates/web.ratelimited.template.yml"
## Uncomment these two lines if you wish to add Lets Encrypt (https)
  - "templates/web.ssl.template.yml"
#  - "templates/web.letsencrypt.ssl.template.yml"
  - "templates/cloudflare.template.yml"

## which TCP/IP ports should this container expose?
## If you want Discourse to share a port with another webserver like Apache or nginx,
## see https://meta.discourse.org/t/17247 for details
expose:
  - "80:80"   # http
  - "443:443" # https

[...]

Crius · August 9, 2024, 4:16pm

Excerpt of the /logs

Crius · August 9, 2024, 6:30pm

I see this moved to installation, just to be clear, this is not a new installation.

This instance of discourse have been running since march 2023 and never had this specific problem.

There has been an issue in the past with some 529 but had been since resolved.

Moin · August 9, 2024, 6:57pm

I think it still fits into

Falco · August 9, 2024, 8:18pm

Looks like your PostgreSQL is overwhelmed. Looks like most of you RAM is idle, I´d try tweaking the DB to use it and see how things fare after that.

RGJ · August 9, 2024, 10:43pm

What does /sidekiq/queues look like?
What version where you updating from?

Crius · August 10, 2024, 12:16am

From the latest stable the 6th of May, v3.2.1 to the latest test-passed.

Sidekiq queues:

The Dead job section is this one but it’s the same job since the beginning of time it seems.

Oldest entries:

The ones in retry are the same job being retried over and over it seems.

Crius · August 10, 2024, 12:18am

But… why suddenly? After just an update of the application layer?

I am using the discourse prometheus exporter plugin.
If I added a postgresql exporter as another container on the VM, would it be possible to allow it to access the metrics on the discourse postgresql installation?

Crius · August 11, 2024, 11:59pm

Any more precise direction about how to fine tune the db for discourse?

Crius · August 12, 2024, 1:52pm

Not sure if related, but surely started happening after the update, clicking on the dismiss button in the unread tab always return a 503.

Crius · August 13, 2024, 8:12am

Well, as it seems like there isn’t a solution I’ll try to go back to the latest stable as it’s supposed to be… you know, stable.

Crossing finger that there isn’t a core dependency breaking the build process, like last time.

RGJ · August 13, 2024, 8:43am

You can’t go back from tests-passed to stable, unless there is a higher stable version available. So the next opportunity for you is when 3.4.0 is out, I figure that’s around or after Christmas…

Besides, you’ll have to bite the bullet someday.

Crius · August 13, 2024, 8:53am

Well, I just did. Seems to be working. We don’t care anyway for any of the features in 3.3.0 anyway.

I’ll see if there are still issues. Worse it can get is that we still get a plethora of 429 and 502, not that big of a change.

I’d appreciate directions on how to configure the Postgres on discourse so it has more resources available, however.

Edit: Deployed version 3.2.5. System seems stable.

RGJ · August 13, 2024, 8:54am

Please remind us that you did this when you’re posting your next issue

Crius · August 13, 2024, 8:59am

I always mention the version I am in when posting an issue.
I think it’s important to remember that exactly because this is presented as an open source software, critical issues should be considered instead of writing things like this:

This is yet another example of people going out of their way and switching to the “stable” version encountering some bugs that fall between the cracks because it’s a not the most popular version deployed.

When stable should mean “stable”, not “legacy”.
The fact that core dependencies like discourse docker are pushed without a tag system should be enough to be a bit more humble when responding to users that are reporting an issue.

RGJ · August 13, 2024, 9:03am

I was talking about mentioning the fact that you downgraded when you technically couldn’t.

I think it’s important to remember… that I do not work for Discourse and I am helping you in my own time, so I do not appreciate your tone, nor am I able to do anything with your feedback.

Moin · August 13, 2024, 9:05am

Topic		Replies	Views
Bursts of 502 Service Unavailable, pointers to debug Installation	16	1578	June 8, 2024
Admin dashboard won't load after upgrade to v2.1.0.beta3 +20 Installation	32	1870	August 7, 2018
Cloudflare or plugins breaking my Discourse instance? Support	20	2223	June 9, 2020
Cloudflare with Discourse Installation	47	5924	January 8, 2020
I tried to rebuild but I can't finish cause this error Installation	2	37	June 12, 2025

Lots of HTTP 502 and 429 after updating to 3.4.0.beta1-dev

Related topics