Users reporting lots of 502 errors when attempting to post due to "max consecutive replies" check

Look for the 502s in the nginx logs, what are some example errors? Is there anything in our discourse logs that correlates?

My guess is that something is locking up and timing out. Can you confirm you are on latest?

5 Likes

I’ll take a look tonight or tomorrow - just back from vacation and trying to catch up on some work! Thanks for the help.

5 Likes

I think I may have narrowed it down a bit. I was unable to replicate the 502 errors on my mod/admin account, so I impersonated a user who had the problem and encountered it. When the user hits the submit reply button on a longer thread (4700 posts), the page sits on “saving” for a good 20 seconds before eventually failing to post with a 502 error.

Granting that user moderator status immediately fixes the problem. He was TL2, and granting TL3 did not fix it.

8 Likes

That is very interesting, cc @sam. Good sleuthing!

4 Likes

Since this is new, I am guessing this could be due to the consecutive reply check, will have a look later today

Given we have an easy bypass per:

Can you set max consecutive replies in your site settings to 0 and let me know what happens?

3 Likes

I think that MAY have fixed it-- it isn’t 100% reproducible, and I closed the thread I was testing in earlier, but re-opening it I was not able to reproduce. Will ask the users if they’re seeing any more 502s.

2 Likes

hmmm cause I just tested this query in data explorer on a 20k post topic and it is lightning fast.

SELECT user_id
  FROM posts
 WHERE deleted_at IS NULL
   AND NOT hidden
   AND topic_id = SOME_TOPIC_ID
 ORDER BY post_number DESC
 LIMIT 3

I would love to get to the bottom of this.

2 Likes

Happy to run whatever you want to help debug, but I’m not familiar with postgres or ruby so need some guidance.

3 Likes

Awesome, can you install Data Explorer Plugin and then run the query above substituting SOME_TOPIC_ID with the topic id of the problem topic?

2 Likes

All set. The topic I tested earlier:

3 results. Query completed in 0.5 ms.

and here is the output from the largest topic on the forum that’s not locked, 15k posts:

3 results. Query completed in 0.6 ms.

So seems really fast.

Also so far nobody has seen a 502 error, since we set that parameter to 0.

3 Likes

Thank you so much for :bear:ing with me.

I believe I just fixed the culprit here:

https://github.com/discourse/discourse/commit/a7628c1d749f2d0c2d0744e5bdeaac8145554562

Any chance you can update to latest and re-enable the setting? Let me know if the issue is still gone?

6 Likes

Not sure how to update, dashboard says I’m already up to date and running git pull in /var/discourse says the same thing.

Just issue the rebuild from the command line and you should be good. You can time it this way too :wink:

2 Likes

Did you try visiting /admin/upgrade?

1 Like

OK, I updated (went to admin/upgrade manually, there was no link in the dashboard) and reverted the consecutive replies setting to 5. So far no errors, asked users to verify.

Thanks again for your help with this! Much appreciated!

3 Likes

There will only be a link from the dashboard when we release a new beta. We add code daily (hourly, even) - if we notified you for every single commit, your site would always say it’s out of date :wink:.

2 Likes

Makes sense, that’s why I tried a git pull earlier also. Anyway it’s upgraded now.

4 Likes

Thanks for your diligence in staying on top of this, we found a subtle but important issue as a result.

5 Likes