Ramping up and performance issues related to scalability

(Kgish) #1

For an important customer, we’ve done a pilot using Discourse and (of course) they have been very pleased with the results so far.

Now it is time to ramp up to production, eventually servicing more than 40,000 users. We are worried about performance and initial analysis has shown memory problems and unpredictable behavior, e.g. email, invites and sidekiq.

Increasing memory and the number of CPUs doesn’t seem to help.

Is there a known limit for Discourse when it comes to number of users?

Is there a list of best practices or some document that outlines scalability issues, how to ensure that Discourse will continue to perform well with increased user traffic.

(Rafael dos Santos Silva) #2

I have a forum, more than 100k users, 30k active in a 30day interval, and it doesn’t even make a 8core 8gb ram server sweat.

I was about to tweak down the server specs, to be honest.

However, this really depends on your load type.

(Kgish) #3

Is it a single docker container running on a vm? Would you be so kind as to share the specs?

(Rafael dos Santos Silva) #4

Single docker container running in standalone mode, the simplest of all.

It’s a VPS on a private cloud running the VMWare tech. I don’t have the details of the host machine, but looks like it’s pretty big.

Also the Storage solution doesn’t use SSDs so some pages (suggested topics is the worst one) get a little bigger load time, but nothing big.

I even had a peak day with almost all monthly users logging and posting in a promotion topic, and it used something like half of the server. Too bad the docker container doesn’t have netstat so I couldn’t count the number of connected users on message_bus (this would be a good stat to have on admin page).

(Kgish) #5

I wonder then why I’m having so many performance problems, for example…

Sidekiq keeps consuming too much memory and has to restart (using top I can see that the machine is swapping)

Users are getting 422 errrors rate limiter too many requests.

This never happened before.

I did run a script to import the users automatically where they have to request a password reset, could this be a possible cause?

(Rafael dos Santos Silva) #6

This happens on normal use.Like a memory leak protection. Maybe we should tweak the limit a bit.

(Kgish) #7

Why would it be consuming so much memory in the first place? What limit tweak value would you recommend?

(Rafael dos Santos Silva) #8

So much? That, as far as I know, a limit the team put where the sidekiq is restarted. Since the queue isn’t lost, the process is restarted and keep working from where irt stopped. No big deal.

Are your users coming from different IPs? First time using Discourse, I sent a bunch of e-mails for the big opening. Discourse was behind a reverse proxy, everyone coming from the same IP. That didn’t work well :smile:.

See if the user hitting the limit doesn’t share IP with another user. That’s very common on enterprise environments. Not so much on the internet. If you are confident that you don’t need no rate limiting, it’s easy to remove the "templates/web.ratelimited.template.yml" line from the app.yml

How can I disable the rate limiter
Discourse blocking with password reset
(Hosein Naseri) #9

I removed this line but I’m still hitting the rate limit. why?