Extreme Load Error After Upgrading to 3.3.0.beta3-dev Yesterday (on Prem)

Upgraded to 3.3.0.beta3-dev yesterday and also Installed AI plugin. The plugin is currently only enabled to staff member (5 people)

But the entire site is dog slow, I’m getting extreme load errors. I can’t seem to figure out where is it coming from my server load seems fine.

Is there a place or places I can go to figure out what is causing it.

Here’s what I see in the Crawler Report not sure if tis bad or good or what. I don’t have a frame of reference.

Looking at my Server looks like unicorn processes are pretty busy

Is this the cause? Do I need more CPU? or Just more Unicorns?

Has its been a while since your last upgrade? Maybe it’s doing some kind of image processing or rebaking.

You can have a look at /sidekiq to see what it’s doing.

Queues Are Empty

I don’t really know what the rest of this means.

I’m not sure what’s normal here… Here’s our server spec
image

Rebooted everything was back to normal but now we are getting extreme load again. I can’t figure out where the issue is coming from is there any tooling that can help within discourse?

So the 3 unicorn workers

Are busy… but we aren’t getting higher than normal traffic as far as I can tell it is about the same it has been the only change was upgrade to 3.3.0 and Added Ai plugin but its only available to Staff

Issues Started Yesterday 6/3

We do have a few more crawlers it seems.

Here’s just Crawlers over a month, but again ti doesn’t seem that much higher. The site is almost un-usable

Any help would be appreciated!

This is a guess, but the only thing that stands out for me in the Sidekiq logs is that the job that’s shown is NotifyMailingListSubscribers. That job can potentially create a lot of requests.

Also, do you see any errors on your Admin / Logs / Error Logs page?

I added a block to the facebook crawler cause that guy was going to town

However I noticed that adding slow / crawlers isn’t updating my robots.txt

but robots.txt doesn’t show the slow entries, only the block entries.

Quite a few of these

I see 3 errors but they don’t seem related… (though it is hard to tell)

Job exception: PG::DatetimeFieldOverflow: ERROR:  timestamp out of range: "271768-09-23 06:24:11.793040 BC"
LINE 1: ...sers"."moderator" = FALSE AND (users.created_at < '271768-09...
                                                             ^
ActionDispatch::RemoteIp::IpSpoofAttackError (IP spoofing attack?! HTTP_CLIENT_IP="10.10.121.119" HTTP_X_FORWARDED_FOR="14.140.10.244, 14.140.10.244")
app/controllers/topics_controller.rb:1298:in `track_visit_to_topic'
app/controllers/topics_controller.rb:169:in `show'
app/controllers/application_controller.rb:422:in `block in with_resolved_locale'
app/controllers/application_controller.rb:422:in `with_resolved_locale'
lib/middleware/omniauth_bypass_middleware.rb:64:in `call'
lib/content_security_policy/middleware.rb:12:in `call'
lib/middleware/anonymous_cache.rb:391:in `call'
lib/middleware/csp_script_nonce_injector.rb:12:in `call'
config/initializers/008-rack-cors.rb:14:in `call'
config/initializers/100-quiet_logger.rb:20:in `call'
config/initializers/100-silence_logger.rb:29:in `call'
lib/middleware/enforce_hostname.rb:24:in `call'
lib/middleware/request_tracker.rb:291:in `call'

And another job exception around SMTP

Discourse does it’s own rate limiting, it doesn’t rely on robots.txt

Thanks Michael,

Any other ideas on what it could be? Would spinning more unicorns help?

Is that done form the app.yml?

Yes, that would probably help.

env:
  UNICORN_WORKERS: 8

in the app.yml will do this.

I recommend pulling performance numbers using the prometheus plugin if you have that set up, or you can use performance headers.

Analysing your web logs should help a lot with identifying why your server is so busy; looks like crawlers are a good place to start.

2 Likes

Well upgraded to a new DO instance doubled the ram and CPU. Added 8 unicorns (vs 3) did a Db reindex and vaccum and I think we are back in bussiness!

Thanks for the help

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.