Troubleshooting a slow site that was pretty zippy until this morning

How would I go about troubleshooting a site that’s become slow (for no apparent reason) today?

Resource use is very low:


This is a 16 GB Memory / 4 AMD vCPUs / 200 GB Disk / SFO3 - Ubuntu 24.04 (LTS) x64 droplet with 30% disk used.

DigitalOcean Service status has been normal all day.

Slow site has been reported in various locations by users.

yaml:
UNICORN_WORKERS: 8
db_shared_buffers: "1024MB"
db_work_mem: "40MB"

I’ve rebuilt to latest and gave Sidekiq some more memory UNICORN_SIDEKIQ_MAX_RSS: 1000

Some 429 errors in the console:


The Error log from the last 3 days:

1 Like

what happens in safe mode?

1 Like

I don’t get errors in the console in safe mode, but it’s much slower. It takes about 10-15 seconds to load anything and images are chugging like they are coming over a 14.4 Kbps modem.

It took about 20 seconds to load /logs. Going back to /admin took about a minute.

A “poll” seems to take a long time:

BTW, these are the plugins running:

      - git clone https://github.com/discourse/docker_manager.git
      - git clone https://github.com/discourse/discourse-data-explorer.git
      - git clone https://github.com/paviliondev/discourse-locations.git
      - git clone https://github.com/discourse/discourse-affiliate.git
      - git clone https://github.com/discourse/discourse-yearly-review.git
      - git clone https://github.com/discourse/discourse-docs
      - git clone https://github.com/discourse/discourse-subscriptions
      - git clone https://github.com/paviliondev/discourse-category-lockdown
      - git clone https://github.com/discourse/discourse-reactions.git
1 Like

Here are a couple more data points from this morning. Sidekiq seems laid back:

Interesting memory graph - after app rebuilds it’s about 20-30%, then jumps to 46% during a backup and stays there:

Do you have the infamous badges in posts theme component installed?

4 Likes

This one?

8 Likes

Woah! Night and day after removing the Post Badges component. Disabling it did not make a difference, but deleting it did. No more console errors, either.

Thanks @Falco!

5 Likes

Welp, I’m afraid that was not it, or at least not the whole thing.

Now I’m seeing broken images and this in the console:

Still slow loading or not loading at all with the spinner going…

1 Like

I wonder if this has anything to do with the issue:

I restored Discourse from a backup about 4 weeks ago when I moved it from an old Ubuntu 16.4 LTS droplet to new one running Ubuntu 24.04. I did not do a manual rebake.

2 Likes

Keeps getting weirder. This is when going from /logs to /admin by clicking the “Back to site” link.

1 Like

There was another recent topic with the “no route named admin” error.
Site Glitch Content Not Showing Up - #18 by Suresh_Suthar

Maybe this is also Cloudflare related
Resolving "SyntaxError: Unexpected identifier #..." caused by Cloudflare Auto Minify

2 Likes

Hmm. Mine is not using Cloudflare, but I did see a duplicated header in Chrome, like in the first post there.

I’ve just rebuilt with no plugins other than docker_manager, so I’ll report back how it behaves.

One other thing to note is that when it hangs in Chrome, I had to close that tab and open it in a new one. Force reloading it didn’t do anything.

1 Like

Now the nightly backup to S3 is failing with no change in any setup:

[2024-10-10 15:03:04] Uploading archive...
[2024-10-10 15:14:33] EXCEPTION: multipart upload failed: Net::WriteTimeout with #<TCPSocket:(closed)>

EDIT: Two manually triggered backups failed with the same error above, but then two manual backups succeeded. All with no changes to the setup. :person_shrugging:

1 Like

Not seeing errors in the console, just really slow load times intermittently:

Discourse Doctor looks fine on one run, then on a second run reports that port 587 is likely blocked which is odd because it delivered the test mail on the first run and then again successfully on the third run:

Connection to port 587 failed.
====================================== SOLUTION =======================================
The most likely problem is that your server has outgoing SMTP traffic blocked.
If you are using a service like Mailgun or Sendgrid, try using port 2525.

Am I right to think there is something screwy with this DigitalOcean droplet?

It would appear this droplet has some networking issues - download is pretty slow, but note the upload speed :scream::

speedtest-cli
Retrieving speedtest.net configuration...
Testing from Digital Ocean (24.199.xxx.xxx)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Next Level Infrastructure (Santa Clara, CA) [4.38 km]: 2.242 ms
Testing download speed................................................................................
Download: 839.25 Mbit/s
Testing upload speed......................................................................................................
Upload: 1.27 Mbit/s
1 Like

Here is the happy conclusion to this saga…

After running speedtest-cli and iperf3 network throughput tests which showed abysmally slow speeds between the droplet and the outside world, I asked DigitalOcean to investigate and they concluded after doing their own testing:

We have discovered some issues with the hypervisor where your Droplet is located. We are working with our backend team to migrate your Droplet to another hypervisor.

All is well again.

3 Likes