今朝まで非常に軽快だったサイトが遅くなった問題のトラブルシューティング

How would I go about troubleshooting a site that’s become slow (for no apparent reason) today?

Resource use is very low:


This is a 16 GB Memory / 4 AMD vCPUs / 200 GB Disk / SFO3 - Ubuntu 24.04 (LTS) x64 droplet with 30% disk used.

DigitalOcean Service status has been normal all day.

Slow site has been reported in various locations by users.

yaml:
UNICORN_WORKERS: 8
db_shared_buffers: "1024MB"
db_work_mem: "40MB"

I’ve rebuilt to latest and gave Sidekiq some more memory UNICORN_SIDEKIQ_MAX_RSS: 1000

Some 429 errors in the console:


The Error log from the last 3 days:

「いいね!」 1

what happens in safe mode?

「いいね!」 1

I don’t get errors in the console in safe mode, but it’s much slower. It takes about 10-15 seconds to load anything and images are chugging like they are coming over a 14.4 Kbps modem.

It took about 20 seconds to load /logs. Going back to /admin took about a minute.

A “poll” seems to take a long time:

BTW, these are the plugins running:

      - git clone https://github.com/discourse/docker_manager.git
      - git clone https://github.com/discourse/discourse-data-explorer.git
      - git clone https://github.com/paviliondev/discourse-locations.git
      - git clone https://github.com/discourse/discourse-affiliate.git
      - git clone https://github.com/discourse/discourse-yearly-review.git
      - git clone https://github.com/discourse/discourse-docs
      - git clone https://github.com/discourse/discourse-subscriptions
      - git clone https://github.com/paviliondev/discourse-category-lockdown
      - git clone https://github.com/discourse/discourse-reactions.git
「いいね!」 1

Here are a couple more data points from this morning. Sidekiq seems laid back:

Interesting memory graph - after app rebuilds it’s about 20-30%, then jumps to 46% during a backup and stays there:

Do you have the infamous badges in posts theme component installed?

「いいね!」 4

This one?

「いいね!」 8

Woah! Night and day after removing the Post Badges component. Disabling it did not make a difference, but deleting it did. No more console errors, either.

Thanks @Falco!

「いいね!」 5

Welp, I’m afraid that was not it, or at least not the whole thing.

Now I’m seeing broken images and this in the console:

Still slow loading or not loading at all with the spinner going…

「いいね!」 1

I wonder if this has anything to do with the issue:

I restored Discourse from a backup about 4 weeks ago when I moved it from an old Ubuntu 16.4 LTS droplet to new one running Ubuntu 24.04. I did not do a manual rebake.

「いいね!」 2

Keeps getting weirder. This is when going from /logs to /admin by clicking the “Back to site” link.

「いいね!」 1

There was another recent topic with the “no route named admin” error.
Site Glitch Content Not Showing Up - #18 by Suresh_Suthar

Maybe this is also Cloudflare related
Resolving "SyntaxError: Unexpected identifier #..." caused by Cloudflare Auto Minify

「いいね!」 2

Hmm. Mine is not using Cloudflare, but I did see a duplicated header in Chrome, like in the first post there.

I’ve just rebuilt with no plugins other than docker_manager, so I’ll report back how it behaves.

One other thing to note is that when it hangs in Chrome, I had to close that tab and open it in a new one. Force reloading it didn’t do anything.

「いいね!」 1

Now the nightly backup to S3 is failing with no change in any setup:

[2024-10-10 15:03:04] Uploading archive...
[2024-10-10 15:14:33] EXCEPTION: multipart upload failed: Net::WriteTimeout with #<TCPSocket:(closed)>

EDIT: Two manually triggered backups failed with the same error above, but then two manual backups succeeded. All with no changes to the setup. :person_shrugging:

「いいね!」 1

Not seeing errors in the console, just really slow load times intermittently:

Discourse Doctor looks fine on one run, then on a second run reports that port 587 is likely blocked which is odd because it delivered the test mail on the first run and then again successfully on the third run:

Connection to port 587 failed.
====================================== SOLUTION =======================================
The most likely problem is that your server has outgoing SMTP traffic blocked.
If you are using a service like Mailgun or Sendgrid, try using port 2525.

Am I right to think there is something screwy with this DigitalOcean droplet?

It would appear this droplet has some networking issues - download is pretty slow, but note the upload speed :scream::

speedtest-cli
Retrieving speedtest.net configuration...
Testing from Digital Ocean (24.199.xxx.xxx)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Next Level Infrastructure (Santa Clara, CA) [4.38 km]: 2.242 ms
Testing download speed................................................................................
Download: 839.25 Mbit/s
Testing upload speed......................................................................................................
Upload: 1.27 Mbit/s
「いいね!」 1

Here is the happy conclusion to this saga…

After running speedtest-cli and iperf3 network throughput tests which showed abysmally slow speeds between the droplet and the outside world, I asked DigitalOcean to investigate and they concluded after doing their own testing:

We have discovered some issues with the hypervisor where your Droplet is located. We are working with our backend team to migrate your Droplet to another hypervisor.

All is well again.

「いいね!」 3

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.