As of last night site is responding very poorly

Apologies in advance if this is the wrong category, location, etc.
Ive had a discourse site running for about 6 months through a digitalocean vps without many issues. The admin page reads I’m on version 2.5.0.beta4. As of last night, most of the sites page content either refuses to load in or takes a seemingly inane amount of time. For example, I can navigate to pages like the homepage or /admin, but any of the actual content for them (posts, the admin graphs / other tabs) wont seem to load in. I’ve checked in on the system vitals and cpu usage idles around 2%, and there is minimal traffic or disc usage. There is a userbase of maybe 10 or so people as I am just trying out / setting up the site. So that considered this behavior seems very odd.

The only plugins I have according to app.yml are docker_manager and discourse-signatures. I’m the only admin user so I can confirm changes haven’t been made to the site settings in quite a while as well.

My first thought was to restart the machine itself, and i’ve also tried to manually update using git pull and ./launcher rebuild app. I’m not sure what to look for during that process that would indicate if any errors are occurring, but the rebuild seems to complete and the site can be accessed again afterwards but it remains at 2.5.0.beta4. Similarly, trying to access the /admin/update page will eventually just time out. This all seems fairly strange because the site is arguably ‘functional’ - I simply don’t know enough about how it operates to really diagnose anything. I found and can run the discourse-doctor but I’m not sure what it accomplishes - successfully emails me, etc.

The one thing that may seem to indicate an issue is, last night I got an email from the forum about a response to a post, and when I navigate to the ‘latest posts’ category, (after it eventually loads) there doesn’t seem to be any indication that the post exists, because the thread overview in latest doesn’t list it as having that user posted last. I can’t seem to load in the content of any posts so there isn’t a way to check for sure. So there may be some error / mismatch in the database? I’m not sure how something like that would branch out into causing entire chunks of the site to fail loading, or if this is a rabbit hole worth chasing.

Any thoughts on where to start with troubleshooting for an issue like this? Thanks much if you took the time to read : )

Hi tuckie! Welcome!

Looks like you are doing all the right things.

I highly recommend you update if you can - you’re pretty far behind the latest version. But be sure to download a backup first so you don’t lose anything.

Can you log in via ssh and see if you are running out of storage?

df -h 

Whatever the case, storage is a good first thing to check, and this command is a good one to run to remove any stale containers that are taking up space:

./launcher cleanup app 

Then I’d try rebuilding the app to the latest version. Let us know if it works this time and doesn’t display any errors in the console.

./launcher rebuild app

Thanks for the quick uptake.
Its reading about 7.9 GB free in the drive mounted on /dev/vda1 mounted on / - I am not majorly aware of how the other partitions are used on ubuntu or how they might affect running (discourse is in a container, no?), the rest look to be the boot partition/ etc. There are only about 30-40 posts total on the forum as I test it, so its not (seemingly) in danger there. The cleanup was able to free up ~4GB extra.

As for the app rebuild, I’ve ran this a few times actually. I don’t see any glaring warning messages occurring during the process, but at the same time when its done I don’t see anything saying ‘success’ either - I wouldn’t know what error / warning lines to look for. It removes the old container and then runs the docker container, and then its done. Ive just ran it one more time, and when I connect to the site it tells me that updates are available still, but it takes an incredibly long time to report the version (2.5.0.beta4 still) I’m on and the version to update to.

Part of the problem is that it seems I can’t really use the admin tools either because of response times or failing to load. For example, navigating to the backups tab just displays the loading animation indefinitely. Out of interest I’ve opened the console on the backup tab, and the browser appears to try and fetch javascript files and is failing on all of them, slowly one at a time.

If there’s a way to work with backups through ssh that seems like it would be useful here.

It sounds like a network problem. Are you using cloudflare? (if so turn off the orange cloud).

You could have a noisy neighbor at DigitalOcean, so you might open a ticket with them.

It doesn’t make any sense that you say you’ve done a rebuild but the version hasn’t changed. I’d think that you’d need to do the postgres 12 upgrade. Did you not see anything about that when you did the rebuild?

1 Like

I am on digitalocean, I suppose something like that could be happening, though I’m not sure if that would cause this problem as consistantly or for as long as this. I think a better way I could describe the issue with the site is that it seems like typically the page is able to load the templating or ‘shell’ of the page, but beyond that fetching any actual content for the pages seems to keep loading forever.

As for the rebuild/version change - it could be that an error like that is happening, but I don’t know a good way to go about parsing it, nor do I really know what I’d be looking for. I did see a line along the lines of ‘postgres installed’ looking at the output scroll by as i ran rebuild again just now. I’m not sure if this is because of the work going on inside of a container or not, but for example ./launcher rebuild app | grep 'postgres' doesn’t seem to filter anything out, nor does ./launcher rebuild app > output.txt && grep 'postgres' output.txt. the output.txt does contain information in it but seemingly not everything? it at the very least doesn’t end in the same way as the actual console output.