'bundle exec rake db:migrate' taking a long time due to calendar migration

Hi all, hoping for some help!

Has anyone experienced this before?

./launcher rebuild app is running, get’s to this point and just hangs here.

Now it is doing something, but it’s doing it extremely slowly. I’ve been running the forum via a self installed digital ocean droplet for 3 years now, but this is new and is causing a lot of downtime. Is there a way to smooth this out? Is it to do with images on the forum or something?

Can you please open another SSH session and try to find which migration is causing trouble?

Something like ps aux | grep postgres should show the start of it.

1 Like

I am not a Linux expert (or frankly, even an amateur one) but I’ll try.

1 Like

Running out of memory is my guess. You can try

free -h

Maybe also

du -hs /var/discourse/shared/standalone/*
1 Like

Memory (RAM) or Diskspace?

Should be loads of both I would think - Droplet is: 8 GB Memory / 4 Intel vCPUs / 160 GB Disk + 200 GB / Ubuntu 18.04.3 (LTS) x64

Is it “safe” to open another SSH session and run those while this db:migrate is still running?

If I were you would start by getting a new VPS with an OS that still has active support.

Yes.

Okay - please understand I am NOT a linux expert - your post intimates that the current ubuntu build is super out of date etc?

Yes, the OS dates from April 2018 and was supported for 5 years, so that ended over 6 months ago.

2 Likes

Appreciate the info.

As someone who freely admits I’m an amateur doing his best - any recommendations for what I should do next?

The db:migrate failed - message was:

client_loop: send disconnect: Connection reset

On logging back in, you’re quite right:

New release ‘20.04.6 LTS’ available.
Run ‘do-release-upgrade’ to upgrade to it.

Considering my forum is currently down, am I safe to do the upgrade, and then worry about fixing the forum? or should I try and get that back online first?

:thinking: That’s an SSH error…

Did you take a backup before upgrading? If so, it would be easiest to get a brand new server with Ubuntu 22, install Discourse and restore the backup.

1 Like

Alas, one of my admin folks pressed the upgrade button on the site, and it seems to have failed and then broken everything. :smiley:

Last backup was taken yesterday - so not too bad.

I take it an upgrade to the existing server would wipe the existing installation then?

(Thanks for your patience @RGJ btw)

Hard to tell, but since things are failing I wouldn’t take a chance. At least not before making sure the backup is stored in a safe place.

There’s a decent chance that you could spin up a new VM, stop the container (sounds like it’s not runnnig anyway) then rsync everything to the new server and try again there. That can likely get you back up and running without losing any data.

It all sounds so simple, but man am I feeling out of my depth here. It’s currently running on a digital ocean droplet. So spinning up a new VM - that’s a loaded sentence? On the same droplet? On a new one? :woozy_face:

A VM is the common term for what DigitalOcean calls a droplet.

Sounds like you might want to take a look at my profile and considered managed hosting though :wink:

1 Like
ystemd+  7660  0.0  0.3 352060 28284 ?        S    22:31   0:00 /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/13/main
systemd+  7667  0.0  0.1 352588  9628 ?        Ss   22:31   0:00 postgres: 13/main: checkpointer 
systemd+  7668  0.3  0.9 352060 78796 ?        Ss   22:31   0:03 postgres: 13/main: background writer 
systemd+  7669  0.0  0.1 352060 13696 ?        Ss   22:31   0:00 postgres: 13/main: walwriter 
systemd+  7670  0.0  0.1 352728 11892 ?        Ss   22:31   0:00 postgres: 13/main: autovacuum launcher 
systemd+  7671  0.0  0.0  67996  5320 ?        Ss   22:31   0:00 postgres: 13/main: stats collector 
systemd+  7672  0.0  0.0 352612  6640 ?        Ss   22:31   0:00 postgres: 13/main: logical replication launcher 
systemd+ 10900 82.0  3.8 487164 317728 ?       Rs   22:33  10:42 postgres: 13/main: discourse discourse [local] DELETE
systemd+ 10901  0.0  0.1 353432 13804 ?        Ss   22:33   0:00 postgres: 13/main: discourse discourse [local] idle

htop shows that the discourse [local] delete is what’s eating 100% cpu. The droplet has 8GB of RAM, and right now <1GB is in use (not counting buffers).

The OS is out of date, but this seems very strange to me. There’s plenty of RAM and disk, and that postgres delete task has been running for >12 minutes. There are less than 600K posts and <4K users, so the database isn’t huge. Oh. Wait. the postgres_data directory is 28GB.

I ran a VACUUM VERBOSE ANALYZE;.

I did this:

discourse=# SELECT checkpoints_timed, checkpoints_req FROM pg_stat_bgwriter; 
 checkpoints_timed | checkpoints_req 
-------------------+-----------------
                 6 |              20

I’m now trying to reindex concurrently. Perhaps the vacuum and reindex will help.

Thanks Jay. Let me know if there’s anything you need from me.

Please share the whole SQL of the long running query.

Where do I find that?

The migration isn’t printing any logs. THe last line in the rebuild is

I, [2023-12-04T22:33:33.759401 #1]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate' 

I"m working on a full log for the one that I just restarted.

Enter the container, switch to the postgres user, enter psql and run

SELECT pid, age(clock_timestamp(), query_start), usename, query 
FROM pg_stat_activity 
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' 
ORDER BY query_start desc;