'bundle exec rake db:migrate' taking a long time due to calendar migration

Sikamikanico · December 4, 2023, 1:58pm

Hi all, hoping for some help!

Has anyone experienced this before?

./launcher rebuild app is running, get’s to this point and just hangs here.

Now it is doing something, but it’s doing it extremely slowly. I’ve been running the forum via a self installed digital ocean droplet for 3 years now, but this is new and is causing a lot of downtime. Is there a way to smooth this out? Is it to do with images on the forum or something?

Falco · December 4, 2023, 2:04pm

Can you please open another SSH session and try to find which migration is causing trouble?

Something like ps aux | grep postgres should show the start of it.

Sikamikanico · December 4, 2023, 2:22pm

I am not a Linux expert (or frankly, even an amateur one) but I’ll try.

pfaffman · December 4, 2023, 4:01pm

Running out of memory is my guess. You can try

free -h

Maybe also

du -hs /var/discourse/shared/standalone/*

Sikamikanico · December 4, 2023, 4:02pm

Memory (RAM) or Diskspace?

Should be loads of both I would think - Droplet is: 8 GB Memory / 4 Intel vCPUs / 160 GB Disk + 200 GB / Ubuntu 18.04.3 (LTS) x64

Is it “safe” to open another SSH session and run those while this db:migrate is still running?

RGJ · December 4, 2023, 4:14pm

If I were you would start by getting a new VPS with an OS that still has active support.

Yes.

Sikamikanico · December 4, 2023, 4:15pm

Okay - please understand I am NOT a linux expert - your post intimates that the current ubuntu build is super out of date etc?

RGJ · December 4, 2023, 4:22pm

Yes, the OS dates from April 2018 and was supported for 5 years, so that ended over 6 months ago.

Sikamikanico · December 4, 2023, 4:44pm

Appreciate the info.

As someone who freely admits I’m an amateur doing his best - any recommendations for what I should do next?

The db:migrate failed - message was:

client_loop: send disconnect: Connection reset

On logging back in, you’re quite right:

New release ‘20.04.6 LTS’ available.
Run ‘do-release-upgrade’ to upgrade to it.

Considering my forum is currently down, am I safe to do the upgrade, and then worry about fixing the forum? or should I try and get that back online first?

RGJ · December 4, 2023, 4:48pm

That’s an SSH error…

Did you take a backup before upgrading? If so, it would be easiest to get a brand new server with Ubuntu 22, install Discourse and restore the backup.

Sikamikanico · December 4, 2023, 4:51pm

Alas, one of my admin folks pressed the upgrade button on the site, and it seems to have failed and then broken everything.

Last backup was taken yesterday - so not too bad.

I take it an upgrade to the existing server would wipe the existing installation then?

(Thanks for your patience @RGJ btw)

RGJ · December 4, 2023, 5:21pm

Hard to tell, but since things are failing I wouldn’t take a chance. At least not before making sure the backup is stored in a safe place.

pfaffman · December 4, 2023, 6:06pm

There’s a decent chance that you could spin up a new VM, stop the container (sounds like it’s not runnnig anyway) then rsync everything to the new server and try again there. That can likely get you back up and running without losing any data.

Sikamikanico · December 4, 2023, 6:08pm

It all sounds so simple, but man am I feeling out of my depth here. It’s currently running on a digital ocean droplet. So spinning up a new VM - that’s a loaded sentence? On the same droplet? On a new one?

RGJ · December 4, 2023, 7:46pm

A VM is the common term for what DigitalOcean calls a droplet.

Sounds like you might want to take a look at my profile and considered managed hosting though

pfaffman · December 4, 2023, 10:51pm

ystemd+  7660  0.0  0.3 352060 28284 ?        S    22:31   0:00 /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/13/main
systemd+  7667  0.0  0.1 352588  9628 ?        Ss   22:31   0:00 postgres: 13/main: checkpointer 
systemd+  7668  0.3  0.9 352060 78796 ?        Ss   22:31   0:03 postgres: 13/main: background writer 
systemd+  7669  0.0  0.1 352060 13696 ?        Ss   22:31   0:00 postgres: 13/main: walwriter 
systemd+  7670  0.0  0.1 352728 11892 ?        Ss   22:31   0:00 postgres: 13/main: autovacuum launcher 
systemd+  7671  0.0  0.0  67996  5320 ?        Ss   22:31   0:00 postgres: 13/main: stats collector 
systemd+  7672  0.0  0.0 352612  6640 ?        Ss   22:31   0:00 postgres: 13/main: logical replication launcher 
systemd+ 10900 82.0  3.8 487164 317728 ?       Rs   22:33  10:42 postgres: 13/main: discourse discourse [local] DELETE
systemd+ 10901  0.0  0.1 353432 13804 ?        Ss   22:33   0:00 postgres: 13/main: discourse discourse [local] idle

htop shows that the discourse [local] delete is what’s eating 100% cpu. The droplet has 8GB of RAM, and right now <1GB is in use (not counting buffers).

The OS is out of date, but this seems very strange to me. There’s plenty of RAM and disk, and that postgres delete task has been running for >12 minutes. There are less than 600K posts and <4K users, so the database isn’t huge. Oh. Wait. the postgres_data directory is 28GB.

I ran a VACUUM VERBOSE ANALYZE;.

I did this:

discourse=# SELECT checkpoints_timed, checkpoints_req FROM pg_stat_bgwriter; 
 checkpoints_timed | checkpoints_req 
-------------------+-----------------
                 6 |              20

I’m now trying to reindex concurrently. Perhaps the vacuum and reindex will help.

Sikamikanico · December 4, 2023, 11:21pm

Thanks Jay. Let me know if there’s anything you need from me.

Falco · December 4, 2023, 11:39pm

Please share the whole SQL of the long running query.

pfaffman · December 5, 2023, 12:02am

Where do I find that?

The migration isn’t printing any logs. THe last line in the rebuild is

I, [2023-12-04T22:33:33.759401 #1]  INFO -- : > cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate'

I"m working on a full log for the one that I just restarted.

Falco · December 5, 2023, 12:03am

Enter the container, switch to the postgres user, enter psql and run

SELECT pid, age(clock_timestamp(), query_start), usename, query 
FROM pg_stat_activity 
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' 
ORDER BY query_start desc;

Topic		Replies	Views
Database migration chokes on huge value of a "calendar-details" item in table "post_custom_fields" Bug	13	2005	December 7, 2020
Discourse installation has been getting slower and slower and slower Installation server-resources	37	1591	May 15, 2023
Help! Upgrade led to complete fail Installation	8	1306	March 9, 2021
PostgreSQL Stuck During Rebuilding Installation	52	940	October 4, 2024
Rebuild taking ~3hours Installation server-resources	35	1569	March 16, 2025

'bundle exec rake db:migrate' taking a long time due to calendar migration

Related topics