PostgreSQL Stuck During Rebuilding

Having the same issue… DO Droplet on Ubuntu 20.04. Tried to Upgrade Docker from within Discourse first but it kept getting an error code 137. So then I tried to rebuild Discourse from the command line and it hung on The database is ready to accept connections. Ctrl+C wouldn’t do anything so closed SSH and opened a new one and started Discourse again and it was still working but not updated. Rebooted the droplet and tried upgrading Docker again from Discourse and this time it worked! So I tried rebuilding Discourse again but it still hung on the same place. Closed SSH again and opened and started Discourse again but now I get the Oops screen! So now my Discourse site is down and the only way I’ve ever been able to recover from the Oops screen previously is by rebuild the app which I can’t do!

So now I’m at a loss as my Discourse and Droplet experience is very limited and I’m not sure what I can do now. docker_manager is the only plugin used in my app.yml so I can only assume the error is due to Docker being a newer version and not jiving with my Discourse version? I don’t know. I last updated Discourse in January so it’s not that out of date…

So my site is down until this issue can be figured out… Unless I start up a new Droplet and re-set up everything again and restore the Discourse backup that I took? Is that my only option at this point? :tired_face:

Error 137 is out of memory. I would try adding more swap. If you have only 1gb of ram I might resize to 2gb and perhaps also have 3 or 4 gb of swap.

You might try a

./launcher start app

But I suspect that the database has migrated too far for the old container.

If you’re stuck and want paid support see Contact Us - Literate Computing

Edit: But here’s what I’d do:

Hello, same error here. Workaround for now is to force version param in app.yml to v3.3.0. Arch AMD64, Ubuntu 18.04. Strange that a minor version failed, update to v3.3.0 passed without problem last week :neutral_face:

1 Like

For anyone running into this problem and am comfortable giving me access to your server, please PM me so that I can debug the problem on a server that has the problem. I have tried multiple ways and cannot reproduce this problem which makes it harder to push a fix.

5 Likes

I don’t see a way in my profile to PM you…

You need to be at trust level 1 to send messages

Looking at the stats in your profile, you’re pretty close already

2 Likes

For anyone that is stuck with this issue with Discourse down, I’ve found you can at least get the old version of the forum up by restarting the VM and then running ./launcher start app. This command won’t work after attempting a rebuild without restarting your instance / VM.

I should be able to bump the Ubuntu version on our affected VM on Monday, so will keep everyone posted on the outcome.

1 Like

Ctrl+c doesn’t work when it is stuck, have to reboot system.

This command also doesn’t do anything.

**/var/discourse**# ./launcher start app

x86_64 arch detected.

WARNING: containers/app.yml file is world-readable. You can secure this file by running: chmod o-rwx containers/app.yml

+ /usr/bin/docker run --shm-size=512m -d --restart=always -e LANG=en_US.UTF-8 -e RAILS_ENV=production -e UNICORN_WORKERS=2 -e UNICORN_SIDEKIQS=1 -e RUBY_GC_HEAP_GROWTH_MAX_SLOTS=40000 -e RUBY_GC_HEAP_INIT_SLOTS=400000 -e RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=1.5 -e DISCOURSE_DB_SOCKET=/var/run/postgresql -e DISCOURSE_DB_HOST= -e DISCOURSE_DB_PORT= -e LETSENCRYPT_DIR=/shared/letsencrypt -e DISCOURSE_FORCE_HTTPS=true -e LC_ALL=en_US.UTF-8 -e LANGUAGE=en_US.UTF-8 -e DISCOURSE_HOSTNAME=techoforum.com -e DISCOURSE_DEVELOPER_EMAILS=techoforumd@gmail.com -e DISCOURSE_SMTP_ADDRESS=smtp.sendgrid.net -e DISCOURSE_SMTP_PORT=587 -e DISCOURSE_SMTP_USER_NAME=apikey -e DISCOURSE_SMTP_PASSWORD=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 -e DISCOURSE_SMTP_DOMAIN=gmail.com -e DISCOURSE_NOTIFICATION_EMAIL=techoforumd@gmail.com -e LETSENCRYPT_ACCOUNT_EMAIL=me@example.com -h discourseonubuntu2004-s-1vcpu-1gb-sfo3-01-app -e DOCKER_HOST_IP=172.17.0.1 --name app -t -p 80:80 -p 443:443 -v /var/discourse/shared/standalone:/shared -v /var/discourse/shared/standalone/log/var-log:/var/log --mac-address 02:f8:99:7d:c3:d6 local_discourse/app /sbin/boot

Unable to find image 'local_discourse/app:latest' locally

docker: Error response from daemon: pull access denied for local_discourse/app, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.

See 'docker run --help'.

I’ve another forum on another droplet and that doesn’t give any issue with updating. It is weird why with same server configuration one droplet has issues while other doesn’t?

That sounds like a RAM issue. How much ram and swap do you have? I would add a GB or two of SWAP space (and maybe add ram if you have only 1GB)

How much RAM and swap do you have on those systems? What is the output of

free -h

And RAM would explain why @tgxworld has been unable to replicate it.

I’m fairly certain RAM/swap is the issue.

Btw for anyone running into this problem, you can work around it for now by adding base_image: discourse/base:2.0.20240708-0023 to the top of containers/app.yml file.

5 Likes

Not sure it’s a RAM issue in my case as the affected VM has 125 GiB allocated and 78 GB available.

              total        used        free      shared  buff/cache   available
Mem:           125G         14G        940M         31G        110G         78G
Swap:            0B          0B          0B

The dev server with the same OS that successfully upgraded without this issue only has 16 GiB RAM

1 Like

Darn. It was going to explain everything. :person_shrugging:

1 Like

Could it be a db size issue?

The database on our prod server is pretty large, but dev is very small. That’s the only real difference between the VMs that have successfully upgraded and the affected one (in my case).

Maybe, have you changed the memory config for the database?

How big is the database?

1 Like

I’ll check it out and see if it’s been changed

This is the only thing that worked for me. Thank you for sharing this!! My customers thank you, too :slight_smile:

Hoping we can get a proper fix for this soon.

1 Like

Hello,
I just upsized Droplet by doubling RAM and increasing disk size. I’m still facing the same issue.
Anything else to try?

# free -h
              total        used        free      shared  buff/cache   available
Mem:          1.9Gi       289Mi        83Mi        11Mi       1.6Gi       1.5Gi
Swap:         2.0Gi       3.0Mi       2.0Gi

# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            941M     0  941M   0% /dev
tmpfs           198M  1.1M  197M   1% /run
/dev/vda1        34G   14G   21G  39% /
tmpfs           986M     0  986M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           986M     0  986M   0% /sys/fs/cgroup
/dev/vda15      105M  7.4M   97M   8% /boot/efi
/dev/loop1       56M   56M     0 100% /snap/core18/2829
/dev/loop2       56M   56M     0 100% /snap/core18/2823
/dev/loop3       92M   92M     0 100% /snap/lxd/29619
/dev/loop0       64M   64M     0 100% /snap/core20/2264
/dev/loop4       64M   64M     0 100% /snap/core20/2318
/dev/loop5       39M   39M     0 100% /snap/snapd/21465
/dev/loop6       92M   92M     0 100% /snap/lxd/24061
/dev/loop7       39M   39M     0 100% /snap/snapd/21759
tmpfs           198M     0  198M   0% /run/user/0
overlay          34G   14G   21G  39% /var/lib/docker/overlay2/3c7ebf42647de2b5df34cba2b047079fd3454ea7fe9b04c7b70f227df1e7eafe/merged
1 Like

OMG! Why didn’t I read this solution before. It also worked for me.
So what is the solution going forward? Do we need to keep specifying this base image in the future as well or change it to get updated image?