Last night I was pushing through the Discourse upgrades and rebuilt the app, which resulted in a host of Postgres errors. I realized this was a result of the recent upgrade, but kept getting permission denied errors, among other things (and yes, I chowned everything to 700 so it wasn’t global). So I moved my original /var/discourse somewhere that was supposed to be temporary and reinstalled a fresh instance of Discourse to try and at least get postgres up to date.
Here’s where it gets fun. I had a backup of the site (DB only, uploads are saved to a different volume) generated by the UI from three days ago. Or at least, I thought I did. What I have now is a file called wacky-writers-forum-2021-04-06-033906-v20210328233843.sql.gz which I think I’ve learned is not, in fact, the tar.gz file the actual backup should be in.
I have everybody redirected to a landing page currently, and I’m hoping someone may be able to tell me that it is still possible to retrieve the actual .tar.gz file from the server from 3 days ago, and how, exactly I should go about doing that.
I have my backups and uploads saving to Digital Ocean block storage, and I still have the discourse folder from my old install that was functional, but moving/copying it back over to /var/discourse just breaks everything all over again, including throwing postgres errors. I’ve been working on this for 9 hours straight and I’m just about at my wits’ end. Can anybody help me, or at least try to point me in the right direction? We just hit our 1k user mark and I would really really like to try and avoid losing all of that.
If your have your S3 configuration in the app.yml then your can just do a commend line restore and it’ll pull the backup from s3.
Since you have your assets in S3, the backup contains only the database.
You should just be able to clone a new /var/discourse, copy your yml file, rebuild, and do the command line restore.
I’ll amend that and say my uploads and backups aren’t local to the main discourse folder (it’s partially how this all got started, I was working on trying to move us to DigitalOcean Spaces). So, no, unfortunately, I don’t have any of the S3 configurations done since I was just saving it to mounted storage.
The backups were being saved in mnt/my_storage/shared/standalone, but when I go to look for backups in there, all I have is the wacky-writers-forum-2021-04-06-033906-v20210328233843.sql.gz file. I did actually try to restore from that for lack of a better idea (which was probably wrong), but I got a permission denied error. I’m sure it’s something to do with how those backups are actually generated.
So in that case you should be able to restore the SQL file, and then re-mount the block storage volume to get your uploads back.
There are two kinds of backups: sql.gz which does not include uploads, and tar.gz which does include uploads. So you had the wrong kind of backup but the fact that you had the uploads on an external volume saved your butt.
EXCEPTION: lib/discourse.rb:93:in `exec': Failed to copy archive to tmp directory.
cp: cannot open '/var/www/discourse/public/backups/default/wacky-writers-forum-2021-04-06-033906-v20210328233843.sql.gz' for reading: Permission denied
The restore was successful from that .sql.gz file. (hooray! Thanks again Richard.)
I ensured app.yml was the same setup as before everything died
./launcher rebuild app
Rebuild is successful with Postgres 13 (finally)
However, going to the site itself now is still down. I use Cloudflare but I have Development Mode on right now, and flushed the DNS cache. Everything is pointed where it’s supposed to go. The Cloudflare template is in app.yml.
DNS is resolving correctly, hostnames is up to date, the Discourse install was done with the appropriate URL, and I’m running out of ideas.
https://forum.wackywriters.com is the URL, I’m just getting “server unavailable” errors. I feel like I’m going around in circles here (sorry) but any suggestions?
Edit: When I run ./discourse-doctor, I see that there are two instances of the app running in Docker:
Is this normal? (seems like it wouldn’t be, but everything I thought I knew about Discourse has been thrown out the window the last 24 hours )
Edit2: I’ve been putting this off as a last resort, but I’m going to try and set up an entirely new server with a clean Discourse install. I’m worried something has gotten fubared with all my mucking around and I can’t figure out what’s broken. Thankfully I still have the backup and all the uploads on block storage, so if I’m lucky, I should be able to connect that to a new droplet and move things over from there. If anyone has additional suggestions or tips, I’d still appreciate more tenured expertise than mine.
Edit3: Even with a new server and IP propagating (nslookup and ping both look good, whatsmydns.net looks good), forum won’t load. Still getting connection errors. It’s like it isn’t connecting the IP address to the Discourse instance and instead is trying to load a static page, which of course, doesn’t exist in this case.
So after almost 24 hours of fighting, I figured out why the site refused to load after I got the restore going.
Because of so many resets and reinstallations and god knows what else, I hit the rate limit, so I’ve temporarily commented out the ssl templates and will get them going again in a week.
The site is “functioning” while I rebake all posts to fix the broken images but I really appreciate Jay and Richard for helping me out today, you got me through the parts I really just couldn’t figure out.
Now to get a real backup downloaded so I can get S3 setup this week without worrying about this again.