Thanks, my site has crashed unfortunately so I’m only left with the S3 uploads and back ups. I’m assuming there’s no way to for me to migrate any left over local files to the S3 now.
So I’m wondering what are my options right now? Is there a way to restore from S3 backups and ignore the local files? I found a way to have it ignore the S3 upload but then pretty much all the posts have broken links/images (90%+ is probably in the S3 because I setup the S3 upload many many years ago).
So an update for folks who may be struggling with the same issue (basically I’m unable to restore from a backup and the server crashed due to a fault system upgrade).
From what I understand the root cause of the issue is that there are local uploads AND there are S3 uploads, so when the restore tool is trying to restore it’s bugging out because it doesn’t know how to handle local and S3 restores at the same time (maybe it’s time for Discourse to relook at backup/restores).
Thanks to @RGJ for this tip, he suggested force discourse to ignore the S3 upload while restoring:
Add a line to your app.yml DISCOURSE_ENABLE_S3_UPLOADS=false
Rebuild discourse ./launcher rebuild app
Attempt a restore (either from the GUI Backup page or using the CLI)
Then after restoring, remove that line from app.yml and rebuild one more time
While this worked, point to note that the forum was badly broken, the categories, settings and posts were restored, however all the images, links, embedded documents etc were broken and errored out.
The hail-mary solution:
I managed to salvage the old server and extracted the /var/discourse directory (tar/gz) and copied it onto the new server and did a ./launcher rebuild app. This completely restored the operation of the forum, however the fundamental problem still remains - the backups will NOT work because they have a mix of local and S3 uploads.
So I really need some advice on the best way to fix this issue once and for all. Is it better/easier to move all the upload from local to S3 or from S3 to local and how does one do it? The entire point of a backup is to help out in situations like this one, but it’s failed me so I need yourself to get it straightened out.
If you want to stop using s3, then you can enter the rails console and set
SiteSetting.include_s3_uploads_in_backups=true
Then take a backup, make sure that you do not have s3 configured in your app.yml and restore the backup. I think that this will restore backups to local.
But in either case, I would still recommend setting your keys and backup bucket in ENV variables in your app.yml file and then check that you can restore it to a new site.
I’m thinking the ideal thing to do would probably be to have all uploads local and save the backups (zips) to the S3. This way the backup is available on S3 should anything happen to the server but the backup in itself is self contained with no dependencies so it should be easy to restore it on a new server.
So if I understood correctly, I should follow these instructions:
If you want to stop using s3, then you can enter the rails console and set
SiteSetting.include_s3_uploads_in_backups=true
Then take a backup, make sure that you do not have s3 configured in your app.yml and restore the backup. I think that this will restore backups to local.
and then
disable the enable upload to S3 option in Admin → Settings → Files
enable the backup to S3 option in the Admin → Settings → Backups page
Is that correct?
This is the part that confused me, why would I want to put the S3 configuration in the app.yml file?
So that you have access to your backups via a command line restore before you restore your backup. Otherwise, you have to set up an admin account and then configure S3 and then restore. Similarly, whatever settings you put in your database get overwritten when you restore the database.
I think that best practice is to configure S3 only via ENV variables in the app.yml file. It would probably make sense to make them be hidden settings, if not for hundreds of people who would be surprised that they had disappeared.
How would one restore a backup from S3 using the command line? According to the instructions here: Restore a backup from command line
It says you can drop the backup file into the /var/discourse/shared/standalone/backups/default folder and then start a restore from the CLI. This is what I had done with your suggestion earlier (which eventually led to broken links unfortunately), but that does work.
How does one restore directly from S3 using the CLI?
Thanks, so it’ll read the S3 backups and list them an option.
Jay, to follow up on a suggestion you had made to move assets local:
I think you can set a hidden setting include_s3_uploads_in_backups to true and then make a backup and restore it when s3 is turned off to stop using S3.
Having S3 backups with them configured in app.yml means that you can do a command line restore with only the app.yml file (after cloning discourse and installing docker).
For the first step would I need to backup the S3 buckets or is this a bucket safe operaiton?
So to get it up and running from a backup, I had to
disable Settings → Files → enable s3 uploads
Settings → Backups → backup location → S3
enable Settings → Backups → backup with uploads
Then I took a backup and I was able to restore it successfully. However one things did break, all the attachments (files) not have invalid links. The images are all good, but attachments links like https://domain.com/uploads/short-url/phu1HOLvkE8LWpkKYfnMPSWsvHh.zip now give me a error
Clicking on the “Rebuild HTML” button didn’t work. The link didn’t change and it still leads to the error page.
There’s a second issue I noticed after the restore. I had a look at the error logs and I noticed this. The link isn’t the same as the one in the post I rebuilt: Failed to process hijacked response correctly : Errno::ENOENT : No such file or directory @ rb_sysopen - /XXXXX.s3.dualstack.us-east-1.amazonaws.com/optimized/1X/46728e07f9819907d1b18387bf02ea7fc25c7981_2_32x32.ico
The odd thing is that when I put the above URL into the browser it does actually serve up the icon.
Here’s the backtrace
Message (5 copies reported)
Failed to process hijacked response correctly : Errno::ENOENT : No such file or directory @ rb_sysopen - /XXXXX.s3.dualstack.us-east-1.amazonaws.com/optimized/1X/46728e07f9819907d1b18387bf02ea7fc25c7981_2_32x32.ico
Backtrace
/var/www/discourse/app/controllers/static_controller.rb:160:in read' /var/www/discourse/app/controllers/static_controller.rb:160:in block (2 levels) in favicon’
/var/www/discourse/lib/distributed_memoizer.rb:16:in block in memoize' /var/www/discourse/lib/distributed_mutex.rb:33:in block in synchronize’
/var/www/discourse/lib/distributed_mutex.rb:29:in synchronize' /var/www/discourse/lib/distributed_mutex.rb:29:in synchronize’
/var/www/discourse/lib/distributed_mutex.rb:14:in synchronize' /var/www/discourse/lib/distributed_memoizer.rb:12:in memoize’
/var/www/discourse/app/controllers/static_controller.rb:138:in block in favicon' /var/www/discourse/lib/hijack.rb:56:in instance_eval’
That happened to me as well, on an Oracle Cloud server. The kernel panicked, and so did I. I thought I was toast. But after about six or eight reboots from the cloud console, some of which were “pull-the-plug” reboots, and after about half-an-hour of waiting, the server came up long enough for me to edit grub.cfg and revert to the previous kernel.
I was able to save my instance thereby. A day later, there was another new kernel update offered, and that’s when I grew more certain my theory about kernel trouble was true. And I found the bug description to confirm it. Yeah, pretty nasty.
I devised a Stupid Grub Trick, as I call it, that I will try to find time to post anon, so one would be able to avoid such a calamity in the future.
Good luck with your restore, @RBoy. I must say this thread is giving me a queasy stomach after my own near disaster last – when was it, Wednesday?
By the way, you said you regained access to your old server. If you still have it or can get access once more – for me it took some hard reboots and some waiting – well, go in and upgrade one more time, as there is another new kernel that doesn’t have the bug. Or revert to the previous kernel.