Awesome - is there a way to remove the backups for now so that the Dashboard works again until this is fixed?
If you have S3 backups enabled try you can try to run this from the rails console:
./launcher enter app rails c SiteSetting.backup_location = 'local'
refresh the site and see if you can access the dashboard
Instead of doing that I just did it from the Settings and reloaded the page and Dashboard stats work again - but I can’t rely on that as I need S3 backups… BUT this is a good workaround so that we can check stats until the full fix is in.
Ah sorry, I thought that you were unable to enter the dashboard at all. The result is the same.
Then you need to wait for the official fix before reactivate the setting
I can confirm that changing ‘Settings / Backup / Backup location’ to ‘local’ from ‘Amazon S3’ means the dashboard loads quickly again.
Thanks for looking at it, and we’ll await the official fix at the next update.
Does that mean that the new Backup location is broken? Or does it mean that if you did it the old way it’s broken because of the change?
I’m not sure what’s actually causing the timeout…
How many objects are in your backup bucket? Does Admin -> Backups load without problems when
backup_location is set to S3?
Our /admin/backups page (or rather then underlying request for .backups.json) takes about 50 seconds to come up each time, so it looks like the culprit of why the dashboard times out.
There are six entries in the ‘filename’ list for us, as we only keep that amount of backups rolling in S3 for 7 days.
At a guess, it looks like when calling S3 for a file list or file size then it’s taking about 10 seconds per entry. Perhaps something changed in the AWS S3 client library recently?
The backups to S3 seem to be working ok, which is good, as I did a manual backup and it worked.
One ‘funny’ is that there is a new entry at the bottom of the list, perhaps showing some mismatch between what’s in the DB and what is actually in S3?
Note: the odd ‘3a97e…gz’ entry at the bottom.
Our AWS S3 store looks like this:
Hmm, I wonder if it’s because we store our backups and uploaded files/image in the same bucket? I can see S3 enumerating on the bucketlist and then taking a long time, due to all the root level keys.
Let me try giving it a separate clean bucket of it’s own and seeing if that works better.
Yes, that was it - problem solved. I created a clean new S3 bucket for the backup location and now the backup/index works quickly.
Problem Summary: With the introduction of direct backups to S3 using the new Backup Location feature, the listing of objects in S3 at the ‘S3 backup location’ setting can potentially take a long time, and cause a timeout over 30 seconds of the /admin and /admin/backup pages. This didn’t used to matter before, as the local file system was used as an interim storage location, but with a direct S3 backup the backup/index needed to list all the S3 objects it found.
Solution: Choose a S3 backup bucket with less existing objects in it, i.e. do not share your S3 Uploads location with your Backup Location S3 bucket.
Thanks for info all - all working great now.
Not sure if helpful, but one potential tweak to this would be to put the S3 backups with a ‘discoursebackup/’ prefix and then you could pass that here to filter the object.list call to AWS - that way it would filter it and return quickly without an empty bucket expected:
@s3_helper.list(S3BACKUP_PREFIX).each do |obj|
Hmm maybe @gerhard can chime in on that observation.
I think it’s just an unlucky edge-case Jeff, in that:
The S3 backups go in the root of the bucket chosen for backups, while ideally a location could be chosen within the bucket (a prefix for the S3 object keys).
Some Discourse stand-alone installs will have the same bucket specified as their S3 upload as well, and due to the nature of that Uploads storage hierarchy means, that there are 230+ existing root objects in it (the hashing of upload locations uses many root prefixes or ‘directories’ in filesystem speak).
S3 AWS .listObjects is notoriously slow without some sort of prefix filter.
The new Backup Location gets its info direct from S3 now, which is why it became an issue and not before.
I doubt there are many like us that kept both the backups and uploads in the same bucket, and it was super easy to just create a new S3 backup bucket anyway (which can then have it’s own version lifecycle, Glacier etc) which is probably the most sensible deployment.
s3_backup_bucket support prefixes. You can use
my-bucket-for-discourse-stuff/backups without problems.
As you have experienced yourself, putting everything into the same bucket without using a prefix is a really bad idea. The solution is to either use the bucket exclusively for backups or to use a prefix for backups.
Aside from this issue, you’re effectively combining data you’re publishing to the web with backups of everything including personally identifying user data.
With all the work done to secure the download of backups, I would be quite open to the idea of Discourse explicitly blocking the same bucket from being used in both cases.
You should put that in the help text for the setting? At the moment it just says the bucket name, implying the opposite. “The Amazon S3 bucket name that files will be uploaded into. WARNING: must be lowercase, no periods, no underscores.”
I’m glad I figured out why ours (and others) Discourse dashboard’s were broken for so long.
Let me know when the fix is in that the listing of the backups doesn’t break the dashboard viewing.
AWS S3 bucket boundaries are no more or less secure than S3 object ACLs. If the backup files are not marked for public then there is not more or less security risk than using a different bucket with different ACLs.
In my s3 bucket, which I share one for backups and uploads, I have the last 30 days of backups plus the uploads bits:
My backups option errors out just like the Dashboard does. When I adjust to a new bucket it automatically errors out, doesn’t even wait
Update: I removed a lot of old backups, then setup a new bucket for backups, and now the Dashboard shows nice and quick again (probably because there aren’t any backups since I have a new location for backups and it hasn’t backed up yet) … However the “backups” tab still gives the error… but it’s a start (and I didn’t have to do an update)
Just to be clear on this, we didn’t stop working on a fix for dashboard at least. I don’t want a simple error like this to be able to take down the full dashboard, even if it’s not technically a report, we will get a fix for this.