One of my sites won’t come up after a recent update.
It’s a digital ocean droplet setup via the standard docker, and has been up & running for about 1.5 years.
Today it failed to restart, here’s the full rebuild attempt after an apt-get upgrade -y & reboot
There are no plugins installed.
It was set up to backup to s3 (my other sites aren’t) and this is of-interest.
16287:C 28 May 02:33:33.481 * DB saved on disk
16287:C 28 May 02:33:33.482 * RDB: 18 MB of memory used by copy-on-write
155:M 28 May 02:33:33.532 * Background saving terminated with success
rake aborted!
Aws::S3::Errors::PermanentRedirect: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/seed-fu-2.3.5/lib/seed-fu/runner.rb:46:in `eval'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/aws-sdk-core-2.5.3/lib/aws-sdk-core/plugins/s3_sse_cpk.rb:19:in `call'
Whilst this error is far from perfectly helpful, I’m guessing that AWS has deprecated (and now removed) whatever style of endpoint URL you’ve configured, and now everything is awful. Check your S3-related site settings and compare them against the S3 documentation for correctness.
I restored the entire image so our site is up, but I have no reason to think I won’t encounter a failure on the next one-click or ./launcher rebuild app failure.
I’m looking here:
I don’t see a smoking gun as to the problem. Any pointers would be appreciated.
ok, I’ve spent enough time troublehshooting this to be certain:
If the s3 backup settings are incorrect, and enable_s3_backups is true, then a normal rebuild will fail.
I manage the site for a friend who entered his s3 keys incorrectly from day 1 and I’ve never had a problem rebuilding his container before.
An update to discourse/discourse_docker which skips trying to talk to s3, or at least gracefully accepts failure, would mean that people aren’t blindsided by an s3 backup failure during a rebuild
The backtrace indicates that the rebuild attempted to upload an avatar to S3, because it was told that was where avatars are stored for this site. Chances are everyone’s avatar uploads have been exploding but nobody noticed, so in a way it’s nice you’ve been told it’s broken now, so it can be fixed…
It’ll be rarer still once discourse warns of s3 failures.
The problem is that once failed, I was left without a way to get the system up & running again… I had to roll back to a system level snapshot from a few days ago, so I could get around the inability to rebuild.
I think that is a major problem, from a design standpoint. In my books, a rebuild should only fail if something is screwed up on my box (i.e. something broken, or outdated), or I can’t reach git to get the newest version. If my box isn’t broken, then rebuild shouldn’t break it. If rebuild notices that I’ve messed up my S3 config then it should say “I can proceed if you want, but your uploads and backups are/will be missing. Do you want to abort or proceed, Meatbag?”