Can't rebuild due to AWS SDK gem bump and new AWS Data Integrity Protections

The AWS SDK maintainers broke the compatibility. It is up to your S3 clone provider to get up to speed and implement better compatibility so you can remove the workarounds.

3 Likes

Just to clarify, this only affects JS/map/CSS files in assets, and won’t affect uploads, right?
I mean, will it impact clearing orphaned files?

By the way, I assume this issue might affect updating Discourse from the admin panel?
Actually, updating via the admin panel failed for me, which is why I performed a rebuild.

Yes, it is only for assets.

This rake task? No.

But the whole AWS SDK change may have also broken that for people on non-compatible clones.

1 Like

I"m pretty sure that’s wrong. So we probably need to turn off clean up uploads too. That’ll just cause errors when you’re running, though. It won’t cause you to not be able to rebuild.

Sounds likely. Maybe we need some “skip_s3_delete” setting to solve this until the other providers catch up? And maybe automatically set it for providers we know are broken?

Is that rake task the only way that expired assets get removed?

I’m just wondering, why not add an option to keep the assets on the Discourse core server (I mean, without storing them on S3)?

As long as it doesn’t affect uploads or the process of clearing orphaned files, it seems like a viable solution.

Yes. It is not like this is a big deal. Normal sites updating every once in a while won’t see much difference.

People can setup their own lifecycle rules if they care about this stuff.

Isn’t setting up clean up uploads = false already that?

Lol no. Discourse officially supports S3. While I went out of my way starting the whole Configure an S3 compatible object storage provider for uploads wiki and adding a few toggles to increase clone compatibility we have zero plans on investing more time into that today.

If the community wants to send a couple PRs that increase compatibility and are default off that is pr-welcome, but don’t expect to see official support in core for every clone any time soon.

3 Likes

FWIW, it looks like Digital Ocean has no trouble deleting backups nor expiring missing assets.

For providers that are broken, it’d take a long time before unneeded assets caused much of a problem. Keeping a whole bunch of backups including a huge database and all uploads, could be quite a problem if you’re paying for storage.

1 Like

Hi - I’m Pat Patterson, Chief Technical Evangelist at Backblaze; I arrived on this thread because I have a self-hosted proof-of-concept Discourse forum, and I happened to bump into this exact issue today while configuring my forum to use Backblaze B2 for backups and uploads.

Setting AWS_REQUEST_CHECKSUM_CALCULATION & AWS_RESPONSE_CHECKSUM_CALCULATION to WHEN_REQUIRED is a helpful workaround for basic cases of uploading and downloading files, but it’s helpful to know that it doesn’t cover a number of scenarios, including:

  • Deleting files - Discourse is using the DeleteObjects S3 operation to delete multiple files in a single API call, as it should.
  • Uploading files to buckets with object lock enabled.

The problem is that a checksum (either the Content-MD5 header or one of the new checksum headers) is required (rather than just supported) for these operations, and this causes the current AWS SDKs to provide the new checksum header. As far as I know, there is no way to override this and have the SDK provide Content-MD5 as it used to.

Our engineers are working on resolving all this; in the meantime, the best mitigation is to use version 1.177.0 or earlier of the aws-sdk-s3 gem.

I did try to downgrade the AWS SDK gem versions in my PoC deployment by editing the Gemfile and replacing

gem "aws-sdk-s3", require: false
gem "aws-sdk-sns", require: false

with

gem "aws-sdk-core", "~> 3.215.1", require: false
gem "aws-sdk-kms", "~> 1.96.0", require: false
gem "aws-sdk-s3", "~> 1.177.0", require: false
gem "aws-sdk-sns", "~> 1.92.0", require: false

but my bundle-fu is not strong, and I only succeeded in breaking my deployment with the error:

/var/www/discourse/config/initializers/100-sidekiq.rb:69:in `<main>': undefined method `logger=' for module Sidekiq (NoMethodError)

  Sidekiq.logger = Logger.new(nil)
         ^^^^^^^^^
	from /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/railties-7.2.2.1/lib/rails/engine.rb:689:in `load'
	from /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/railties-7.2.2.1/lib/rails/engine.rb:689:in `block in load_config_initializer'
...

I guess I missed some vital step.

Without wishing to cast shade at our friends at DO, they did this by updating their service to simply ignore the new checksum headers rather than rejecting the API call due to the unsupported checksum.

Their incident report says:

Note that Spaces does not currently verify data integrity checksums sent by the AWS CLI and AWS SDKs as part of upload requests

We decided that simply accepting and storing data that may not match the checksum that the API client supplied was a bad thing.

5 Likes

thanks for posting!

Yeah, and AWS SDK maintainers are only giving us the cold shoulder on this