I’m trying to migrate an existing Discourse server to a new AWS-based environment, storing uploads in S3 in a bucket using server-side encryption with customer-managed keys (SSE-C). During the restore process, uploads aren’t making it into S3 – every single upload fails. With the judicious use of tap|p
debugging, I’ve discovered that the upload is being made, but the validation of the returned etag is failing because the etag returned from S3 after upload is different every time. For example, this is the return value of the put_object
call on two different attempts to restore:
# Run 1
#<struct Aws::S3::Types::PutObjectOutput expiration=nil, etag="\"d49ec2006cfd6fe957af2f711edd9a4b\"", checksum_crc32=nil, checksum_crc32c=nil, checksum_sha1=nil, checksum_sha256=nil, server_side_encryption="aws:kms", version_id="xAF23wQ.zwpoxVmmGiTjxfX0svMZbHAe", sse_customer_algorithm=nil, sse_customer_key_md5=nil, ssekms_key_id="**redacted**", ssekms_encryption_context=nil, bucket_key_enabled=true, request_charged=nil>
# Run 2:
#<struct Aws::S3::Types::PutObjectOutput expiration=nil, etag="\"05edffee421c6aef950b3d4418ada293\"", checksum_crc32=nil, checksum_crc32c=nil, checksum_sha1=nil, checksum_sha256=nil, server_side_encryption="aws:kms", version_id="H2_8SVh.Yx2LKB4GIjhyPbVoj_.Vc1E2", sse_customer_algorithm=nil, sse_customer_key_md5=nil, ssekms_key_id="**redacted**", ssekms_encryption_context=nil, bucket_key_enabled=true, request_charged=nil>
(I know these are the same files because I’m also printing out the client-side MD5 sum and the put_object
request options, which include the filename)
It turns out that the ETag response header behaves… differently when using SSE-C:
Objects encrypted by server-side encryption with customer-provided keys (SSE-C) or AWS Key Management Service (AWS KMS) keys (SSE-KMS) have ETags that are not an MD5 digest of their object data.
The only way I can find to do integrity verification of uploads with SSE-C is to send the Content-MD5
request header, and let S3 do the corruption detection. Note also that the already-uploaded check will also break when using SSE-C, but at least that can be disabled using SKIP_ETAG_VERIFY
.
I’m not submitting a PR straight off because there’s two different ways to approach this:
- just extend the
SKIP_ETAG_VERIFY
to encompass post-upload verification, which is cheap and hacky, and requires users to know that their use of SSE-C means they’ll have to turn that on; or - Switch to using the Content-MD5 header (preferably always) to do upload integrity protection, which has the benefit of working for everyone, at the cost of a far larger PR.
(As an aside, I’m rather disturbed that nobody has hit this before – is nobody using Discourse with SSE-C for uploads?!?)