UpdateGravatar job when the gravatar hasn't changed but the upload file is inaccessible

supermathie · August 24, 2018, 9:27pm

I’ve been racking my brains around why I’ve got lots of “generic heads avatars” but the UpdateGravatar job was not updating the UserAvatar yet not logging ANYTHING… I managed to track it down to:

        if tempfile
          upload = UploadCreator.new(tempfile, 'gravatar.png', origin: gravatar_url, type: "avatar").create_for(user_id)

          if gravatar_upload_id != upload.id
            gravatar_upload&.destroy!
            self.gravatar_upload = upload
            save!
          end
        end

UploadCreator returns the id for the extant file if the gravatar hasn’t changed, so we just abort.

BUT:

[46] pry(main)> Upload.find(3)
=> #<Upload:0x0000561fc5b5b8d8
 id: 3,
 user_id: 1,
 original_filename: "gravatar.png",
 filesize: 166131,
 width: 360,
 height: 360,
 url: "/uploads/default/original/1X/0430db64b0848c22047072b62e52bf733b105010.png",
 created_at: Wed, 01 Aug 2018 19:06:17 UTC +00:00,
 updated_at: Wed, 01 Aug 2018 19:06:17 UTC +00:00,
 sha1: "0430db64b0848c22047072b62e52bf733b105010",
 origin: "https://www.gravatar.com/avatar/e8b7922236865bfc10fb22eb13f9c394.png?s=360&d=404",
 retain_hours: nil,

this looks all well and good until you realize I’m working in the but the upload record is pointing at the load filesystem instead of the . So the whole time, Discourse can’t read the file but the update job won’t put a new one into place.

As for how it got into this state in the first place, that’s something else I’m investigating.

Perhaps nothing needs to change in the UpdateGravatar job itself, perhaps something should be done to recover from this state when retrieving the user avatar (e.g. invalidating the upload)… or maybe this is an UploadCreator bug and it should be verifying the existing upload.is accessible rather than assuming.

I bet that would fix… Other things

sam · August 30, 2018, 3:02am

I think a big pile of pain here can be removed once we run the the migrate local to the cloud job.

I am pretty sure @tgxworld rewrote a lot of the process to avoid this particular failure case. Can we close this @tgxworld ?

tgxworld · August 31, 2018, 1:31am

By , do you mean that you have S3 uploads enabled yet the upload is still being stored on the local file system? If that is the case, s3 uploads might not have been configured correctly since the URL is generated based on the current store the Discourse is currently configured with.

github.com/discourse/discourse

lib/upload_creator.rb

81b99efc6


      
          
          @upload.for_private_message = true if @opts[:for_private_message]
          @upload.for_group_message   = true if @opts[:for_group_message]
          @upload.for_theme           = true if @opts[:for_theme]
          @upload.for_export          = true if @opts[:for_export]
          
          return @upload unless @upload.save
          
          # store the file and update its url
          File.open(@file.path) do |f|
            url = Discourse.store.store_upload(f, @upload)
          
            if url.present?
              @upload.update!(url: url)
            else
              @upload.errors.add(:url, I18n.t("upload.store_failure", upload_id: @upload.id, user_id: user_id))
            end
          end
          
          if @upload.errors.empty? && is_image && @opts[:type] == "avatar"
            Jobs.enqueue(:create_avatar_thumbnails, upload_id: @upload.id, user_id: user_id)

supermathie · August 31, 2018, 1:58am

In this particular case it was Azure, but yeah same idea.

I took a backup from a metal site, restored it to an Azure-enabled site, then had to run the uploads:migrate rake task.

I think I was running into this before running the rake task, but I can easily recreate the scenario.

The right answer is probably “UploadCreator should probably just re-store the file if it’s being uploaded again but can’t be accessed.”

tgxworld · August 31, 2018, 2:31am

I think I know what is happening here. When you upload the same upload again, the sha1 is the same so UploadCreator will just return the existing upload record that matches the sha1. The challenge here is how we’re going to determine that the store has changed so that we can decide if we need to re-store the file.

@sam Do we want this to go into the 2.1 release or is after OK?

supermathie · August 31, 2018, 2:34am

Yep, exactly.

I don’t think we need to determine that; we could just try accessing the existing path. If we can’t access the file, re-store it under the current store’s path.

sam · August 31, 2018, 2:36am

I think the correct thing to do here is to make sure we have a proper rake task to fix this stuff. If you are using s3 and somehow hit a situation where some uploads are both local and missing, we should have a task to clean up this mess. Simplest thing to do here would be to have a task for “clean up local” which nukes all local uploads that are missing.

Any rake tasks we want are totally fine for 2.1

tgxworld · August 31, 2018, 2:39am

We already have the rake task to migrate local uploads to s3. The problem @supermathie is describing is that he feels that UploadCreator should re-upload the file to the store if the UploadCreator detects that the current Upload#url is no longer valid.

sam · August 31, 2018, 2:47am

Yeah BUT … we do not have a rake task for bin/rake nuke_bad_local_uploads_from_orbit (and a dry run for that as well) … this is what I want us to run here on sites exhibiting this issue.

We also got to fix migrate to s3 job so it deals cleanly with the “oh no, this file is missing from local” better.

tgxworld · August 31, 2018, 2:49am

I don’t think it is a bad local file in this case. On the the uploads aren’t shared across instances so depending on which instance the update gravatar job is run, the result for trying to read a local file is different.

sam · August 31, 2018, 2:54am

Oh they absolutely are, the gravatars are all stored on S3. Once you shift to S3 everything must be on S3 always under all conditions.

Technically this is an anti pattern:

github.com

discourse/discourse/blob/9b7cab589ac15a034d1c0e700230c1b3f63f8ba0/app/models/optimized_image.rb#L129-L134


      
          path =
            if local?
              Discourse.store.path_for(self)
            else
              Discourse.store.download(self).path
            end

When on S3 we should always be 1000% on S3… not S3 for some files and not S3 for others.

Gravatar job uses upload creator which stores stuff on S3.

tgxworld · August 31, 2018, 3:01am

That is not what @supermathie is describing/requesting. Based on what I’m reading here, the feature being request here is that UploadCreator should re-store the file if the store for the upload has changed. This is irregardless of whether all uploads have already been migrated properly to the new store or not.

sam · August 31, 2018, 3:04am

Yeah I am pushing back here for now. I would prefer to first have a rake job the “cleans up any giant mess” for 2.1

For 2.2 we can refactor so perhaps we add a column for storage engine into Upload and OptimizedImage and then have some extra resiliency built into these processes, but it is a far bigger change.

For now all I want is, big mess cleanup rake task.

There is an open question around “when/how” do we validate that our s3 pointers are still good and not 404ing. But that is also a hard question to answer.

supermathie · August 31, 2018, 3:09am

There’s other things we could potentially do:

when we try and access an upload but fail (e.g. the avatar) we could e.g. mark it as known-bad
(by the way, when we send back the “default” avatar we should probably set cache headers to only cache for, say, 1 hour? we should do that)

This all started with me taking a metal backup and restoring it onto an Azure site, so let me do that again, do a proper uploads:migrate, and see how it looks after that.

supermathie · August 31, 2018, 4:42am

Never mind… we already do the right thing (of course we do! we know what we’re doing!). Should have checked.

a “default image” result looks like:

HTTP/1.1 200 OK
Cache-Control: max-age=600, public
Content-Length: 2608
Content-Type: image/png
Last-Modified: Fri, 31 Aug 2018 03:48:45 GMT (current time)

vs.

a good result:

HTTP/1.1 200 OK
Cache-Control: max-age=31556952, public, immutable
Content-Length: 1478
Content-Type: image/png
Last-Modified: Fri, 31 Aug 2018 03:30:06 GMT

supermathie · August 31, 2018, 4:43am

A post was split to a new topic: Do not set current time as ‘Last Modified’ when “not found” avatar is used

tgxworld · May 7, 2019, 7:25am

@supermathie I’m going to close this for now. I don’t think it is the application’s job to determine if the store has changed. If the site has been moved to an S3 store, there shouldn’t be any upload records that contain a URL that belongs to a different store.

Topic		Replies	Views
Problems with avatar uploads due to S3 changes Support	22	3313	October 27, 2018
Migrate_to_S3 Fails on Rebake Installation	17	2351	September 10, 2022
Migrate_from_s3 problems Bug	19	3169	January 17, 2021
User cannot use custom avatar with Amazon S3 Bug	42	6447	May 4, 2015
How can I migrate files from the old upload scheme (default/XXX) to the new one (default/original/{1,2}X)? Support	19	3054	September 29, 2022

UpdateGravatar job when the gravatar hasn't changed but the upload file is inaccessible

Related topics