Troubles with uploads after DO Spaces datacenter change

Dear all,
after having searched the form at the best of my abilities without finding a solving answer, I’m asking support for an odd situation raised after a recent change of Digital Ocean Datacenter.
So, we had all our uploads stored on a Digiatal Ocean Spaces Bucket, in the ams3 datacenter.
After 2 huge HW issues and consequent service disruption in a little more than month, last weekend we decided to move all our files to fra1 datacenter.

Here are the steps I followed:

  1. In preparation for the transfer I uploaded all the files we had on ams3 (the 3 classic dir originals, optimized and tombstone) to the new bucket on fra1 using s3cmd.
  2. I went on the forum settings and set the new endpoint for attachments, cdnl and backup bucket.
  3. I launched a full post rebake, expecting it to fix all things in one go.

Unfotunately this was not the case. Most of the attachments were “ported” correctly, but a few hundreds were not. It’s not clear to me what happened, but these missing attachments were moved in the tombstone directory.

I thought that launching the rake task rake uploads:recover_from_tombstone would have taken care of that, but nope. The files are seen, but at the end of the task no attachments are recovered, images are still not visible in posts.

I started to dig a bit deeper and I found out that by running UploadRecovery.new(dry_run: true).recover (found digging in meta) in the rails console was giving me precious information, such as the post URL as well as the short or logn URL of the problematic image.

For the URLs returned in the short form, so I wrote bit of python code to “translate” the short upload filename into the long form, so that I could go and check for the presence of the file in the bucket.
I did, and I can confirm all the missing files are there, in the new bucket as well as in the old. Part of the missing uploads I found sitting in the tombstone directory, as expected, but some others are oddly still in the original directory. The files are not corrupted. If I access them from url they open correctly in both datacenters, and if I dump them locally on my linuxbox I can open them with no errors.

Somehow the upload recovery process fails to pick them up and fix whatever is messed up in the DB. :man_shrugging:

So my questions are:

  • is there a way to understand why even if the uploads files are in tombstone (or in original), the rake task is failing to recover them?
  • what would be the correct set of steps to ensure that in case of bucket change or even transition from DO to another aws compatible environment, all attachments are moved and prepared correctly for the swap? More in general, what should one do, step by step, in such case? Clearly a simpe rebake is not enough. :confused:
  • what does the task posts:invalidate_broken_images do? I mean, what does invalidate mean?

Thanks in advance, I am struggling with this since a week and I really need to put this to rest or I will get crazy :smiley: :stuck_out_tongue:
FYI suggestion to re-load all the 800+ attachments by hand is not considered a valid answer. There must be an algorithmic reason… :laughing:

2 Likes

I think that you missed a DbHelper.remap('oldbucketurl', 'newbucketurl') between steps 2 and 3.

4 Likes

Hi @falco, thanks for your response.
Yes, initially I did forget.
I ran that after I found it digging here on meta. :confused: and it helped to recover some of the files.
I did a full rebake, by the way, after running it.

What else could I try?

1 Like

So, I might have a clue on what is going on here.
I’ve not thought to mentioned a fact related to the rake uploads: recover_from_tombstone rake task output, that might point to some interesting hint.

It seems that the task is actually finding the uploads files in tombstone, but throws me a warning about something (it is the upload full filename) being incorrect. Like this:

Warning /t/i-miei-modellini-volanti/28272/212 had an incorrect 487b613752a0c338646fecc942512e5de9afeb3f should be c87c4f08d1a9aac3f43d19722cfd5a94f2544272 storing in custom field 'rake uploads:fix_relative_upload_links' can fix this

Running a find command on my local copy of the uploads directories, it turns out I do have a file called 487b613752a0c338646fecc942512e5de9afeb3f.jpeg.

The shortlink belonging to this specific upload is upload://alcIv6jVlmjiEOEBh8fNDJyRms7.jpeg, and applying the base62 algorithm that calculates the full filename corresponding to it, it turns out that the value is 487b613752a0c338646fecc942512e5de9afeb3f, precisely the filename the recover_from_tombstone rake task warns me to be wrong. :thinking:

Why is the tool claiming it’s wrong, and should be c87c4f08d1a9aac3f43d19722cfd5a94f2544272 instead?

Just in any case I run the rake uploads:fix_relative_upload_links task several time, and then re-run rake uploads: recover_from_tombstone but nothing seems to change.

Edit:
Searching for 487b613752a0c338646fecc942512e5de9afeb3f in a database backup I made before changing bucket, I can see that the record in the uploads table belonging to this image was showing exactly this hex filename, so even more I cannot understand why the rake task complains about it.

This is one of the oldest misunderstandings on Meta.
You do not need to rebake after a well-targeted remap.

2 Likes

You might be right, but the thing is, it’s difficult to know exactly what to do and not to do in this cases without a tutorial/guide from the devs.
One has always the feeling that should have done something more or in a different order, sort of finding out a recipe that works distilling it out of tens of posts written across the last 3/4 years. :stuck_out_tongue:
Rebaking seems to be the panacea for many things and harmless for existing posts.

It’s a complicated way to say that given how many times people were kinda stumbling on issues with uploads management and such, a nice official guide from the Staff would be an important reference. :wink:

1 Like

Sorry, neet to reemerge this one.
During last week I spent some time reading the uploads rake tasks code trying to understand what goes on under the hood of the recover_from_tombstone and recover one.
It’s a difficult thing because of the encapsulation of the classes, so I’d say I mostly failed.

What I understood though (please @Falco correct me if I am wrong) is that the file name on disk of an upload is created combining its sha1 and its original extension. Then it’s stored on disk/aws in a directory whose path depends on the first and sometimes second letter in its name, within 1X or 2X or 3X… (how are these determined I do not understand).
Finally the sha1 and file name are stored, among other things, in records of the uploads table in PostgreSQL.

Going back to what happened during our change of Digital Ocean datacenter, this is what happened at the best of my understanding:

  1. we copied all the files from ams3 to fra1
  2. we failed to perform DbHelper.remap('oldbucketurl', 'newbucketurl') as suggested by @falco but it was not clear to us we had to in this case
  3. we launched a global rebake. At this stage thousands of images “broke” and many were moved to tombstone. It’s not completely clear to me why.
  4. I realised something was wrong, interrupted the on-going rebake and found about the remap command searching here in meta. We launched the DbHelper.remap('oldbucketurl', 'newbucketurl') task
  5. In order to recover the images that were moved to tombstone at step 3 we launched a rake uploads:recover_from_tombstone which recovered some, but left hundreds of others unrecovered, and showed errors about the sha1 of the files such as Warning /t/eclisse-parziale-di-sole-04-01-2011/14456/50 had an incorrect 3f5a1c136b97aebac4a188432c8e3ab7487f3bca should be ec88ee9eea18f3b8424bfef796345c68582911b5 storing in custom field 'rake uploads:fix_relative_upload_links' can fix this as if the file was somehow changed and hence the sha1 is now different. The recovery of such files fails.

We never changed the files while moving them between the two datacentres. Using s3cmd we were literally dumping them locally from the old bucket and immediately re-uploading them in the new one.

Why should the sha1 calculated by Discourse be different at all?

Would it be possible to just force the recover task to ignore the sha1 discrepancy and just adapt import in DB what is there or rename the existing files with the new sha1 while recovering them?

Am I missing something obvious? Thanks all for your help.

So, just to give this thread a closure that might be useful to somebody else, this is how we solved the situation.

Essentially, since it was impossible to recover the missing attachments through the various uploads recovery rake tasks, I have put together a Ruby script (apologies in advance, I am definitively NOT a Ryby or Rails developer, so I bet the code is inefficient and ugly, but that’s beside the point :stuck_out_tongue: ) that:

  1. Finds all posts containing the string upload://
  2. Extracts the shortlink of each upload and transforms it in its long form sha1 hash
  3. Queries the Uploads table
  4. If an attachment having the sha1 hash is found in Uploads, that upload is skipped, else the URL of that upload is checked in the old Digital Ocean bucket/spaces.
  5. If the uplink is found in the old bucket/spaces then the shortlink is replaced with the URL to the same upload in the old bucket.
  6. If modified, trigger a rebake of the original post, to let Discourse do the heavy lifting of re-downloading locally the “lost” upload and re-create all it needs in the DB.

To avoid blacklisting and to reduce the load on the server, an interval of 20 seconds is introduced every time a rebake is requested.


def remoteFileExist(url, retries=3)
    puts "Requesting #{url} ..."
    uri = URI(url)
    response = nil
    res = Net::HTTP.get_response(uri)
    puts res['content-type']
    if res.code[0,1] == "2" and res['content-type'].include? 'image'
        return true
    else
        return false
    end
    rescue Net::ReadTimeout => e
        puts "TRY #{retries}/n ERROR: timed out while trying to connect #{e}"
        if retries <= 1
            raise
        end
        remoteFileExist(url, retries - 1)
    end
end


####################################################################


posts=Post.where("raw like '%upload://%' " ).order('topic_id ASC, post_number DESC');
idx = 0;
posts.each do |p|
    idx = idx + 1;
    puts ""

    matches = p.raw.scan(/(!\[(.)*\]\(upload:\/\/([a-zA-Z0-9]+)\.(jpeg|jpg|png|gif|pdf|mp3|mp4|mov)\))/)

    new_raw = p.raw

    matches.each do |m|  
        short_url = m[0];
        short_sha = m[2];
        ext = m[3];
        long_sha = Base62.decode(short_sha).to_s(16).rjust(40,"0")

        upload = Upload.where('sha1 = ?', long_sha)

        puts "#{short_url} -> #{long_sha}\n"

        if upload.all.count == 0
            puts "#{long_sha} not found in DB. Recovering from ams3...\n"

            subdir1 = long_sha[0]
            subdir2 = long_sha[1]

            new_url1 = "https://discourse-data.ams3.digitaloceanspaces.com/original/3X/#{subdir1}/#{subdir2}/#{long_sha}.#{ext}"
            test1 = remoteFileExist(new_url1)
            if test1
                new_raw = new_raw.gsub(short_url, "\n#{new_url1}")
            else
                new_url2 = "https://discourse-data.ams3.digitaloceanspaces.com/original/2X/#{subdir1}/#{long_sha}.#{ext}"
                if remoteFileExist(new_url2)
                    new_raw = new_raw.gsub(short_url, "\n#{new_url2}")
                end
            end
            puts ""
            sleep 5
        end
    end

    if p.raw != new_raw
        puts "OLD\n"
        puts p.raw
        puts "-----------"
        puts "NEW\n"
        puts new_raw
        puts "-----------"
        puts "UPDATING!"
        # goahead = gets
        p.raw = new_raw
        p.cooked = ''
        p.save
        p.rebake!(invalidate_broken_images: true);
        puts "*******************************************"
        sleep 30
    else
        puts "SKIP!"
        puts "*******************************************"
        sleep 1
    end
end

1 Like