Broken Images and Their S3 URLs

@schleifer Hey, can we please get guidance after this?

OK, that’s something we can work with. First, move all of the files in /default/* into /original/1X/*.

That’s known to us but not to the database. The next step will adjust the path for all the uploads in the DB. Before we change anything else, though, let’s sanity check.

Start a database console:

cd /var/discourse
./launcher enter
rails db

Run this query to take a look at this query:

select id,url from uploads where id > 0 and url not like '//PREFIX/original/%'

You’ll need to replace PREFIX with BUCKET + ā€˜.s3.dualstack.’ + REGION + ā€˜.amazonaws.com’, which would give something like //pesioforum.s3.dualstack.us-west-2.amazonaws.com/original/%.

That should output (0 rows). If not, then we have extra steps.

2 Likes

There are 7 uploads for your query and none of them are from S3.
So all S3 links point to original only, right? The 7 uploads are from before we started using s3 for uploads (Oct 2018).

Of the S3 links (2614),
2368 use //pesioforum.s3.dualstack.ap-south-1.amazonaws.com
246 use //pesioforum.s3.ap-south-1.amazonaws.com

Both links work, just mentioning it here as it might affect any regex we might use.

1 Like

@schleifer Please help us finish this. :slight_smile:

OK, so you should be clear to move files from /default/ to /original/1X.

1 Like

You can migrate those into S3 by running rake s3:upload_assets

The dualstack endpoint works for IPv6 and IPv4. The other is IPv4-only.

There’s a script in the image for remapping string in the database. Before running it, ALWAYS TAKE A BACKUP via /admin/backups (Hamburger → Admin → Backups).

This should fix up the 246:

discourse remap '//pesioforum.s3.ap-south-1.amazonaws.com/original/' '//pesioforum.s3.dualstack.ap-south-1.amazonaws.com/original/'

After moving everything from /default/ into /original/1X/ we can remap those files in the DB. But before that, we should make sure every thing in /original/2X is actually there.

Does this query return the same number of rows as the count of actual objects under that path in the bucket?:
select url from uploads where url like '//pesioforum.s3.dualstack.us-west-2.amazonaws.com/original/2X/%'

3 Likes

Hey @schleifer

You can migrate those into S3 by running rake s3:upload_assets

I ran this, and it uploaded assets (js, css, etc.) for the website. The 7 files were not uploaded.
I found rake uploads:migrate_to_s3 but wanted to confirm that it was the right task for this.

There’s a script in the image for remapping string in the database

This worked well and there aren’t any old non-dualstack links in the uploads table anymore.

But before that, we should make sure every thing in /original/2X is actually there.

Sadly, this isn’t the case. There are 521 files in the S3 bucket, but 2186 records in the uploads table.
I tested a few files that are not in /original/2X/ as needed and they are all in /default/.

Example: From the uploads table,
https://pesioforum.s3.dualstack.ap-south-1.amazonaws.com/original/2X/8/806a660beb158e9f06d07ffcd2370b389bbd250b.jpeg doesn’t exist, but the same file is in
https://pesioforum.s3.dualstack.ap-south-1.amazonaws.com/default/806a660beb158e9f06d07ffcd2370b389bbd250b.jpeg

At this point, as a one-time hack, we are okay with just moving all the files from /original/2X/{}/ into /original/1X/ and updating the posts, etc. with the new links.
New uploads are being placed properly into 2X anyway.

Aha, yes that was the one I actually intended. It should push those last seven up.

Indeed, that’s the best option at this point. Copy all the files out of their /2X/ sub-prefix and move everything into /1X/.

After everything is in place, here’s a command that should update all the database entries:

discourse remap --regex "//pesioforum\.doublestack\.s3\.ap-south-1\.amazonaws\.com/original/[1234]X/([0-9a-f]/){0,}" "//pesioforum.doublestack.s3.ap-south-1.amazonaws.com/original/1X/"

(Remember the previous warning about taking a backup.)

After that some posts may need the HTML version rebuilt via the wrench menu. If there’s more than a few, then you can rebuild everything with rake posts:rebake.

2 Likes

@schleifer that worked! With a modified regex and rebake of all posts, most of the images and uploads work well.
There are few images (non-posts) which still link to /optimized/, but we can fix this manually. Ex: logos in different themes, etc.

Thanks a lot for your help!

4 Likes

Hello, We have hit a similar problem to this in our environment as well, we were hoping to get a hand to resolve it too.

Our problem is similar to this one in many ways:

  1. We have the same value listed under s3 upload bucket and s3 backup bucket
  2. We hit this problem when we upgraded Discourse:
Old version: v2.3.0.beta3
New Version: v2.5.0.beta6
  1. I have exec into the discourse container and queried the database:
SELECT id,url FROM uploads where id > 0 and url not like '//acme-forum.s3.dualstack.us-west-2.amazonaws.com/original/%';
 id |                                                url
----+----------------------------------------------------------------------------------------------------
  1 | /uploads/default/original/1X/eb17xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxc33.png
  2 | /uploads/default/original/1X/b87fxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxv21.png
 78 | //acme-forum.s3-us-west-2.amazonaws.com/original/1X/1205xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxv045.png
(3 rows)
  1. I have exec into the discourse container and queried the database:
select url from uploads where url like '//acme-forum.s3.dualstack.us-west-2.amazonaws.com/original/3X/%';
 //acme-forum.s3.dualstack.us-west-2.amazonaws.com/original/3X/6/2/6267xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxf607c.jpeg
(7953 rows)
  1. I have checked how many itms are in ./original/3X/, the answer is 251 items

Question:

  1. we’re using dualstack and I don’t want to re-map our urls to not use it.
  2. our folder structure looks different, we have things like 3X/X/Y (eg: 3X/7/a) so how can we move everything from default to 3X/*, it will still not map correctly

My current thinking is to write a script that will reference the output from Step 4 to find out where to move the file back into the ./original/3X/X/Y folder.

The only problem is that when I did that, dualstack has not hosted this file yet. What I mean is, when I replace the file to original/3X/X/Y, I can see it when I go to:
Broken https://acme-forum.s3.dualstack.us-west-2.amazonaws.com/original/3X/6/b/6b6xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxa001.png

works https://acme-forum.s3-us-west-2.amazonaws.com/original/3X/6/b/6b6xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxa001.png

Update it turns out the dualstack endpoint was never broken like I thought, I made a mistake when I originally copied the image file into ./original/3X/6/b where I forgot to allow read from everyone.

So my question is:
Would it be a viable option for me to put the image files from ./default back to ./original/3X/x/y and then not modify the database at all?

Ok so I have an update.
It looks like I can predict where the ./original images are supposed to go but I don’t know how to fix the ./optimized/ images

In our forum, if you browse to one of our posts, it tries to display the ./optimized image.

Is there any way to know which is an optimized image?

My thinking is that optimized images end with the _2_10x10.png, would that be a fair assumption? If that’s the case would it be a viable solution to use a script to copy anything which has something like _2_10x10.png over to optimized and anything without, straight to the ./original folder?

example:

GET https://acme-forum.s3.dualstack.us-west-2.amazonaws.com/optimized/3X/c/c/ccaxxxxxxxxxxxxxxxxxxxxxxxxxxxx85_2_690x268.png
[HTTP/1.1 403 Forbidden 0ms]

thanks!

@41821 If the urls in the uploads table are correct and working, but the posts still try to load the optimized images, then clearing the optimized_images table and rebaking all posts should do it: discourse=> delete from optimized_images;

thanks so much for the feedback, actually I ended up solving (if you could call it that) by writing a script to move the image from the /default directory back to /optimized based on the file-name. This seems to have worked and I don’t have any problems any more.

If this happens again in the future though, I’ll do what you suggested and trash everything from optimized_images and rebake.

thanks!

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.