Broken Images and Their S3 URLs

@schleifer Hey, can we please get guidance after this?

OK, thatā€™s something we can work with. First, move all of the files in /default/* into /original/1X/*.

Thatā€™s known to us but not to the database. The next step will adjust the path for all the uploads in the DB. Before we change anything else, though, letā€™s sanity check.

Start a database console:

cd /var/discourse
./launcher enter
rails db

Run this query to take a look at this query:

select id,url from uploads where id > 0 and url not like '//PREFIX/original/%'

Youā€™ll need to replace PREFIX with BUCKET + ā€˜.s3.dualstack.ā€™ + REGION + ā€˜.amazonaws.comā€™, which would give something like //pesioforum.s3.dualstack.us-west-2.amazonaws.com/original/%.

That should output (0 rows). If not, then we have extra steps.

2 Likes

There are 7 uploads for your query and none of them are from S3.
So all S3 links point to original only, right? The 7 uploads are from before we started using s3 for uploads (Oct 2018).

Of the S3 links (2614),
2368 use //pesioforum.s3.dualstack.ap-south-1.amazonaws.com
246 use //pesioforum.s3.ap-south-1.amazonaws.com

Both links work, just mentioning it here as it might affect any regex we might use.

1 Like

@schleifer Please help us finish this. :slight_smile:

OK, so you should be clear to move files from /default/ to /original/1X.

1 Like

You can migrate those into S3 by running rake s3:upload_assets

The dualstack endpoint works for IPv6 and IPv4. The other is IPv4-only.

Thereā€™s a script in the image for remapping string in the database. Before running it, ALWAYS TAKE A BACKUP via /admin/backups (Hamburger ā†’ Admin ā†’ Backups).

This should fix up the 246:

discourse remap '//pesioforum.s3.ap-south-1.amazonaws.com/original/' '//pesioforum.s3.dualstack.ap-south-1.amazonaws.com/original/'

After moving everything from /default/ into /original/1X/ we can remap those files in the DB. But before that, we should make sure every thing in /original/2X is actually there.

Does this query return the same number of rows as the count of actual objects under that path in the bucket?:
select url from uploads where url like '//pesioforum.s3.dualstack.us-west-2.amazonaws.com/original/2X/%'

3 Likes

Hey @schleifer

You can migrate those into S3 by running rake s3:upload_assets

I ran this, and it uploaded assets (js, css, etc.) for the website. The 7 files were not uploaded.
I found rake uploads:migrate_to_s3 but wanted to confirm that it was the right task for this.

Thereā€™s a script in the image for remapping string in the database

This worked well and there arenā€™t any old non-dualstack links in the uploads table anymore.

But before that, we should make sure every thing in /original/2X is actually there.

Sadly, this isnā€™t the case. There are 521 files in the S3 bucket, but 2186 records in the uploads table.
I tested a few files that are not in /original/2X/ as needed and they are all in /default/.

Example: From the uploads table,
https://pesioforum.s3.dualstack.ap-south-1.amazonaws.com/original/2X/8/806a660beb158e9f06d07ffcd2370b389bbd250b.jpeg doesnā€™t exist, but the same file is in
https://pesioforum.s3.dualstack.ap-south-1.amazonaws.com/default/806a660beb158e9f06d07ffcd2370b389bbd250b.jpeg

At this point, as a one-time hack, we are okay with just moving all the files from /original/2X/{}/ into /original/1X/ and updating the posts, etc. with the new links.
New uploads are being placed properly into 2X anyway.

Aha, yes that was the one I actually intended. It should push those last seven up.

Indeed, thatā€™s the best option at this point. Copy all the files out of their /2X/ sub-prefix and move everything into /1X/.

After everything is in place, hereā€™s a command that should update all the database entries:

discourse remap --regex "//pesioforum\.doublestack\.s3\.ap-south-1\.amazonaws\.com/original/[1234]X/([0-9a-f]/){0,}" "//pesioforum.doublestack.s3.ap-south-1.amazonaws.com/original/1X/"

(Remember the previous warning about taking a backup.)

After that some posts may need the HTML version rebuilt via the wrench menu. If thereā€™s more than a few, then you can rebuild everything with rake posts:rebake.

2 Likes

@schleifer that worked! With a modified regex and rebake of all posts, most of the images and uploads work well.
There are few images (non-posts) which still link to /optimized/, but we can fix this manually. Ex: logos in different themes, etc.

Thanks a lot for your help!

4 Likes

Hello, We have hit a similar problem to this in our environment as well, we were hoping to get a hand to resolve it too.

Our problem is similar to this one in many ways:

  1. We have the same value listed under s3 upload bucket and s3 backup bucket
  2. We hit this problem when we upgraded Discourse:
Old version: v2.3.0.beta3
New Version: v2.5.0.beta6
  1. I have exec into the discourse container and queried the database:
SELECT id,url FROM uploads where id > 0 and url not like '//acme-forum.s3.dualstack.us-west-2.amazonaws.com/original/%';
 id |                                                url
----+----------------------------------------------------------------------------------------------------
  1 | /uploads/default/original/1X/eb17xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxc33.png
  2 | /uploads/default/original/1X/b87fxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxv21.png
 78 | //acme-forum.s3-us-west-2.amazonaws.com/original/1X/1205xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxv045.png
(3 rows)
  1. I have exec into the discourse container and queried the database:
select url from uploads where url like '//acme-forum.s3.dualstack.us-west-2.amazonaws.com/original/3X/%';
 //acme-forum.s3.dualstack.us-west-2.amazonaws.com/original/3X/6/2/6267xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxf607c.jpeg
(7953 rows)
  1. I have checked how many itms are in ./original/3X/, the answer is 251 items

Question:

  1. weā€™re using dualstack and I donā€™t want to re-map our urls to not use it.
  2. our folder structure looks different, we have things like 3X/X/Y (eg: 3X/7/a) so how can we move everything from default to 3X/*, it will still not map correctly

My current thinking is to write a script that will reference the output from Step 4 to find out where to move the file back into the ./original/3X/X/Y folder.

The only problem is that when I did that, dualstack has not hosted this file yet. What I mean is, when I replace the file to original/3X/X/Y, I can see it when I go to:
Broken https://acme-forum.s3.dualstack.us-west-2.amazonaws.com/original/3X/6/b/6b6xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxa001.png

works https://acme-forum.s3-us-west-2.amazonaws.com/original/3X/6/b/6b6xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxa001.png

Update it turns out the dualstack endpoint was never broken like I thought, I made a mistake when I originally copied the image file into ./original/3X/6/b where I forgot to allow read from everyone.

So my question is:
Would it be a viable option for me to put the image files from ./default back to ./original/3X/x/y and then not modify the database at all?

Ok so I have an update.
It looks like I can predict where the ./original images are supposed to go but I donā€™t know how to fix the ./optimized/ images

In our forum, if you browse to one of our posts, it tries to display the ./optimized image.

Is there any way to know which is an optimized image?

My thinking is that optimized images end with the _2_10x10.png, would that be a fair assumption? If thatā€™s the case would it be a viable solution to use a script to copy anything which has something like _2_10x10.png over to optimized and anything without, straight to the ./original folder?

example:

GET https://acme-forum.s3.dualstack.us-west-2.amazonaws.com/optimized/3X/c/c/ccaxxxxxxxxxxxxxxxxxxxxxxxxxxxx85_2_690x268.png
[HTTP/1.1 403 Forbidden 0ms]

thanks!

@41821 If the urls in the uploads table are correct and working, but the posts still try to load the optimized images, then clearing the optimized_images table and rebaking all posts should do it: discourse=> delete from optimized_images;

thanks so much for the feedback, actually I ended up solving (if you could call it that) by writing a script to move the image from the /default directory back to /optimized based on the file-name. This seems to have worked and I donā€™t have any problems any more.

If this happens again in the future though, Iā€™ll do what you suggested and trash everything from optimized_images and rebake.

thanks!

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.