Broken Images and Their S3 URLs

How do you have S3 setup? Did you set any values in app.yml or just in the admin UI? It looks like something is unexpected for DISCOURSE_S3_BUCKET or DISCOURSE_S3_BACKUP_BUCKET.

4 Likes

Yes, like I explained, everything was fine until the recent update. We have followed all the steps.
In particular, we did something like this:

As for these values, I can’t seem to find any variable set in app.yml. It wasn’t previously required. Did something change?

Indeed. Something may not be working right in the code, but we don’t know what nor why. So we need more information.

There are two different ways to setup S3: You can configure the environment in app.yml or you can enter the values in the admin UI. The variables are named slightly differently in each.

The topic you linked to describes how to setup in the admin UI. If that is how you setup your site, can you tell us what you have for s3_upload_bucket, and s3_backup_bucket?

2 Likes

@schleifer Same for me. I haven’t changed anything in those settings, either. I also verified my AWS credentials are working.

Oh, that explains it.

The problem happened because those settings are not supposed to be the same value – search the setup topic for the line “Do I really need to use separate buckets for uploads and backups?”
A maintenance job ran on the contents of the backup bucket and affected the uploads because their values overlapped.

@sam we should probably enforce that in code, not just documentation

To fix up your site, there’s two steps:

First, you’ll need to change the backups prefix – adding /backups to the end as described in the setup topic, is sufficient.

After, move everything in the S3 bucket back into the correct place. Everything in the top “default” folder should get put back at the top level.
E.g. there’s probably “default/originals” folder there which should be moved up.

You’ll have to use the AWS web console or some other tool to browse the bucket.

5 Likes

Alright, thank you so much! I shall try the steps out and come back here in case anything ever happens. Any idea why this happened all of a sudden? :sweat_smile:

Hey @schleifer that makes sense and added the backups prefix to the bucket name.

As for the existing uploads, all the uploads are in /default/ (not in sub-folders). The image urls in posts (and everywhere else) use /original/* or /optimized/*.

If we move everything in the default folder up one level (to the root), then the images will be in /*.
And no, there aren’t any folders within defaults, just upload files. It seems to contain files with standard 40-char hash filename as well as some with suffixes like “_2_10x10” (which I presume are from optimized).

How do you suggest fixing this? Fixing all posts with new links will take time. Is it possible to somehow group the files in the correct folders based on this filename?

1 Like

@schleifer any updates on this?

That’s…unexpected. We’ll have to fix up a bunch of stuff by hand, then.

The most important question would be “are new uploads going to the correct location?”

Assuming that’s true, you can put the old uploads in a known location and adjust their database entry. How many files are there in /default/?

3 Likes

The new uploads are working as expected into sub-folders, fortunately. And the links in posts point to the correct location.

There are >10k files in /default/. Editing each post manually seems like a lot of work. Is there any way to script this? Maybe with a regex replace on all posts?

That’s the plan, yes. The next thing to do is put all the AWOL files in a known location. In the bucket, what sub-directories are there under /original/? There should be /1X/ and there may be others.

3 Likes

This is the complete folder structure. New uploads (works fine) are stored in /original/2X/*/.jpg

/
|---backups
    |---default
        |---(bunch if backups .tar.gz files)
|---default
    |---(10719 uploaded files pdf, images, etc.)
|---inventory
    |---1/pesuioforum/{optimized,original}/[DATE]/manifest.{checksum,json}
|---optimized
    |---1X
        |---(2 files)
    |---2X
        |---{0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f}
            |---(uploaded files that start with the folder's name char above) (1055 files in total)
|---original
    |---2X
        |---(Same as optimized/2X, 520 files)
|---tombstone
    |---optimized
        |---1X
            |---(2 files)
    |---original
        |---2X
            |---(Same as optimized/2X, 1 file)

The next thing to do is put all the AWOL files in a known location.

The /default/ already seems to have all the images.

1 Like

@schleifer Hey, can we please get guidance after this?

OK, that’s something we can work with. First, move all of the files in /default/* into /original/1X/*.

That’s known to us but not to the database. The next step will adjust the path for all the uploads in the DB. Before we change anything else, though, let’s sanity check.

Start a database console:

cd /var/discourse
./launcher enter
rails db

Run this query to take a look at this query:

select id,url from uploads where id > 0 and url not like '//PREFIX/original/%'

You’ll need to replace PREFIX with BUCKET + ‘.s3.dualstack.’ + REGION + ‘.amazonaws.com’, which would give something like //pesioforum.s3.dualstack.us-west-2.amazonaws.com/original/%.

That should output (0 rows). If not, then we have extra steps.

2 Likes

There are 7 uploads for your query and none of them are from S3.
So all S3 links point to original only, right? The 7 uploads are from before we started using s3 for uploads (Oct 2018).

Of the S3 links (2614),
2368 use //pesioforum.s3.dualstack.ap-south-1.amazonaws.com
246 use //pesioforum.s3.ap-south-1.amazonaws.com

Both links work, just mentioning it here as it might affect any regex we might use.

1 Like

@schleifer Please help us finish this. :slight_smile:

OK, so you should be clear to move files from /default/ to /original/1X.

1 Like

You can migrate those into S3 by running rake s3:upload_assets

The dualstack endpoint works for IPv6 and IPv4. The other is IPv4-only.

There’s a script in the image for remapping string in the database. Before running it, ALWAYS TAKE A BACKUP via /admin/backups (Hamburger → Admin → Backups).

This should fix up the 246:

discourse remap '//pesioforum.s3.ap-south-1.amazonaws.com/original/' '//pesioforum.s3.dualstack.ap-south-1.amazonaws.com/original/'

After moving everything from /default/ into /original/1X/ we can remap those files in the DB. But before that, we should make sure every thing in /original/2X is actually there.

Does this query return the same number of rows as the count of actual objects under that path in the bucket?:
select url from uploads where url like '//pesioforum.s3.dualstack.us-west-2.amazonaws.com/original/2X/%'

3 Likes

Hey @schleifer

You can migrate those into S3 by running rake s3:upload_assets

I ran this, and it uploaded assets (js, css, etc.) for the website. The 7 files were not uploaded.
I found rake uploads:migrate_to_s3 but wanted to confirm that it was the right task for this.

There’s a script in the image for remapping string in the database

This worked well and there aren’t any old non-dualstack links in the uploads table anymore.

But before that, we should make sure every thing in /original/2X is actually there.

Sadly, this isn’t the case. There are 521 files in the S3 bucket, but 2186 records in the uploads table.
I tested a few files that are not in /original/2X/ as needed and they are all in /default/.

Example: From the uploads table,
https://pesioforum.s3.dualstack.ap-south-1.amazonaws.com/original/2X/8/806a660beb158e9f06d07ffcd2370b389bbd250b.jpeg doesn’t exist, but the same file is in
https://pesioforum.s3.dualstack.ap-south-1.amazonaws.com/default/806a660beb158e9f06d07ffcd2370b389bbd250b.jpeg

At this point, as a one-time hack, we are okay with just moving all the files from /original/2X/{}/ into /original/1X/ and updating the posts, etc. with the new links.
New uploads are being placed properly into 2X anyway.