Quick AWS S3 Files Storage Question - Structure?

Hi!

Had a search in #support but couldn’t find an existing answer, so forgive me for a quick support question:

We’ve been using S3 to store Discourse files since about 2016. When I look at the root of the S3 bucket I was surprised to see the directory structure ‘above’ the ones I expected to be there, e.g. optimized/ original/ etc.

Do people think it is safe to remove the numbered directories at root e.g. 99/ because of perhaps a miscopy that happened a long time ago? It’s possible they were copied a long time ago into the wrong place. Is it possible the posts would have old baked in paths to those locations that I don’t want to break?

Here’s how it looks, and my goal is to clean it up (if it needs it at all):

image

Put another way, if anyone uses S3 already, what’s the name of the objects at the root of the storage please?

Thanks for any help. Cheers :beers:

1 Like

Good question, who would know the history of this @sam?

1 Like

I think we might also be a bit weird :slight_smile: in that we’ve been running Discourse since about 2015, and have changed storage locations over the years.

We started out using local storage for files, got some growth and then moved over to using S3 stored uploads. At that time I don’t think we moved the existing ones, by rebaking the posts, so the oldest posts still use the non local storage URLs.

One thing I should point out is that we’re not going to removing anything as it is right now, because even if the organization has changed over the years, we’re taking about small numbers where it’s safest to leave what we have anyway.

Honestly I don’t know how these 96->99 folders came about, they are not where we store uploads.

1 Like

Thanks for looking @sam, good to know - I wonder if we manually copied up to S3 what we had, and it went wrong. It hasn’t caused any harm.

We have at S3 bucket root

_emoji/
_optimized/
optimized/
original/
tombstone/

Plus a whole bunch of numbered directories going from 1/ to 225/. Each numbered directory has a single image file in it, with a name like ‘874c0706216382af.jpg’.

Tombstone has a S3 lifecycle to mark as deleted after 30 days.

So guessing, but is it just optimized/ original/ and tombstone/ that are used?

1 Like

Yes, that sounds about right.

2 Likes

Those files are always there on very old forums (around 2014). I think they predate optimized and original and I suspect they are still being referred to.

1 Like

I couldn’t resist finding this out. This is indeed an old upload scheme. It was abandoned later than I suspected, in May 2015, with this commit.

https://github.com/discourse/discourse/commit/9ded21e4c61d4c1e71f57a778d519ddea26c96e2

These uploads are indeed still being used, so do not remove them!

3 Likes

Thanks Michael. As these are when we first got going in 2014 then the file numbers are small and we’ll keep them where they are. :slight_smile:

Interesting enough, we did move server recently, and went for a Discourse backup / restore route (rather than upgrade in place of the base Unix version) and I think (although not 100% sure) that the restore did not put these local files in place properly. They were contained in the backup archive but the restore process only seemed to work for optimized/originals down.

It wasn’t a big deal, because we could tar -x them ourselves from the backup archive (when we noticed the old and new servers were different in their uploads/ contents), but something that might trip someone up, so wanted to mention here.

Even though 99.9% of our uploads are served from S3 now (we switched from local to S3 fairly early on) I think that we must have copied up the local files when we manually create the S3 bucket initially. On retrospect we probably should have rebaked posts, but it always worked well enough with the very small (and old) posts have the local file upload URL).

1 Like