Is there any reason why daily backups would see a reduction in size?

Last few days have been 3.4GB and today’s backup is reporting 3.3GB - is this normal or does it sound like something’s up? I don’t recall any large (or image) threads being deleted…

1 Like

Yes, this is very normal.

Recently @tgxworld did a massive data refactor that removed a large amount of bloat around the email logs table. This data saving is expected.

One interesting thing though is how image heavy your forum is (cause if it was not the savings would have been even more noticeable) , images really do eat up the vast majority of backups in so many cases:)

5 Likes

Thanks Sam.

I keep meaning to ask about images - is there anyway we can set Discourse to resize images to something like 1024px wide?

Also… what about storing images outside of the DB? (Though would we need to use something like Rsync of backups?)

I remember there was some discussion around these topics a few years ago, not sure if they’re possible yet?

Images are stored outside the DB already!

Going forward you can lower max image size and megapixels on your forum to avoid the large images. But fixing the history here… I am not sure if we have a rake task for that quite yet.

3 Likes

Ah nice!

Will that mean they get resized automatically, or will users be told the images are too big and they need to resize themselves first?

That would be awesome! I’m not sure how we go to using that much since I don’t think we have ‘that’ many images… I wonder if we could have something in the ACP that lets us see all uploads? (And maybe resize them individual or by batch or something?)

Try it out, we attempt to downsize on upload automatically to match the megapixel value you are allowing.

The images are resized for display but the original is always stored for retrieval, so it is not going to help your storage.

Your best solution is to store the images in an online bulk storage like S3.

Automatic downsizing of originals AND history is definitely an opt-in feature we want to fully support.

1 Like

What do you mean? Set a max image size in your site settings and that is indeed the max image size.

That’s true. That setting will prevent a large image from being uploaded at all.

However, I think @AstonJ may be asking that if there is a feature in the system to:

  1. Accept an image upload from the user (up to some reasonable upper limit of course)
  2. Auto-resize that image to some max size (e.g. 1024)
  3. Not keep that original large image (in order to save storage space)

That’s already the case… if you specify a max allowed, the images can’t be larger than that. I think it is in megapixels, so it’s about dimensions rather than absolute filesize, but the effect is the same. Rule 3 is met.

There’s only so big a 128 x 128 image file can be, in bytes :wink:

1 Like

I can definitely see a real use case for:

Hell no… we are not an image storage service, maximal size of every image on the forum is 500K, we will drop fidelity till we get there on all of history… sorry everyone.

I get that many forums that tire of giant old backups may may want to go down this path.

Right but we have exactly that feature it is just gated on pixel dimensions, not filesize per se.

Remember we have to deal with pasted in images too.

3 Likes

Sam, I’ve been thinking about why our backup is so big when we don’t really have that many images uploaded… I think it might be because of the welcome DiscourseBot.

From what I remember, it asks users to upload photos to help familiarise them with the upload system. I’m wondering, would it be a good idea to have Discourse automatically delete these PMs (and all uploads inside them) after 2 or 3 months after registration?

It seems a waste hosting these uploads when they are not going to be seen by anyone and are effectively for testing purposes only… what do you think?

5 Likes

I like this idea. :slight_smile:

It might be worth updating one of the welcome bot’s text strings to let people know that, as they’re only testing the feature, their images won’t be stored long-term unless they re-upload them.

It could be spun in an encouraging manner; “feel free to pick any image - this is just a test run!” or similar would convey the information, while focusing on the positive - in this case the user not needing to worry about finding the best image, and instead just using whatever file is convenient. This is one of the things I really like about the Discourse team - their messaging always feels friendly and welcoming.

4 Likes

Can you run a data-explorer query to give some real world data about how much space DiscourseBot uploads really takes?

5 Likes

I don’t have that plugin installed but can add it :smiley: Could you give me the query you would like me to execute please?

Even though an old topic, since this has been on my table for ages I’ll just post it here as this is the newest topic relevant to this. Pictures might be easier for someone wondering about this in 2021 or something. Discourse forums really are long term archives for me instead of discussion forums far too often :frowning: .

In autumn 2018 I noticed my backups for one site went through the roof (relatively speaking, small site :slight_smile: ).

Today I just happened to see the old note about checking it out… and noticed the backup size is far less. Which got me curious.

I wonder what happened in October when sizes increased a lot :slight_smile: . Unfortunately I don’t have those backups anymore and a bit too lazy to go digging through changelogs especially as I don’t know when I upgraded and what.

Looking at the newest tar, uploads are about 50M (14M optimized) and sql is 150M (28M packed). All this makes sense.

We don’t back up the resized / thumbnailed images by default any more, just the original image.

This means all backups are smaller, but restoring a backup is a bit more work in the short term, since after restore the system now has to auto-regenerate the “optimized” (resized, thumbnailed, etc) copies of the images from the originals. That’s the tradeoff.

Did I state this correctly @sam?

There is also a site setting if you want to retain all resized images in the backups to make restoring a bit more efficient.

4 Likes

Awesome, I like software with sensible decisions. Or justifiable support. Ah, that’s why I use Discourse…

Still wondering why backup sizes skyrocketed in one month for me. Maybe a lot of uploads, dunno.

Personally I pretty much always resize good pictures from 6M to 500K or around that with click-resizing (no auto, too risky).

The few times I really want to keep resolution I keep a copy myself.

Which of course begs the question, why not have an option to always “downgrade” uploaded images and never even keep the original? Site owner’s preference (percentage or whatever).

Optionally use software that uses smart downgrading (best option - image quality really is important sometimes).

Or to be really “nice”, keep the best image for a certain period of time and unless the original poster (or possibly someone reader) marks it as “not good enough”, toss the original.