Raw_email post field stores images as Base64 strings

Hi,

Since I’m thinking of changing my CDN (Stackpath) and I know that Discourse dynamically sets the CDN URLs in links, I did nonetheless a search for the string “stackpath” in my forum’s posts to see how many occurrences there would be.

It returned between 300 and 400 posts.

I downloaded the data, and I was surprised that it’s almost a 100 MB file!

Apparently, it’s because images in posts sent by email seem to be present as base64 strings in the raw_email field.

So… As I see it, it doesn’t seem to be an issue really, but I’d like to have some clarifications about this.

Why are images stored like this in the raw_email field? Is there no better way to do so?
Can it impact performances in any way? What about forums where a lot of people reply by email and with images involved? Could we consider that it takes too much space for not much info?

Any explanation or additional information about that is welcome. :slight_smile:

1 Like

This is going to sound a bit reductive, but that’s how images are sent over email. And since we’re storing the raw_email… that’s what we store.

Think of it like a 100MB Inbox. Which is a drop in the bucket for most sites.

You can find the storage size with:

select sum(pg_column_size(raw_email)) / 1048576 as raw_email_MB from posts;

We would no longer be storing the raw_email in that case… I suppose there’s an argument to made for stripping large base64 content from raw_emails over a certain age…

Not likely; it’ll sit on disk but should not be referenced unless you look at it.

:thinking:

5 Likes

Thank you for the detailed answer :slight_smile:

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.