S3 backup questions

I’m no developer or an expert when it comes to this subject, so please bear with me here…

A few months ago when I installed Discourse (I had to uninstall it for the time being, but will return some time next year) I noticed that my daily backups were very small (or course, I was just building/designing it and had no members, so no real traffic and/or new content). Each backup would take around 20MB.

I asked ChatGPT how much I would pay per month for a backup service for that particular case (I know that with an active community, the backup size will quickly increase, but this is just to be used as an example).

On AWS I would be expected to pay around $0.45-$0.60 a month for those daily 20MB backups (so 600MB a month). Is that a realistic cost or am I missing something?

Then, I asked about BackBlaze B2. To my surprise I was told that they use something compatible with S3 (before asking ChatGPT about it I thought S3 was something that only AWS used, like the name of their service - which also is -, but it seems that it’s more like a “protocol” that other companies can use. Interesting…). I was told that BackBlaze has a flat fee of $0.005 per GB per month.

So for those 20MB backup a day (600MB a month), the estimated monthly costs would be:

  • AWS S3: $0.45-$0.60
  • Backblaze B2: $0.15-$0.30.

Is this a realistic scenario for that amount of space?

I just want to have a good understanding of how things work (and cost), so I can then adapt to other amounts of data as the community grows.

I remember seeing the S3 option in the Discourse settings (when I thought it was just related to AWS - which always seemed super confusing to a non expert like myself). So, that means I can use BackBlaze B2, correct?

Another question: if I’m paying with PayPal, I also have to pay their fees which is like 30 cents per transaction + whatever % of the total amount paid. So, I asked ChatGPT if both companies allowed me to deposit a certain amount once and then let that be used as credit, avoiding monthly fees. I was told that they do. Can anyone confirm this?

After doing some more research, I found this other topic and the OP said they have a 3GB backup, while another person said they are at 8GB. I believe I successfully created a system where I was downloading it directly to my Dropbox (I uninstalled Discourse a few months ago, so I can’t confirm it 100%, but I believe it was working properly), so my question is: should I go through the hassle of setting yet another service for backups? How big can a backup really be if 99.9% of it is just text and occasionally some images that are already being optimized by Discourse itself? Since I have my own backup disks with daily backups (and snapshots), I don’t have to keep all backups on Dropbox anyway, and since they have 30 day history, even if something goes wrong after deleting a backup file, I still have 30 days to recover it.

Any help on this subject is greatly appreciated. Thank you!

For that small backups at this frequency, yes, fees will be minimal.

From what I remember, when I used S3 for my backups, it cost about 60$ a year for 30 Gb backups twice a week (3 backups kept). Even if I’m wrong about the number of backups, the order on magnitude of the costs is correct enough.

S3 will cost you almost nothing. :smiley:

Thank you for confirming! Yes, I’m aware that this wouldn’t be a practical cost on a busy community, but at least this gives me an idea of what things cost on average.

Do you mind sharing why you don’t use it anymore? What are you using now? And you were using AWS S3, right? Now, that I understood that there’s the S3 “protocol” (or whatever technical term that thing is called), I just want to understand if you were using AWS?

Let me see if I get this right: each backup was 30GB, you were doing it twice a week (so 60GB a week) and you were keeping 3x30GB at all times? You were still being charged for all backups, even if you deleted them, right? So technically speaking, you were using 60GB a week x4 weeks (give or take), that’s 240GB, plus an additional 30GB at the end of the month, giving you 270GB used over a month?

Maybe my math is wrong or I don’t fully understand the whole process, but if this is right, it’s still very cheap for $60 a year. I would assume that 30GB was not for Discourse? Would a community get to that point of space needed per backup?

Also, regarding my Dropbox comment at the end, I have no clear idea how I ended up with my Dropbox backup plan (I remember going through the process of dealing with Dropbox’s API or something, but I don’t remember if it was for Discourse?), because I was using Digital Ocean, but I can definitely see the folder there.

I have 22GB on Dropbox from back when they launched and I invited a lot of friends to join. Since I barely use Dropbox for my daily work (I mostly use iCloud), and I have Keyboard Maestro to remove those backups from Dropbox into another folder on my computer on a daily basis, most of my Dropbox space is available. Wouldn’t this be enough for a fairly busy community in the future? How big is your community’s backup file these days, if you don’t mind sharing that info?

Yes, it was AWS S3.

I don’t have the exact numbers unfortunately as I can’t see the S3 data usage more than 1 year old in Amazon’s interface, but as I said I think the order of magnitude is right.
I do know that my backups were ~30 GB each, and that I was keeping about 3 backups max. As for the frequency, I don’t remember if it was once every two days, or every week.

I still have the bills, and they say between 5 and 7$ a month. They don’t show any data-related information.

I stopped using s3 because of the cost. That was not that expensive as you say, but I was trying to balance costs between the different services I subscribe to (hosting, emails, CDN, backups…), and decided to sync my backups to my Google Drive with rclone instead, for free.

The drawback is that while I trust Discourse for using S3 reliably, when using rclone, I do not have as much trust, and I must ensure the backups are properly synced to Google Drive. And I continue to monitor it from time to time, especially since I’ve noticed that at least once, the Google token wasn’t properly refreshed, and my backups stopped being sync with Drive.

I might change the way I manage my backups in the future, I don’t know.

Yes, it is for Discourse :slight_smile:
Backup size was like 27 GB two years ago, and now 30 GB.

Like many instances, most of the backup’s size is uploads. Uncompressed database is 23 GB, tho, but text is efficiently compressed.

I believe this could be easily fixed with some type of automation tool? I use Keyboard Maestro and if I’m expected to get 1 file on my Dropbox with a certain name, for example, I can automate that to run daily or whenever I’m supposed to get the file and if nothing is there, it could show a notification.

Would something like this work for you instead of manually checking it?

Wow, that’s a lot! i can see it getting to that on communities with lots of engagement, lots of uploads, and lots of years of activity? So, for now, I guess I will be ok with very small backup files that fit my Dropbox.

And even if I decide to use AWS or BackBlaze, 5-7 dollars a month is good enough to bring some peace of mind. If there’s budget for that, I value my mind more than my wallet :wink:

Really appreciate your time and help on this (and checking your bills and stuff)! :+1:

לייק 1