I’d be curious to know the average weight ratio between compressible and already compressed data in a Discourse backup[1], and how much data (in %) would be saved using zstd.
It’s not the same feature request, but it’s also about backup compression so I’m crossposting this:
I wouldn’t be surprised if the percentage was about the same on all my Discourse forums.
Of course, some forums rely very much on image uploads, and some won’t even allow file uploads ↩︎
Occasionally the backup process causes us some availability issues due to the additional load. So I did a quick experiment with zstd today.
These were my results of compressing the same 73GiB dump.sql file with gzip (level 4, as in the Discourse backup) and zstd (default level 3, of 19):
Compression size: 15.8% smaller (.zst was 84% of .gz size)
Compression time (-T1): 71% faster (29% of gzip time)
Compression time (-T0): 89% faster (11% of gzip time)
YMMV, didn’t run multiple times, my own machine (6 cores), it was doing other things too, etc, etc — didn’t aim for precision. Still, I think the benefits are clear.
I’m not sure if -T0 would necessarily be a good choice for everyone as leaving some room for Discourse itself seems like a good idea, hence the sample with -T1 for a more apples to apples comparison.
Feels like a win-win and would likely also have a significant impact on Discourse’s hosting infra too. That said, I don’t have the chops for a PR, so just sharing what I found.