@mpalmer The --rsyncable
option helps a lot, and should probably just be turned on for everyone.
The cost of --rsyncable
7 daily backups created by discourse:
- Total with current gzip options (none): 306294397 bytes
- Total after decompression and recompression with
gzip --rsyncable
: 307156949 bytes
0.03% is sufficiently negligible. 
The benefit of --rsyncable
For those same 7 tarballs, with default gzip, both borg and tarsnap are unable to deduplicate the data. (Savings is <1%)
And with gzip --rsyncable
:
- borg is able to deduplicate the 7 tarballs to 1.187x
- tarsnap is able to deduplicate the 7 tarballs to 1.165x
(x = average size of single tarball)
The case for having Discourse create uncompressed backups
TLDR: Not a huge win.
Uncompressing the tarballs increases their size on disk by ~51%. I have to assume this would vary for other installations based on what percentage of the tarball is sql/text vs already-compressed uploads. (Uploads comprise ~60% of my tarballs, measured with du when extracted.)
Both borg and tarsnap deduplicate a bit better with an uncompressed tarball than with --rsyncable
:
- With borg set to do no compression of its own, the 7 uncompressed tarballs deduplicate to only 77.25MB.
- With borg set to do chunk-wise zlib compression, that shrinks to 47.29 MB.
- Tarsnap always does chunk-wise compression, totaling 46.43 MB.
For backups of all of /var/discourse (including the 7 tarballs, but also non-tarred copies of the uploads, and postgres’ files on disk, etc…):
- /var/discourse with tarballs compressed with gzip defaults, tarsnap compressed total: 371428106 bytes
- /var/discourse with tarballs compressed with
gzip --rsyncable
, tarsnap compressed total: 114474052 bytes
- /var/discourse with uncompressed tarballs, tarsnap compressed total: 96140976 bytes
Because the savings between the last two is less than half the size of my uploads (as extracted from a tarball), I’m assuming that deduplication of uploads between on-disk and in-tarball is not happening, which is the main thing I hoped to gain by having uncompressed tarballs.
So, forget about the uncompressed option, at least for now, but please do turn on --rsyncable
!