Reduce local disk space needs by not (redundantly) gzipping backups

Ed_S · November 16, 2022, 12:38pm

The backup process creates a tar file and then applies gzip to it. There are two types of things in the tar file: an already gzipped sql dump and the contents of uploads (if requested.) In my case every upload file is already compressed: gz, gzip, gif, jpeg, png, zip. So the final gzipping gains only 1% of size.

I believe it would be better to demand less free space.

A previous topic from 2016 mentions disabling backup compression, but it looks like the sql dump was at that time not compressed, which shifted the tradeoffs.

Add option to disable backup compression

gerhard · November 17, 2022, 2:50pm

I’m already working on a new backup format that removes the double compression. My hope is that it will be ready within a month or two.

Ed_S · November 17, 2022, 3:45pm

Sounds great @gerhard!

tumbano · April 20, 2023, 7:52am

Any update on this? Thanks

Ed_S · October 4, 2023, 9:11am

Not to bug you too much, but how is this progressing?

gerhard · October 4, 2023, 9:22am

Development of that feature is currently paused and it isn’t on our current roadmap. I hope we will get to it in 2024.

Isambard · August 30, 2024, 6:44pm

If I wrote a patch to accept a 0 in the compression rate to disable gzip, would that be something that you would accept?

Ed_S · August 30, 2024, 7:00pm

(I’m guessing that you’d save CPU time that way, but not space, because the gzipped tar file would still be created.)

Isambard · August 30, 2024, 7:08pm

I’m aiming to save cpu time. Actually, I was thinking of using the 0 as a flag that would change the code path so that it doesn’t gzip (sadly, zero is not a valid compression level supported across all gzip versions, afaik).

Ed_S · August 30, 2024, 8:02pm

Hmm that wouldn’t help me at all! (Likewise others who’ve had the same problem with limited disk space.)

If tar were being used, it could be used with z or j options. If a subshell were being used, the output of tar could be piped into gzip. But I think in fact some higher level ruby functions may be in use.

RGJ · August 30, 2024, 10:06pm

cough

github.com

discourse/discourse/blob/7b89fdead98606d4f47ceb0a1d240d0f6e5f589e/lib/compression/tar.rb#L13-L21


      
          Discourse::Utils.execute_command(
            "tar",
            "--create",
            "--file",
            tar_filename,
            target_name,
            failure_message: "Failed to tar file.",
          )

Ed_S · August 31, 2024, 7:27am

Maybe it shouldn’t be too difficult… I appreciate that making changes to backup and restore must be made with great care, but I think just inlining the compression would save a lot of space requirement without any compatibility question.

From tar --help

-a, --auto-compress use archive suffix to determine the compression
-z, --gzip, --gunzip, --ungzip filter the archive through gzip

Isambard · September 1, 2024, 10:49pm

Does -z actually do an in-place compression? I always assumed that it just ran gzip after the tar file is completed.

Ed_S · September 2, 2024, 8:34am

Unwisely, in this case! The bytes which represent the uncompressed tar file never hit the disk.

MentalNomad · May 6, 2025, 2:35pm

Are you saying we can simply add
"--gzip",

And it will stop requiring fully double the actually space used used the data?

Ed_S · May 6, 2025, 3:38pm

Yes, that’s the change to the tar command.

Topic		Replies	Views
Add option to disable backup compression Feature	29	5925	August 30, 2024
Backups are duplicating and not respecting number to keep on disk Installation	68	2379	February 15, 2019
Migrate from gz compression to zstd for backups Feature pr-welcome	2	240	February 3, 2025
Using TarWriter to stream backup Dev performance , backups	3	1305	January 25, 2018
When backup fails, delete the useless backup Feature	7	1377	December 15, 2017

Reduce local disk space needs by not (redundantly) gzipping backups

Related topics