Why are backups so large?

We have about 250,000 posts from 25,000 topics - and our database / backup is about 4.5 backups (I have it set up for 2 backups - so I think the post database is about 2.25 Gig.

That seems really large for what are essentially short, simple text files.

Is there anyway that can be made smaller? How about a simple compression algorithm included so that we can save space on our backups?

1 Like

pg_dump supports compression:

-Z, --compress=0-9

You just need to balance between time spent VS file-size.

Maybe patch this file and add a small compression level (2~3) and measure the impact on your workload.

In my starting database (10 posts, 50 users) I see the following:

-rw-r--r-- 1 postgres postgres 2,2M Jul 24 16:04 back_Z_2 -rw-r--r-- 1 postgres postgres 11M Jul 24 16:04 back_no_compression

`
Compression (Z=2)
real 0m0.557s
user 0m0.072s
sys 0m0.032s

No Compression
real 0m0.837s
user 0m0.280s
sys 0m0.036s
`

But you need to use some real data to really commit to this. Using Meta would be a good example.

1 Like

Backups are compressed, that’s what the .tar.gz extension means. It is basically .zip for UNIX:

That’s a 742 MB backup that is compressed into a 415 MB file:

It does include uploaded images, if you extract it:

The uploads folder is 314 MB across 6,947 files and 116,168 folders (!).

You can get smaller backups by opting not to include the uploaded images as well, but it won’t be a complete archive of the content (text and images) that makes up the topics on your site!

Remember images are not very compressible. But text is. If I compress just dump.sql alone with 7zip it goes from 339 MB to 60 MB.

2 Likes

Images in posts are usually a very large percentage of backup size

3 Likes