Streaming backups with minio-client?

I don’t like the most frequent backup schedule of only once per day. That seems like potentially a lot of people’s careful writing to throw away in case of disaster. However, I don’t want to set up a cluster with HA; that’s too resource-heavy for my use case; I just want to have a high degree of confidence that I can recover nearly everything written on my Discourse, even if it takes a while. This Discourse is a free resource for a non-paying community, so I’m sensitive to value that the community puts into it but also sensitive to additional cost. Downtime is more acceptable than data loss.

On one Discourse I help manage, I’ve set up locally-served images (moving images to S3 is a one-way door, as I discovered the hard way) that use minio-client to stream uploads to S3 in near real time, using the --watch facility. I’m doing that from the host. (This doesn’t work with Digital Ocean Spaces though; I found out the hard way that Ceph limitations cause this to fail.) This makes recovery simple, just using minio-client to copy all the files back. One command and it’s all back, up to the instant, even though the database backup might be nearly a day old… I really like this paradigm.

It seems to me that it would be possible to do the same thing for streaming WAL archiving for the database without using a watch, but instead using an archive_command that invokes minio-client each time a wal file is finalized on just that file. However, in that case, minio would have to live in the container with postgres, not on the host.

That got me thinking…

While I could use exec: in my container yaml files to add minio-client and do this all for myself, what would folks here think about discourse/discourse_docker having image/base/install-minio-client and a standard template to put a .mc/config.json in place, add an runit file, and allow fairly easy streaming backup and recovery based on container configuration? It would obviously be an advanced configuration that comes with :warning: in the documentation and isn’t turned on by default, but since I’ll probably do the work somewhere at some point, if I do it in discourse/discourse_docker it’s more accessible than if I just hack up my data.yml file. The cost would be minio-client in the base image, which is about 21MB; about twice the size of the redis-server binary.

Not a promise to do it, just curious whether it might be accepted should I actually do the work. :relaxed:

Edit: An alternative would be to have a separate directory to copy the files to, and then use any tool like minio-client outside the container to stream the data to the remote location, or mount a remote filesystem like s3fs on the location to which the files are copied. That might be simpler, more flexible configuration, with the same end result, and without the weight of carrying minio-client in every discourse image.

2 Likes