While doing research, I found that this class is already available in Discourse via require 'rubygems/package': Class: Gem::Package::TarWriter (Ruby 2.0.0)
Using this should allow Discourse to take backups without having more than double the required disk space available, by streaming the entire archive to disk through an in-process tar and an in-process gzip.
Usage should look like the following:
destination = File.open(target_filename, "wb")
gz_stream = Zlib::GzipWriter.new(destination, 5)
@tar_writer = Gem::Package::TarWriter.new(gz_stream)
log "Archiving data dump..."
FileUtils.cd(File.dirname(@dump_filename)) do
@tar_writer.add_file "dump.sql.gz", 0644 do |tf|
File.open(@dump_filename) do |df|
IO.copy_stream(df, tf)
end
end
end
rel_directory = File.join(Rails.root, "public")
upload_directory = File.join(rel_directory, "uploads", @current_db)
log "Archiving uploads..."
last_progress = Time.now
files_since_progress = 0
Dir[File.join(upload_directory, "**/*")].each do |file|
stat = File.stat(file)
relative = file.delete_prefix(rel_directory)
if stat.directory?
@tar_writer.mkdir relative, stat.mode
else
files_since_progress += 1
if files_since_progress > 100 or (last_progress < 15.seconds.ago)
log "Archiving #{file}"
files_since_progress = 0
last_progress = Time.now
end
@tar_writer.add_file relative, stat.mode do |tf|
File.open(file, "rb") { |df| IO.copy_stream(df, tf) }
end
end
end
log "Finishing up archive..."
@tar_writer.close
gz_stream.close
destination.close
remove_tmp_directory
The above code does not have:
- proper error reporting
- progress indicators