While doing research, I found that this class is already available in Discourse via require 'rubygems/package'
: http://ruby-doc.org/stdlib-2.0.0/libdoc/rubygems/rdoc/Gem/Package/TarWriter.html
Using this should allow Discourse to take backups without having more than double the required disk space available, by streaming the entire archive to disk through an in-process tar and an in-process gzip.
Usage should look like the following:
destination = File.open(target_filename, "wb")
gz_stream = Zlib::GzipWriter.new(destination, 5)
@tar_writer = Gem::Package::TarWriter.new(gz_stream)
log "Archiving data dump..."
FileUtils.cd(File.dirname(@dump_filename)) do
@tar_writer.add_file "dump.sql.gz", 0644 do |tf|
File.open(@dump_filename) do |df|
IO.copy_stream(df, tf)
end
end
end
rel_directory = File.join(Rails.root, "public")
upload_directory = File.join(rel_directory, "uploads", @current_db)
log "Archiving uploads..."
last_progress = Time.now
files_since_progress = 0
Dir[File.join(upload_directory, "**/*")].each do |file|
stat = File.stat(file)
relative = file.delete_prefix(rel_directory)
if stat.directory?
@tar_writer.mkdir relative, stat.mode
else
files_since_progress += 1
if files_since_progress > 100 or (last_progress < 15.seconds.ago)
log "Archiving #{file}"
files_since_progress = 0
last_progress = Time.now
end
@tar_writer.add_file relative, stat.mode do |tf|
File.open(file, "rb") { |df| IO.copy_stream(df, tf) }
end
end
end
log "Finishing up archive..."
@tar_writer.close
gz_stream.close
destination.close
remove_tmp_directory
The above code does not have:
- proper error reporting
- progress indicators