But I can’t list files because they are on your bucket and I’m pretty sure I need credentials for a list.
rake uploads:fix_missing_s3
seems to have pulled (most?) things to the local filesystem (uploads are not yet on s3 for this site)
So I did this to fix up the uploads:
def fix_bad_uploads(bad_uploads)
fixed = 0
retrieved = 0
missing = 0
bad_bucket="//discourse-cloud-file-uploads.s3.dualstack.us-west-2.amazonaws.com/business6/uploads/forumosa"
bad_uploads.each do |upload|
url = URI.parse("https:"+upload.url)
upload.url=upload.url.gsub(bad_bucket,"/uploads/default")
if File.exists?("/shared/#{upload.url}")
fixed += 1
print "1"
upload.save
# posts = Post.where("raw like '%#{upload.short_url}%'")
# posts.each do |post|
# post.rebake!
# print "."
# end
else
begin
# retrieve missing
filename = "/shared#{upload.url}"
dirname = File.dirname(filename)
unless File.directory?(dirname)
FileUtils.mkdir_p(dirname)
end
file = File.new(filename, "w")
Net::HTTP.start(url.host) do |http|
resp = http.get(url.path)
open(file, "wb") do |file|
file.write(resp.body)
end
end
file.close
print "+"
upload.save if File.exists?(filename)
rescue => e
puts "bad: #{e}"
missing += 0
sleep 1
print "0"
end
end
end
end
This fixed up most of them. But there seem to be some posts that have an uploads://
entry for which there isn’t an Upload in the database. Rebaking those ends up with a transparent.png
.
So then I tried something like this:
def get_missing_short_url(short_url)
prefix = "https://discourse-cloud-file-uploads.s3.dualstack.us-west-2.amazonaws.com/business6/uploads/forumosa/original/3X"
remove_url = "https://discourse-cloud-file-uploads.s3.dualstack.us-west-2.amazonaws.com/business6/uploads/forumosa/"
sha1= Upload.sha1_from_short_url(short_url)
extension = short_url.split(".").last
upload = Upload.find_by(sha1: sha1)
if !upload
# try to find it in s3
one = sha1[0]
two=sha1[1]
url_link = "#{prefix}/#{one}/#{two}/#{sha1}.#{extension}"
puts "URL: #{url_link}"
sleep 1
url = URI.parse(url_link)
full_filename = url_link.gsub(remove_url,"/shared/uploads/default/")
filename = "/tmp/#{File.basename(url_link.gsub(remove_url,"/shared/uploads/default/"))}"
dirname = File.dirname(filename)
unless File.directory?(dirname)
FileUtils.mkdir_p(dirname)
end
File.open(filename, "w") do |file|
Net::HTTP.start(url.host) do |http|
resp = http.get(url.path)
open(file, "wb") do |file|
file.write(resp.body)
end
end
end
# make upload for file
File.open(filename, "r") do |file|
upload = UploadCreator.new(
file,
File.basename(file),
).create_for(Discourse.system_user.id)
end
if upload.persisted?
puts "We did it! #{upload.id}"
else
puts "darn. #{upload.errors.full_messages}"
sleep 5
end
File.open(filename, "w") do |file|
Net::HTTP.start(url.host) do |http|
resp = http.get(url.path)
open(file, "wb") do |file|
file.write(resp.body)
end
end
end
end
upload
end
That mostly works, but in my tests sometimes I fail to infer the correct S3 URL from the sha that I infer from the short URL. I’m not sure how to fix that.
Also, one of them somehow ended up with a sha
that was different from the one in the filename of the s3 path.
My current thinking now is to start by going through all of cooked
and getting all of the https://discourse-cloud-file-uploads
urls and then going about updating Upload
records that refer to them and creating the ones that are missing.
Am I missing something obvious?