Migrating S3/spaces non-image uploads to local

I have read the following page:

So I looked at lib/taks/uploads.rake:migrate_from_s3 and found:

    .where("raw LIKE '%.s3%.amazonaws.com/%' OR raw LIKE '%(upload://%'")

However, I have noticed that video uploads don’t get the raw upload:// pseudo-protocol, but instead just end up as literal links into the storage provider (in my case, digital ocean spaces).

It seems obvious that I’ll have to modify this task to succeed.

Would it make more sense to look at SiteSetting.s3_endpoint and SiteSetting.s3_upload_bucket either instead of, or in addition to, the literal reference to amazon?

Are there tests for the tasks? I don’t see any. I have what might be something like the obvious fix, but no way to augment existing tests, and no easy way to non-destructively test. Which makes me uneasy…

index 0761c4712a..63f49155f3 100644
--- a/lib/tasks/uploads.rake
+++ b/lib/tasks/uploads.rake
@@ -129,12 +129,12 @@ def migrate_from_s3
 
   Post
     .where("user_id > 0")
-    .where("raw LIKE '%.s3%.amazonaws.com/%' OR raw LIKE '%(upload://%'")
+    .where("raw LIKE '%.s3%.amazonaws.com/%' OR raw LIKE '%#{SiteSetting.Upload.absolute_base_url}%' OR raw LIKE '%(upload://%'")
     .find_each do |post|
     begin
       updated = false
 
-      post.raw.gsub!(/(\/\/[\w.-]+amazonaws\.com\/(original|optimized)\/([a-z0-9]+\/)+\h{40}([\w.-]+)?)/i) do |url|
+      post.raw.gsub!(/(\/\/[\w.-]+(amazonaws\.com|#{Regexp.quote(SiteSetting.s3_endpoint)})\/(original|optimized)\/([a-z0-9]+\/)+\h{40}([\w.-]+)?)/i) do |url|
         begin
           if filename = guess_filename(url, post.raw)
             file = FileHelper.download("http:#{url}", max_file_size: max_file_size, tmp_file_name: "from_s3", follow_redirect: true)

Also, I expect from experience that even though all these images have been optimized, it will decide that it needs to spend 10 days re-optimizing all 50GB of images (96GB of total files, original+optimized) as it moves them, turning off all email notifications for our entire site while doing that. Since I don’t have a good way to test, I thought I’d ask whether that’s the case; if it is, I’d like to know whether there is a way around it; to just copy the already-optimized images down.

I can easily copy all the files to the local system using MinIO Client. I’m curious how hard it would be to just drop the files into place and modify the database to point to the new location, without re-optimizing all those images…

https://github.com/discourse/discourse/pull/9809

I haven’t tested it at this point, but at least it’s shared as a PR instead of just a meta post.

1 Like

Many more related fixes, now validated by real migration process, in a new PR

https://github.com/discourse/discourse/pull/10093

1 Like