How can I migrate files from the old upload scheme (default/XXX) to the new one (default/original/{1,2}X)?

I’ve been running a self-hosted Discourse site for about four and a half years now (since September 2014), and I just received a report of several broken image links on my site. After looking into the issue, I noticed that the paths to all of the offending images were in the format uploads/default/XXX/, where XXX is a three-digit number (e.g. uploads/default/240/ and uploads/default/247/). Sure enough, the corresponding directories under /var/discourse/shared/standalone/uploads/default/ were empty, but the problem was more extensive than I realized: almost all the numbered directories under /var/discourse/shared/standalone/uploads/default appeared to be empty, as well!

Thankfully, it looks like most of the images had been moved to their corresponding folders in the “tombstone” directory (i.e. /var/discourse/shared/standalone/uploads/tombstone/default/XXX/) and should thus theoretically be recoverable by following @tgxworld’s instructions:

./launcher enter app
rails c
require_dependency "upload_recovery"
UploadRecovery.new.recover

The call to UploadRecovery.new.recover completed fairly quickly, but unfortunately it did not restore the uploaded files to their original locations. Running UploadRecovery.new(dry_run: true).recover did not return any entries, either, so I was left with the impression that there isn’t a problem with the database (though I could be wrong!).

Anyway, after looking around a bit more I noticed that many of the newer files seemed to be uploaded under the uploads/default/original/1X/ or uploads/default/original/2X/ directories rather than uploads/default/XXX/. I wonder…could it just be that the canonical location for uploads has changed, and Discourse is now simply discarding anything that isn’t stored under uploads/default/original/{1,2}X?

For what it’s worth, all of the affected directories (i.e. /var/discourse/shared/standalone/uploads/default/XXX/ and /var/discourse/shared/standalone/uploads/tombstone/default/XXX/) and their files appear to have the same “last-modified” timestamp—11:14 a.m. on January 18—which just so happens to be a few hours after I upgraded to v2.2.0.beta8 (though I couldn’t tell you the exact commit number). Interestingly enough, I don’t see this timestamp on any of the subdirectories of uploads/default/original/{1,2}X/.

In summary:

  • Is there any reason that images stored in directories like /var/discourse/shared/standalone/uploads/default/240/ would be automatically moved to the tombstone directory even if they are still being actively referenced by existing posts?

  • Is it safe to just copy the missing images from /var/discourse/shared/standalone/uploads/tombstone/default/XXX/ to /var/discourse/shared/standalone/uploads/default/XXX/, or is there another command I should run to accomplish this?

  • If it isn’t safe to copy the missing images and there isn’t a command to accomplish this, how can I restore the images and/or update all the posts that reference them? (Should I migrate them to uploads/default/original/{1,2}X/ somehow?)

Thanks!

Didn’t you run into this too @sam? Old installs have this issue.

Looks like you have images that were not migrated to the new scheme.
Have you disabled the migrate to new scheme site setting?

It’s super safe, except it won’t last long :wink: If you don’t also update the database, then they’ll be moved back to the tombstone.

If you want to restore images, it’s better to use the uploads:recover_from_tombstone rake task.

4 Likes

Interesting…I couldn’t find a “migrate to new scheme” site setting via the web interface; I assume you’re talking about a setting that can only be accessed from a Rails console within the Docker container? i.e.

cd /var/discourse/
sudo ./launcher enter app
rails c
> SiteSetting.migrate_to_new_scheme

In that case, you’re right: for some reason SiteSetting.migrate_to_new_scheme was false, but—having never heard of that setting before—I certainly never disabled it myself! Should it have automatically been changed to true at some point?

Back in July 2016, it looks like you could fix the problem by following these steps:

# copy the "deleted" images from their to-be-deleted staging area
cd /var/discourse
./launcher enter app
rake uploads:recover_from_tombstone

# rebake to see that it doesn't happen again
rake posts:rebake

Then, in October 2018, @tgxworld said that those instructions were “stale” and that this is the new procedure:

./launcher enter app
rails c
require_dependency "upload_recovery"
UploadRecovery.new.recover

Now, just three days ago, @sam suggested the following:

./launcher enter app
rails c
> SiteSetting.migrate_to_new_scheme = true
.... wait a day
rake posts:rebake

Which of these three methods is considered best practice?

5 Likes

My bad, it’s a manual process.

So, the UploadRecovery class that @tgxworld built is an improvement over the uploads:recover_from_tombstone. They both do the same thing, only the UploadRecovery works for both the local and S3 storages, whereas the rake task only works for the local storage.

What @sam suggested won’t restore uploads that were put in the tombstone but is needed to be able to restore them.

So, here’s what you should do

./launcher enter app
rails c
SiteSetting.migrate_to_new_scheme = true
Jobs::MigrateUploadScheme.new.execute(nil)

You might need to execute the last line several times as this only migrate 50 uploads at a time.

Then, you can do

./launcher enter app
rake posts:rebake
rake uploads:recover
7 Likes

Oh, okay. Does this mean that I will also need to disable the setting (i.e. run SiteSetting.migrate_to_new_scheme = false) after the migration process is complete?

Oh, neat! Thanks for the clarification. :+1:

I think I understand what you mean, but just to be clear: are you saying that changing the value of SiteSetting.migrate_to_new_scheme to true will tell Discourse to migrate all uploads to the new storage scheme, but it won’t touch any uploads that have already been placed in the tombstone directory (i.e. you need to move uploads out of the tombstone directory before they can be migrated)?

Hmm…do I need to run Jobs::MigrateUploadScheme.new.execute(nil) manually, or will it automatically be scheduled to run eventually (hence @sam’s advice to “wait a day”)?

In any event, I changed SiteSetting.migrate_to_new_scheme to true from the Rails console in my Discourse container yesterday. Interestingly enough, all of the numbered subdirectories under /var/discourse/shared/standalone/uploads/default/ (e.g. 100, 101, 102, 103, 104, and 105) still appear to exist, and I found seven messages like the following in my error logs:

Job exception: undefined method `unlink' for #<File:0x00007f368703ab00>

Here’s a backtrace, in case you’re interested:

/var/www/discourse/app/models/optimized_image.rb:410:in `block in migrate_to_new_scheme'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/activerecord-5.2.2/lib/active_record/relation/delegation.rb:71:in `each'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/activerecord-5.2.2/lib/active_record/relation/delegation.rb:71:in `each'
/var/www/discourse/app/models/optimized_image.rb:367:in `migrate_to_new_scheme'
/var/www/discourse/app/jobs/scheduled/migrate_upload_scheme.rb:28:in `execute'
/var/www/discourse/app/jobs/base.rb:196:in `block (2 levels) in perform'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/rails_multisite-2.0.6/lib/rails_multisite/connection_management.rb:63:in `with_connection'
/var/www/discourse/app/jobs/base.rb:185:in `block in perform'
/var/www/discourse/app/jobs/base.rb:181:in `each'
/var/www/discourse/app/jobs/base.rb:181:in `perform'
/var/www/discourse/app/jobs/base.rb:243:in `perform'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/mini_scheduler-0.9.1/lib/mini_scheduler/manager.rb:82:in `process_queue'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/mini_scheduler-0.9.1/lib/mini_scheduler/manager.rb:30:in `block in initialize'

At this point, would it still be safe for me to run the following commands or should I hold off until I have investigated the errors above?

cd /var/discourse/
sudo ./launcher enter app
rails c
> Jobs::MigrateUploadScheme.new.execute(nil)
[...]
> Jobs::MigrateUploadScheme.new.execute(nil)
> exit
rake posts:rebake
rake uploads:recover

Thanks again for all your help!

3 Likes

Won’t hurt if you leave it enabled.

Exactly :ok_hand:

Both will work, I just suggested a way to fix it ASAP :wink:

That job is scheduled every 10 minutes and only works when the migrate_to_new_scheme site setting is enabled.

Thanks, I fixed that bug.

Yes, it’s safe. The error only happened for “optimized images” which you can always regenerate if you have the original :wink:

3 Likes

Could you explain what is this? Should everyone do it? Or it should be done by old users? How should we know?

Only very old (> 4 years old) Discourse instances.

You can check by doing

./launcher enter app
rails c
Upload.where("url NOT LIKE '//%' AND url NOT LIKE '/uploads/default/original/_X/%'").exists?

If this returns true, then you have images using the old scheme.

3 Likes

Thanks for the clarification, @zogstrip! I just ran rake posts:rebake and rake uploads:recover in my Docker container, but so far it doesn’t look like any of my older uploads have been migrated to the new storage scheme. Of course, even though the rake posts:rebake command only took a few minutes to complete, I suppose it may have simply been queueing up rebake tasks—if so, will I have to wait a while for those to complete?

In the meantime, I’d like to be perfectly clear about what I’m trying to accomplish here. I have a lot of lines like the following in the Markdown source of older posts on my site:

<img src="/uploads/default/247/232ac9c9d98b6458.jpg" width="536" height="500"> 

(This will load the file saved at /var/discourse/shared/standalone/uploads/default/247/232ac9c9d98b6458.jpg.)

As far as I can tell, none of these image URLs have been changed to the following format, which can be found in the Markdown source of newer posts:

<img src="/uploads/default/original/1X/6a8bd2193a5c4703620836334fd47bb4a54b9005.jpg" width="375" height="500"> 

(This will load the file saved at /var/discourse/shared/standalone/uploads/default/original/1X/6a8bd2193a5c4703620836334fd47bb4a54b9005.jpg.)

It seems to me that there are a few things to be done here:

  1. Rehash older uploads and store them under uploads/default/original/1X/.
  2. Rewrite the Markdown source of older posts to reference the newly hashed files under uploads/default/original/1X/.
  3. Regenerate the HTML versions of these older posts with the new image URLs.

I had thought that rake posts:rebake would rewrite these URLs in the Markdown source and regenerate the resulting HTML, and that either rake uploads:recover or Jobs::MigrateUploadScheme.new.execute(nil)—possibly both—would rehash and relocate the uploaded files, but perhaps I’m misunderstanding how the process works. If so, could you help me figure out what’s really going on here? :wink:

Interestingly enough, when I try to run this I encounter the following error:

NameError: undefined local variable or method `db' for main:Object
from (pry):2:in `__pry__'

Am I missing something here?

It’s my bad, I copy-pasted directly from the code and forgot to remove the db variable interpolation.

The query should be

Upload.by_users.where("url NOT LIKE '//%' AND url NOT LIKE '/uploads/default/original/_X/%'").exists?

As for your issue, I’ll have to dig into the code to see exactly what’s going on.

2 Likes

Oh, no worries! In hindsight, I probably should have noticed that myself.

You know, it’s funny…I just ran the updated query and Discourse seems to think that I don’t have any images using the old upload scheme:

[1] pry(main)> Upload.where("url NOT LIKE '//%' AND url NOT LIKE '/uploads/default/original/_X/%'").exists?
=> false

Of course, that’s despite the fact that older posts still have a lot of lines like the following in their Markdown source:

<img src="/uploads/default/247/232ac9c9d98b6458.jpg" width="536" height="500">

¯\_(ツ)_/¯

Awesome, thanks! :sparkles:

3 Likes

But that’s true for new sites as well, because of stock images like:

=> [#<Upload:0x000055f79be02ac0
  id: -1,
  user_id: -1,
  original_filename: "d-logo-sketch.png",
  filesize: 14461,
  width: nil,
  height: nil,
  url: "/images/d-logo-sketch.png",
  created_at: Thu, 14 Mar 2019 12:13:08 UTC +00:00,
  updated_at: Thu, 14 Mar 2019 12:13:08 UTC +00:00,
  sha1: nil,
  origin: nil,
  retain_hours: nil,
  extension: "png",
  thumbnail_width: nil,
  thumbnail_height: nil,
  etag: nil>,

So I think that test should be:

Upload.where("url NOT LIKE '//%' AND url NOT LIKE '/uploads/default/original/_X/%' and id>0").exists?

Best I can tell, running Jobs::MigrateUploadScheme.new.execute(nil) won’t fix them. And that keeps rake uploads:migrate_from_s3 from running.

2 Likes

Any updates on this, @zogstrip? Specifically…

Does this sound correct to you, or am I misunderstanding how Discourse handles the migration of uploaded files?

Thanks!

Yes, this was recently added by @tgxworld. I’ve fixed the query in my post and in the uploads:migrate_to_s3 rake task.

Upload.by_users.where("url NOT LIKE '//%' AND url NOT LIKE '/uploads/default/original/_X/%'").exists?

Didn’t get a chance to look at it just yet…

3 Likes

but I have installed discourse last year. However when I apply your code, It returned true .

I have these kinds of links:

The team has made some great progress with this:

3 Likes

Thanks, @watchmanmonitor! I’ll post a reply in that topic. :+1:

1 Like