Re-adding missing uploads to the database

I’ve got a site where it seems that a bunch of uploads have been removed from the uploads table in the database, but still exist in the filesystem. This leaves links to these broken. I’ve fixed a few by re-uploading files into a new topic to create the record in the DB (at the time I thought that the files were missing too, but @angus graciously pointed out why I wasn’t finding them–One day I’ll ask: why do we have both sha1 and base62 names for all of these assets?) and then re-baking the posts that include those uploads, like this:

def rebake_posts_with_uploads(topic_id, post_number)
  p = Post.find_by(topic_id: topic_id, post_number: post_number)
  exit unless p
  re = /upload:\/\/(.+?)\)/
  shas= p.raw.scan(re)

  shas.each do |sha|
    posts = Post.where("raw like '%#{sha[0]}%'")
    next unless posts
    puts "Found #{posts.count - 1} #{sha[0]}"
    posts.each do |post|
      next if post.id == p.id
      puts "rebake #{post.id}--#{BASE_URL}/t/-/#{post.topic_id}/#{post.post_number}"
      post.rebake!
    end
  end
end

My new plan, since it seems that the files exist in the file system but not in the database, is to do something like this:

for (all files in /shared/uploads/default/original/1x) do |file|
  unless file is in uploads table
     create upload record 
     for each post that includes that upload record
       rebake

Does that seem right? I’m looking at uploads.rake and don’t see anything that seems to do this already. This is sort of the opposite of

but instead of FileUtils.rm(file_path) I’d instead do an Upload.create, I think.

If this seems really stupid or there is a much better solution, I’d love to hear it before I go down this little rabbit hole.

Thanks.

I don’t know how this happened. I was hoping to pin the blame on a custom plugin, but I’m afraid that’s not the case. It may be related to another discussion, in which someone said:

3 likes

Yeah our history here with uploads is quite spotty, and it is our fault… @sam can you recommend someone to give a quick bit of advice?

2 likes

This can work, you just have to be careful to test on local … also look at 2x / 3x directories there are uploads everywhere.

3 likes

I’m trying something like this:

def add_missing_files_to_uploads
  public_directory = Rails.root.join("public").to_s
  db = RailsMultisite::ConnectionManagement.current_db
  uploads_directory = File.join(public_directory, 'uploads', db).to_s
  # uploads and optimized images
  missing = 0
  matched = 0
  Dir.glob("#{uploads_directory}/**/*.pdf").each do |file_path|
    sha1 = Upload.generate_digest(file_path)
    url = file_path.split(public_directory, 2)[1]
    if (Upload.where(sha1: sha1).empty? &&
        Upload.where(url: url).empty?)
      puts "MISSING #{file_path}" if DEBUG
      missing += 1
    else
      matched += 1
    end
  end
  puts "MISS: #{missing}. Match #{matched}"
end

Does that look sort of close? My first test seems to have failed, but I might have screwed something up.

Hi @pfaffman. I was reading this topic as it looks a promising solution for my problem here.
Did you manage to get to a good result in the end?

Sorry, I can’t remember.

It looks to Sam and me like it’ll work.

I just solved a simulator problem by creating a post and into it inserting a link to all of the images and then rebaking all of the posts that contain transparent.gif.

This looks like a somewhat more elegant solution.

I would say make a database-only backup and give it a try. I’m sort of on vacation, but if you have a budget I can see what I can do.

And I too have an important client on ams3; I’ve not yet moved them to aws, but I think that’s happening soon. I have a friend who worked for digital ocean who recommended ams3 because it was their best data center and was largely underutilized (this was now long ago). That didn’t work out as I’d hoped.

1 like

I wonder if this is the best method nowadays?

I’m trying to recover uploads that are on the filesystem but not in the database after a bad S3 migration (with the old rake:s3_migrate).

There isn’t really a good method. :crying_cat:

But, yeah. That’s still the idea. I don’t think anything has changed in Discourse since then. Typically, I’d do a couple by hand to make sure that it does what’s expected. Also, make a backup first, and maybe put the site in read-only mode so that if you do need to restore the backup you won’t have to throw away any posts made while you were mucking with things.

Without knowing a good bit more than is easy to communicate in a forum I can’t tell if that’s really the best way or if there’s something simpler. You might be able to just gsub the paths of the URLs, for example. If you’ve got a budget you can contact me or ask in Marketplace .

1 like

Okay, I solved it with Claude and a lot of praise. I’m sharing what I did in order to help anyone else with a similar or the same issue.

I’m not sure if that’s the most clever and optimal method to use, just the one that worked for me.

Please be careful and keep in mind that I’m not an expert but a novice always learning.

The issue (S3 → local filesystem)

After migrating from AWS S3 to local FS, a lot of images displayed as transparent.png. The files was always on disk but Discourse couldn’t resolve them.

The root cause was a broken chain:

  1. Posts with upload:// short URLs (base62-encoded SHA1).
  2. Database uploads mapping SHA1 → local file path.
  3. Filesystem storing files named by their SHA1 hash,

The migration moved files to disk correctly, but no uploads DB records existed. Without a matching record, Discourse falls back to transparent.png.

The solution (create records and rebake)

# Enter container
./launcher enter app
rails c

Create missing upload records from orphan files:

dir = Rails.root.join("public", "uploads", "default", "original")
created = 0

Dir.glob(dir.join("**", "*")).select { |f| File.file?(f) }.each do |path|
  sha = File.basename(path, File.extname(path))
  next if Upload.find_by(sha1: sha)

  ext = File.extname(path).delete(".")
  relative = path.sub("#{Rails.root}/public", "")

  u = Upload.new
  u.sha1 = sha
  u.url = relative
  u.original_filename = File.basename(path)
  u.filesize = File.size(path)
  u.extension = ext
  u.user_id = -1
  u.save!(validate: false)

  created += 1
  puts "Created upload #{u.id}: #{sha}"
end

puts "Total created: #{created}"

Rebake posts that reference restored uploads:

fixed_posts = 0

Upload.where(user_id: -1).find_each do |u|
  short = u.short_url
  next unless short

  Post.where("raw LIKE '%upload://%'").find_each do |p|
    urls = p.raw.scan(/upload:\/\/[^\s\]\)]+/)
    urls.each do |url|
      decoded = Upload.sha1_from_short_url(url)
      if decoded == u.sha1
        p.rebake!
        fixed_posts += 1
        puts "Rebaked post #{p.id}"
        break
      end
    end
  end
end

puts "Total rebaked: #{fixed_posts}"

Regenerate missing optimized:

After fixing the original files, we need to populate the optimized files (1X, 2X, etc).

Rake works in discourse container but not in rails console.

rake uploads:regenerate_missing_optimized

[OPTIONAL] Still missing optimized

If rake uploads:regenerate_missing_optimized did not solve all the file issues and there is still missing files:

# Enter container
./launcher enter app
rails c
missing = 0
OptimizedImage.find_each do |oi|
  path = "#{Rails.root}/public#{oi.url}"
  unless File.exist?(path)
    missing += 1
    oi.delete
  end
end
puts "Deleted #{missing} broken optimized records"

Then exit rails and run again:

rake uploads:regenerate_missing_optimized

Safe rollback (just in case)

All created records use user_id: -1 and delete_all skips callbacks so filesystem files are untouched. To undo:

Upload.where(user_id: -1).delete_all

Previously used destroy_all by mistake and it triggered callbacks that moved files to tombstone.

Recovered an individual one that I used to test and reframed my approach.

3 likes