Re-adding missing uploads to the database

I’ve got a site where it seems that a bunch of uploads have been removed from the uploads table in the database, but still exist in the filesystem. This leaves links to these broken. I’ve fixed a few by re-uploading files into a new topic to create the record in the DB (at the time I thought that the files were missing too, but @angus graciously pointed out why I wasn’t finding them–One day I’ll ask: why do we have both sha1 and base62 names for all of these assets?) and then re-baking the posts that include those uploads, like this:

def rebake_posts_with_uploads(topic_id, post_number)
  p = Post.find_by(topic_id: topic_id, post_number: post_number)
  exit unless p
  re = /upload:\/\/(.+?)\)/
  shas= p.raw.scan(re)

  shas.each do |sha|
    posts = Post.where("raw like '%#{sha[0]}%'")
    next unless posts
    puts "Found #{posts.count - 1} #{sha[0]}"
    posts.each do |post|
      next if post.id == p.id
      puts "rebake #{post.id}--#{BASE_URL}/t/-/#{post.topic_id}/#{post.post_number}"
      post.rebake!
    end
  end
end

My new plan, since it seems that the files exist in the file system but not in the database, is to do something like this:

for (all files in /shared/uploads/default/original/1x) do |file|
  unless file is in uploads table
     create upload record 
     for each post that includes that upload record
       rebake

Does that seem right? I’m looking at uploads.rake and don’t see anything that seems to do this already. This is sort of the opposite of

but instead of FileUtils.rm(file_path) I’d instead do an Upload.create, I think.

If this seems really stupid or there is a much better solution, I’d love to hear it before I go down this little rabbit hole.

Thanks.

I don’t know how this happened. I was hoping to pin the blame on a custom plugin, but I’m afraid that’s not the case. It may be related to another discussion, in which someone said:

3 个赞

Yeah our history here with uploads is quite spotty, and it is our fault… @sam can you recommend someone to give a quick bit of advice?

2 个赞

This can work, you just have to be careful to test on local … also look at 2x / 3x directories there are uploads everywhere.

3 个赞

I’m trying something like this:

def add_missing_files_to_uploads
  public_directory = Rails.root.join("public").to_s
  db = RailsMultisite::ConnectionManagement.current_db
  uploads_directory = File.join(public_directory, 'uploads', db).to_s
  # uploads and optimized images
  missing = 0
  matched = 0
  Dir.glob("#{uploads_directory}/**/*.pdf").each do |file_path|
    sha1 = Upload.generate_digest(file_path)
    url = file_path.split(public_directory, 2)[1]
    if (Upload.where(sha1: sha1).empty? &&
        Upload.where(url: url).empty?)
      puts "MISSING #{file_path}" if DEBUG
      missing += 1
    else
      matched += 1
    end
  end
  puts "MISS: #{missing}. Match #{matched}"
end

Does that look sort of close? My first test seems to have failed, but I might have screwed something up.

Hi @pfaffman. I was reading this topic as it looks a promising solution for my problem here.
Did you manage to get to a good result in the end?

Sorry, I can’t remember.

It looks to Sam and me like it’ll work.

I just solved a simulator problem by creating a post and into it inserting a link to all of the images and then rebaking all of the posts that contain transparent.gif.

This looks like a somewhat more elegant solution.

I would say make a database-only backup and give it a try. I’m sort of on vacation, but if you have a budget I can see what I can do.

And I too have an important client on ams3; I’ve not yet moved them to aws, but I think that’s happening soon. I have a friend who worked for digital ocean who recommended ams3 because it was their best data center and was largely underutilized (this was now long ago). That didn’t work out as I’d hoped.

1 个赞

我想知道这是否是现今最好的方法?

我正试图恢复那些在 S3 迁移失败(使用了旧的 rake:s3_migrate)后存在于文件系统但不在数据库中的上传内容。

其实没有一个好的方法。:crying_cat:

但是,是的。想法仍然是那样。我不认为自那时起 Discourse 有任何变化。通常,我会手动做几项来确保它能按预期工作。另外,先备份一下,也许将站点设置为只读模式,这样如果你需要恢复备份,就不用丢弃你在摆弄东西时发表的任何帖子。

如果不了解比在论坛上容易传达的更多的信息,我就无法判断那是否真的是最好的方法,或者是否有更简单的方法。例如,你可能只需要对 URL 的路径使用 gsub。如果你有预算,可以联系我或在 Marketplace 询问。

1 个赞

好的,我通过 Claude 和大量的赞美解决了这个问题。我将分享我的做法,希望能帮助到遇到类似或相同问题的其他人。

我不确定这是不是最巧妙和最佳的使用方法,但它对我有效。

请小心并记住,我不是专家,而是一个不断学习的新手。

问题所在 (S3 → 本地文件系统)

从 AWS S3 迁移到本地文件系统后,很多图片显示为 transparent.png。文件总是在磁盘上,但 Discourse 无法解析它们。

根本原因是链条断裂:

  1. 带有 upload:// 短链接(base62 编码的 SHA1)的帖子
  2. 数据库中将 SHA1 映射到本地文件路径的 uploads 记录。
  3. 以其 SHA1 哈希命名的文件文件系统

迁移正确地将文件移动到了磁盘,但没有存在的 uploads 数据库记录。如果没有匹配的记录,Discourse 会回退到 transparent.png

解决方案 (创建记录并重新烘焙)

从孤立文件中创建缺失的上传记录:

dir = Rails.root.join("public", "uploads", "default", "original")
created = 0

Dir.glob(dir.join("**", "*")).select { |f| File.file?(f) }.each do |path|
  sha = File.basename(path, File.extname(path))
  next if Upload.find_by(sha1: sha)

  ext = File.extname(path).delete(".")
  relative = path.sub("#{Rails.root}/public", "")

  u = Upload.new
  u.sha1 = sha
  u.url = relative
  u.original_filename = File.basename(path)
  u.filesize = File.size(path)
  u.extension = ext
  u.user_id = -1
  u.save!(validate: false)

  created += 1
  puts "Created upload #{u.id}: #{sha}"
end

puts "Total created: #{created}"

重新烘焙引用了已恢复上传的帖子:

fixed_posts = 0

Upload.where(user_id: -1).find_each do |u|
  short = u.short_url
  next unless short

  Post.where("raw LIKE '%upload://%'").find_each do |p|
    urls = p.raw.scan(/upload:\/\/^[^\s\]\)]+/)
    urls.each do |url|
      decoded = Upload.sha1_from_short_url(url)
      if decoded == u.sha1
        p.rebake!
        fixed_posts += 1
        puts "Rebaked post #{p.id}"
        break
      end
    end
  end
end

puts "Total rebaked: #{fixed_posts}"

重新生成缺失的优化文件:

修复原始文件后,我们需要填充优化文件(1X、2X 等)。

rake uploads:regenerate_missing_optimized

安全回滚(以防万一)

所有创建的记录都使用 user_id: -1。要撤销:

Upload.where(user_id: -1).delete_all

delete_all 会跳过回调,因此文件系统中的文件不会被触动。

之前错误地使用了 destroy_all,它触发了将文件移至墓碑的回调。

我恢复了用于测试的单个文件,并重构了我的方法。

3 个赞