恢复帮助 - 系统在午夜挂起

好吧,这迟早会发生,之前一切太顺利了。多年来系统一直自动更新,我只需每隔几周更新一次 Discourse。昨晚午夜,亚马逊显示系统无响应,Discourse 宕机,CPU 占用率飙升至 100%,直到耗尽 AWS 的 CPU 资源。我无法登录系统,唯一一次在多次重启后短暂登录成功时,我在 htop 中看到以下进程占用了大量 CPU:

snap lxd activate

如果有人遇到过这种情况,能否帮忙分析一下为什么会自行发生?这对未来的参考会非常有帮助。

回到当前紧迫的问题,我在 AWS 上重新构建了一台使用 Ubuntu 20 LTS 的新服务器,Discourse 的设置过程异常简单。我手头有一份 app.yml 文件,用它重新创建了 Discourse 论坛。旧服务器使用 S3 存储备份以及内容(如图片等)。

创建服务器后,我从 S3 下载了最新的 Discourse 备份文件,手动上传到 Discourse 服务器,然后点击了“恢复”按钮。几分钟后,我收到了以下错误:

[2022-06-09 09:01:56] ALTER TABLE
[2022-06-09 09:01:56] ALTER TABLE
[2022-06-09 09:01:56] 正在迁移数据库...
[2022-06-09 09:02:11] == 20220308201942 CreateUploadReferences: 迁移中 ===========================
-- create_table(:upload_references, {})
   -> 0.0486s
-- add_index(:upload_references, [:upload_id, :target_type, :target_id], {:unique=>true, :name=>"index_upload_references_on_upload_and_target"})
   -> 0.0030s
== 20220308201942 CreateUploadReferences: 迁移完成 (0.0580s) ==================

== 20220309132719 CopyPostUploadsToUploadReferences: 迁移中 ================
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT post_uploads.upload_id, 'Post', post_uploads.post_id, uploads.created_at, uploads.updated_at\nFROM post_uploads\nJOIN uploads ON uploads.id = post_uploads.upload_id\nON CONFLICT DO NOTHING\n")
   -> 0.0595s
== 20220309132719 CopyPostUploadsToUploadReferences: 迁移完成 (0.0602s) =======

== 20220309132720 CopyPostUploadsToUploadReferencesForSync: 迁移中 =========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT upload_id, 'Post', post_id, NOW(), NOW()\nFROM post_uploads\nON CONFLICT DO NOTHING\n")
   -> 0.0076s
== 20220309132720 CopyPostUploadsToUploadReferencesForSync: 迁移完成 (0.0080s) 

== 20220330160747 CopySiteSettingsUploadsToUploadReferences: 迁移中 ========
-- execute("WITH site_settings_uploads AS (\n  SELECT id, unnest(string_to_array(value, '|'))::integer upload_id\n  FROM site_settings\n  WHERE data_type = 17\n  UNION\n  SELECT id, value::integer\n  FROM site_settings\n  WHERE data_type = 18 AND value != ''\n)\nINSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT site_settings_uploads.upload_id, 'SiteSetting', site_settings_uploads.id, uploads.created_at, uploads.updated_at\nFROM site_settings_uploads\nJOIN uploads ON uploads.id = site_settings_uploads.upload_id\nON CONFLICT DO NOTHING\n")
   -> 0.0034s
== 20220330160747 CopySiteSettingsUploadsToUploadReferences: 迁移完成 (0.0038s) 

== 20220330160751 CopyBadgesUploadsToUploadReferences: 迁移中 ==============
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT badges.image_upload_id, 'Badge', badges.id, uploads.created_at, uploads.updated_at\nFROM badges\nJOIN uploads ON uploads.id = badges.image_upload_id\nWHERE badges.image_upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0006s
== 20220330160751 CopyBadgesUploadsToUploadReferences: 迁移完成 (0.0010s) =====

== 20220330160754 CopyGroupsUploadsToUploadReferences: 迁移中 ==============
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT groups.flair_upload_id, 'Group', groups.id, uploads.created_at, uploads.updated_at\nFROM groups\nJOIN uploads ON uploads.id = groups.flair_upload_id\nWHERE groups.flair_upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0050s
== 20220330160754 CopyGroupsUploadsToUploadReferences: 迁移完成 (0.0055s) =====

== 20220330160757 CopyUserExportsUploadsToUploadReferences: 迁移中 =========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT user_exports.upload_id, 'UserExport', user_exports.id, uploads.created_at, uploads.updated_at\nFROM user_exports\nJOIN uploads ON uploads.id = user_exports.upload_id\nON CONFLICT DO NOTHING\n")
   -> 0.0013s
== 20220330160757 CopyUserExportsUploadsToUploadReferences: 迁移完成 (0.0041s) 

== 20220330164740 CopyThemeFieldsUploadsToUploadReferences: 迁移中 =========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT theme_fields.upload_id, 'ThemeField', theme_fields.id, uploads.created_at, uploads.updated_at\nFROM theme_fields\nJOIN uploads ON uploads.id = theme_fields.upload_id\nWHERE type_id = 2\nON CONFLICT DO NOTHING\n")
   -> 0.0006s
== 20220330164740 CopyThemeFieldsUploadsToUploadReferences: 迁移完成 (0.0010s) 

== 20220404195635 CopyCategoriesUploadsToUploadReferences: 迁移中 ==========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT categories.uploaded_logo_id, 'Category', categories.id, uploads.created_at, uploads.updated_at\nFROM categories\nJOIN uploads ON uploads.id = categories.uploaded_logo_id\nWHERE categories.uploaded_logo_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0095s
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT categories.uploaded_background_id, 'Category', categories.id, uploads.created_at, uploads.updated_at\nFROM categories\nJOIN uploads ON uploads.id = categories.uploaded_background_id\nWHERE categories.uploaded_background_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0004s
== 20220404195635 CopyCategoriesUploadsToUploadReferences: 迁移完成 (0.0103s) =

== 20220404201949 CopyCustomEmojisUploadsToUploadReferences: 迁移中 ========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT custom_emojis.upload_id, 'CustomEmoji', custom_emojis.id, uploads.created_at, uploads.updated_at\nFROM custom_emojis\nJOIN uploads ON uploads.id = custom_emojis.upload_id\nWHERE custom_emojis.upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0032s
== 20220404201949 CopyCustomEmojisUploadsToUploadReferences: 迁移完成 (0.0036s) 

== 20220404203356 CopyUserProfilesUploadsToUploadReferences: 迁移中 ========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT user_profiles.profile_background_upload_id, 'UserProfile', user_profiles.user_id, uploads.created_at, uploads.updated_at\nFROM user_profiles\nJOIN uploads ON uploads.id = user_profiles.profile_background_upload_id\nWHERE user_profiles.profile_background_upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0017s
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT user_profiles.card_background_upload_id, 'UserProfile', user_profiles.user_id, uploads.created_at, uploads.updated_at\nFROM user_profiles\nJOIN uploads ON uploads.id = user_profiles.card_background_upload_id\nWHERE user_profiles.card_background_upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0011s
== 20220404203356 CopyUserProfilesUploadsToUploadReferences: 迁移完成 (0.0033s) 

== 20220404204439 CopyUserAvatarsUploadsToUploadReferences: 迁移中 =========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT user_avatars.custom_upload_id, 'UserAvatar', user_avatars.id, uploads.created_at, uploads.updated_at\nFROM user_avatars\nJOIN uploads ON uploads.id = user_avatars.custom_upload_id\nWHERE user_avatars.custom_upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0200s
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT user_avatars.gravatar_upload_id, 'UserAvatar', user_avatars.id, uploads.created_at, uploads.updated_at\nFROM user_avatars\nJOIN uploads ON uploads.id = user_avatars.gravatar_upload_id\nWHERE user_avatars.gravatar_upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0069s
== 20220404204439 CopyUserAvatarsUploadsToUploadReferences: 迁移完成 (0.0276s) 

== 20220404212716 CopyThemeSettingsUploadsToUploadReferences: 迁移中 =======
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT theme_settings.value::int, 'ThemeSetting', theme_settings.id, uploads.created_at, uploads.updated_at\nFROM theme_settings\nJOIN uploads ON uploads.id = theme_settings.value::int\nWHERE data_type = 6 AND theme_settings.value IS NOT NULL AND theme_settings.value != ''\nON CONFLICT DO NOTHING\n")
   -> 0.0025s
== 20220404212716 CopyThemeSettingsUploadsToUploadReferences: 迁移完成 (0.0030s) 

== 20220526203356 CopyUserUploadsToUploadReferences: 迁移中 ================
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT users.uploaded_avatar_id, 'User', users.id, uploads.created_at, uploads.updated_at\nFROM users\nJOIN uploads ON uploads.id = users.uploaded_avatar_id\nWHERE users.uploaded_avatar_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0227s
== 20220526203356 CopyUserUploadsToUploadReferences: 迁移完成 (0.0234s) =======


[2022-06-09 09:02:11] 正在重新连接数据库...
[2022-06-09 09:02:12] 正在重新加载站点设置...
[2022-06-09 09:02:12] 正在为非工作人员用户禁用出站邮件...
[2022-06-09 09:02:14] 正在禁用只读模式...
[2022-06-09 09:02:14] 正在清除分类缓存...
[2022-06-09 09:02:14] 正在重新加载翻译...
[2022-06-09 09:02:14] 正在重新映射上传文件...
[2022-06-09 09:02:14] 正在恢复上传文件,这可能需要一些时间...
[2022-06-09 09:03:05] 异常:1823 个上传文件中有 509 个未迁移到 S3。数据库 'default' 的 S3 迁移失败。
[2022-06-09 09:03:05] /var/www/discourse/lib/file_store/to_s3_migration.rb:132:in `raise_or_log'
/var/www/discourse/lib/file_store/to_s3_migration.rb:79:in `migration_successful?'
/var/www/discourse/lib/file_store/to_s3_migration.rb:373:in `migrate_to_s3'
/var/www/discourse/lib/file_store/to_s3_migration.rb:66:in `migrate'
/var/www/discourse/lib/file_store/s3_store.rb:328:in `copy_from'
/var/www/discourse/lib/backup_restore/uploads_restorer.rb:62:in `restore_uploads'
/var/www/discourse/lib/backup_restore/uploads_restorer.rb:44:in `restore'
/var/www/discourse/lib/backup_restore/restorer.rb:61:in `run'
/var/www/discourse/script/spawn_backup_restore.rb:23:in `restore'
/var/www/discourse/script/spawn_backup_restore.rb:36:in `block in <main>'
/var/www/discourse/script/spawn_backup_restore.rb:4:in `fork'
/var/www/discourse/script/spawn_backup_restore.rb:4:in `<main>'

有人能告知问题出在哪里,以及如何从 Amazon S3 服务器备份中恢复服务器吗?

您好,

在从 app.yml 重新创建新服务器后,您是否可以访问 https://your.domain/admin/backups 部分中的备份?

不,在从 app.yml 重新创建后,它只给了我一个全新的、什么都没有的设置。我从 S3 下载了最后一个备份,并手动将其上传到本地的 discourse 并点击了恢复。

我开始恢复它,我看到了所有的设置(包括 S3,凭据,以及设置页面上的一切)都回来了,我看到了所有的类别都显示出来,帖子也显示出来,一切都正常。然后突然几分钟后,我收到一个登出消息,所有的类别、主题都消失了,日志显示了那个错误(它似乎回滚了)。

:thinking: 所以 S3 配置不在 app.yml 中(如https://meta.discourse.org/t/using-object-storage-for-uploads-s3-clones/148916)?而是像https://meta.discourse.org/t/setting-up-file-and-image-uploads-to-s3/7229#discourse-configuration-7 中那样配置?

我在我的 app.yml 中没有看到这个。

所有 S3 设置都在 Admin → Settings 页面中定义,并且一年来一直运行正常,直到昨晚服务器宕机需要恢复。

是的,这就是我用来设置 S3 备份和上传的。

我认为您应该尝试使用您的设置编辑 app.yml,然后您应该可以在管理部分看到您的备份并从中恢复,而无需手动导入和上传。您不应该需要恢复这些上传,但它们似乎包含在备份中。但我不知道为什么会失败……

我认为这可能是因为您的备份包含S3和本地上传的混合。恐怕这超出了我的专业范围,但在此主题中有一些讨论和解决方法,可以帮助您绕过此故障。但是,那里的错误数量要少得多,因此您可能需要考虑这一点:

抱歉,我的网站不幸崩溃了,所以我只剩下 S3 上传和备份。我猜我现在无法将任何剩余的本地文件迁移到 S3 了。

所以我想知道我现在有哪些选择?有没有办法从 S3 备份恢复并忽略本地文件?我找到了一种方法可以忽略 S3 上传,但那样的话,几乎所有的帖子都会有损坏的链接/图片(90% 以上可能在 S3 中,因为我多年前就设置了 S3 上传)。

以下是针对可能遇到相同问题的用户更新(基本上我无法从备份恢复,并且服务器由于系统升级故障而崩溃)。

据我所知,问题的根本原因是存在本地上传和 S3 上传,因此当恢复工具尝试恢复时,它会出错,因为它不知道如何同时处理本地和 S3 恢复(也许是时候让 Discourse 重新审视备份/恢复了)。

感谢 @RGJ 的这个技巧,他建议强制 Discourse 在恢复时忽略 S3 上传:

  1. 在您的 app.yml 中添加一行 DISCOURSE_ENABLE_S3_UPLOADS=false
  2. 重建 Discourse ./launcher rebuild app
  3. 尝试恢复(从 GUI 备份页面或使用 CLI
  4. 然后在恢复后,从 app.yml 中删除该行并再次重建一次

虽然这奏效了,但需要注意的是,论坛严重损坏,类别、设置和帖子已恢复,但所有图像、链接、嵌入式文档等都已损坏并出错。

最后的解决方案:
我设法抢救了旧服务器,并提取了 /var/discourse 目录(tar/gz),将其复制到新服务器并执行了 ./launcher rebuild app。这完全恢复了论坛的运行,但根本问题仍然存在——备份将无法工作,因为它们混合了本地和 S3 上传。

因此,我真的需要一些建议,以一次性解决此问题的最佳方法。是将所有上传从本地移动到 S3,还是从 S3 移动到本地更好/更容易,以及如何操作?备份的全部目的是在这种情况下提供帮助,但它让我失望了,所以我需要您来解决它。

如果您按照使用对象存储进行上传(S3 和克隆)中的描述进行配置,您应该能够运行

 rake uploads:migrate_to_s3

如果您想停止使用 S3,则可以进入 rails 控制台并设置

  SiteSetting.include_s3_uploads_in_backups=true

然后进行备份,确保您的 app.yml没有 配置 S3,并恢复备份。我认为这将把备份恢复到本地。

但无论哪种情况,我仍然建议在 app.yml 文件中将您的密钥和备份存储桶设置为环境变量,然后检查您是否可以将其恢复到新站点。

好的,我想我有点搞混了。

我认为最理想的做法是将所有上传都保存在本地,并将备份(zip 文件)保存到 S3。这样,即使服务器出现任何问题,备份也可用在 S3 上,而且备份本身是独立的,没有依赖项,因此应该很容易在新服务器上恢复。

所以,如果我理解正确的话,我应该遵循这些说明:

If you want to stop using s3, then you can enter the rails console and set

  SiteSetting.include_s3_uploads_in_backups=true
Then take a backup, make sure that you do not have s3 configured in your app.yml and restore the backup. I think that this will restore backups to local.

然后

  1. 在管理员 → 设置 → 文件中禁用“启用上传到 S3”选项
  2. 在管理员 → 设置 → 备份页面启用“备份到 S3”选项

这是正确的吗?

这是让我感到困惑的部分,为什么我需要将 S3 配置放在 app.yml 文件中?

这样您就可以在恢复备份之前通过命令行访问您的备份。否则,您必须先设置一个管理员帐户,然后配置 S3,然后进行恢复。同样,您在数据库中设置的任何设置在恢复数据库时都会被覆盖。

我认为最佳实践是通过 app.yml 文件中的 ENV 变量来配置 S3。如果不是因为有数百人会惊讶于它们消失了,那么将它们设为隐藏设置可能会有意义。

因为否则您在恢复时会遇到麻烦。

如何使用命令行从 S3 恢复备份?根据此处的说明:Restore a backup from the command line
它说您可以将备份文件放入 /var/discourse/shared/standalone/backups/default 文件夹,然后从命令行开始恢复。这就是我之前按照您的建议所做的(不幸的是,这最终导致了损坏的链接),但这确实有效。

如何直接使用命令行从 S3 恢复?

cd /var/discourse
./launcher enter app
discourse restore

它将打印可用的备份,然后您可以复制/粘贴您想要还原的备份。

好的,它将读取 S3 备份并将其列为一个选项。

Jay,关于您提出的将资产移至本地的建议:

我认为您可以将一个隐藏设置 include_s3_uploads_in_backups 设置为 true,然后进行备份并将其恢复,当 S3 关闭时停止使用 S3。

将 S3 备份与 app.yml 中的配置相结合,意味着您只需使用 app.yml 文件就可以进行命令行恢复(在克隆 discourse 并安装 docker 之后)。

第一步,我需要备份 S3 存储桶,还是这是一个存储桶安全操作?

好吧,至少我弄清楚了昨晚我的服务器为什么会崩溃(今天在完全重建后又崩溃了一次 :frowning: ,详情请参阅此主题:Ubuntu 20.04 kernel update with docker causing a crash on EC2 and Lightsail

要从备份中运行它,我必须执行以下操作:

  1. 禁用 Settings → Filesenable s3 uploads
  2. Settings → Backupsbackup locationS3
  3. 启用 Settings → Backupsbackup with uploads

然后我进行了一次备份,并成功地恢复了它。但是,有一件事坏了,所有的附件(文件)现在都有无效链接。图像都很好,但附件链接,如 https://domain.com/uploads/short-url/phu1HOLvkE8LWpkKYfnMPSWsvHh.zip 现在会给我一个错误:

哎呀!该页面不存在或已设为私有。

有没有办法修复这些短链接?

您可以尝试对其中一个主题执行 HTML 重建(也称为重新烘焙),看看是否可以解决问题。

谢谢。有没有关于如何发出命令来烘焙特定主题的指南?