恢复帮助 - 系统在午夜挂起

Okay this was bound to happen, things were just too good for too long. Years of running on cruise control, the system would automatically update itself and I would update Discourse every few weeks. At midnight last night, Amazon showed the system was unresponsive, discourse was down and the CPU was pegged at 100% until it ran out of AWS CPU resources. Couldn’t login to the system, the one time after several reboots I was able to login momentarily, I saw this in htoptaking up a lot of CPU

snap lxd activate

If anyone has seen this can can throw some light as to why this may happened by itself, it would be much appreciated for future reference.

Coming the pressing issue at hand, I proceeded to rebuild a new server on AWS using Ubuntu 20LTS, Discourse setup was exceedingly easy. I had a copy of the app.yml file which I used to recreate the discourse forum. The old server was using S3 for the backups AND for the content (images etc).

After creating the server, I downloaded the latest discourse backup file from S3, manually uploaded it to the discourse server and hit the Restore buttons. After a few minutes I get this error.

[2022-06-09 09:01:56] ALTER TABLE
[2022-06-09 09:01:56] ALTER TABLE
[2022-06-09 09:01:56] Migrating the database...
[2022-06-09 09:02:11] == 20220308201942 CreateUploadReferences: migrating ===========================
-- create_table(:upload_references, {})
   -> 0.0486s
-- add_index(:upload_references, [:upload_id, :target_type, :target_id], {:unique=>true, :name=>"index_upload_references_on_upload_and_target"})
   -> 0.0030s
== 20220308201942 CreateUploadReferences: migrated (0.0580s) ==================

== 20220309132719 CopyPostUploadsToUploadReferences: migrating ================
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT post_uploads.upload_id, 'Post', post_uploads.post_id, uploads.created_at, uploads.updated_at\nFROM post_uploads\nJOIN uploads ON uploads.id = post_uploads.upload_id\nON CONFLICT DO NOTHING\n")
   -> 0.0595s
== 20220309132719 CopyPostUploadsToUploadReferences: migrated (0.0602s) =======

== 20220309132720 CopyPostUploadsToUploadReferencesForSync: migrating =========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT upload_id, 'Post', post_id, NOW(), NOW()\nFROM post_uploads\nON CONFLICT DO NOTHING\n")
   -> 0.0076s
== 20220309132720 CopyPostUploadsToUploadReferencesForSync: migrated (0.0080s) 

== 20220330160747 CopySiteSettingsUploadsToUploadReferences: migrating ========
-- execute("WITH site_settings_uploads AS (\n  SELECT id, unnest(string_to_array(value, '|'))::integer upload_id\n  FROM site_settings\n  WHERE data_type = 17\n  UNION\n  SELECT id, value::integer\n  FROM site_settings\n  WHERE data_type = 18 AND value != ''\n)\nINSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT site_settings_uploads.upload_id, 'SiteSetting', site_settings_uploads.id, uploads.created_at, uploads.updated_at\nFROM site_settings_uploads\nJOIN uploads ON uploads.id = site_settings_uploads.upload_id\nON CONFLICT DO NOTHING\n")
   -> 0.0034s
== 20220330160747 CopySiteSettingsUploadsToUploadReferences: migrated (0.0038s) 

== 20220330160751 CopyBadgesUploadsToUploadReferences: migrating ==============
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT badges.image_upload_id, 'Badge', badges.id, uploads.created_at, uploads.updated_at\nFROM badges\nJOIN uploads ON uploads.id = badges.image_upload_id\nWHERE badges.image_upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0006s
== 20220330160751 CopyBadgesUploadsToUploadReferences: migrated (0.0010s) =====

== 20220330160754 CopyGroupsUploadsToUploadReferences: migrating ==============
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT groups.flair_upload_id, 'Group', groups.id, uploads.created_at, uploads.updated_at\nFROM groups\nJOIN uploads ON uploads.id = groups.flair_upload_id\nWHERE groups.flair_upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0050s
== 20220330160754 CopyGroupsUploadsToUploadReferences: migrated (0.0055s) =====

== 20220330160757 CopyUserExportsUploadsToUploadReferences: migrating =========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT user_exports.upload_id, 'UserExport', user_exports.id, uploads.created_at, uploads.updated_at\nFROM user_exports\nJOIN uploads ON uploads.id = user_exports.upload_id\nON CONFLICT DO NOTHING\n")
   -> 0.0013s
== 20220330160757 CopyUserExportsUploadsToUploadReferences: migrated (0.0041s) 

== 20220330164740 CopyThemeFieldsUploadsToUploadReferences: migrating =========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT theme_fields.upload_id, 'ThemeField', theme_fields.id, uploads.created_at, uploads.updated_at\nFROM theme_fields\nJOIN uploads ON uploads.id = theme_fields.upload_id\nWHERE type_id = 2\nON CONFLICT DO NOTHING\n")
   -> 0.0006s
== 20220330164740 CopyThemeFieldsUploadsToUploadReferences: migrated (0.0010s) 

== 20220404195635 CopyCategoriesUploadsToUploadReferences: migrating ==========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT categories.uploaded_logo_id, 'Category', categories.id, uploads.created_at, uploads.updated_at\nFROM categories\nJOIN uploads ON uploads.id = categories.uploaded_logo_id\nWHERE categories.uploaded_logo_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0095s
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT categories.uploaded_background_id, 'Category', categories.id, uploads.created_at, uploads.updated_at\nFROM categories\nJOIN uploads ON uploads.id = categories.uploaded_background_id\nWHERE categories.uploaded_background_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0004s
== 20220404195635 CopyCategoriesUploadsToUploadReferences: migrated (0.0103s) =

== 20220404201949 CopyCustomEmojisUploadsToUploadReferences: migrating ========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT custom_emojis.upload_id, 'CustomEmoji', custom_emojis.id, uploads.created_at, uploads.updated_at\nFROM custom_emojis\nJOIN uploads ON uploads.id = custom_emojis.upload_id\nWHERE custom_emojis.upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0032s
== 20220404201949 CopyCustomEmojisUploadsToUploadReferences: migrated (0.0036s) 

== 20220404203356 CopyUserProfilesUploadsToUploadReferences: migrating ========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT user_profiles.profile_background_upload_id, 'UserProfile', user_profiles.user_id, uploads.created_at, uploads.updated_at\nFROM user_profiles\nJOIN uploads ON uploads.id = user_profiles.profile_background_upload_id\nWHERE user_profiles.profile_background_upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0017s
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT user_profiles.card_background_upload_id, 'UserProfile', user_profiles.user_id, uploads.created_at, uploads.updated_at\nFROM user_profiles\nJOIN uploads ON uploads.id = user_profiles.card_background_upload_id\nWHERE user_profiles.card_background_upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0011s
== 20220404203356 CopyUserProfilesUploadsToUploadReferences: migrated (0.0033s) 

== 20220404204439 CopyUserAvatarsUploadsToUploadReferences: migrating =========
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT user_avatars.custom_upload_id, 'UserAvatar', user_avatars.id, uploads.created_at, uploads.updated_at\nFROM user_avatars\nJOIN uploads ON uploads.id = user_avatars.custom_upload_id\nWHERE user_avatars.custom_upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0200s
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT user_avatars.gravatar_upload_id, 'UserAvatar', user_avatars.id, uploads.created_at, uploads.updated_at\nFROM user_avatars\nJOIN uploads ON uploads.id = user_avatars.gravatar_upload_id\nWHERE user_avatars.gravatar_upload_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0069s
== 20220404204439 CopyUserAvatarsUploadsToUploadReferences: migrated (0.0276s) 

== 20220404212716 CopyThemeSettingsUploadsToUploadReferences: migrating =======
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT theme_settings.value::int, 'ThemeSetting', theme_settings.id, uploads.created_at, uploads.updated_at\nFROM theme_settings\nJOIN uploads ON uploads.id = theme_settings.value::int\nWHERE data_type = 6 AND theme_settings.value IS NOT NULL AND theme_settings.value != ''\nON CONFLICT DO NOTHING\n")
   -> 0.0025s
== 20220404212716 CopyThemeSettingsUploadsToUploadReferences: migrated (0.0030s) 

== 20220526203356 CopyUserUploadsToUploadReferences: migrating ================
-- execute("INSERT INTO upload_references(upload_id, target_type, target_id, created_at, updated_at)\nSELECT users.uploaded_avatar_id, 'User', users.id, uploads.created_at, uploads.updated_at\nFROM users\nJOIN uploads ON uploads.id = users.uploaded_avatar_id\nWHERE users.uploaded_avatar_id IS NOT NULL\nON CONFLICT DO NOTHING\n")
   -> 0.0227s
== 20220526203356 CopyUserUploadsToUploadReferences: migrated (0.0234s) =======


[2022-06-09 09:02:11] Reconnecting to the database...
[2022-06-09 09:02:12] Reloading site settings...
[2022-06-09 09:02:12] Disabling outgoing emails for non-staff users...
[2022-06-09 09:02:14] Disabling readonly mode...
[2022-06-09 09:02:14] Clearing category cache...
[2022-06-09 09:02:14] Reloading translations...
[2022-06-09 09:02:14] Remapping uploads...
[2022-06-09 09:02:14] Restoring uploads, this may take a while...
[2022-06-09 09:03:05] EXCEPTION: 509 of 1823 uploads are not migrated to S3. S3 migration failed for db 'default'.
[2022-06-09 09:03:05] /var/www/discourse/lib/file_store/to_s3_migration.rb:132:in `raise_or_log'
/var/www/discourse/lib/file_store/to_s3_migration.rb:79:in `migration_successful?'
/var/www/discourse/lib/file_store/to_s3_migration.rb:373:in `migrate_to_s3'
/var/www/discourse/lib/file_store/to_s3_migration.rb:66:in `migrate'
/var/www/discourse/lib/file_store/s3_store.rb:328:in `copy_from'
/var/www/discourse/lib/backup_restore/uploads_restorer.rb:62:in `restore_uploads'
/var/www/discourse/lib/backup_restore/uploads_restorer.rb:44:in `restore'
/var/www/discourse/lib/backup_restore/restorer.rb:61:in `run'
/var/www/discourse/script/spawn_backup_restore.rb:23:in `restore'
/var/www/discourse/script/spawn_backup_restore.rb:36:in `block in <main>'
/var/www/discourse/script/spawn_backup_restore.rb:4:in `fork'
/var/www/discourse/script/spawn_backup_restore.rb:4:in `<main>'

Can anyone advise on what’s the problem and how I can restore the server from the Amazon S3 server backups?

您好,

在从 app.yml 重新创建新服务器后,您是否可以访问 https://your.domain/admin/backups 部分中的备份?

不,在从 app.yml 重新创建后,它只给了我一个全新的、什么都没有的设置。我从 S3 下载了最后一个备份,并手动将其上传到本地的 discourse 并点击了恢复。

我开始恢复它,我看到了所有的设置(包括 S3,凭据,以及设置页面上的一切)都回来了,我看到了所有的类别都显示出来,帖子也显示出来,一切都正常。然后突然几分钟后,我收到一个登出消息,所有的类别、主题都消失了,日志显示了那个错误(它似乎回滚了)。

:thinking: 所以 S3 配置不在 app.yml 中(如https://meta.discourse.org/t/using-object-storage-for-uploads-s3-clones/148916)?而是像https://meta.discourse.org/t/setting-up-file-and-image-uploads-to-s3/7229#discourse-configuration-7 中那样配置?

1 个赞

我在我的 app.yml 中没有看到这个。

所有 S3 设置都在 Admin → Settings 页面中定义,并且一年来一直运行正常,直到昨晚服务器宕机需要恢复。

是的,这就是我用来设置 S3 备份和上传的。

我认为您应该尝试使用您的设置编辑 app.yml,然后您应该可以在管理部分看到您的备份并从中恢复,而无需手动导入和上传。您不应该需要恢复这些上传,但它们似乎包含在备份中。但我不知道为什么会失败……

1 个赞

我认为这可能是因为您的备份包含S3和本地上传的混合。恐怕这超出了我的专业范围,但在此主题中有一些讨论和解决方法,可以帮助您绕过此故障。但是,那里的错误数量要少得多,因此您可能需要考虑这一点:

2 个赞

抱歉,我的网站不幸崩溃了,所以我只剩下 S3 上传和备份。我猜我现在无法将任何剩余的本地文件迁移到 S3 了。

所以我想知道我现在有哪些选择?有没有办法从 S3 备份恢复并忽略本地文件?我找到了一种方法可以忽略 S3 上传,但那样的话,几乎所有的帖子都会有损坏的链接/图片(90% 以上可能在 S3 中,因为我多年前就设置了 S3 上传)。

以下是针对可能遇到相同问题的用户更新(基本上我无法从备份恢复,并且服务器由于系统升级故障而崩溃)。

据我所知,问题的根本原因是存在本地上传和 S3 上传,因此当恢复工具尝试恢复时,它会出错,因为它不知道如何同时处理本地和 S3 恢复(也许是时候让 Discourse 重新审视备份/恢复了)。

感谢 @RGJ 的这个技巧,他建议强制 Discourse 在恢复时忽略 S3 上传:

  1. 在您的 app.yml 中添加一行 DISCOURSE_ENABLE_S3_UPLOADS=false
  2. 重建 Discourse ./launcher rebuild app
  3. 尝试恢复(从 GUI 备份页面或使用 CLI
  4. 然后在恢复后,从 app.yml 中删除该行并再次重建一次

虽然这奏效了,但需要注意的是,论坛严重损坏,类别、设置和帖子已恢复,但所有图像、链接、嵌入式文档等都已损坏并出错。

最后的解决方案:
我设法抢救了旧服务器,并提取了 /var/discourse 目录(tar/gz),将其复制到新服务器并执行了 ./launcher rebuild app。这完全恢复了论坛的运行,但根本问题仍然存在——备份将无法工作,因为它们混合了本地和 S3 上传。

因此,我真的需要一些建议,以一次性解决此问题的最佳方法。是将所有上传从本地移动到 S3,还是从 S3 移动到本地更好/更容易,以及如何操作?备份的全部目的是在这种情况下提供帮助,但它让我失望了,所以我需要您来解决它。

1 个赞

如果您按照使用对象存储进行上传(S3 和克隆)中的描述进行配置,您应该能够运行

 rake uploads:migrate_to_s3

如果您想停止使用 S3,则可以进入 rails 控制台并设置

  SiteSetting.include_s3_uploads_in_backups=true

然后进行备份,确保您的 app.yml没有 配置 S3,并恢复备份。我认为这将把备份恢复到本地。

但无论哪种情况,我仍然建议在 app.yml 文件中将您的密钥和备份存储桶设置为环境变量,然后检查您是否可以将其恢复到新站点。

2 个赞

好的,我想我有点搞混了。

我认为最理想的做法是将所有上传都保存在本地,并将备份(zip 文件)保存到 S3。这样,即使服务器出现任何问题,备份也可用在 S3 上,而且备份本身是独立的,没有依赖项,因此应该很容易在新服务器上恢复。

所以,如果我理解正确的话,我应该遵循这些说明:

If you want to stop using s3, then you can enter the rails console and set

  SiteSetting.include_s3_uploads_in_backups=true
Then take a backup, make sure that you do not have s3 configured in your app.yml and restore the backup. I think that this will restore backups to local.

然后

  1. 在管理员 → 设置 → 文件中禁用“启用上传到 S3”选项
  2. 在管理员 → 设置 → 备份页面启用“备份到 S3”选项

这是正确的吗?

这是让我感到困惑的部分,为什么我需要将 S3 配置放在 app.yml 文件中?

这样您就可以在恢复备份之前通过命令行访问您的备份。否则,您必须先设置一个管理员帐户,然后配置 S3,然后进行恢复。同样,您在数据库中设置的任何设置在恢复数据库时都会被覆盖。

我认为最佳实践是通过 app.yml 文件中的 ENV 变量来配置 S3。如果不是因为有数百人会惊讶于它们消失了,那么将它们设为隐藏设置可能会有意义。

1 个赞

因为否则您在恢复时会遇到麻烦。

2 个赞

如何使用命令行从 S3 恢复备份?根据此处的说明:Restore a backup from the command line
它说您可以将备份文件放入 /var/discourse/shared/standalone/backups/default 文件夹,然后从命令行开始恢复。这就是我之前按照您的建议所做的(不幸的是,这最终导致了损坏的链接),但这确实有效。

如何直接使用命令行从 S3 恢复?

cd /var/discourse
./launcher enter app
discourse restore

它将打印可用的备份,然后您可以复制/粘贴您想要还原的备份。

2 个赞

好的,它将读取 S3 备份并将其列为一个选项。

Jay,关于您提出的将资产移至本地的建议:

我认为您可以将一个隐藏设置 include_s3_uploads_in_backups 设置为 true,然后进行备份并将其恢复,当 S3 关闭时停止使用 S3。

将 S3 备份与 app.yml 中的配置相结合,意味着您只需使用 app.yml 文件就可以进行命令行恢复(在克隆 discourse 并安装 docker 之后)。

第一步,我需要备份 S3 存储桶,还是这是一个存储桶安全操作?

好吧,至少我弄清楚了昨晚我的服务器为什么会崩溃(今天在完全重建后又崩溃了一次 :frowning: ,详情请参阅此主题:Ubuntu 20.04 kernel update with docker causing a crash on EC2 and Lightsail

2 个赞

要从备份中运行它,我必须执行以下操作:

  1. 禁用 Settings → Filesenable s3 uploads
  2. Settings → Backupsbackup locationS3
  3. 启用 Settings → Backupsbackup with uploads

然后我进行了一次备份,并成功地恢复了它。但是,有一件事坏了,所有的附件(文件)现在都有无效链接。图像都很好,但附件链接,如 https://domain.com/uploads/short-url/phu1HOLvkE8LWpkKYfnMPSWsvHh.zip 现在会给我一个错误:

哎呀!该页面不存在或已设为私有。

有没有办法修复这些短链接?

您可以尝试对其中一个主题执行 HTML 重建(也称为重新烘焙),看看是否可以解决问题。

谢谢。有没有关于如何发出命令来烘焙特定主题的指南?