Processo de restauração cancelado na etapa de migração de uploads para o S3

I’ve been running into issues trying to run a restore on our Staging Discourse instance. Staging is running v2.4.0.beta1 +36. Any idea where the breakdown might be or where to look? Thanks in advance!

Below is the end of the log output:

[2019-07-16 20:08:12] ALTER TABLE
[2019-07-16 20:08:12] ALTER TABLE
[2019-07-16 20:08:12] ALTER TABLE
[2019-07-16 20:08:12] ALTER TABLE
[2019-07-16 20:08:12] Migrating the database...
[2019-07-16 20:08:16] Reconnecting to the database...
[2019-07-16 20:08:16] Reloading site settings...
[2019-07-16 20:08:16] Disabling outgoing emails for non-staff users...
[2019-07-16 20:08:16] Clearing emoji cache...
[2019-07-16 20:08:16] Disabling readonly mode...
[2019-07-16 20:08:16] Clear theme cache
[2019-07-16 20:08:22] Extracting uploads...
[2019-07-16 20:08:40] Migrating uploads to S3...
[2019-07-16 20:08:46] Restore process was cancelled!
[2019-07-16 20:08:46] Trying to rollback...
[2019-07-16 20:08:46] Rolling back...
[2019-07-16 20:08:47] Cleaning stuff up...
[2019-07-16 20:08:47] Removing tmp '/var/www/discourse/tmp/restores/default/2019-07-16-200516' directory...
[2019-07-16 20:08:48] Unpausing sidekiq...
[2019-07-16 20:08:48] Marking restore as finished...

Is something wrong with your S3 config?

Do you see more output running discourse restore BACKUP_FILENAME from the command line?

I will check this next and report back. Thank you.

Below is the output after running discourse restore BACKUP_FILENAME from the command line. Any feedback is appreciated, thanks!

Disabling outgoing emails for non-staff users...

Clearing emoji cache...

Disabling readonly mode...

Clear theme cache

Extracting uploads...

Migrating uploads to S3...

Checking if default already migrated...

524 of 9474 uploads are not migrated to S3. S3 migration failed for db 'default'.

321 posts are not remapped to new S3 upload URL. S3 migration failed for db 'default'.

Looking for missing uploads on: default

Fixing missing uploads: 

..........................................................................................................

116 post uploads are missing.

116 uploads are missing.

106 of 116 are old scheme uploads.

98 of 83342 posts are affected.

rake posts:missing_uploads identified 98 issues. S3 migration failed for db 'default'.

No posts require rebaking

Migrating uploads to S3 for 'default'...

Some uploads were not migrated to the new scheme. Please run these commands in the rails console

SiteSetting.migrate_to_new_scheme = true

Jobs::MigrateUploadScheme.new.execute(nil)

Restore process was cancelled!

Trying to rollback...

Rolling back...

Cleaning stuff up...

Removing tmp '/var/www/discourse/tmp/restores/default/2019-07-22-172918' directory...

Unpausing sidekiq...

Marking restore as finished...

Notifying 'system' of the end of the restore...

Finished!

[FAILED]

Restore done.

That’s a known problem. I’ll fix it tomorrow.

Following up on this front to see if the fix has been implemented? Thanks!

No, it’s not fixed yet. But, as a workaround, you could temporarily disable the enable_s3_uploads site setting before creating the backup.

Following up on the long term fix for this challenge. Thanks!

Was this ever resolved? I’m getting the same issue when trying to migrate to a new server. I’m going to try the workaround.

This just bit me for a relatively large migration.

I’m pretty sure I’ve just been hit by this too, I believe this should be marked as a bug.
(if it’s different feel free to move to separate new topic)

Kinda important that restoring backups works.


Note that the reason for failure is not obvious when using the admin UI and clicking Restore next to the backup name, you get:

...
Migrating uploads to S3...
Restore process was cancelled!
...

Completing the restore via the command line gets you more detail:

discourse enter app
discourse restore example-net-2020-01-02-033557-v20191219112000.tar.gz
...
Reconnecting to the database...
Reloading site settings...
Disabling outgoing emails for non-staff users...
Clearing emoji cache...
Disabling readonly mode...
Clear theme cache
Extracting uploads...
Remapping uploads...
Remapping '//forum-example-net.s3.dualstack.eu-west-2.amazonaws.com/' to '/uploads/default/'
optimized_images=3
uploads=1
Migrating uploads to S3...
Checking if default already migrated...
6 of 12 uploads are not migrated to S3. S3 migration failed for db 'default'.
1 posts are not remapped to new S3 upload URL. S3 migration failed for db 'default'.
Looking for missing uploads on: default

0 post uploads are missing.
  Please provide the following environment variables
    - DISCOURSE_S3_BUCKET
    - DISCOURSE_S3_REGION
    and either
    - DISCOURSE_S3_ACCESS_KEY_ID
    - DISCOURSE_S3_SECRET_ACCESS_KEY
    or
    - DISCOURSE_S3_USE_IAM_PROFILE
Restore process was cancelled!
Trying to rollback...
Rolling back...
Cleaning stuff up...
Dropping function from the discourse_functions schema
Removing tmp '/var/www/discourse/tmp/restores/default/2020-01-06-222212' directory...
Unpausing sidekiq...
Marking restore as finished...
Notifying 'system' of the end of the restore...
Finished!
[FAILED]
Restore done.

I added a little debug code to uploads.rake just before Please provide the following environment variables to dump the environment variables:

puts "ENV: " + ENV.inspect

ENV contained no DISCOURSE_S3_* variables set.

Is there a meaningful reason this isn’t pulling this data from the settings?

I think that the notion is that if you have uploads on S3 then you’ll do a database-only backup and then it won’t fail because it won’t have uploads.

Totally, but that doesn’t help when the backup you have is one that includes the uploads.

To be clear - not critically important for me right now, I can comment out the offending lines and complete the restore but others can’t.

Agreed. And getting all uploads moved to S3 is fairly complicated chore and requires a S3 CDN .

No need to convert this into a bug. It’s on my plate and I’ve already spent a huge amount of time refactoring the restore process, adding lots of tests and making it more reliable. I’ll make a couple more tweaks to make restoring to S3 less likely to break and to output more information in the admin UI.

AFAIK, o backup/restauração foi refeito, mas acabei de descobrir que isso ainda é um problema.
Uma tentativa no beta11 de restaurar um backup com enable s3 uploads ativado ainda falha com:

[2020-02-18 09:51:38] Restaurando uploads, isso pode levar algum tempo...
[2020-02-18 09:51:38] EXCEÇÃO: Por favor, forneça as seguintes variáveis de ambiente:
  - DISCOURSE_S3_BUCKET
  - DISCOURSE_S3_REGION
  e também uma das seguintes:
  - DISCOURSE_S3_ACCESS_KEY_ID
  - DISCOURSE_S3_SECRET_ACCESS_KEY
  ou
  - DISCOURSE_S3_USE_IAM_PROFILE

[2020-02-18 09:51:38] /var/www/discourse/lib/file_store/to_s3_migration.rb:34:in `s3_options_from_env'

Então, os uploads para o S3 estão habilitados no banco de dados, mas não os backups para o S3?

Correto, isso trata da migração de uploads.

As credenciais de acesso ao S3 estão presentes no banco de dados restaurado, portanto não é necessário exigir que elas também estejam em uma variável de ambiente.

Fornecer as variáveis de ambiente também resulta em falha:

Restaurando uploads, isso pode levar algum tempo...
Verificando se db8015 já foi migrado...
200 de 206 uploads não foram migrados para o S3. Migração para o S3 falhou para o banco de dados 'db8015'.
5 posts não foram remapeados para a nova URL de upload do S3. Migração para o S3 falhou para o banco de dados 'db8015'.
Nenhum post requer rebaking.
Migrando uploads para o S3 para 'db8015'...
Enviando arquivos para o S3...
 - Listando arquivos locais
 => 21 arquivos
 - Listando arquivos no S3
. => 16 arquivos
 - Sincronizando arquivos com o S3
.....................
Atualizando as URLs no banco de dados...
Removendo imagens otimizadas antigas...
Marcando todos os posts contendo lightboxes para rebake...
5 posts foram marcados para rebake
EXCEÇÃO: 183 de 206 uploads não foram migrados para o S3. Migração para o S3 falhou para o banco de dados 'db8015'.
/var/www/discourse/lib/file_store/to_s3_migration.rb:127:in `raise_or_log'
/var/www/discourse/lib/file_store/to_s3_migration.rb:74:in `migration_successful?'
/var/www/discourse/lib/file_store/to_s3_migration.rb:350:in `migrate_to_s3'
/var/www/discourse/lib/file_store/to_s3_migration.rb:61:in `migrate'
/var/www/discourse/lib/file_store/s3_store.rb:203:in `copy_from'
/var/www/discourse/lib/backup_restore/uploads_restorer.rb:48:in `restore_uploads'
/var/www/discourse/lib/backup_restore/uploads_restorer.rb:30:in `restore'
/var/www/discourse/lib/backup_restore/restorer.rb:58:in `run'
script/discourse:143:in `restore'

Não tenho ideia do motivo dessa falha.

A maioria dos uploads já estava no S3, então as mensagens “200 de 206 uploads não foram migrados para o S3” e “183 de 206 uploads não foram migrados para o S3” estão incorretas. O número de 21 arquivos locais está correto, e há aproximadamente 200 uploads no S3 (poderiam ser 206 também). Não reconheço nenhum dos outros números (183, 16).

Também não faço ideia do motivo pelo qual o processo de restauração está tentando mover mais uploads para o S3. Ele deveria apenas pegar as imagens locais do backup e deixar os uploads no S3 intactos? Ou estou passando por cima de algo?

No final, acabei modificando manualmente a configuração enable_s3_uploads no dump do banco de dados para false, mas isso fez com que tudo fosse remapeado de volta para o local. E como havia algumas imagens ainda locais, foi necessário muito trabalho para descobrir quais precisavam ser remapeadas de volta para o S3 e quais não.

Tudo isso na versão 2.4.0 beta11.

A mistura de uploads locais com uploads armazenados no S3 não é suportada. Sim, eu sei, é possível acabar nesse estado quando alguém muda de local para S3 e não migra os uploads existentes para o S3, mas essa é outra história…

A restauração de um backup sempre inclui o remapeamento de uploads se o sistema detectar qualquer alteração que afete as URLs de upload. Isso inclui a troca entre standalone e multisite, a troca entre uploads locais e S3, bem como alterações nas configurações do S3 e da CDN. Todos os uploads são restaurados no local correto com base nas configurações, seja localmente ou no S3.

De vez em quando, encontramos backups em que o remapeamento automático e a migração para o S3 falham por vários motivos. Você pode esperar ver mais melhorias no início do ciclo de desenvolvimento 2.5.