No timeout for backups (stuck since September)

I would like to report the following issues:

  1. There is seemingly no timeouts for backups. So an automatic backup was stuck since September
  2. Once canceled, all our users received email digests and password resets aggregated since September.

Here’s the log:

[2019-09-26 03:35:25] pg_dump: creating INDEX "public.idx_tag_users_ix1"
[2019-09-26 03:35:25] pg_dump: creating INDEX "public.idx_tag_users_ix2"
[2019-09-26 03:35:25] pg_dump: creating INDEX "public.idx_topic_id_public_type_deleted_at"
[2019-09-26 03:35:25] pg_dump: creating INDEX "public.idx_topics_front_page"
[2019-09-26 03:35:25] pg_dump: creating INDEX "public.idx_topics_user_id_deleted_at"
[2019-09-26 03:35:25] pg_dump: creating INDEX "public.idx_unique_actions"
[2019-09-26 03:35:25] pg_dump: creating INDEX "public.idx_unique_flags"
[2019-09-26 03:35:25] Finalizing backup...
[2019-09-26 03:35:25] Creating archive: our-community-2019-09-26-033520-v20171214040346.tar.gz
[2019-09-26 03:35:25] Making sure archive does not already exist...
[2019-09-26 03:35:25] pg_dump: creating INDEX "public.idx_unique_post_uploads"
[2020-01-24 16:02:39] Backup process was cancelled!
[2020-01-24 16:02:39] Notifying 'system' of the end of the backup...
4 Likes

While this isn’t great, do you have any idea why your backup took so long? Is your database huge?

1 Like

No it’s small, .dump files from Postgres were 112 MB.

How was your server installed? Where is it deployed?

1 Like

This is very weird. There’s a stray log message from “pg_dump” after the “Finalizing backup…” message.

Also, I’m not sure why the system didn’t send emails anymore. Sidekiq should already be unpaused when “Finalizing backup…” appears in the logs.

https://github.com/discourse/discourse/blob/3b7f5db5ba9d4db23593fe116499f9583fed271f/lib/backup_restore/backuper.rb#L48-L51

What version of Discourse are you using? I assume you didn’t run any upgrades since September, otherwise I’m quite sure that would have stopped the backup. :thinking:

8 Likes

Thanks for reading and responding! I noticed the issue while inspecting an old server, which has now been replaced. I still have access to the old instance.

  • It happened on 1.8
  • The only emails sent were new_version emails. No password resets, no notifications.
  • Logs are often written to and flushed in different threads/processes, so I’m not sure if that can explain the stray pg_dump message… but it’s certainly very suspicious.

I think the actual issue would perhaps also be that there’s no sanity checks when sending out a pile of emails in a queue. What if a password reminder is 3 months old? Should we assume that system time has changed or indeed that emails have failed to send for a considerable duration?

Our new server is now running the latest version of Discourse, but it would of course be a shame if unattended backups once again get stuck. https://community.learningequality.org/

1 Like

Well, that’s extremely old. It wouldn’t surprise me if that version had problems with pausing/unpausing Sidekiq that have been fixed since then.

There are no unattended upgrades in Discourse.

4 Likes

There are no unattended upgrades in Discourse.

That was a typo

I’m quite sure there is no bug in the current version that could be causing a stuck backup and a stalled email queue.

I’m closing this topic. Please flag to reopen if you are experiencing it on the latest version of Discourse and can provide steps to reproduce.

4 Likes