Backups fail to upload to S3 several times- works eventually

Backups to S3 have been working for several years. Starting a month ago, I often get several notifications that the backup failed. Then it retries, and about an hour later fails again, so I get a second notification, and often up to six notifications in one day before it finally succeeds.

The log from the notification says it’s failing to upload the zipped tar file to S3. I can’t find any errors in my S3 account.

Log The first part of the log looks normal, then:

[2025-05-20 07:11:38] Finalizing backup…
[2025-05-20 07:11:38] Creating archive: 506-investor-group-2025-05-20-070428-v20250513161753.tar.gz
[2025-05-20 07:11:38] Making sure archive does not already exist…
[2025-05-20 07:11:38] Creating empty archive…
[2025-05-20 07:11:38] Archiving data dump…
[2025-05-20 07:12:17] Archiving uploads…
[2025-05-20 07:15:48] Removing tmp ‘/var/www/discourse/tmp/backups/default/2025-05-20-070428’ directory…
[2025-05-20 07:15:48] Gzipping archive, this may take a while…
[2025-05-20 07:32:51] Uploading archive…
[2025-05-20 07:34:28] EXCEPTION: Sidekiq::Shutdown
[2025-05-20 07:34:28] /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/aws-sdk-s3-1.182.0/lib/aws-sdk-s3/multipart_file_uploader.rb:199:in value' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/aws-sdk-s3-1.182.0/lib/aws-sdk-s3/multipart_file_uploader.rb:199:in map’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/aws-sdk-s3-1.182.0/lib/aws-sdk-s3/multipart_file_uploader.rb:199:in upload_in_threads' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/aws-sdk-s3-1.182.0/lib/aws-sdk-s3/multipart_file_uploader.rb:82:in upload_parts’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/aws-sdk-s3-1.182.0/lib/aws-sdk-s3/multipart_file_uploader.rb:59:in upload' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/aws-sdk-s3-1.182.0/lib/aws-sdk-s3/file_uploader.rb:42:in block in upload’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/aws-sdk-core-3.219.0/lib/aws-sdk-core/plugins/user_agent.rb:69:in metric' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/aws-sdk-s3-1.182.0/lib/aws-sdk-s3/file_uploader.rb:40:in upload’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/aws-sdk-s3-1.182.0/lib/aws-sdk-s3/customizations/object.rb:477:in block in upload_file' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/aws-sdk-core-3.219.0/lib/aws-sdk-core/plugins/user_agent.rb:69:in metric’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/aws-sdk-s3-1.182.0/lib/aws-sdk-s3/customizations/object.rb:476:in upload_file' /var/www/discourse/lib/backup_restore/s3_backup_store.rb:48:in upload_file’
/var/www/discourse/lib/backup_restore/backuper.rb:351:in upload_archive' /var/www/discourse/lib/backup_restore/backuper.rb:41:in run’
/var/www/discourse/lib/backup_restore.rb:13:in backup!' /var/www/discourse/app/jobs/regular/create_backup.rb:10:in execute’
/var/www/discourse/app/jobs/base.rb:316:in block (2 levels) in perform' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/rails_multisite-6.1.0/lib/rails_multisite/connection_management/null_instance.rb:49:in with_connection’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/rails_multisite-6.1.0/lib/rails_multisite/connection_management.rb:21:in with_connection' /var/www/discourse/app/jobs/base.rb:303:in block in perform’
/var/www/discourse/app/jobs/base.rb:299:in each' /var/www/discourse/app/jobs/base.rb:299:in perform’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:220:in execute_job' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:185:in block (4 levels) in process’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:180:in traverse' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:183:in block in traverse’
/var/www/discourse/lib/sidekiq/discourse_event.rb:6:in call' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:182:in traverse’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:183:in block in traverse' /var/www/discourse/lib/sidekiq/pausable.rb:131:in call’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:182:in traverse' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:183:in block in traverse’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/job/interrupt_handler.rb:9:in call' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:182:in traverse’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:183:in block in traverse' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/metrics/tracking.rb:26:in track’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/metrics/tracking.rb:134:in call' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:182:in traverse’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/middleware/chain.rb:173:in invoke' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:184:in block (3 levels) in process’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:145:in block (6 levels) in dispatch' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/job_retry.rb:118:in local’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:144:in block (5 levels) in dispatch' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/config.rb:39:in block in class:Config
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:139:in block (4 levels) in dispatch' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:281:in stats’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:134:in block (3 levels) in dispatch' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/job_logger.rb:15:in call’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:133:in block (2 levels) in dispatch' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/job_retry.rb:85:in global’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:132:in block in dispatch' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/job_logger.rb:40:in prepare’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:131:in dispatch' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:183:in block (2 levels) in process’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:182:in handle_interrupt' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:182:in block in process’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:181:in handle_interrupt' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:181:in process’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:86:in process_one' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/processor.rb:76:in run’
/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/component.rb:10:in watchdog' /var/www/discourse/vendor/bundle/ruby/3.3.0/gems/sidekiq-7.3.9/lib/sidekiq/component.rb:19:in block in safe_thread’
[2025-05-20 07:34:28] Deleting old backups…
[2025-05-20 07:34:28] Cleaning stuff up…
[2025-05-20 07:34:28] Removing archive from local storage…
[2025-05-20 07:34:28] Removing ‘.tar’ leftovers…
[2025-05-20 07:34:28] Marking backup as finished…
[2025-05-20 07:34:28] Notifying ‘system’ of the end of the backup…

I have plenty of disk space (6X as much free space as the backup size).

CPU usage from today- the backup should take under an hour but when you have to try 9 times it takes four hours:

Could you be running out of memory? That exception makes me think that Sidekiq (which runs the automatic backups) is either killed by the OS or crashes for some other reason.

Did you try rebuilding the Docker container (./launcher rebuild app) to see if that fixes it?

I don’t think so. The server is overkill for our community with 16GB RAM. top shows 11GB free memory, and that hardly changes if I manually trigger a backup.

I’ll check /var/log/syslog tomorrow for anything about memory, kill, or OOM. (Can’t check today because I had some unrelated verbose logging and the backup event scrolls out of the syslog buffer.)

Yes, I updated to 3.5.0.beta5-dev a few days ago and the issue persists.