Problems with self-hosted upgrade to 3.x: cannot roll back

This update is failing for me, and right now, all 3 of my Discourse forums are down.

The console window on the Upgrade screen is not populating with anything, but the rest of the UI suggests the upgrade is still in progress (but this has already taken a lot longer than any update I’ve ever done).

Anyone else experiencing this? Any ideas for how I can get the upgrade to complete successfully?

1 Like

@pearsonified because this includes an update to the Rails component, this is one of those releases that needs an update from the command line.

2 Likes

Ahh, is that the issue with the ruby 3.0 error and web-push-3.0.0?

The admin upgrade web UI should tell the user that rather than letting them click “upgrade all” and end up with a completely broken forum. Save a lot of users heart palpitations.

1 Like

No idea; I personally don’t ever use the GUI for updates, I personally only ever do them from the command line. I didn’t see any errors that looked like that. But then also I would not have.

Most of the time when you see a generic error message, if the specific scenario could was foreseen sufficiently for a specific error message, it would have instead been avoided.

I guess it’s a good thing that the “help my GUI update failed” problem is so rare that most people don’t realize that the first thing to do is to just go do git pull; ./launcher rebuild app for the default deployment. It is almost always the answer here.

But it is the general thing to do if the GUI update fails.

3 Likes

I’ve been using discourse for many years now. Like 7? Anyway, never had a web upgrade fail catastrophically before. It has failed of course, but the forum always came back to life because it rolled back and started up the old container.

It isn’t a generic error, the ruby thing is pretty clear.

web-push-3.0.0 requires ruby version >= 3.0, which is incompatible with the
current version, 2.7.6
Docker Manager: FAILED TO UPGRADE

I would be complaining pretty loudly about this if I paid for Discourse. Not because there’s a huge bug that killed our forum, bugs happen, but because it wasn’t fixed in at least the past 3 days. As is, just a bit of excitement at the end of the day.

1 Like

From the perspective of the docker manager, that code as of when the version already running was written did not know that in the future this problem would happen. I hear that you are frustrated, but what you are asking for might not make sense technically.

If you paid for Discourse, they would be managing this process and you’d never see it. :relaxed:

3 Likes

Even if technically impossible to fix, you could send users emails to warn them to upgrade via CLI. Make some noise about it. Breaking forums with no rollback is bad.

I thought Discourse also sold support for on-prem usage? Either way, I can’t complain too loudly as I’m not a paying customer, but geez, 3 days is not great.

1 Like

I’m happy for you that this is the first time in seven years that you’ve had to fall back to to this, but:

I’m going to call that out as a mischaracterization. CDCK do actually recommend backups before updates as a best practice. I’ve seen many times here on meta the note that if the GUI update fails, fall back to rebuilding from the command line, as part of normal expectations for administering your own forum. They did not break forums with no rollback so your implication is false.

I’m not going to sit here arguing for hours. I just think you aren’t being completely reasonable here.

Furthermore, upgrading a production server without running said upgrade through some form of staging environment is just asking for pain.

It doesn’t have to be a full scale copy, a $5 VPS will suffice for most. If you want to avoid downtime and the need to turn to your backups it’s the cheapest insurance you can get.

1 Like

I’ve just tested this out on a test DO instance that hasn’t been updated in yonks:

and as reported it errored out:

Fetching gem metadata from https://rubygems.org/.........
Fetching https://github.com/rails/sprockets
web-push-3.0.0 requires ruby version >= 3.0, which is incompatible with the
current version, 2.7.6
Docker Manager: FAILED TO UPGRADE
#<RuntimeError: RuntimeError>
/var/www/discourse/plugins/docker_manager/lib/docker_manager/upgrader.rb:209:in `run'
/var/www/discourse/plugins/docker_manager/lib/docker_manager/upgrader.rb:93:in `upgrade'
/var/www/discourse/plugins/docker_manager/scripts/docker_manager_upgrade.rb:19:in `block in <main>'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-7.0.3.1/lib/active_support/fork_tracker.rb:20:in `block in fork'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-7.0.3.1/lib/active_support/fork_tracker.rb:18:in `fork'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activesupport-7.0.3.1/lib/active_support/fork_tracker.rb:18:in `fork'
/var/www/discourse/plugins/docker_manager/scripts/docker_manager_upgrade.rb:6:in `<main>'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/railties-7.0.3.1/lib/rails/commands/runner/runner_command.rb:43:in `load'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/railties-7.0.3.1/lib/rails/commands/runner/runner_command.rb:43:in `perform'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.2.1/lib/thor/command.rb:27:in `run'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.2.1/lib/thor/invocation.rb:127:in `invoke_command'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/thor-1.2.1/lib/thor.rb:392:in `dispatch'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/railties-7.0.3.1/lib/rails/command/base.rb:87:in `perform'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/railties-7.0.3.1/lib/rails/command.rb:48:in `invoke'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/railties-7.0.3.1/lib/rails/commands.rb:18:in `<main>'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.13.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/bootsnap-1.13.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
bin/rails:18:in `<main>'
Spinning up 3 Unicorn worker(s) that were stopped initially

and the upgrade page now shows:

image

and the main site 500s. In /logs I see this error:

NoMethodError (undefined method `navigation_menu' for #<Class:0x00007fdffcb1b2f8>)
lib/wizard/builder.rb:98:in `block in build'
lib/wizard.rb:25:in `append_step'
lib/wizard/builder.rb:61:in `build'
lib/wizard.rb:110:in `requires_completion?'
lib/wizard.rb:117:in `user_requires_completion?'
app/serializers/site_serializer.rb:171:in `include_wizard_required?'
…

@pearsonified @mcdanlj Can you paste what errors you were getting?

I don’t know if we can do anything now to prevent this for future users, but we’ll look into it.

In the meanwhile, did the following steps help?

image

6 Likes

Gonna run the command-line upgrades this morning and see how that goes. Forums have been down 18 hours now :pensive:

2 Likes

If you were asking me yes, the CLI rebuild worked fine. I’m a decent SA and found the fix on the forums here but looks like other forum owners like @pearsonified were severely inconvenienced. 18 hours of downtime, oof.

People talking about staging and production environments are off their gourds. There are lots of non-corporate people hosting Discourse forums on little VMs. “You should be testing in a staging environment” is an insulting, sanctimonious response which does not in any way justify breaking their install. Nothing justifies that.

I just want to emphasize that I appreciate this excellent software is completely free. I’m grateful for that, and the occasional stumbling blocks don’t change my feelings.

2 Likes

Discourse is free, servers, domains, and our time is not.

This has nothing to do with corporate environments, it’s just common sense when there are so many variables at play. Lots of work is done to make upgrades seamless and if you would prefer to give your time for free to resolve significant blips that’s your call - but personally I prefer to know that the nonprofit and volunteer communities I host are going to be down once for a painless upgrade, as opposed to a protracted period.

The smallest VPS at DO will suffice to de-risk upgrades as long as the test is representative - if you’re a decent SA then that $5 is a the cost of a cup of coffee, and fraction of your hourly market rate.

The stg VM would have to hold the forum’s data too which would double our hosting costs. It isn’t necessary anyway, it’s a videogame forum and a bit of downtime (not 18 hours!) is OK. Worst case scenario I had a VM snapshot before the upgrade so I could have just rolled it back.

I do find the 4 days this has remained broken to be concerning. Upgrades failing without rollback should trigger a real “holy crap, this needs to be fixed NOW” moment. That should not be a controversial statement.

1 Like

I too got the web-push Ruby error. Normal forum, no custom plugins.

web-push-3.0.0 requires ruby version >= 3.0, which is incompatible with the
current version, 2.7.6

There have been times previously when it’s said that it must be upgraded via command line. Maybe that option could be enabled now for whichever version is causing this issue?

Edit: Oh, it does actually say that!

The issue is that when you do a docker_manager update and then just click the tab on the top rather than refreshing the page. So could it be changed to refresh via HTTP rather than just changing tabs with JS?

1 Like