Updating / Rebuilding Discourse to latest version irreversibly breaks website

Hey, we’ve been stuck at a very nasty bug as of late that broke our entire website with no seemingly feasible way out of it. We’ve tried everything from rebuilding to removing/adding plugins to no avail. We have a working backup from a couple months ago, but it seems that any attempt to rebuild this with a newer version just straight up breaks the site.

The main message that gets displayed reads:

Oops
The software powering this discussion forum encountered an unexpected problem. We apologize for the inconvenience.

Detailed information about the error was logged, and an automatic notification generated. We'll take a look at it.

No further action is necessary. However, if the error condition persists, you can provide additional detail, including steps to reproduce the error, by posting a discussion topic in the site's feedback category.

This is a generic Discourse error message that yields no actual results when troubleshooting. Checking the logs of our docker, I was able to find this:

Completed 500 Internal Server Error in 152ms (ActiveRecord: 0.0ms | Allocations: 17980)
ActionView::Template::Error (undefined method `[]' for nil:NilClass)
lib/svg_sprite/svg_sprite.rb:502:in `block in custom_icons'
lib/svg_sprite/svg_sprite.rb:500:in `each'
lib/svg_sprite/svg_sprite.rb:500:in `custom_icons'
lib/svg_sprite/svg_sprite.rb:275:in `block in all_icons'
lib/distributed_cache.rb:25:in `defer_get_set'
lib/svg_sprite/svg_sprite.rb:517:in `get_set_cache'
lib/svg_sprite/svg_sprite.rb:268:in `all_icons'
lib/svg_sprite/svg_sprite.rb:337:in `bundle'
lib/svg_sprite/svg_sprite.rb:285:in `block in version'
lib/distributed_cache.rb:25:in `defer_get_set'
lib/svg_sprite/svg_sprite.rb:517:in `get_set_cache'
lib/svg_sprite/svg_sprite.rb:284:in `version'
lib/svg_sprite/svg_sprite.rb:290:in `path'
app/helpers/application_helper.rb:586:in `client_side_setup_data'
app/views/layouts/application.html.erb:61
lib/topic_list_responder.rb:13:in `block (2 levels) in respond_with_list'
lib/topic_list_responder.rb:9:in `respond_with_list'
app/controllers/list_controller.rb:103:in `block (2 levels) in <class:ListController>'
app/controllers/application_controller.rb:387:in `block in with_resolved_locale'
app/controllers/application_controller.rb:387:in `with_resolved_locale'
lib/middleware/omniauth_bypass_middleware.rb:71:in `call'
lib/content_security_policy/middleware.rb:12:in `call'
lib/middleware/anonymous_cache.rb:356:in `call'
config/initializers/100-quiet_logger.rb:23:in `call'
config/initializers/100-silence_logger.rb:31:in `call'
lib/middleware/enforce_hostname.rb:23:in `call'
lib/middleware/request_tracker.rb:198:in `call'

So, naturally, I thought I could fix the issue by simply implementing a try/catch at line 501 where it errors. This is the method in question:

  def self.custom_icons(theme_id)
    # Automatically register icons in sprites added via themes or plugins
    icons = []
    custom_svg_sprites(theme_id).each do |item|
      begin
        svg_file = Nokogiri::XML(item[:sprite])
        svg_file.css('symbol').each do |sym|
          icons << sym.attributes['id'].value if sym.attributes['id'].present?
        end
      rescue Exception:
        puts Exception
      end
    end
    icons
  end

While this does restore functionality to the site, it unfortunately does so by not loading any icons or images on the entire website, so ultimately this doesn’t fix anything.

We’re using the following plugins:

          - git clone https://github.com/discourse/docker_manager.git
          - git clone https://github.com/discourse/discourse-bbcode.git
          - git clone https://github.com/discourse/discourse-follow.git
          - git clone https://github.com/discourse/discourse-user-notes.git

From what I recall, we’ve used some plugins such as discourse-follow before they were officially supported, so perhaps sidegrading them broke something in the process.

The original version before the update was 2.8.0.beta2, we are now on 2.9.0.beta1

I’ve tried just about everything I can think of, I wasn’t sure if I should post here because I feel as if this is too generic of a bug for anyone to be able to help, but if someone has any ideas as to what’s causing this, I’d appreciate the assistance.

1 Like

The could be a Postgres issue where the postgres is trying to upgrade to version 13 from 10.

Check the logs to see if you see anything related to postgres

2 Likes

Are you referring to the logs contained in /shared/standalone/log/rails? Because they mention nothing relating to postgres. As for the logs contained in /shared/standalone/log/var-log/postgres, at around the time of the error, 19:51:08, there’s a request does not seem out of the ordinary. No errors or anything that seems to fail. And around the time of the update, I don’t think that gets logged here.

Do you have something in particular I can look for?

make sure your apt-get update | apt-get upgrade is up to date
and then run ./launcher rebuild app

check for errors or look in /var/discourse/shared/standalone/log/rails

1 Like

I’ve updated both apt-get update and apt-get upgrade, did a full rebuild, and unfortunately am met with the very same errors. From what I can see, nothing changed.

1 Like

Some questions:

  1. Does the rebuild complete without errors?
  2. Have you tried loading the site via safe mode?
  3. Can you share the URL of the site so we can take a look?
1 Like
  1. Yes, the rebuild completes without errors.
  2. Yes, and much like when I fix the error mentioned above, it boots the website in a sort of “no icon / image” mode where nothing loads.
  3. Yes and no, the site contains some explicit content and I’m going to have to ask around if people are comfortable with me sharing it.

In the meantime, I’ve found out something else, when browsing the website in safe mode, I noticed that the DiscoTOC throws an error. Upon disabling it, the website works normally again without the need for safe mode, however the icons/images still do not load properly.

All images are throwing 404s in the console, most icons like the favicon are throwing 500.
Naturally, they were all working fine before.

To add to this, could it be that the rebuild has somehow cleared the database of all image references?

To keep everyone updated, I’ve since tried out several suggestions from different threads, but just to name two:

None of the suggestions helped,rake posts:rebake, rake posts:missing_uploads and rake uploads:recover_from_tombstone all did not work, unfortunately. I’ve also checked out the sidekiq dashboard, but that’s not restoring anything either.

To anyone having any idea how to restore the now missing images, I’d be grateful to hear them.

After a lot of trial and error, I was able to successfully restore the images after all, so I’ll consider this issue resolved.

1 Like

Was there something that you wish you’d know at the start?

3 Likes

Perhaps having a more thorough look at the website in safe mode from the start would’ve helped. I also have to admit, I was only involved in the restoration of the site after some work had already been done on it, it turned out that the images where in a separate backed up folder and only had to be moved over.

4 Likes

Hard for anyone to guess that! Glad you figured it out.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.