Forum gone haywire - possibly after upgrade

Long post…

A couple of days ago I migrated my forum to a new server. The migration went well and users were able to log in and use it as normal. However, after 2-3 hours logged in users found that they were unable to use it. I didn’t see the problem initially, but within 10-15 minutes I found my account was affected too. What I was seeing was the forum page being repeatedly output down the page but with the content missing…so header…bit of text…header again…bit of text again…header again…etc

I assumed something had gone wrong with the migration so as there were very few posts since the migration and I still had the old server available, I just fired up the forum on the old server and pointed DNS back to that. That would give me time to figure out what went wrong and try the migration again.

Then today, my ‘old’ forum prompted me to say there were critical updates needed. I applied them and things seemed to be OK. However, an hour or two later and the old forum is now exhibiting the same weird display problem that my migrated forum exhibited.

So now I’m thinking that there’s something wrong after the upgrade. The migrated server would have picked up the latest code when built, so it exhibited the problem straight away. The old forum only got the updates today and started to go ‘haywire’ soon after.

So far I’ve unsuccessfully tried:

  1. ./launcher rebuild app
  2. Comment out all the plugins in app.yml and ./launcher rebuild app
  3. Running in safe mode with all options turned on
  4. Clearing browser cache
  5. Tried different browsers (Chrome, Edge, Forefox and Opera)

The problem seems to be progressive for logged in users. I logged in with a test account today and that didn’t exhibit the problem, but when I tried that on the migrated server the test account eventually showed the same problems. Unfortunately I’m unable to try anything more at the moment because my admin account is showing the problem so it’s unusable.

I have a backup from before the upgrade, but I suspect that’s not going to help. If I rebuild the forum it’s going to pick the latest updates so if I restore the content into that it’s probably going to go haywire again within a few hours.

Server setup:

Debian 12 running Docker Swarm v26.1.4. 120GB of disk space available. 64GB RAM with only about 20GB currently in use. Connections in to the server are over Cloudflare Tunnels. The old server has less available disk space and memory, but neither are maxed out.

I’m trying to think what else I can do now to try to get things back online. I’m open to suggestions!

Auto-minify perhaps?

6 Likes

Thanks for the suggestion.

It was turned on. I’m not sure why though because I don’t normally turn it on. I’ve turned it off (JS, CSS and HTML) but I’m still seeing the repeating pages scrolling down the screen.

PS. I note that Cloudflare auto-minify is being deprecated in early August.

Have you followed the instructions to clear the Cloudflare cache as well?

1 Like

Yes. Cloudflare cache (and browser cache) has been cleared after auto-minify was turned off.

1 Like

In that case, I’d recommend checking the browser developer console to see if there are any errors when the issue occurs.

You may also like to try Safe mode again now that you’ve fixed the cloudflare issue.

1 Like

Safe mode is working now. No errors in JS console other than a timezone deprecation warning. I’m going to switch back to default theme and disable all the theme components to see if it works OK outside safe mode and then re-enable them one at a time to see if I can narrow it down.

I’ll be a little while before I can confirm how things are going because Dad’s Taxi has to go on a journey!

1 Like

It does appear that auto-minify was the culprit. I don’t know how/when it got enabled. I didn’t do it knowingly, having been bitten by similar problems with other systems in the past. My suspicion is that Cloudflare recently (I think) added a button to activate a set of basic settings to improve the way that browsers interacted with web sites. I reviewed the suggested changes and enabled those that looked sensible and safe. I’ve gone back to look at the options again and don’t see any reference to auto-minify, but perhaps it got enabled as part of the ‘basic settings’?

The Cloudflare cache also explains why this problem suddenly appeared hours after I’d applied Discourse updates. I have Cloudflare browser TTL caching set to 4 hours. I didn’t purge the Cloudflare cache after updating, so for a few hours afterwards people were still getting the old ‘good’ files that they’d been getting for weeks since the last Discourse update. Then after 4 hours, Cloudflare started updating it’s own cache, saw new Discourse files and minified them before add them to its cache. Then as browsers requested updates for their own caches they got the corrupted files. As browsers updated their own caches at different times each user saw things go awry at different times.

Lessons learned:

  1. Purge the Cloudflare cache after applying a Discourse update. Seems obvious now, but I’d not thought about it before!
  2. Don’t enable auto-minify. I must have done it but I don’t know how/when. I already knew it wasn’t a good idea having broken a Wordpress site with it some time ago, but this has re-inforced the message.

Many thanks to @JammyDodger and @david for helping me to solve this :smiley:

1 Like