UI randomly fails during a short period

gmoirod · July 8, 2022, 10:12am

Hi there !

I installed my own ‘stable’ discourse with external Postgres and Redis.
Just to precise about architecture : in Azure, 1 LoadBalancer, 1 VM hosting Discourse container with NFS share for backups and pictures, 1 Postgres, 1 Redis.

I customed it with own logo, plugin discourse-calendar and discourse-news (and other things too but irrelevant here).

Randomly, for a period like 30 min, some UI fails :

Main logo reverts to default one
Favicon reverts to default one
Page “upcoming-events” generated by discourse-calendar disappears (no link and 404 response when going to by url)
Custom logo given to discourse-news (with an url) disappears

Then it comes back.

I have nothing in logs about that.
My browser console shows nothing.
One thing i can tell is that during this period, i can see an augmentation of Redis cache misses.

Does anybody can help me to troubleshoot that ? I even do not know where i can find relevant log…

JammyDodger · July 22, 2022, 8:37am

I’m afraid this is too far from the standard install for me to know anything about.

Have you managed to find what you needed?

gmoirod · July 22, 2022, 8:41am

No i did not.
And my problem occasionally continues to appear
I do not know where to search for a clue…

pfaffman · July 22, 2022, 9:04am

The easiest thing would be to switch to a stabs/standard install. It would be cheaper too. I can’t imagine what it could be.

gmoirod · July 22, 2022, 10:55am

@pfaffman i did use the standard installation from my point of view.
Except that i used provided feature to use external db and redis.
But i use the app.yaml and docker build and run described in Standard install.

I did that to be able to provide high-availability and different scaling strategies : with a full standalone deployment, you can only scale vertically (scale up your node) and not highly-available.

pfaffman · July 22, 2022, 11:45am

I see. That does sound like it should work. My best guess is that you’re scaling down to zero virtual machines and what you see is the cached site in your browser. Or some other way the kids balancer isn’t connecting to the host. Or discourse isn’t getting the real ip and it’s rate limiting (but usually you would see an error).

But your high availability features are providing low availability. Unless you’re going from having tens of users most of the time to thousands some others (as for a sports site) then scaling is likely to cause more problems than it solves.

So the first thing I’d do is get rid of the load balancer and see if that fixes it. Then decide what to do from there. If it happens once a month it won’t be easy to diagnose.

Topic		Replies	Views
Discourse web interface becomes unresponsive a few minutes after starting Installation	35	6703	January 9, 2018
Redis Problems? (Forum broken after upgrade) Installation	34	3077	December 24, 2021
Error connecting to Redis Installation	17	6911	March 21, 2023
Admin dashboard won't load after upgrade to v2.1.0.beta3 +20 Installation	32	1869	August 7, 2018
After recoverying a Backup file on new "On-premise" VM - Site no longer works Installation	27	1055	September 14, 2021

UI randomly fails during a short period

Related topics