Real-time updating of topics freezes under high activity

Please clarify - are you experiencing freezes (topic not updated for new posts) or are you getting extreme load error messages?

There are tweaks in this thread that provide some improvements for the freezing, but they also increase the system load, so you are more likely to get extreme load scenarios.

3 Likes

We experience sometimes topic freezing in those situations I’ve reported, but when that happens the system also shows warnings of extreme load. So I can’t tell you what is what.

We don’t mind extreme load as long as it doesn’t kick people from topics or interrupt update for new posts. We would actually prefer in that case to have it slowly loading stuff (the wheel could spin for 15 seconds for each user to read/post and we would prefer that to freezing or user being kicked out).

4 Likes

I have to agree. The extreme load UX is confusing for the end user.

  • How many concurrent users you have?
  • What kind of hardware?
  • Link to your forum stats?

@sam

As we are now on the CDCK SaaS platform, I can only observe this from the UX point of view.

We have had some good heat in the games during the last couple of weeks. The “freezes” have pretty much disappeared with the platform change, but there is this fluxuation in the way that the topic gets updated, which may still be confusing to some. But the audience has mostly (90%) stopped complaining and is focusing on the games, which is a good sign.

There is however a scenario which I can reproduce with fairly high (again 90%) confidence. The platform has occasional issues in resuming the session, when the game topic is in a background tab (Android) or under a locked screen. When I get back to the busy topic, usually due to an interesting event in the game, the topic view is not sometimes updated. I can see user avatars blinking at the bottom of the topic, but no posts are appearing. One needs to refresh the browser to fully recover.

The repro pattern is not the easiest, as you need:

  • A busy topic
  • Some good action in the game → more heat to the topic
  • Keep the topic under locked screen or on the background browser tab.
3 Likes

We suffer from that too.

Another thing is, when jumping to the first unread post, it can repeat this behaviour a few times (going to the same “unread post” a few times, although the first unread post position should have changed in each occasion).

To exemplify:

  1. I jump to the first unread post
  2. scroll and read the 100 unread posts
  3. then go to another topic or homepage…
  4. after a minute or so, there are like 30 new unread posts, but when I click on the icon, I’m thrown once gain to the position on 1 (meaning 130 posts backwards and not just the new unread 30).

But, once more, it only happens in very very busy topics during some minutes at the greatest peak of refresh and posting by every user all in the same topic at the same time. Kind of annoying but not a dealbreaker so far.

1 Like

I would consider that a success.

Can you provide a repro here on meta? Probably not since it requires a large number of active users idling in the same topic at the same time?

My current thinking is we should build a live chat feature and instantiate it just-in-time, when you have…

  • lots of users

  • in the same topic

  • at the same time

  • then, and only then, instantiate a live chat box overlay and strongly push users into using that instead of replies, maybe even disable the ability to reply to the topic with

    :loudspeaker: Hey, it looks like what you really wanted was a chatroom… here it is, have fun! :speech_balloon:

15 Likes

Yeah, I know what you mean, but it’s so limited to those occasions that I guess it’s not worth the effort. We usually have matches like that once to twice a week and it’s mostly at the 5 minute period as soon as the match finishes. But I’ve actually thought about it several times (that it would be nice to have a temporary chatroom function or switching thing to those 90-minute period of a football match). :laughing:

Still, I’ll try to repro one of these days by recording the screen for a while.

1 Like

Our instance has been showing some 429’s, as the playoff games have started. @staff should be able to see some in the last 3,5 hours of our logs, and more expected when the deciding goal is scored (game is going to second OT as I type this).

I anycase, if you are still logging and tracing this there are not many opportunities left, as the finals and the following off-season is getting close.

2 Likes

I just wanted to add my name to the thread here so I can follow this. We are a new gymnastics forum. We experienced the above along with “freezing” last night during US Olympic Trials. Here is the thread…

We had 4 unicorns last night.

I resized the server to 4 Intel vCPUs & 8 GB memory at Digital Ocean and did…

unicorn_workers: 8
db_shared_buffers: “2GB”

We are expecting much higher traffic during the Olympics. What else can we do to optimize the server for “chat like” traffic during the competition?

3 Likes

If you have hundreds of users in a single topic using Discourse as a chat and it’s a limited time event, I’d suggest bumping the server temporarily a bit more.

The larger Premium AMD droplet in Digital Ocean for the 16 days of the Olympics cost $54.85, and should be more than enough for a community of your size.

7 Likes

I do not have these lines in my app.yml. Do I just add them?

Yes. Add them in the env section.

1 Like

If this is still in the staff’s radar, our blast off is tonight at 18:30 (UTC+3) and again tomorrow at the same time.

There is much anticipation after two COVID ruined seasons, so I am expecting heavy traffic spikes at tappara.co

1 Like

@ljpp
what is your current situation? did Redis 6 help you?

We are now on CDCK SaaS, which is why gave the staff a heads up. We are a kind of a test bench for this matter.

3 Likes

The 1st game weekend went alright from technical point of view (less so if we look at how the team played).

One phenomena is still reproducable. If you multitask with your phone, switch to other apps during the game chat and then come back, you sometimes end up in a situation where the topic is not updated. Status avatars do blink, indicating there is a a constant flow of posts.

A browser refresh or a visit via the index page is needed to see the missed posts.

All this using an up to date Android device w. Chrome.

2 Likes

Thanks for helping us figure these incidents out – it’s a process.

2 Likes

We had a derby game last evening, so it was an great opportunity to monitor this:

  • One senior user logged the whole game with Chrome Dev Tools network inspector, and no error statuses were reported on the client :+1:
  • I ran a quick poll during the game, got a few dozen responses which were 94% postivive (no issues). This is a very good result and I am not sure if a clean 100% is even possible with this survey method (random users, random HW, random network conditions). So I consider anything above 90% a success. :+1:

To sum it up, on the CDCK SaaS infra this is now a marginal issue and the features generally work well even under a heavy/spiky load. Give me a nod if you wish to deep dive into this, and I’ll provide a potential date and time. After all you never know what happens in a game, and how the audience reacts to the events.

COVID-restrictions have eased up, so currently the arenas can be sold to full capacity. This has an impact on our audience as the HC fans will be on-site at the games, rather that furiously chatting online.

2 Likes

We merged a performance patch that will affect sites that are seeing lots of activity, specifically when many users are in a single topic and posts receive likes. During those events one of the possible outcomes was this exact “freezing” behavior.

In a single site we host, this change resulted in 12M less requests for the posts endpoint being made per event day, which competed with the live updates of newer replies.

4 Likes

Noted. We have a back to back derby on Fri-Sat, so there will be a buzz. Although 13.000 people will be at the grand opening of our new arena, which may have an impact on our load, positive or negative.

Will monitor, short & long term :+1:

4 Likes