From my experience this is not directly solved by any current approach or has a linear solution. In fact, separating them in different machines is not an instant solution for that issue.
We also experience heavy drops and āthe site is extremely busy so you are seeing it as someone that isnāt logged inā messages when a big event happens (such as a game, like @ljpp said), and that drags down the whole site, not only the people inside that topic.
So, I tried two different things, a separated setup and a ābig machineā, both have this type of issues. My instances are monitored with Prometheus and the logs are visible on Grafana, etc, so I have a very granular control of hardware/container performance, and I can confirm that it really doesnāt matter what you do, the issue happens anyway.
If you put a big machine behind it you may delay it a little bit, but you will get the errors and sessions drops and the machine will be with almost no usage, be it disk, cpu or ram. And this happens both with the ādefault installā and ātwo containerā installs.
With different machines the issue is the same, regardless of the machines being the same type of machine or one being āCPU-Optimizedā and the other āDisk-Optimizedā, etc. To this you also have to add the extra layer of possible failure of the connection between two different machines, that will inevitably lag, although this amount of lag can change in regards of how you setup that connection and āhow far awayā are the two machines from each other, but you will get the same behavior.
As a note, this type of behavior happens as well with things like the Babel plugin, however, seems to me that the Babel Plugin can handle a lot more āsimultaneousā writes, even though the āchatsā are actually hidden topics, but the difference is in how they are presented and ārefreshedā/āpulledā. This difference in behavior has brought me to the conclusion of this being some applicational correlation that derives from a FrontEnd kind of issue ācrashingā the app (being that FrontEnd is not my area of expertise, contrary to BackEnd) and the operations at hand by posting and people staying on a topic waiting for it to āself updateā with tens of messages on a single minute.
To that you also have to add the human factor, when people feel the site is āsluggishā or that a topic āisnāt updating as fast as it should beā, they will F5 the hell out of it, adding more load. But good luck āeducatingā on that regard