Agree 100% with this. Sometimes they add value to the community, but not to the extent that it necessitates code changes IMO.
Could you expand a bit on this performance penalty? This is for closed and archived topics? Closing topics at 10k is fine, deleting them would be something else entirely.
My community loves Discourse and has been forum-based for over 15 years. They won’t use a chat room and they would react very negatively to having old topics deleted. If there is going to be a serious and growing performance issue from these topics simply existing then I am going to need to either render them out as static pages or migrate to another platform.
I realize our community doesn’t fit well with how you envision Discourse being used, but that’s the community I am responsible for and there are some changes I can’t force them to make. We’ve actually never been stronger as a community than we are now using Discourse. I would hate to have to move to a different platform when everybody is so happy with our current setup.
Megatopics need to be mostly hidden – even if they are closed and/or archived, the more users that hit a megatopic the worse your server will perform. Ideally the megatopics should be deleted so you only have one active at any given time? That’s my recommendation. The more megatopics you have the more risk you incur.
If you can throw a bunch of money at the problem you could massively overprovision your server and support more megatopics – but it’ll still impact median performance for all topics.
Even when a topic is closed it generates data, traffic and load.
Remember that every users read position is recorded against every topic. Every post can be liked, interaction with megatopics is still possible after they are closed, not to mention the amount of noise it can throw into your search results.
That still doesn’t explain why megatopics in particular cause issues. Why is one 10,000 post topic worse than ten 1,000 post topics? With the latter there’s the same number of total posts to be potentially liked or searched for, but ten times as many read positions and topics to be searched for. Based on your explanation alone, I would conclude that a greater number of smaller topics is worse. So there must be more to it.
Because you’re loading just one topic at a time. You can pick up 10 1000 one at a time without trouble, but picking up 10,000 at once is much harder.
I’m curious about the specifics though. Only a certain number of posts are loaded by default, before you scroll, so it clearly isn’t because of the number of visible posts. Is it due to the timeline? Due to the topic summary? Just in general due to various linear or greater-than-linear algorithms based around the total number of posts in the topic?
I don’t really care about megatopics, depending on what you consider “mega”. In the section I frequent most in the community I use, the topic with the most posts is about 3.6k posts, but the 10th highest is only 600. The 25th highest is only 300 posts. I’m more curious from a technical perspective at this point.
Here’s a data explorer query that I wrote to attempt an answer to your question. You can try it with different topics and offsets.
-- [params] -- int :topic_id = 107216 -- int :offset = 10000 SELECT "posts"."id" FROM "posts" WHERE ("posts"."deleted_at" IS NULL) AND "posts"."topic_id" = :topic_id AND "posts"."post_type" IN (1,2,3) ORDER BY "posts"."sort_order" ASC LIMIT 20 OFFSET :offset
Here’s a normal sized topic and a stupid-big topic:
Example time: 3.4ms -> Index Scan using index_posts_on_topic_id_and_sort_order on posts (cost=0.43..1925.22 rows=477 width=8) example time: 353.9ms, 739.6 ms (time varies depending on database caching) -> Index Scan using index_posts_on_topic_id_and_sort_order on posts (cost=0.43..605155.88 rows=161255 width=8)
I think that I have seen times longer than 750ms.
Here are the Median and 99th percentile times. Median time is surprisingly good, but the nature of a median is that you can’t tell whether at the 60% percentile it’s much much worse than the median case.
Here’s another server (it has a stupid number of categories, which gives it a performance hit too, so it’s not apples to apples without megatopics), but you can see the median performance is half and the worst case is much better too:
Me too, just out of curiosity.
I understand that navigating in a megatopic could have bad performances following this explanation:
However I don’t understand how one or several megatopics can impact the navigation on other pages in the forum, even the topics list for example.
By adding a lot of load to the server. Every single megatopic load is equivalent to 100x the perf penalty! Seee the post directly above yours.
The forum I’m currently importing to Discourse has a lot of mega topics:
(and posts haven’t been imported all yet!)
Since mega topics don’t work well with Discourse, what should I do in practice (I intend to suggest a chat in the future for the community, maybe Discord, but I want to know what to do wth the current topics) ?
Split and close them?
If I split them, how many messages per topic? Is the default value of 10000 enough or maybe do you advise decreasing it?
Splitting into 10k chunks should be good enough.
Additionally most of those look ok. The real bleeding starts at 10k and most of what is pictured in the screenshot is far less than 10k.
What is the latest on mega topic performance impact? Our COVID pandemic follow-up topic is approaching 10k posts and we are analyzing possible slow downs recently.
I don’t know what server stats “should” be, but I can share ours for a community with lots of megatopics. We currently have 15 closed topics at 10k posts and well over 50 open with 2k+ posts. Most of the forum activity happens in a relatively small number of very active topics at any given time.
We currently run on a DigitalOcean server with 4 virtual CPUs, 8 GB memory and 160 GB disk, which costs $40/month. Maybe once every few months some users will very briefly get the “extreme load” message. This only happens when there is some live event happening and lots of users are all posting at once - like averaging multiple posts per minute in a single topic over the course of an hour or two.
At all other times performance is smooth sailing with no issues. We’re currently on track to needing more disk space long before we’ll need to upgrade anything else.
That’s a fine number, 2k posts is no big deal, a few dozen 10k+ topics is probably fine (especially if they are closed), the danger zone is when you have lots of megatopics which are active. I’d define “lots” as more than a few dozen.
While mega topics are generally not an issue here at Meta, they seem to be a natural way organizing discussions for many communitities. In other words, the discussion has not natural break point.
Inderes is a Finnish company that provides financial analysis for the stock market. They recently launched their Discourse community and it has been a massive success, considering the region and niche.
The discussion is primarily organized per stock or investment vehicle. Like, $AAPL or $TSLA as examples. Now in just two years or so, many of these topics are approaching the 10k marker. A fantastic proof of concept for Discourse (a buzzing community, built form scratch), but also emphasizes the issue of mega topics.
If nothing else, you can break it up by year. This is covered in the first post. Megatopics can work for a while, but if there are too many of them, they are going to eventually make your site fall over.
(Also, search becomes a nightmare when you have tens of thousands of posts in the same “topic”, etc – basically you’ve built a chat room.)
It’d be great if, in addition to automatically creating a new topic once 10k posts have been reached, site admins could also set up a time interval at which megatopics are automatically archived and continued in a new topic, e.g. at the end of each month, quarter, half-year, or year.
We need to crosslink this topic
So, the earlier requests…
were satisfied by
I’m sorry I didn’t close the loop here, but now I have!