Mini-Update: Tried to tweak around the Unicorn Runners Settings but I don’t really know if there is a core-per-worker relation or is like a shared CPU kind of thing like CPU % x Second, is there a best practice for this?
Also, have been reading about PostgreSQL to know how to get around that size but, even though I’m backed up, don’t know how the app will react in the long run if I just go around snipping.
Quick Edit: Removed the Swap File and I’m now analyzing how to proceed with the 10K+ topics, which I use the opportunity to ask how much of a danger they are to the bigger picture (I assume there is some given the recommendation, but if there is something else I would like to learn it if possible).
I strongly advise you to not override the Discourse defaults, which auto-close topics with more than 10k replies. There’s a reason we have this setting on by default.
Beyond that, this problem of massive post_timings table is on our radar and we are currently brainstorming ways to deal with it, perhaps even in the current 2.5 release cc @sam@eviltrout
We have around 80 megatopics pumping the whole time (128k posts on the largest one).
Some 502 problems sometimes (maybe related to this, I can’t tell), everything goes smoothly since we’ve adjusted the db_shared_buffers parameter to something above the database size.
I’ve got a site with a handful of >100K post mega topics. I told them that it was a mistake, but they wanted them anyway. They are complaining about performance issues. I’m hoping that I’ll soon be able to change the value back to the 10K default.
It did cause me to learn some stuff about database optimization, so that’s a bonus.
I know what you mean, but I also have to admit that it can be awkward to split a topic in years, specially because we have a lot of conversation around people.
Imagine a topic about Trump in general or one about the United States in general. In our forum we have a topic about one of our club’s ex-presidents, one for the president nowadays, one for each football team player. You get the thing. It’s easy to split other topics by seasons, but not these ones. Sure, it’s not impossible, but quite impracticable.
Imagine a category about Trump in general or the United States in general.
No one is going to read >10000 posts in a topic straight through. They just aren’t. At some point, you can just take a breath and start anew. I suspect that these topics are more like chat than discussion anyway, and that few people read what happened yesterday, much less last week, last month, or last year.
Here is a data explorer query that mimics the query done getting a page of posts:
-- [params]
-- int :topic_id = 107216
-- int :offset = 10000
SELECT "posts"."id" FROM "posts"
WHERE ("posts"."deleted_at" IS NULL)
AND "posts"."topic_id" = :topic_id
AND "posts"."post_type" IN (1,2,3) ORDER BY "posts"."sort_order" ASC LIMIT 20
OFFSET :offset
I believe you guys, it’s just that our users are used like this and it’ll take a bit of effort to get there… but it’s doable.
Related to this, I was just reading @codinghorror post Natural breakpoints or "chapters" for long topics? and thought that this kind of table of contents but for topics only would be a nice solution to be able to gather and have visible a sequence of topics while browsing and replying only in a single one.
We do have concrete near term plans to tame the post_timings table but megatopics will be a source of considerable ongoing pain for many, many years.
and remember that is after@tgxworld did great work about a year ago to reduce the overall amount of work we do on megatopics, essentially switching to “low power mode” on these topics so they don’t hurt as much.
But don’t kid yourself: megatopics still hit HARD.
Following your recommendations and while I try to figure out the other questions (acknowledging that you guys are working on the post_timings table), I’m proceeding to restore the defaults on topic size (while apologizing for my stupidity, why not).
With this in mind, I have some questions that I hope you could answer:
Once the setting is restored, those topics will be closed. New ones will be created. However, is the existence of those older big topics dangerous to the site as a whole? Meaning, should I “split them” X times into smaller topics or as long as they are not active is ok?
I saw the debate that @Paracelsus raised with the commodity for the users. With this in mind, I’m going to ask if it would be possible to have a setting to “Auto-Continue Topics that are closed due to max-topic size”. That is, when a mega-topic is automatically closed, would it be possible for it to open another with the same author, same title (with a number maybe at the end of the topic?) and the only content being a link to the previous one? (While adding the link to the new one on the closed one).
I know that it is kind of convoluted and maybe not wanted for everyone (that’s why I’m rising it as an optional setting) but for high-traffic sites with relevant topics it could be useful, I guess. wdyt?
This is like asking “should we encourage people to smoke cigarettes by auto-buying them a new pack when they run out?”
Not unless they get a lot of traffic, which they should not if they are closed, I would think? See how it goes but stopping the bleeding of the active mega topics is step zero, so you are on a good path here
Just a follow-up… We’re trying now to deal with these mega-topics by splitting them in chunks of 10k post each. However, the system doesn’t seem to be willing to cope with our strategy :
I get a “502 Bad Gateway” when selecting at least 8000 posts at a time and moving them to a new topic (haven’t tried with smaller amounts of posts). Is there a way to increase the capacity or another/better way to accomplish this? @codinghorror@pfaffman
but . . . . I think that you might have to put this in a loop to create the new post numbers. You’ll want to test your solution on a staging site. If you screw something up, you’ll be in a world of hurt.
If that’s not enough for you to figure out how to do it, then you’ll probably need to post in #marketplace. But I bet no one is going to read the old posts anyway and that you should just close them and leave them huge, perhaps even deleting or unlisting them unless you think they have some SEO value.
Since reported and agreed on closing everything above the mark (and re-establishing the limits) performance seems to be better, a lot better. We still get some “Due to extreme load, this is temporarily being shown to everyone as a logged out user would see it” but considerably lower.
With that said:
Thanks for the hard work looking into how to reduce that monster table, really appreciate it.
Are there any other “Performance Tips” or even “Setups” that we may be missing? Even if they are “advanced”, any help is appreciated.
Again, a huge thanks to everyone for helping and giving feedback.