Memory ballooning when adding thousands of categories

(Michael John Kirk) #1

Continuing the discussion from Patterns for managing a lot of “private forums”:

We’re adding about 5k read restricted categories and a corresponding 5k groups.

I was hesitant about adding this as a bug, as it’s possible we’re using categories outside of their intended purpose. But I don’t know of any reason not to have thousands of categories.

We’re creating the categories in the background via sidekiq while the app continues to serve web traffic. We’re seeing the app processes balloon upon receiving some traffic. They’ll boot up around 200MB and then grow to 800MB over the course of 5 mins/100 requests.

If we stop sidekiq, memory stabilizes, so it seems pretty clearly related to this process. Also, it seems to be compounding with the number of categories. E.g. Shortly after starting the import, we’d see them grow to 400MB after 10 mins/200 requests.

I’m really not sure what’s causing it. A shot in the dark was related to the categories message bus broadcast, which happens after creating each category and includes the complete list of categories. The serialized message is currently about 400k.

Pursuing that angle now - but I’m really not confident that it’s the cause. Or what to do about it if it is.

Happy Friday night!

(Michael John Kirk) #2

It does indeed seem like the memory issue is with re-broadcasting all the categories to every user whenever there’s a change to any category.

I don’t know enough about MessageBus yet, but maybe we can publish categories per user so they fetch only the ones they can view.

Another bottleneck which wasn’t written to handle thousands of groups is the CategorySerializer (Group.all - [a, b, c])

(Jeff Atwood) #3

I’m going to go out on a limb here and say that thousands of categories is a bad idea, as in “it hurts when I do this crazy painful thing.”

I cannot even think of any forum I have ever seen with thousands of top level categories. Please share any real live forums you can point me to if there are examples of this in the wild.

(Michael John Kirk) #4

I agree that thousands of categories hurts.

What I’m coming up against is a problem with Discourse’s tight coupling of topics of conversation (categories) with access control.

It’s not that people want to talk about thousands of different categories, it’s that small personal (private!) communities are really important in my forum. Categories are just the only way you can currently do this.

I don’t think having 1000’s of different sub-communities is ridiculous (think private subreddits, game clans).

In the immediate, we’ve got a crappy work-around in place which only publishes public categories (of which we have 8) on the message bus, but I’d like to think on something more scaleable for our community.

(Sam Saffron) #5

That is definitely a lot way beyond what I would have expected we would hit at the moment. I am grappling with the exact use case you have, can you explain a little bit more about your sits, your use case etc.

As to the general scalability fixes, I think it is plenty sane to cut down on the broadcasting, though I would be quite worried about a system that has the category list in constant flux. It seems like you are trying to use a feature for something it is not.

(Michael John Kirk) #6

I definitely agree with you - but it’s the best way I could find to achieve what I need. So I’m making due in the immediate, with my eye on a better solution for myself and anyone interested in making it easier to build intimate communities within Discourse.

My specific use case is based around a health and fitness game I work on. People follow certain exercise regimes and pretty strict dietary rules for 8 weeks and score themselves daily based on their ability to follow these rules.

A lot of folks in the game know each other in real life. Sometimes one person will hear about the game and will invite a few of their friends to join. Other times larger groups of people sign up together (e.g. an entire Gym signs up it’s 100 members). Our communities range in size anywhere from 5 (the group of friends who signed up together) to 150 (an entire Gym). In total we have about 15k users and 5k communities. The median community size is about 20 members.

The game requires commitment and ongoing effort. People create and join communities to support themselves and make the game more fun. We have some social tools built around these communities within our application. E.g. Rank your score within your community. Leave a comment for your community members.

In the past we had a homebuilt forum solution. It was pretty spartan, and we replaced it this fall with Discourse rather than build it out. Discourse has been received pretty well - but many folks have been requesting a private forum for their community. This was a feature we gave up when we moved away from our homebuilt forum.

Our public forum is good for topics like

  1. Is it normal to have a headache after cutting sugar our of my diet?
  2. Awesome eggplant recipe

But it’s not good at topics like

  1. I’m having a rough day
  2. Who wants to come over for dinner tonight?

Since we launched the private forums a few hours ago, there have been 7 posts in the private community categories and as many in the public (It’s the middle of the night where most of our users live).

edit: I think this is more usefully classified as a feature than a bug.

(Sam Saffron) #7

Very interesting use case. I think I understand now.

So, in an ideal unicorn world we would define 1000s of groups, then have a special flag on a topic (perhaps a archetype) that is similar to a private message except that it shows up in the list of topics.

I feel that you still may want to categorize your stuff in a sane form, so the number of categories would be small and reusable for these private conversation.

Just brainstorming here, perhaps if there was a checkbox when creating topics that a user ticked:

[ ] make this topic private to my friends

Clearly this kind of feature is not something we would ship with core, but we can add the hooks you need to work it in a plugin.

Technically this is fairly complicated, but the category solution really feels like the wrong solution cause it is clearly against its design, why does a mod need to have a drop down with 5000 categories and a completely bust categories page.

(Jeff Atwood) #8

Private in what sense? Private like

Joe on Eagle street is having a house party this Thursday, please drop by!

I think any truly private sub-area should have a strong sense of locality, to the point that every topic is (almost) tied to a calendar and a map.

If an entire gym signs up to use your app, they have more in common with using your app than they do with anything else. That implies the forum should be mostly public, tied around interest in this shared app and game, and rarely if ever private. If they just want to chat about the gym, they should use the Bill’s Gym facebook page or other official outlet, shouldn’t they?

(Michael John Kirk) #9

Yes, This could be one use of privacy in discourse. But it’s only part of why I think this is important, and why I’m discussing it here at length. I suspect that you Discourse guys value social dynamics, and I think community dynamics is an important subset of that, so I’m trying to be deliberate about the value proposition of small group communication within a greater forum.

That’s possible. I’m not currently interested in “Facebook events” as a feature in Discourse. Maybe I would use it, but it doesn’t capture the intent of what I’m getting at.

I think you’re missing the picture here - I’ll try to explain better.

Our users don’t want the private groups to talk “generally about the gym”, though they could. The private-per-gym-category is not a “category” in the ontological sense. It’s an abuse of the term based on my technical limitations. In fact few conversations since launching the private forums have had much to do with the gym at all. The private forum per gym just functions as an arbitrary smaller group which fosters intimate communication (aka a community).

This is dunbar’s number in action (see also communication in small groups). It’s kind of an interesting reflection of the whole point of a gym anyway. The naive utilitarian might hypothesize that a gym exists as a way to share the capital investment in expensive workout equipment.


For the most part you could do a totally adequate workout routine at home, without special equipment. The real purpose of a gym is a leveraging social / communal experience to improve your health. You have an excuse to interact with people, people who you know to keep you accountable.

Looking at the most recent posts, 25 are public and 19 are private. People seem to be using both. I’m going to continue trying to figure out what causes people to use one vs. the other.

It is interesting that a lot of the private posts are re-hashes of similar public posts (e.g. “Nutrition tips for ACME gym” vs. “Nutrition tips”). The nutrition tips are not specific to ACME gym - they’re not building a fine grained ontology. It’s just spawning a general discussion on Nutrition within a social group whose magnitude you can conceptualize, rather than the faceless harsh/inane forum monster.

(Bill Ayakatubby) #10

So, if I understand correctly, you’re basically re-creating the multi-recipient PM feature so that the messages appear on the topic list? If that’s case, it sounds like you’re actually asking for two distinct features:

  1. A user preference that lists PMs on the Latest page. (I don’t agree with this one. PMs are private and should be listed apart from topics which are generally publicly available.)

  2. User-defined friend/PM groups. (“Steve generally sends PMs to the same 5 people. He wants to create a group that contains them to make the process of composing a PM a bit faster.”)

(Michael John Kirk) #11

Thanks for the thoughts @BhaelOchon. A “private message” by some abstract definition might be the right idea - but the current interface for private messages would be a step away from the user experience I’m looking for.

Discourse does allow you to private message a group, but groups aren’t easily discoverable. Also, I’m not convinced that inserting private topics into the front page is a bad thing. That’s how private categories work now.

Maybe this is a good question - how are other people using private categories? Apart from this private-category-per-group craziness, we also have one private group for internal communication within the company. I imagine this second case makes up the majority use case for private categories.

One (I think) nice thing we’ve done with the private-categories-as-subforums is to display your private categories in the header navigation. Discourse makes it easy to link to a category in the topnav, but making that dynamic required a (not-too-horrible™) hack to core. I’m going to propose a hook for this (per-user-navigation PR hopefully forthcoming!).

You can easily see that your group “Wingnutz” has an unread post. As a forum user, this is more relevant to you than the fact that there is a new message in the public forum. I think psychologically it falls somewhere between a personal private message and a public message in terms of urgency, commensurate with the size of the messages audience.

The other nice thing about (mis?)using categories in this way is a discoverable way to message your community.

“+ Create Wingnutz Topic” is reasonably clear.

(Jeff Atwood) #12

I agree with @BhaelOchon.

I feel like you are in permanent fork territory and you should really think about the future of what you are doing. If you are OK with being forked forever, go for it.

(Sam Saffron) #13

Personally I think PMs are a much better fit for your problem, the main issue here is about exposing the functionality better and possibly yanking into the topic list and/or the top level nav.

Would prefer polishing / extending the PM functionality using hooks etc than adding a gazzilion categories. It feels like a way better fit.

(Michael John Kirk) #14

I appreciate your concern @codinghorror. Believe me, I’m not in the least bit excited about maintaining a bastardized Discourse. I am hopeful that my use case can be incorporated into core from a feature perspective (fostering small group communication), if not necessarily from a technical one (using categories), and am willing to do some legwork to make it palatable.

Thanks for constructively suggesting a possible way forward @sam.

(Dan Haecker) #15

Hi @mkirk, where are you at with this, six months on?

I am considering discourse but have a similar use case. Specifically, we have tens of thousands of users who are part of one (or more) shared cause(s) but in our app they are organized by “community” (political boundaries) such as city, county, state legislative district, state, and country. These segments are all critically important to the shared cause and our existing app is built around leadership levels, messaging, and activities respecting all of these levels.

Looking at using Discourse, we’d need to be able to add read restricted categories for each user to coincide with their communities. So rather than a new user seeing tens of thousands of places to have discussions - as there are tens of thousands of users in tens of thousands of communities - they would see just their six communities, for example, 1) National: USA, 2) State: Texas, 3) House District: Texas 01, 4) Senate District: Texas 13, 5) County: Tarrant, 6) City: Fort Worth. Structured this way, it would be clean to a user and allow discussions where they need to be had with only those they need to have them with.

We’ve got a homegrown solution for this which is just posts with community tags and community “filters” that correspond to the user’s political boundaries. It’s not nearly as robust as Discourse for sure.

@mkirk is this somewhat similar to your use case? anyone else have ideas as to how best to implement this in Discourse? Thanks!

(Adam Capriola) #16

What if when you send a private message to multiple people, Discourse asked if you would like to save that as a “group”? (And then I guess Discourse would kind of need to keep track of “contacts” for each user, to reference the group again and edit it if need be.)

Or do something similar to how Gmail will ask if you want to add other participants to a message (based on who you’ve included before)?

Something along those lines that might be a good solution.

(Rikki Tooley) #17

I think a solution here would be to make a profile page for groups… then all private messages you receive that are addressed to that group are listed there. Same expectation on privacy but a lot easier to manage.

Then @mkirk could just link to the user’s “app group” in the navigation bar.

There would be some issues around what a user gets to see if they join the group after a message thread is present, but those issues are already there… maybe add a setting to message threads to “allow anyone in this thread to see all messages” rather than just the ones they receive while they are present.

edit: just realised that groups already have a page, my suggestion would just boil down to having a ‘messages’ tab like user profile pages.

(Kane York) #18

The current situation is that after someone is added to a group private message, they can see all the posts in the entire PM.

Private messages to a group, or other means to handle private support requests
(Rikki Tooley) #19

Oh ok, so if you add someone to a group they can see previous message threads to a group? If so, then all that’s needed is the message listing on the group page.

(Kane York) #20

No… the current implementation of PMing a group is that the group members are expanded at creation time.

If you manually add them to the PM, though, they will be able to see all of the messages.