Server scaling and load created by the plugin

(Stephen) #1

We recently included Discourse in our APM, having noticed that performance was beginning to drag a little.

Our existing site is around 15/16 months old and sits on a 2GB/2core DO box (with swap). The instance isn’t massive, we have around 15k threads, or thereabouts.

This seems to be the problem:

I was tempted to sling it onto a larger host, but after a bit more digging noticed that the wp-discourse plugin on our front end site, which is quite large and reasonably high trafficked, is by far the most consuming:

Making anything between 1k and 3k calls per minute. We don’t have debugging enabled, so Discourse shouldn’t be hammered by every post view, but judging from the native performance reports in Discourse the wordpress route is by far the most heavily trafficked, by several orders of magnitude.

Is there any way to throttle this behavior? I can throw more resources at our Discourse instance, or attempt to move it into AWS so it’s adjacent to our front end servers, but at the same time would like to understand the cause before I begin treating symptoms.

Embedding, but don't automatically create a topic until a user chooses to comment
(Sam Saffron) #2

I thought we had some caching in place

How many topics total do you have on wordpress? is that 15k?

simplest workaround would be to hack that file only to sync hourly or something.

(Stephen) #3

It’s probably around 12k, we use Discourse for our comments on every article.

I thought I should flag this because it’s going to be a real resource drain for any site with significant article volume that uses the plugin. Discourse outstrips every other plugin on our site past and present by more than 3x.

(Sam Saffron) #4

the whole strategy the plugin uses to synchronise comments needs to be optimised. There are 2 fixes that could be applied.

  1. Simple fix: make this call to topic/wordpress.json SUPER cheap if nothing changed.

  2. More complex fix: instead of calling topic/wordpress.json, once every N minuted ask for stuff that changed and update the post metadata with that info.

(Stephen) #5

Is anyone actively working on the plugin though? It seems to have gone very quiet over the last 9 months.

We’ve experienced a number of issues, some fairly critical such missing comment counts, but haven’t had much success moving them forwards.

(Sam Saffron) #6

Nope, only reactive work, we don’t have any customers urging us to improve it and lots of other stuff that customers are pushing us to build.

(Stephen) #7

I will experiment with tweaking the interval in the php. I guess the most obvious fix would be to only poll posts from the past day or week every 10 minutes, and make the rest an hour or more.

(Stephen) #8

I updated the interval to an hour one hour ago, load is unchanged.

(Jeff Atwood) #9

Ok so the setting does not work?

(Stephen) #10

The interval isn’t a setting in the plugin, I’ve altered the time value of sync_comments in discourse.php to:

$last_sync + 60 * 60 < $time

To change it from 10 minutes to 60, thus far there hasn’t been a notable drop off.

I’ve enabled debug briefly to see if it leads to a significant increase in calls, if it doesn’t then that setting may not be working, no.

If it does then my next step will be to temporarily push it to 60 * 600 for a few hours to see if it has any effect.

(Stephen) #11

Ok, so debug has led to a significant uptick in call time, but no change to the call count.

I don’t fully understand how the calls are structured, I’m guessing @benword could be of assistance there, but I had expected that the number of calls would increase too, if every page view was triggering a check against the relevant thread.

Does this suggest that each page view is checking every thread? That would explain the above, where the number of calls remain unchanged but the amount of threads polled and therefore call length increases?

(Stephen) #12

Comparing two hours of similar traffic levels I can see that forcing updates to be 10x less frequent:

  • halves the time consumed
  • halves the average call length
  • has no significant effect on call volume.

So in answer to your question @codinghorror the time interval does work in a sense, but it doesn’t behave as we would expect. If I’m asking the plugin to wait until a post is 10x older, I would expect the number of calls to drop in proportion. This is probably why it’s not exposed in the plugin UI, there’s stuff going on with the WordPress route that make this trickier to solve.

Either way, my big concern here is that the success of our front end site is going to mandate upgrades to our discourse instance long before our community growth requires it.

Hourly checks:

Ten hourly checks:

(Simon Cossar) #13

I’m curious to know if this has improved with the more recent versions of the plugin. I think that one reason for the huge number of calls to Discourse was related to syncing the Discourse comment numbers on WordPress index pages. The sync-period for comments numbers on index pages has been increased to 1 day. This should somewhat reduce the number of calls being made on busy sites, but it’s still not very efficient and it also results in the comment numbers on WordPress being out of sync with the comment numbers on Discourse.

One thing that could quite easily be done to increase the efficiency with which comments and comment numbers are synced would be to add an option to use a Discourse webhook for syncing comments. This way comments would only be synced when a change happens on the Discourse site. This ties in quite well with the WordPress REST API.

The UI for it could be fairly straightforward. (this screeenshot is from the shortcodes plugin.)

There are also some efficiency issues related to displaying avatars. I’m not sure what can be done about that, other than adding an option to not display avatars for the comments displayed on WordPress.

(Jeff Atwood) #14

This would be a nice change, I support this.

(Stephen) #15

Not significantly, no. Even with the 10x increase in interval above it only halved the load.

For sites with high post volumes (such as news) such as ours the Discourse plugin became the number one resource drain. I’ve worked on a couple of other projects since posting this ~15 months ago and we had to use local comments rather than Discourse for this reason. Comment counts being out of sync really harm overall engagement, it seems users flock towards topics which already have some form of lively debate. I suggested a while back that the polling interval varied with the age of the post - it’s only critical in the first 24-48hrs.

The current approach is unsustainable for large sites with tens of thousands of posts.

Still hoping we will see a webhook approach implemented, for the time being I’m very reluctant to suggest Discourse in these scenarios.

(Simon Cossar) #16

Are you using a recent version of the plugin?

(Stephen) #17

Yes, all up to date. Haven’t looked at load figures in a couple of weeks, when I audited this stuff in April it was still top of the list.

(Simon Cossar) #18

Thanks, it won’t have changed much since then. The webhook approach is very doable. Right now it’s making a lot of requests to fetch stale information.

(Simon Cossar) #19

Something I’ve wondered about for a while in terms of the plugin’s efficiency is the way it puts together the comments template. For each comment that’s being displayed for a post, it calls 10 php str_replace functions. For each participant in a topic it calls 6 str_replace functions. To assemble the comments it calls another 5 str_replace functions.

To see how this way of putting the comments together compares to a more standard approach, I created a basic php template with HTML markup for the comments, and echo calls to add the Discourse content. I set a timer at the start and the end of the template, and then did the same thing with the str_replace template that the plugin is currently using.

Running this in my local development environment, for a post displaying 9 comments with 3 participants that wasn’t making any calls to Discourse to sync the comments, the current str_replace template was averaging around 0.047s and the template that was using echo calls was averaging around 0.013s.

Is this something worth looking at? Is there a better way to measure this?

(Jeff Atwood) #20

Doubtful with those numbers, I would not bother.