How much is Discourse affected by a faster CPU?

I recently moved my own Discourse from Digital Ocean to a colocated mini-PC I purchased from Ali Express. This is a good apples to apples of increasing CPU performance with the exact same Discourse instance.

Digital Ocean Droplet specs

  • E5-2630L v2 (Ivy Bridge 2.4 - 2.8, 2 core)
  • 2 GB RAM
  • 40 GB disk (unknown SSD)

cost $20/month

AliExpress box specs

cost $700 plus colocation hosting fees ($29/month)

CPU benchmarks

sysbench --test=cpu --cpu-max-prime=20000 run
sysbench --test=cpu --cpu-max-prime=40000 --num-threads=8 run

35.6s vs 21.3s
41.9s vs 15.7s

(1.7x faster single core, 2.7x faster multiple core)

Disk benchmarks

ioping -RD -w 10 .

12.8k iops, 50.0 MiB/sec vs 13.6k iops, 53.2 MiB/sec

(roughly the same disk performance)

Discourse benchmarks

Before and after I did the move, I compared performance in three places:

  1. Topic back button (same topic)
  2. Topic refresh (same topic)
  3. Homepage refresh (latest)

I did this 10+ times in the same browser. Here are my results. Digital Ocean droplet on the left, Ali Express mini-pc on the right.

1.6x faster, 1.8x faster, 2x faster

As to the question in the topic title, Discourse scales more or less linearly with the CPU speed you throw at it. If you want a faster Discourse, have enough memory, sure, but you want the fastest possible single threaded performance.

44 Likes

Where exactly does “have enough memory” start? When you don’t have any swapping at all going on?

1 Like

The existing guidelines of 1GB RAM for “small” Discourse and 2GB ram for “medium” Discourse are quite reasonable and valid.

Swap is always recommended for Discourse. You can check free -m to determine how much swap is in use. In typical use you should see low swap use, like so:

              total        used        free      shared  buff/cache   available
Mem:           2001        1343         107         218         550         253
Swap:          1023         175         848

If you see high swap use – if used was 900 instead of 175 that indicates a problem, that the instance may need more physical RAM.

7 Likes

Are there any rules of thumb for what defines a small vs. medium community? Would it be based on the number of monthly page views?

2 Likes

My results using the same tests on a $8.99/month VPS from quickpacket:

CPU benchmark:

single thread: 28.8s
multi thread:  18.9s

Disk benchmark:

10.6 k requests completed in 9.82 s, 41.3 MiB read, 1.08 k iops, 4.21 MiB/s
generated 10.6 k requests in 10.0 s, 41.3 MiB, 1.06 k iops, 4.13 MiB/s
1 Like

Have you got details on how someone else might get this deal? I remember something about this a while back. The $15/month requires that you first drop a few hundred bucks on the box, right?

Still, I’m interested. Where to get box? How to arrange colocation?

1 Like

Yes, swap activity is the best baseline measure, as in “if you see consistent swap activity, you definitely don’t have enough memory”. You can get a bit more performance by ensuring you also have enough memory to store the entire working set of disk pages in memory (I’ve talked about this before, specifically the paragraph that starts, “As far as disk cache goes”), but definitely if you’re swapping, your performance is going to be viciously destroyed.

BTW, for anyone comparing the costs and thinking, “OMFG that’s not worth it”, consider that the colocated server is significantly more powerful than the droplet (8x RAM, etc). The closest equivalent droplet is $160/month, so if you were replacing that droplet size with a colo box you’d make your money back in about five months… sure, time is money, etc etc, but stable hardware doesn’t take that much time to keep an eye on.

8 Likes

you want the fastest possible single threaded performance

To compensate for the crappiness of RoR on multithreading

1 Like

The multithreading is fine, that covers concurrent requests. But no individual request will go through any faster.

4 Likes

Some command line rebuild numbers:

cd /var/discourse
git pull
./launcher rebuild app

I did this in two consoles, triggered by pressing enter at the exact same time on the last command. I stopped the clock when the command line prompt returned from the rebuild.

Digital Ocean droplet

(different droplet, but I verified same E5-2630L CPU as in first post, 2GB $20/month droplet)

37:45 → 46:51 = 9 minutes, 6 seconds (546 seconds)

Ali Express box

37:45 → 41:38 = 3 minutes, 53 seconds (233 seconds)


So a rebuild is 2.3x faster.

I know @sam has been working on rebuild speed improvements for the last 2 days so I thought he might be interested as well.

15 Likes

That sounds pretty darn good.

FWIW,

time ./launcher rebuild app

will save you from using your watch. :wink:

18 Likes

Do you mind me asking where you colocate a server for $15/month? Last time I colocated in 2008 or so I was paying $350/month for a 1U server.

2 Likes

Not really a server per se, a mini pc more analogous to Mac mini hosting services. But the perf as you can see is excellent.

2 Likes

Nice comparison, thanks!

It is worth mentioning that Digital Ocean’s VPSs are on the slow side, when you compare head to head with alternatives that have a similar price tag. There are reputable hosting providers that offer roughly twice as fast single core performance.

1 Like

The “name brands” such as Linode and AWS and Azure and Digital Ocean are mostly pretty close in CPU perf. The weirder new providers can be faster – or a whole lot slower.

Which one do you have in mind here? “A lot faster” I can see, also “more cores for less money”, but twice as fast?

I think that is one of the sad thing, it means with the current Ruby Runtime, or Discourse, that is about as fast as we can get. Since We have more or less reached peak single thread performance.

The 4.2 ghz Skylake and 4.5 ghz Kaby Lake are significantly faster than this, as they have a bit more cache and of course a clock rate higher than 3.5 Ghz. The Ali express box is 15w tdp compared to 90w tdp of those.

Speed shift aka hardware CPU clock control is better on Kaby Lake as well so it goes faster… faster, when it needs to. It will hit higher turbo more often and quicker.

1 Like

On the upside, with all the speed improvements which are apparently coming down the pipe in Ruby 3, we’ll automatically get some tidy speed ups. Single-threaded performance almost always has a strong impact on web application performance, whatever the language or framework, because it’s a problem that strongly resists parallelisation. I can’t think of any languages or frameworks that do much, if any, of the page generation in parallel.

2 Likes

Here are a couple of threads with VPSBench results and threads about specific hosting companies.

UpCloud.com is the fastest I have personally tried and they are nearly twice as fast (single core perf, rebuild times). LeaseWeb also performs nicely in terms of CPU speed.

1 Like