How much is Discourse affected by a faster CPU?

codinghorror · March 5, 2017, 11:11am

I recently moved my own Discourse from Digital Ocean to a colocated mini-PC I purchased from Ali Express. This is a good apples to apples of increasing CPU performance with the exact same Discourse instance.

Digital Ocean Droplet specs

E5-2630L v2 (Ivy Bridge 2.4 - 2.8, 2 core)
2 GB RAM
40 GB disk (unknown SSD)

cost $20/month

AliExpress box specs

i7-7500u (Kaby Lake 2.7 - 3.5, 2 core / 4 thread) — $400 on Ali Express
16 GB RAM — Crucial 16GB DDR3, $100
500 GB disk — Samsung 850 Evo SSD, $170

cost $700 plus colocation hosting fees ($29/month)

CPU benchmarks

sysbench --test=cpu --cpu-max-prime=20000 run
sysbench --test=cpu --cpu-max-prime=40000 --num-threads=8 run

35.6s vs 21.3s
41.9s vs 15.7s

(1.7x faster single core, 2.7x faster multiple core)

Disk benchmarks

ioping -RD -w 10 .

12.8k iops, 50.0 MiB/sec vs 13.6k iops, 53.2 MiB/sec

(roughly the same disk performance)

Discourse benchmarks

Before and after I did the move, I compared performance in three places:

Topic back button (same topic)
Topic refresh (same topic)
Homepage refresh (latest)

I did this 10+ times in the same browser. Here are my results. Digital Ocean droplet on the left, Ali Express mini-pc on the right.

1.6x faster, 1.8x faster, 2x faster

As to the question in the topic title, Discourse scales more or less linearly with the CPU speed you throw at it. If you want a faster Discourse, have enough memory, sure, but you want the fastest possible single threaded performance.

tophee · March 5, 2017, 11:30am

Where exactly does “have enough memory” start? When you don’t have any swapping at all going on?

codinghorror · March 5, 2017, 11:36am

The existing guidelines of 1GB RAM for “small” Discourse and 2GB ram for “medium” Discourse are quite reasonable and valid.

Swap is always recommended for Discourse. You can check free -m to determine how much swap is in use. In typical use you should see low swap use, like so:

              total        used        free      shared  buff/cache   available
Mem:           2001        1343         107         218         550         253
Swap:          1023         175         848

If you see high swap use – if used was 900 instead of 175 that indicates a problem, that the instance may need more physical RAM.

Steve_Pavlina · March 5, 2017, 4:56pm

Are there any rules of thumb for what defines a small vs. medium community? Would it be based on the number of monthly page views?

Jon_Rurka · March 5, 2017, 10:30pm

My results using the same tests on a $8.99/month VPS from quickpacket:

CPU benchmark:

single thread: 28.8s
multi thread:  18.9s

Disk benchmark:

10.6 k requests completed in 9.82 s, 41.3 MiB read, 1.08 k iops, 4.21 MiB/s
generated 10.6 k requests in 10.0 s, 41.3 MiB, 1.06 k iops, 4.13 MiB/s

pfaffman · March 5, 2017, 10:31pm

Have you got details on how someone else might get this deal? I remember something about this a while back. The $15/month requires that you first drop a few hundred bucks on the box, right?

Still, I’m interested. Where to get box? How to arrange colocation?

mpalmer · March 5, 2017, 10:44pm

Yes, swap activity is the best baseline measure, as in “if you see consistent swap activity, you definitely don’t have enough memory”. You can get a bit more performance by ensuring you also have enough memory to store the entire working set of disk pages in memory (I’ve talked about this before, specifically the paragraph that starts, “As far as disk cache goes”), but definitely if you’re swapping, your performance is going to be viciously destroyed.

BTW, for anyone comparing the costs and thinking, “OMFG that’s not worth it”, consider that the colocated server is significantly more powerful than the droplet (8x RAM, etc). The closest equivalent droplet is $160/month, so if you were replacing that droplet size with a colo box you’d make your money back in about five months… sure, time is money, etc etc, but stable hardware doesn’t take that much time to keep an eye on.

sercasti · March 6, 2017, 3:07am

you want the fastest possible single threaded performance

To compensate for the crappiness of RoR on multithreading

codinghorror · March 6, 2017, 3:08am

The multithreading is fine, that covers concurrent requests. But no individual request will go through any faster.

codinghorror · March 11, 2017, 12:01am

Some command line rebuild numbers:

cd /var/discourse
git pull
./launcher rebuild app

I did this in two consoles, triggered by pressing enter at the exact same time on the last command. I stopped the clock when the command line prompt returned from the rebuild.

Digital Ocean droplet

(different droplet, but I verified same E5-2630L CPU as in first post, 2GB $20/month droplet)

37:45 → 46:51 = 9 minutes, 6 seconds (546 seconds)

Ali Express box

37:45 → 41:38 = 3 minutes, 53 seconds (233 seconds)

So a rebuild is 2.3x faster.

I know @sam has been working on rebuild speed improvements for the last 2 days so I thought he might be interested as well.

pfaffman · March 11, 2017, 12:34am

That sounds pretty darn good.

FWIW,

time ./launcher rebuild app

will save you from using your watch.

moparisthebest · March 12, 2017, 4:32pm

Do you mind me asking where you colocate a server for $15/month? Last time I colocated in 2008 or so I was paying $350/month for a 1U server.

codinghorror · March 12, 2017, 5:29pm

Not really a server per se, a mini pc more analogous to Mac mini hosting services. But the perf as you can see is excellent.

ljpp · March 12, 2017, 7:33pm

Nice comparison, thanks!

It is worth mentioning that Digital Ocean’s VPSs are on the slow side, when you compare head to head with alternatives that have a similar price tag. There are reputable hosting providers that offer roughly twice as fast single core performance.

codinghorror · March 12, 2017, 9:55pm

The “name brands” such as Linode and AWS and Azure and Digital Ocean are mostly pretty close in CPU perf. The weirder new providers can be faster – or a whole lot slower.

tophee · March 12, 2017, 10:06pm

Which one do you have in mind here? “A lot faster” I can see, also “more cores for less money”, but twice as fast?

ksec · March 13, 2017, 4:15am

I think that is one of the sad thing, it means with the current Ruby Runtime, or Discourse, that is about as fast as we can get. Since We have more or less reached peak single thread performance.

codinghorror · March 13, 2017, 4:20am

The 4.2 ghz Skylake and 4.5 ghz Kaby Lake are significantly faster than this, as they have a bit more cache and of course a clock rate higher than 3.5 Ghz. The Ali express box is 15w tdp compared to 90w tdp of those.

Speed shift aka hardware CPU clock control is better on Kaby Lake as well so it goes faster… faster, when it needs to. It will hit higher turbo more often and quicker.

mpalmer · March 13, 2017, 4:27am

On the upside, with all the speed improvements which are apparently coming down the pipe in Ruby 3, we’ll automatically get some tidy speed ups. Single-threaded performance almost always has a strong impact on web application performance, whatever the language or framework, because it’s a problem that strongly resists parallelisation. I can’t think of any languages or frameworks that do much, if any, of the page generation in parallel.

ljpp · March 13, 2017, 7:47am

Here are a couple of threads with VPSBench results and threads about specific hosting companies.

UpCloud.com is the fastest I have personally tried and they are nearly twice as fast (single core perf, rebuild times). LeaseWeb also performs nicely in terms of CPU speed.

Topic		Replies	Views
I just hit my CPU cap on the Digital Ocean 2GB/2xCPU plan Hosting	35	17526	April 30, 2018
My discourse speed is very slow Installation	24	4663	March 4, 2021
Is the 6$ DO droplet enough? Hosting	26	2659	February 2, 2023
Discourse installation has been getting slower and slower and slower Installation server-resources	37	1546	May 15, 2023
Are there any commands that will speed up the site? Support	19	2123	December 2, 2020