The Discourse Servers

Originally published at: http://blog.discourse.org/2013/04/the-discourse-servers/
When we moved to our new datacenter, I didn’t elaborate on exactly what sort of hardware we have in the rack. But now I will. There are 13 servers here now, all variously numbered Tie Fighters — derived from our internal code name for the project while it was a secret throughout 2012. Tie Fighter…

5 Likes

Did you do a price/performance comparison against cloud vs. self-hosted, or is this a case where you want total control for your own reasons?

1 Like

Performance wise there is no comparison; building your own modern servers is crazy fast compared to what you’ll get “in the cloud”.

One brand new top-of-the-line server costs about as much as two months of EC2 web application hosting – Building Servers for Fun and Prof… OK, Maybe Just for Fun

Unless you pay many thousands of dollars per month.

Our hosting bill at he.net is $600/month for the record. That includes “unlimited” 100 Mbps on a gigabit ethernet port, 15 amps, and a full 42U locked rack cabinet.

7 Likes

I’ve wondered more than once lately if I’m one of the last people on the planet that still has actual physical servers and only selectively uses vm’s because of performance reasons.

For most of the things I need, cloud-hosted servers offer more than adequate performance.

On the other hand, I’m not hosting anything that has to handle serious volume at the moment…

2 Likes

Interesting post, I’m looking forward to the blog post about the software stack.

Though I do miss quite some things in your setup (or you didn’t mention them). How about backups, a firewall, and multiple hosting locations ?

Another interesting calculation would be a TCO comparison with cloud hosting, taking the facts that you bought an expensive server that you’re not really using anymore because of architectural decisions (if you used cloud hosting you would have just canceled it), chances of having to replace a mainboard, and the fact that you’re currently quite overdimensioned into account.

Don’t forget that you’re pretty confident you will grow and be successful (and I think you will be) but for a company that isn’t so sure of being able to grow fast, buying servers might be a bit too optimistic.

So basically, I do admire your guts and the fact that you dare to do something that is exactly the opposite of what everybody thinks one should be doing. But I do wonder if it’s the right thing to do :slight_smile:

1 Like

Turns out that Ruby is … kind of hard to virtualize effectively

Could you elaborate on that? What kind of problems did you have that made you abandon virtualisation.

3 Likes

Does the money you’re saving by building and servicing your own servers include the value of your time? And is the time you spend on the hardware maybe taking away from the time you might be using to work on the software, recruit hosting partners, etc? Just asking.

We saw 20% to 40% performance loss running Discourse benchmarks under Xen and KVM on multiple servers. We tried and tried, and could not do better than that. A “mere” twenty percent performance loss is equivalent to downgrading from a 3 Ghz CPU to a 2.4 Ghz one.

Ruby in general eats CPUs and I/O for breakfast, it is extremely performance intensive. It’ll take everything you can throw at it, and more – and clock rate matters more than number of cores by a wide margin.

3 Likes

Well this is a startup so my time isn’t worth that much. Also I’m pretty good with hardware in general, having built maybe 40 computers from parts in my lifetime.

Hosting is a core business function at Discourse, if we get out-hosted by other people that understand hardware better than we do, there goes a big part of our future business model. So it behooves us to have world-class hosting and extremely fast cutting edge hardware, not just the same cloud servers anybody with a credit card can spin up. :smile:

4 Likes

Just curious, have you tried different interpreters (MRI/Jruby) and application servers (unicorn/passenger/puma)?

1 Like

We are hosted on MRI 2.0 heavily tuned with tcmalloc. I tried jRuby locally and was barely able to get it running, when I did it was much slower than 1.9.3, mostly due to missing native implementations of various native functionality, for example oj is pretty damn fast.

Considering GitHub, Shopify and many other high scale Rails outfits are sticking with MRI and improving it I am comfortable with our decision here.

With regards to web servers, I intend to shift us to unicorn with oobgc, and complement it with thins for long polling. It complicates stuff a bit, so have held off, but will get to it. We work on Passenger, Thin and Puma at the moment, with some minor changes needed to work on Unicorn around Redis and forking.

I do not really intend to use Puma cause our next biggest perf win is oobgc something that is out of the question with Puma. Also, long polling is already implemented and works fine on Thin no urgency moving it to a threaded model.

3 Likes

This is an area where Ruby has been bucking the trend (or at least, not avidly following it) for some time now. The language has some support for various forms of lightweight concurrency, but still lacks good support for native threads – it’s present but somewhat crippled, as I recall. The result is the only easy way to take advantage of multiple cores in Ruby to actually fork independent processes.

This article, while primarily about Rubinius and not MRI, starts out discussing concurrency issues in MRI as a prelude to describing how Rubinius is trying to move past them, and seems to summarize the issues pretty well.

I think a lot of people assume that because Ruby is so popular that all of the available interpreters – especially the canonical one – must be up to date with all the latest tricks for taking advantage of our multi-core future. Unfortunately, that’s just not true! Hopefully it will get better, but right now, MRI is pretty old fashioned…

Let us just diplomatically say “there is lots of room for improvement”. :wink: It’s all upside with Ruby!

3 Likes

Can’t argue with that :smile: I really like Ruby as a language, but I think there’s very few people, including Matz, who would argue there isn’t a lot yet to do to improve the environment!

What’s with the naming (numbering?) convention? Will the next Thai Fighters be numbered 12, 13, 14, 15, 20, 21, . . . ?

We wanted to leave room for more scaling Tie Fighters “above” the routers. e.g. Tie 6, 7, 8, 9.

What are the reasons behind this? Surely PHP, Perl , Python dont even lost any where close to that much in VM. It is also interesting you point this out since I have never ran Ruby on Non VM / Cloud.

I don’t think the issues we observed were Ruby specific, I think they were Turbo Boost specific.

In particular, we saw the typical 2-5% perf loss when we virtualized under xen in a hypervisor that disabled turbo boost (xen only recently has been patched to sort of support this).

To take advantage of full clock speeds we changed to kvm but noticed much bigger perf losses (even though clocks were running at the right speed). The loss was not Ruby specify as far as I recall.

My suspicion is that most places that host vms have machines heavily utilized and turbo boost disabled.

I do not think this is correct. In fact I am sure it is almost certainly wrong. I will go look up the chat logs but we saw “only” 20 percent perf loss in Xen, but to get there we had to lose Intel Xeon turbo, which made it even worse performance wise.

I never ever saw any virtualization results that were 2 to 5 percent, not of a full benchmark of real Ruby code anyway. Synthetic benchmarks of CPU under a VM could do that, perhaps, but anything with IO is going to be worse.