The Discourse Servers

Jeff:

if they do not ramp up under, say, prime95 or some other CPU torture test, it is bad

it seems Debian is ultra-conservative?

it’s supposed to ramp clock speed under load

3.5 Ghz vs 2.7 Ghz is leaving 30% of perf on the table

Sam:

if I need to move to kvm I will


All our focus around xen testing was trying to get it to run turbo boost, in fact a few days later:

now you see why I hate Xen, not supporting turbo is lame… it is NOT that new with sandy bridge xeons

the numbers you gave, looking at the chat logs, were 10.39 seconds for KVM and 8.48 seconds for Xen


Its impossible to untangle all of this, but for us xen was a non-starter, not to be even tested methodlogically unless it supported turbo boost.

You mention IPMI. How does that compare to technology like Intel’s vPro (specifically AMT)? I’ve got experience with vPro and it was a fascinating technology, but have never heard of IPMI.

We’ve been researching things like that to provide a lights-out server where the nodes are shutdown when not in use, and were going to run with a raspberry pi to provide the VPN access. That TomatoUSB looks amazingly useful.

vPro includes a hardware KVM in a similar fashion, even remote viewing of the BIOS: YouTube

There is a pretty comprehensive review at tomshardware and howtogeek had an article a while back.

I just realized the screenshots from the supermicro IPMI screens are the same ones from the halrious videos at http://www.thewebsiteisdown.com/

Do not click ‘Recompute Base Encryption Hash’…

IPMI works great in my experience, you can just click the link in the article to learn more:

Jeff, other than the cost of the actual servers, what are the costs involved with setting up the actual rack. You have a router, then your servers, db servers, but what do you need to connect them all together like switches, power supply, etc…

Gigabit Ethernet hubs, cat6 cables, and a 1u power strip are all relatively inexpensive. I do recommend having two switches racked with one as a hot spare because if your switch dies, you are in big trouble!

Say a switch goes down, realistically how long would it take for you guys to drive down and fix it?

Also, with your experience running SO and SE, how often did you need to get physical access to the servers to fix a failed drive etc?

I’m just trying to compare things when going with something like ec2 or a managed dedicated box. Obviously you get all these benefits of simply buying a powerhouse server for $1.5-2.5K, not paying $50/month for 1GB of ram etc., but it also has some real issues when something does go wrong you have to drive down and diagnose the issue.

The datacenter, he.net does offer remote hands for $100/hour. So if it is very urgent I would call them, they would disconnect and reconnect all the network cables to the hot spare secondary switch in the same order. Pretty easy, since both our live and hot spare are the exact same switch and stacked right on top of each other.

If I had to drive down, it is about an hour to get there. (Berkeley to San Jose)

The main things that fail are hard drives and power supplies. Failure for new, burned-in server hardware is not that common… I never saw any failures at all for the ~10 servers we built up in the 3 years after deployed server hardware for Stack Exchange.

However, in my experience, while you are getting the servers initially set up and configured, you will need physical access a LOT in the beginning. Not because things are failing, but because you always forget something in the configuration. After racking the servers, plan for a few weeks of visiting the datacenter once a week. Once that is over, you’ll barely ever go back.

(and IPMI aka remote KVM-over-IP works amazingly well, you can reboot and edit the BIOS over the internet… as long as the server has power, it can be managed using IPMI which is basically a dedicated little ARM computer with its own networking inside the server.)

Just my experience!

4 Likes

Do you know of any good write-ups/ tutorials where people outline exactly how they setup their co-location rack? A detailed account on exactly what they bought and tips and tricks etc.

@codinghorror I’m curious about your decision to configure your HAProxy servers, Tie Fighter 10 and 11, in a single chassis sharing one power supply. I understand having two HAProxy instances would allow for high-availability, but what about a scenario in which the power supply fails in that chassis? That seems to imply both servers will go down, and in your own words, “nothing will be accessible.” In choosing the Iris 1125, is downtime caused by PSU failure something you decided was acceptable? Or am I missing something from your configuration that makes this a non-issue?

We saw 20% to 40% performance loss running Discourse benchmarks under Xen and KVM on multiple servers. We tried and tried, and could not do better than that

So, maybe this is obvious, but did you make sure that the guest CPU configuration (in KVM) is the same as the host CPU configuration? This isn’t the default because it reduces portability (that is, live migration between different CPU types) but leaving it general can indeed cut performance by the percentages you’re talking about.

Its a long time ago, but we tried quite a lot here, including raw images, cpu pinning and so on. It may be worth testing again.

Have you guys tried LXC containers yet? You get the nice seperation without the VM overhead, it’s a good middle ground with many benefits.

We have a spare PSU on-site in the cage (we actually have a few spare PSUs and SSDs in the cage, as mentioned in the article). So the time it would take me to drive down there, and install it, is acceptable versus likelihood of PSU failure.

LXC containers and docker is something both I and @supermathie are very interested in, I don’t know of anyone who set up a good Discourse container.

3 Likes

How are those Samsung SSD disks doing? Have any burnt out yet? I used Jeff’s server blue prints for building a couple of db servers (SQL Server) but with the 840 pro disks. The performance was pretty damn good, I’m just wondering how long they’ll hold out :smiley:

This depends entirely on the I/O rate on the disks, which depends entirely what you’re doing on that server. For “typical” server use, barring any random unlucky failures, I think it’s safe to expect ~3 years before I’d even remotely be worried.

However, it is a very good idea to get SSD disks much larger than what you need, so the drive has lots of space to reassign used-up cells. I would never, ever run a server with a 128GB drive that is always near capacity, for example. (Drives do reserve some space internally that you can’t use, but the more you have, the more “enterprisey” the SSD is because it is more tolerant of the most common SSD failure mode: worn out cells.)

2 Likes

Probably pretty good. Unfortunately they don’t support the SMART wear levelling indicator but I can get a fairly generic “sense” of how they’re doing as they are exposing a different attribute:

server  ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
live db 177 Wear_Leveling_Count     0x0013   092   092   000    Pre-fail  Always       -       286
live db 177 Wear_Leveling_Count     0x0013   085   085   000    Pre-fail  Always       -       535

back db 177 Wear_Leveling_Count     0x0013   093   093   000    Pre-fail  Always       -       247
back db 177 Wear_Leveling_Count     0x0013   084   084   000    Pre-fail  Always       -       550

webonly 177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       12
webonly 177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       14

webonly 177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       11
webonly 177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       15

So the live and replica database servers have more wear on them. No surprise there. I should graph this. :smile:

There’s another excellent reason for this and that’s performance. For another customer, I was evaluating 128GB and 256GB “value” drives (i.e. not overprovisioned like the Enterprise drives) as replacements for the 50GB SSDs that reached end of life.

The overprovisioned 50GB SSDs gave you VERY consistent performance on a workload. You knew you were getting the IOPS and latency you needed:

Whereas the “Value” drives let you use all that space, but you have to manually enforce overprovisioning if you must avoid high write completion latency and maintain high IOPS:

(yes, the graphs are slightly different things but the 50GB drive maintains that red line like it was aimed by NASA)

3 Likes

That’s awesome! Thanks for taking the time to supply all this data… really interesting!

Hubs? :grin:

Why aren’t you using NIC teams on all the servers and utilize the fact that you have two switches? You could just set it to be active/passive if you insist on only one switch being active at the time.

A single switch failure would mean zero seconds of downtime, instead of driving down there or rent hands at the colo.