Meta is moving to the Cloud 🌩

codinghorror · October 26, 2017, 11:01pm

We might have to just live with massively delayed notifications for a while, as @sam is working on another much more urgent internal problem.

sam · October 27, 2017, 1:50am

Well, this one is a doozie. But I can explain it and it makes perfect sense. Ever since we move meta to AWS, AWS started playing “silly buggers” with all our outbound mail. This is a known issue due to:

We already opened an issue with AWS about this but, they are super slow to respond. To circumvent it we can use a different port (port 587 for example) however that would require extensive reconfiguration of some public infrastructure we have. ~~So instead, @supermathie is just going to move outgoing mail from meta to a third party provider.~~ We were able to get this working.

TLDR

Amazon clogged our mail, by making mail jobs take forever.
Our mail is constantly retrying clogging our Sidekiq
All jobs are massively delayed including onebox, notifications and so on
Profit

Should be fixed soon.

Illustrated explanation:

supermathie · October 27, 2017, 1:53am

Durrrrrrrrrrrrrrr oh man so obvious when you realize it.

Nice one!

AstonJ · October 27, 2017, 1:57am

That explains why the notification to a reply came three hours later well done in sorting it Sam n co

sam · October 27, 2017, 2:12am

Well actually not, it is peanuts, we are not even considering this style of hosting for our regular standard/business/enterprise customers. For them we will plan to continue hosting on bare metal.

We are investigating a new region with bare metal in Europe at the moment to service people who must be hosted in Europe. They will get the same excellent performance and reliability our current hosting in SF provides.

AWS customers are in the “super enterprise” level. They are the type of customer that MUST be hosted on AWS, cause of mantras such as “Nobody ever got fired for choosing IBM”. For them they must be on AWS cause they simply must be. They are used to the very high costs of AWS, it is a fact of life.

Our “super enterprise” customers get an isolated “rack” in the cloud. This includes extensive monitoring, extensive levels of failover (multiple AZs), large EC2 instances and large DB instances, Logs forwarded to elastic search and the list goes on and on. This means that for this “modest” meta config we have something like 12-15 ec2 instances and dedicated database and ElasticCache instances.

Yes, there are economies of scale if we host a giant multisite in the cloud at which point we get to share the monitoring aspect of our cloud infrastructure and cut costs, however this is not on our plans for 2018. A rack in Europe is though.

sam · October 27, 2017, 9:52am

Thanks to @supermathie mail is A-OK again and notification will arrive nice and quick, the AWS is over.

mshappe · October 27, 2017, 3:43pm

As someone who has been following the “cloud vs. dedicated hardware” argument since Discourse’s early days, I’m seriously interested in a detailed discussion of the differences/costs/etc. What’s here is already instructive.

codinghorror · July 9, 2018, 10:56pm

We’ve slowly been working costs down as we go, and we have hosting meta down to around $1000/month on AWS – that’s with multiple tweaks over the last 6-8 months. When we started this it was closer to $3000/month. Really!

Before: $2,717 / month

After: $1,030.06 / month

Note that meta is deployed as “super enterprisey” in our testing so it is somewhat … overprovisioned.

We could do better long term by doing long term reserved instances which could cut the cost in half, essentially pre-paying for multiple years of service.

Any other thoughts this far in @sam?

rorycb · July 9, 2018, 11:14pm

I am curious what changes/tweaks you’ve been doing to cut the cost in half? Is it more paying attention to AWS requirements, or is it tweaking code? Combination of many factors? Would the tweaks your making be useful information for others who may be or may be thinking of hosting on AWS?

codinghorror · July 9, 2018, 11:18pm

The main piece of advice I have is … don’t do it. Don’t take on a complex, “enterprisey” cloud install unless you have to. It’s extremely expensive for what you get. Compare to a simple monolithic Digital Ocean droplet running our standard Docker image, which can get you a very long way even at the $40 and $80 per month price points.

rorycb · July 9, 2018, 11:21pm

Fair, I never plan to use AWS for it, as I said was more of a curiosity for me.

sam · July 10, 2018, 12:10am

Not really, I think you nailed it. Note we do have 1 year reserved instances for a few of our EC2 VMs so the cost is probably closer to the $1300 a month once amortized.

We can probably reduce cost a bit by moving Redis to a EC2 instance and rolling our own vs using ElasticCache which is a bit of a premium.

Overall, we have been very happy with our AWS experience, but it is certainly a bit pricey compared to our bare metal setup. We also squeeze a tiny bit more performance out of our bare meta setup than AWS, but we are not talking 2x difference, more like 5-30% difference on the server side.

Note it is important to have full perspective on costs here, cause even if you can do $80 on digital ocean, you miss out on:

Auto scaling, which helps us a fair bit on some super enterprise setups
Accounting for Prometheus based monitoring which we have. For context NewRelic would be sitting at say $100 per server and then you would also need DataDog which is another $50 or so
We also ship with ELK so you would need something like logit https://logit.io/pricing which is yet more money
Our PG setup has automatic failover so you would need 2x instances on digital ocean to account for something like this plus a complicated setup.
Our Redis setup has automatic failover (so another 2x instances for that)

The bottom line is that $10-$80 is perfectly fine for an unmonitored monolithic setup. But once you need to start talking SLAs and need to know this thing will be rock solid and survive random failures… well costs start mounting.

Fabio_Machado_de_Oli · July 10, 2018, 9:18pm

Does meta require all this power, or are you just using it to test this hosting option or something?

codinghorror · July 10, 2018, 9:32pm

It’s mostly for testing. Meta is moderately busy, you can view its stats (or any discourse site stats) by going to /about

sam · July 11, 2018, 12:45am

Note, our hosting infrastructure on AWS offer an economy of scale. Hosting the first site is fairly costly, but subsequent sites on the same virtual cloud get a substantial discount cause we reuse monitoring/access and logging infrastructure. Not “digital ocean” cheap, but adding one more site say to the meta cloud would be a few hundred dollars vs a thousand dollars.

bryzaguy · July 24, 2018, 8:56pm

<— Disclaimer: New Relic developer here.

What a great and insightful response!

Just out of curiosity, what aspects of Datadog make it a critical ingredient? I work on our Insights product. Would love to know if there’s a way I can help reduce costs. Also, I’m just generally curious.

sam · July 25, 2018, 2:09am

Comparing insights to datadog is so out of scope of what I can do here. I have some experience around using the newrelic app monitoring and some around datadog server monitoring.

What I am digging at here is that in order to run a proper monitored service you need both application level monitoring and server level monitoring. Meaning… you want to know when a server dies or goes to 100% CPU. You also want to know when Discourse has a ton of web requests queueing or if somehow database time for /latest became 4x slower.

Apologies if both comprehensive server and application monitoring can be covered by Newrelic and I put some misinformation out there. Looking through your site it looks like you have enough coverage here.

DannyUSBP · January 27, 2019, 4:39pm

Sam, how was AWS for the Meta for than an year after you have moved it there? Do you guys need to pay for overage bandwidth?

Falco · January 27, 2019, 7:05pm

Meta is hosted in the same setup we use for our Enterprise customers, so it certainly doesn’t fit the free tier. See this previous post:

Only change so far is moving away from Elasticache since the service had some rough edges. We run Redis ourselves.

DannyUSBP · January 28, 2019, 8:52am

Thank you Rafael. I don’t now your bare-metal setup, but it looks to me that any Dedicated Cloud colocated in any Tier 4 data center would work better in a long run, than running all the services with any major cloud (AWS or any other).

Topic		Replies	Views
Recommended Hosting Providers for Self Hosters Hosting	109	29408	April 16, 2025
ScaleWay review? Hosting	53	20635	October 23, 2018
I just hit my CPU cap on the Digital Ocean 2GB/2xCPU plan Hosting	35	17511	April 30, 2018
Cheap Docker hosting? Hosting	38	19426	June 10, 2018
Will we ever be able to install Discourse on shared servers? Installation	55	8008	June 16, 2018

Meta is moving to the Cloud 🌩

TLDR

Related topics