Durrrrrrrrrrrrrrr oh man so obvious when you realize it.
Nice one!
Durrrrrrrrrrrrrrr oh man so obvious when you realize it.
Nice one!
That explains why the notification to a reply came three hours later
well done in sorting it Sam n co 
Well actually not, it is peanuts, we are not even considering this style of hosting for our regular standard/business/enterprise customers. For them we will plan to continue hosting on bare metal.
We are investigating a new region with bare metal in Europe at the moment to service people who must be hosted in Europe. They will get the same excellent performance and reliability our current hosting in SF provides.
AWS customers are in the āsuper enterpriseā level. They are the type of customer that MUST be hosted on AWS, cause of mantras such as āNobody ever got fired for choosing IBMā. For them they must be on AWS cause they simply must be. They are used to the very high costs of AWS, it is a fact of life.
Our āsuper enterpriseā customers get an isolated ārackā in the cloud. This includes extensive monitoring, extensive levels of failover (multiple AZs), large EC2 instances and large DB instances, Logs forwarded to elastic search and the list goes on and on. This means that for this āmodestā meta config we have something like 12-15 ec2 instances and dedicated database and ElasticCache instances.
Yes, there are economies of scale if we host a giant multisite in the cloud at which point we get to share the monitoring aspect of our cloud infrastructure and cut costs, however this is not on our plans for 2018. A rack in Europe is though.
Thanks to @supermathie mail is A-OK again and notification will arrive nice and quick, the AWS
is over.
As someone who has been following the ācloud vs. dedicated hardwareā argument since Discourseās early days, Iām seriously interested in a detailed discussion of the differences/costs/etc. Whatās here is already instructive.
Weāve slowly been working costs down as we go, and we have hosting meta down to around $1000/month on AWS ā thatās with multiple tweaks over the last 6-8 months. When we started this it was closer to $3000/month. Really!
Before: $2,717 / month

After: $1,030.06 / month

Note that meta is deployed as āsuper enterpriseyā in our testing so it is somewhat ⦠overprovisioned. 
We could do better long term by doing long term reserved instances which could cut the cost in half, essentially pre-paying for multiple years of service.
Any other thoughts this far in @sam?
I am curious what changes/tweaks youāve been doing to cut the cost in half? Is it more paying attention to AWS requirements, or is it tweaking code? Combination of many factors? Would the tweaks your making be useful information for others who may be or may be thinking of hosting on AWS?
The main piece of advice I have is ⦠donāt do it. Donāt take on a complex, āenterpriseyā cloud install unless you have to. Itās extremely expensive for what you get. Compare to a simple monolithic Digital Ocean droplet running our standard Docker image, which can get you a very long way even at the $40 and $80 per month price points.
Fair, I never plan to use AWS for it, as I said was more of a curiosity for me. 
Not really, I think you nailed it. Note we do have 1 year reserved instances for a few of our EC2 VMs so the cost is probably closer to the $1300 a month once amortized.
We can probably reduce cost a bit by moving Redis to a EC2 instance and rolling our own vs using ElasticCache which is a bit of a premium.
Overall, we have been very happy with our AWS experience, but it is certainly a bit pricey compared to our bare metal setup. We also squeeze a tiny bit more performance out of our bare meta setup than AWS, but we are not talking 2x difference, more like 5-30% difference on the server side.
Note it is important to have full perspective on costs here, cause even if you can do $80 on digital ocean, you miss out on:
Auto scaling, which helps us a fair bit on some super enterprise setups
Accounting for Prometheus based monitoring which we have. For context NewRelic would be sitting at say $100 per server and then you would also need DataDog which is another $50 or so
We also ship with ELK so you would need something like logit Hosted ELK Pricing - Elasticsearch Pricing | Logit.io which is yet more money
Our PG setup has automatic failover so you would need 2x instances on digital ocean to account for something like this plus a complicated setup.
Our Redis setup has automatic failover (so another 2x instances for that)
The bottom line is that $10-$80 is perfectly fine for an unmonitored monolithic setup. But once you need to start talking SLAs and need to know this thing will be rock solid and survive random failures⦠well costs start mounting.
Does meta require all this power, or are you just using it to test this hosting option or something?
Itās mostly for testing. Meta is moderately busy, you can view its stats (or any discourse site stats) by going to /about
Note, our hosting infrastructure on AWS offer an economy of scale. Hosting the first site is fairly costly, but subsequent sites on the same virtual cloud get a substantial discount cause we reuse monitoring/access and logging infrastructure. Not ādigital oceanā cheap, but adding one more site say to the meta cloud would be a few hundred dollars vs a thousand dollars.
<ā Disclaimer: New Relic developer here.
What a great and insightful response!
Just out of curiosity, what aspects of Datadog make it a critical ingredient? I work on our Insights product. Would love to know if thereās a way I can help reduce costs. Also, Iām just generally curious.
Comparing insights to datadog is so out of scope of what I can do here. I have some experience around using the newrelic app monitoring and some around datadog server monitoring.
What I am digging at here is that in order to run a proper monitored service you need both application level monitoring and server level monitoring. Meaning⦠you want to know when a server dies or goes to 100% CPU. You also want to know when Discourse has a ton of web requests queueing or if somehow database time for /latest became 4x slower.
Apologies if both comprehensive server and application monitoring can be covered by Newrelic and I put some misinformation out there. Looking through your site it looks like you have enough coverage here.
Sam, how was AWS for the Meta for than an year after you have moved it there? Do you guys need to pay for overage bandwidth?
Meta is hosted in the same setup we use for our Enterprise customers, so it certainly doesnāt fit the free tier. See this previous post:
Only change so far is moving away from Elasticache since the service had some rough edges. We run Redis ourselves.
Thank you Rafael. I donāt now your bare-metal setup, but it looks to me that any Dedicated Cloud colocated in any Tier 4 data center would work better in a long run, than running all the services with any major cloud (AWS or any other).
I am not sure at all I am following any of your conclusion here at all.
Maybe, I miss something or just donāt understand something!? Based on the Rafaelās answer just though that having an owned dedicated cloud setup, intends of using leased infrastructure is more cost-effective in a long run.