Meta is moving to the Cloud 🌩


(Jeff Atwood) #18

Also @trash and @tobiaseigen have reported massively delayed oneboxing. @hawk has reported big delays in desktop notifications as well.


(AstonJ) #19

Wow that’s a lot.

Can you share your monthly page views please Jeff? Around a million a month?


#20

More than double that.


(Stefano Maffulli) #21

Congratulations on the basically flawless move! I’d love to read what your architecture looks like on AWS.


(Dean Taylor) #22

Notification of this post just took 14 minutes to arrive:

UPDATE: It then took ~16 minutes to update this post to Onebox the above post reference.


(Jeff Atwood) #23

We might have to just live with massively delayed notifications for a while, as @sam is working on another much more urgent internal problem.


(Sam Saffron) #24

Well, this one is a doozie. But I can explain it and it makes perfect sense. Ever since we move meta to AWS, AWS started playing “silly buggers” with all our outbound mail. This is a known issue due to:

https://aws.amazon.com/premiumsupport/knowledge-center/ec2-port-25-throttle/

We already opened an issue with AWS about this but, they are super slow to respond. To circumvent it we can use a different port (port 587 for example) however that would require extensive reconfiguration of some public infrastructure we have. So instead, @supermathie is just going to move outgoing mail from meta to a third party provider. We were able to get this working.

TLDR

  • Amazon clogged our mail, by making mail jobs take forever.
  • Our mail is constantly retrying clogging our Sidekiq
  • All jobs are massively delayed including onebox, notifications and so on
  • Profit

Should be fixed soon.

Illustrated explanation:

image


(Michael Brown) #25

Durrrrrrrrrrrrrrr oh man so obvious when you realize it.

Nice one!


(AstonJ) #26

That explains why the notification to a reply came three hours later :laughing: well done in sorting it Sam n co :clap:


(Sam Saffron) #27

Well actually not, it is peanuts, we are not even considering this style of hosting for our regular standard/business/enterprise customers. For them we will plan to continue hosting on bare metal.

We are investigating a new region with bare metal in Europe at the moment to service people who must be hosted in Europe. They will get the same excellent performance and reliability our current hosting in SF provides.

AWS customers are in the “super enterprise” level. They are the type of customer that MUST be hosted on AWS, cause of mantras such as “Nobody ever got fired for choosing IBM”. For them they must be on AWS cause they simply must be. They are used to the very high costs of AWS, it is a fact of life.

Our “super enterprise” customers get an isolated “rack” in the cloud. This includes extensive monitoring, extensive levels of failover (multiple AZs), large EC2 instances and large DB instances, Logs forwarded to elastic search and the list goes on and on. This means that for this “modest” meta config we have something like 12-15 ec2 instances and dedicated database and ElasticCache instances.

Yes, there are economies of scale if we host a giant multisite in the cloud at which point we get to share the monitoring aspect of our cloud infrastructure and cut costs, however this is not on our plans for 2018. A rack in Europe is though.


(Sam Saffron) #28

Thanks to @supermathie mail is A-OK again and notification will arrive nice and quick, the AWS :sneezing_face: is over.


(Michael Scott Shappe) #29

As someone who has been following the “cloud vs. dedicated hardware” argument since Discourse’s early days, I’m seriously interested in a detailed discussion of the differences/costs/etc. What’s here is already instructive.


(Jeff Atwood) #30

We’ve slowly been working costs down as we go, and we have hosting meta down to around $1000/month on AWS – that’s with multiple tweaks over the last 6-8 months. When we started this it was closer to $3000/month. Really!

Before: $2,717 / month

image

After: $1,030.06 / month

image

Note that meta is deployed as “super enterprisey” in our testing so it is somewhat … overprovisioned. :wink:

We could do better long term by doing long term reserved instances which could cut the cost in half, essentially pre-paying for multiple years of service.

Any other thoughts this far in @sam?


(Rory Craig-Barnes) #31

I am curious what changes/tweaks you’ve been doing to cut the cost in half? Is it more paying attention to AWS requirements, or is it tweaking code? Combination of many factors? Would the tweaks your making be useful information for others who may be or may be thinking of hosting on AWS?


(Jeff Atwood) #32

The main piece of advice I have is … don’t do it. Don’t take on a complex, “enterprisey” cloud install unless you have to. It’s extremely expensive for what you get. Compare to a simple monolithic Digital Ocean droplet running our standard Docker image, which can get you a very long way even at the $40 and $80 per month price points.


(Rory Craig-Barnes) #33

Fair, I never plan to use AWS for it, as I said was more of a curiosity for me. :slight_smile:


(Sam Saffron) #34

Not really, I think you nailed it. Note we do have 1 year reserved instances for a few of our EC2 VMs so the cost is probably closer to the $1300 a month once amortized.

We can probably reduce cost a bit by moving Redis to a EC2 instance and rolling our own vs using ElasticCache which is a bit of a premium.

Overall, we have been very happy with our AWS experience, but it is certainly a bit pricey compared to our bare metal setup. We also squeeze a tiny bit more performance out of our bare meta setup than AWS, but we are not talking 2x difference, more like 5-30% difference on the server side.

Note it is important to have full perspective on costs here, cause even if you can do $80 on digital ocean, you miss out on:

  • Auto scaling, which helps us a fair bit on some super enterprise setups

  • Accounting for Prometheus based monitoring which we have. For context NewRelic would be sitting at say $100 per server and then you would also need DataDog which is another $50 or so

  • We also ship with ELK so you would need something like logit Pricing cloud ELK - Logit.io which is yet more money

  • Our PG setup has automatic failover so you would need 2x instances on digital ocean to account for something like this plus a complicated setup.

  • Our Redis setup has automatic failover (so another 2x instances for that)

The bottom line is that $10-$80 is perfectly fine for an unmonitored monolithic setup. But once you need to start talking SLAs and need to know this thing will be rock solid and survive random failures… well costs start mounting.


(Fábio Machado De Oliveira) #35

Does meta require all this power, or are you just using it to test this hosting option or something?


(Jeff Atwood) #36

It’s mostly for testing. Meta is moderately busy, you can view its stats (or any discourse site stats) by going to /about


(Sam Saffron) #37

Note, our hosting infrastructure on AWS offer an economy of scale. Hosting the first site is fairly costly, but subsequent sites on the same virtual cloud get a substantial discount cause we reuse monitoring/access and logging infrastructure. Not “digital ocean” cheap, but adding one more site say to the meta cloud would be a few hundred dollars vs a thousand dollars.