Slow rebuild on a cloud server?

I have two instances of Discourse, both on their own droplets at DigitalOcean. The droplets are both running Ubuntu 18.04 with 1 vCPU and 1GB of memory and configured similarly. The instances are very similar in configuration as well (themes and plugins). Both appear to be on the same CPU type.

I did a rebuild on both instances this morning and one finished much faster than the other. For comparison, the slow instance took 57 minutes to rebuild and be available again while the fast instance took 7 minutes to rebuild and be available. I have noticed this slowness before for rebuilding the slow instance and after rebooting, but I didn’t have the direct comparison previously to notice how slow it was.

This is most noticeable during the compression stage of the rebuild, but the whole process is really slow. For the slow instance, the compression steps took between 16 and 70 seconds. For the fast instance, it was between 1 and 12 seconds.

Is there anything I can check to see what might be slowing this down so much? I would suspect there is an issue with the VM somewhere, but I am not sure where to start looking. Or should I open a ticket with DigitalOcean asking them to investigate?

Are the rebuild logs stored anywhere? I wasn’t having any luck finding them.

Thanks

There are more than a dozen of reasons as to why this might be happening.

Prime Suspects:

  1. CPU Cluster load (This is the load that the overall package is having not just your instance. No matter how good isolation a virtualization may offer, A CPU that is being hit with simultaneous load on cores will perform worse than others.)

  2. Region where your droplet is hosted (Yes, I’ve noticed it too, DO does not have same kind of CPUs in every region. Sometimes you can maybe get a slower 2.7Ghz and sometimes a faster 3.5Ghz and that definitely has impact on rebuild times.)

  3. Overall traffic being served through an instance. If it is more populated, it is possible that that’s what causing the slowdown but that shouldn’t impact the rebuild times.

If you could, Please try to add in the information about the CPU (Model Number if possible) and region(s) that you’re hosting your Droplets into! that may be really helpful to understand if one region performs better than the other.

3 Likes

Both instances are in NYC3.

/proc/cpuinfo lists Intel® Xeon® CPU E5-2650L v3 @ 1.80GHz for both instances.

I can only recommend contacting DO about the slower node for terrible performance then.

It’s just chance which nodes have noisy neighbors.

I think that NYC 2 may be a bit better.

2 Likes

Quick update. To speed up comparison and testing of performance, I used sysbench to compare droplets. The command I ran is below. On my fast droplet, I was seeing 250 events/second. On the slow droplet, I was seeing 50 events/second. That fits with the differences in rebuild times as well.

sysbench cpu --cpu-max-prime=20000 run

I worked with DigitalOcean support for a bit and we tried a few things but didn’t see any improvement. Since creating a new droplet and migrating Discourse to a new server is easy, I decided to try creating a new droplet to see if that ended up on different hardware and avoided the performance issue. Thankfully it did, so I migrated the site to the new droplet and the rebuild times are below 10 minutes now.

This can be closed now.

3 Likes