Just once, and a second time now. And that was immediately after acquiring a new IP address.
Still the same outcome I’m afraid.
New day, same issue.
In the end, I completely cloned the AWS instance that Discourse was running on. This all started without any issue. The lesson here is, never shut this instance down, or we will most likely have this issue again. This could be a problem in the future, obviously.
So it seems that the answer here is “sometimes Discourse issues too many requests from rubygems in too short a time, and therefore doesn’t work”. So we were stuck with a non-functioning Discourse installation and therefore no way of continuing conversations that were in progress with our customers. This is disappointing - clearly we never should have used Discourse to begin with.
Thanks everyone for the suggestions and help, however.
I’ve never, ever been rate limited by rubygems during a rebuild in literally years of running Discourse droplets on Digital Ocean. I wonder if AWS shares IP addresses among large numbers of machines, that would also be hitting Rubygems?
It’s frustrating indeed.
Given that you have a separate data container it is appears that you do, you can do this:
./launcher bootstrap app
That will build a new container without first trashing your old one.
Given that it rebuilds properly, you can then
./launcher destroy app
./launcher start app
and have less than a minute of downtime.
Do you think this behavior is somehow specific to the data container? Are multiple containers being rebuilt simultaneously and all hitting rubygems at the same time, and triggering rate limits?
I think that if he knew that he could rebuild just the web container without first destroying it (since it appears that there is a separate data container), he could have solved the problem (which remains a mystery) without first taking his site down.
The actual problem of the rubygems rate limiting is entirely befuddling. You guys claim it’s never happened to you, even when building tons of sites. I’ve seen it several times and cannot explain why it happens or why it goes away.
I’ve literally never seen it, on the Digital Ocean / Discourse sites I run, in three solid years of rebuilding at least once a week.
If someone has repro steps that’d be excellent.
It’s happened to me about 3 times, in building dozens of sites. I can’t figure out how to repro or how it goes away. I’ve contacted rubygems and gotten no response. My only guess, which is totally unfounded, is that Digital Ocean gets lumped into a single rate-limit bin for them.
From what I understand the rate limits were only introduced relatively recently.
Thanks Jay - I’ll refer back here if this problem ever comes up for me again.
I have the same issue: every attempt to run ./launcher rebuild <container>
fails with the «429 Too Many Requests» from rubygems.org:
Here is the only solution I have found: The only solution I have found to workaround «429 Too Many Requests» failure from rubygems.org
I’m running into this issue for the second time in a couple of weeks. Both instances were running on Vultr’s Ubuntu 18.04 x64 1GB RAM packages.
I tried going back to the previous version as described by Discourse.PRO above without any joy.
Maybe Ubuntu 18 is too fast?
If you are getting rate limited, going back to a previous version won’t help. Quite the opposite actually. Your best option is to wait X minutes before issuing another rebuild.
I’ve tried leaving it days before retrying and the same issue occurs.
I’ve also tried it on Ubuntu 17 instance instead. The initial install of Discourse is fine, but the rebuilds always fail due to the Ruby rate limiting. Surely there should be some way for Discourse to check that the required items have been downloaded and retry those that haven’t been?
In that case I would try to contact RubyGems and ask them if they’ve somehow blacklisted your IP address.
I did that last summer and in April they emailed to ask if I still needed help.
We will be releasing a new base image in the next few days that will help here.
Hmm interesting so as the base image gets older, it makes more requests to rubygems and is thus more likely to trigger this?
Yes, and we just went to Rails 5.2, so there is lots of updates.
Meta is running this new image, if the weekend goes alright I will push it to all self-hosters (and force a terminal rebuild for everyone).