This question is somewhat following on from the new “Weekly Summaries” (which I think are a great idea!): Weekly Discourse Summary?. These summaries are going to be about the beta releases, while the default setup for self-installers is tests-passed.
I know there’s a lot of discussion about the different branches on Meta, so I’ll try and keep this as specific as possible.
This is from 2014:
Discourse has 3 official branches:
tests-passed: updated most frequently after our test suite runs. This is the branch we deploy most of our customers on.
beta: updated weekly(ish), a snapshot of tests-passed.
stable: where our major releases live (with security fixes backported as they arise)
I’m currently running on tests-passed, which obviously has the advantage of getting features as soon as they’re released. The disadvantage is that there are very often little bugs in functionality present in tests-passed. They get fixed extremely quickly, but the bug will still be present on my forum until I manually go and update again. Often I’ll only do that once someone has complained about a bug, which is too late!
I’m curious how the team deal with this on your own hosting. I know that meta/try exactly follow the latest commit to tests-passed, but what about your customers?
Do you wait until tests-passed is in a “good” state, and then deploy to customers? Or is the beta branch a good representation of what gets deployed?
Our customers follow tests-passed, but unlike Meta/try do not update automatically as soon as we commit. We update customers regularly, but not on a specific schedule. We deploy customers “as needed”. For example, if we got significant reports about a bug, or the bug is just really bad, we’ll generally deploy sites as soon as we know the bug is fixed. We’ll also do a deploy after we finish a new feature. Sometimes we’ll do a deploy due to a plugin update, which also causes Discourse to be updated. We’ll also deploy if we haven’t deployed in a while, to ensure we keep everyone up to date.
The beta branch is not a good representation of what our customers run.
Why tests-passed and not beta?
Most site admins update their sites when they receive the new beta notification. This notification is sent to all sites on tests-passed and beta. When they update they receive the most recent tests-passed commit, but generally will not update again until the next beta email. However, should they need to update between betas they can. If instead they tracked beta, they would not be able to update their site until we release another beta.
This makes sense, I guess it’s an extension of the “complaint driven development” model. Unfortunately that method doesn’t really work at the scale of small self-hosted forums like my own. By the time a bug is reported and I’ve hit upgrade, it has already disrupted the majority of users.
I wonder if there is some way your “current deployed version” could be recorded on GitHub (through a tag), so that us self-hosters can be notified when a serious bug has been fixed. Maybe that tag could also trigger the “update required” admin panel UI and email notification…
There are many #support requests on Meta from people on the “latest version” that are generally solved by
This is fixed in latest - visit /admin/upgrade and update your forum
If the “update required” logic followed your customer deployment then I think it could help to reduce these kind of complaints.
Unfortunately a lot of my users view Discourse as “buggy software” . I’m not sure the idea above is perfect, but I think it could help.
That’d be tough - not all our customers are guaranteed to be on the same version at the same time. Updating one cluster of customers could be done for a very specific reason, that we would not want to record on Github. Waiting for all customers would also not work, some customers follow a custom deployment schedule.
That’s certainly an interesting point. I for one do not wish for Discourse to be viewed as buggy. However, a decision like this is well above my pay grade . This branch system very much predates my time on the team.
Thanks @mcwumbly, I think that’s a good methodology. I tend to try and update as soon as there’s a new feature (e.g. the composer redesign), but maybe I should wait a little bit to let things settle down first.
I think you’re right here, and it’s probably not too much overhead for me.
In an ideal world though, being an active member of Meta shouldn’t be a prerequisite for having a smooth Discourse experience. If we had some way to automate this “strategic updating” for all the people that don’t spend much time on Meta then that would be great!
I suspect you’re right that some of it is down to the audience, they’re fairly technical people so will notice things that the “average person” may not notice.
One recent example was this - I happened to update to tests-passed just before the fix was made. The problem only existed on meta for a few minutes, but persisted on my forum for about 24 hours before I realised.
Sure, this one is not a show-stopping bug, but it is something that people notice:
Aside from the audience differences, I guess maybe the team think carefully before deploying a certain commit to customers? Or you update customers quick enough that it’s not a big deal - knowledge of the software is what they’re paying you for after all
I suspect from the way @mcwumbly and @Falco described their strategic upgrading process, that they also experienced things like this, and learnt to become more careful about updating?
Maybe I’ll try using the “beta” branch, but I worry that I’ll just end up with small bugs “frozen” for the whole week… At least on tests-passed I can pretty much just click upgrade and it’ll be fixed.
The way in which people complain is definitely related to the community dynamic, but the presence of a bug like that text colour one is an objective fact - I can see it myself.
I 100% agree here. I have never encountered major bugs when using Meta, or any other communities that you host. What I’m trying to work out with this topic is how you manage to achieve that. Do you ever think “oh, tests-passed is a little bit risky at the moment, let’s hold back on upgrading customer x”?
When I upgrade my self-hosted forum to tests-passed it feels like a bit of a lottery whether there will be any bugs. Any bugs that do creep in then generally remain for a week, until the next upgrade notification (or if there’s a bad bug I update manually).
If there is no ‘strategic’ upgrading with your customers, then maybe I’m just being particularly unlucky with the exact times I update .
What am I missing? I’ve long believed in running more than a single environment. Plain and simple. One for “live”, and another or more for mucking about to varying degree. I also keep backups for a while just in case.
Perhaps the main question should be what interferes with running more than a live server?
In other words, is the problem, less than optimal communication. Or is there some factor that prevents some testing before moving it to live?
run a second instance for staging / testing new releases before rolling them to your production site
it all depends what your tolerance for pain / bugs is, but I have to say, the level described here is quite unusual in my personal experience!
I can tell you that adversarial sites, where people were forcefully switched to a software they actively did not want, is devastatingly bad. It actually caused mental problems for me. I strongly STRONGLY advise anyone reading to avoid that particular kind of situation at all costs.
This seems like the “responsible” thing for people who are hyper concerned about things working. But then what? Copy over the docker image? (which, I think, is what you guys do.) Stick the commit key in app.yml?
It’ll always be the case that someone was forcefully moved to a new platform. A former secretary of state ran her own email server so she could keep her Blackberry (I think that was it).
Mostly, we hear that most people are elated to move to Discourse. On a site I’m moving now, when we explained to a “trusted” user why a particular type of “discussion” (OMG reverse-chron?!) was being eliminated he said “good riddance!”
But I’m with you, if you can avoid people who are unhappy, that’s what to do.
I think one practical thing to consider here is to publish the schema versions on the upgrade page as well, and enable sites to update to any commit that has a compatible schema version (by “compatible” I mean, the simplest thing that could possibly work - only consider it compatible if they are equal for starters). If the database schema hasn’t changed from my current build and latest, it should be safe to roll back if I find something unexpected and there isn’t a fix yet.
Enabling this sort of thing through the upgrade (er… upgrade/downgrade) UI would take some effort, but might be worth it. It would also make it more clear how to safely switch “tracks” from beta to tests-passed and vice versa.
This feature idea may not be a top priority for the team, but if there’s openness to it and interest, perhaps someone with this concern in the community could explore it.
(also… maybe I’m oversimplifying and there are other forward/backward compatibility issues aside from the database schema, but I’m not aware of any obvious ones)
In my relatively less mature experience, we’ve somehow accustomed our structure of running all of the 3 forums for our network on discourse under the self hosted environment just for the sake of flexibility and that we had ample of server capacity available to host two relatively moderate and one small forum. however, updates (as we’re discussing here) have been the trivia so after brainstorming for a few months, I Decided to set up replica test instances for all the three of our forums and here is what I do!
All of the three forums had different plugins and whenever there is a demand for a plugin it will be tested into the sandbox first and if everything goes alright then it is pushed to the public server.
for this, after all the testing, I’d simply drop a text to the admin of that forum via a Slack group and they will run the Install+Upgrade.
if there is something that is breaking one forum but others are working fine, we try to identify if that is because of a plugin or because of core & then github changelog.
once there is a fix available (it’s usually a matter of hours as meta has those bugs already fixed and the tests-passed branch should have them merged shortly afterwards) so then there is another upgrade either carried out by me or by the respective forum admin.
we do not try to bring uniformity across all the instances and rather look out for what works stable and then do a scheduled upgrade every sunday to the latest tests-passed builds as it’s usually the case, discourse doesn’t push much of commits on a sunday so it’s a strong possibility that we’re good to go for the coming week unless there is some major bug identified.
this approach has usually worked for us and seems like a viable option to any full time sysadmin who is okay dedicating an hour or two to the servers every sunday.
Ps: we also do upgrade the containers and hosts every 15 days Just to make sure they’re patched for the latest versions of stable software in apt repo.