2.7.0.beta2 upgrade failed

My apologies @geoff777 - without much thinking I reclassified your “like” comment (my intent was to keep the number of my posts small, thus lowering the chance of miscommunication)


I believe that I have enough space:

 System information as of Fri Jan 22 20:56:56 UTC 2021

  System load:  0.02               Users logged in:          0
  Usage of /:   39.7% of 24.06GB   IPv4 address for docker0: 172.17.0.1
  Memory usage: 50%                IPv4 address for eth0:    xxx.xxx.xxx.xxx
  Swap usage:   1%                 IPv4 address for eth0:    
  Processes:    107                IPv4 address for eth1:    

However, I suspect that my problem has to do with the Digital Ocean console - it times out very quickly , so it is possible that my update succeeded and I am just not aware of that. Will contact DO support and report my findings back here.

Thanks.

You can check on your forum’s dashboard that you have successfullu upgraded.
I hope it worked.

Your wish / hope @geoff777 did not help. I tried to login - and the discourse server was not responding.

I decided to run discourse-doctor from the DO console, launched from PuTTY tool (I am running on a Windows 10 machine), - and my console stopped at the same place.

Note this beginning of this run: - app not running!

root@discourse-server:/var/discourse# ./discourse-doctor
DISCOURSE DOCTOR Fri Jan 22 22:14:45 UTC 2021
OS: Linux discourse-server 5.4.0-62-generic #70-Ubuntu SMP Tue Jan 12 12:45:47 U              TC 2021 x86_64 x86_64 x86_64 GNU/Linux


Found containers/app.yml

==================== YML SETTINGS ====================
DISCOURSE_HOSTNAME=forum.congral.tech
SMTP_ADDRESS=smtp.mailgun.org
DEVELOPER_EMAILS=admin@congral.com
SMTP_PASSWORD=3a22be2a4ba5ce9b0865199dc7083871-xxxxxx
SMTP_PORT=587
SMTP_USER_NAME=postmaster@forum.congral.tech
LETSENCRYPT_ACCOUNT_EMAIL=nikolaj.ivancic@congral.com

==================== DOCKER INFO ====================
DOCKER VERSION: Docker version 20.10.2, build 2291f61

DOCKER PROCESSES (docker ps -a)

CONTAINER ID   IMAGE                              COMMAND                  CREAT              ED          STATUS                      PORTS     NAMES
4e0150995f6a   discourse/base:2.0.20201221-2020   "/bin/bash -c 'cd /p…"   16 mi              nutes ago   Exited (1) 14 minutes ago             mystifying_fermat
271aff6b3bce   discourse/base:2.0.20201221-2020   "/bin/bash -c 'cd /p…"   5 hou              rs ago      Exited (1) 5 hours ago                modest_brown
30ed32bab133   discourse/base:2.0.20201221-2020   "/bin/bash -c 'cd /p…"   5 hou              rs ago      Exited (1) 5 hours ago                laughing_lalande
add2d921333a   local_discourse/app                "/sbin/boot"             2 wee              ks ago      Exited (5) 5 hours ago                app

==================== SERIOUS PROBLEM!!!! ====================
app not running!
Attempting to rebuild
==================== REBUILD LOG ====================
Ensuring launcher is up to date
Fetching origin
Launcher is up-to-date
Stopping old container
+ /usr/bin/docker stop -t 60 app
...

Here is the complete log up to the point of failure, saved in my Github repository, to save the space here.

The fact that I repeated this upgrade several times and after each failure (reported to me as a “console network failure”) it is obvious that this upgrade kills the existing discourse instance.

Please advise. I am happy to pass you my certs in case you want to run this yourselves

Both times you got that PuTTY Fatal Error?

That seems like a putty problem, though I can’t imagine why.

Yes, @pfaffman I do get that same error. If the execution of the ./discourse-doctor causes some catastrophic failure is it not possible that this failure results with the PuTTY fatal error - at least for my own (remote) view of his failure?

It does not seem very likely, but I will create a support ticket for DO, hoping that they might have a better view of this problem.

I guess I world try the digital ocean console next (actually, I would use a terminal in Ubuntu, but that’s not what I’d recommend to you).

Since yesterday, I observed a lot of weird behavior. Before sharing them here, please do let me know if continuing this thread is useful for anyone (the alternative is that I am just stepping on my own d… and this is all a waste of everyone’s time). I found that:

  • resetting the root password (from DO control panel) resulted in the ability to use digital ocean’s console (as @pfaffman suggested above)
  • next I run the discourse-doctor in that console and it found nothing wrong (before that resetting event) the https://forum.congral.tech failed to respond - now all works just fine)
  • All of my attempts to upgrade discourse (like this one) failed multiple times (the PuTTY console showing Network Error as the reason), and today I can verify that the upgrade failed:
content="Discourse 2.7.0.beta1 - https://github.com/discourse/discourse version 1cf92310456fb6e6424f6b532770461c56378d53"

Changing the root password and then using digital ocean’s console is a significant change that might be of interest to Discourse team to understand better. Should I continue digging and sharing my findings here, @pfaffman ?

Could you be running out of RAM and then having oom killer knock off some random processes such as sshd, thus disconnecting you and causing problems?

There should be something in dmesg output, /var/log/messages or the journalctl output if oom killer is running.

1 Like

I doubt that running out of RAM is my problem, it’s rather me stubbornly using PuTTY console instead of the DO console.

So, I am going to switch to @pfaffman’s suggestion above, particularly since my discovery that PuTTY console gets disconnected even when I am using it for something completely different from Discourse management.

Ram could be it. Do you have swap?

1 Like

Have you tried adjusting the keep alive setting, something like this?

@pfaffman I did not specify anything - using all droplet defaults, relying on DO folks to ensure reasonable behavior.

First, I am going to try to set the keep-alive as @omarfilip suggested above, then I will look at the swap space situation, followed by the @pfaffman’s suggestion to use the DO original console (if the keep-alive setting will allow me to continue using the PuTTY console, I will use it as it’s a lot more user friendly than the DO’s equivalent.

1 Like

Having so many friendly people helping me, I wish to restate my reasons for trying to put together Ghost and Discourse: it’s in my view ideal tool for someone to write technical documents and offer the best support for discussing these documents. My plan is to address Identity and Account Management (IAM) using several interesting PaaS IAM providers; this subject is not sufficiently well documented (at least in my opinion, based on years of using such services myself).

In order to “beta test” my Ghost/Discourse integrated tool, I decided to describe all details of the creation process of creating and testing this tool. So, all people helping me should know that this effort is intended to help Discourse, Ghost and Digital Ocean community.

1 Like

If you used the Digital-Ocean 1-click install and not the Discourse official Standard Installation then you likely don’t have swap configured, so when you rebuild you run out of ram unless you have > 2GB.

You can try doing

cd /var/discourse
./discourse-setup

and it will create swap for you if it is needed.

If you want help with Digital Ocean’s one-click installation and want to “rely on DO folks to ensure reasonable behavior”, then you should rely on them to support you.

But since I’m already posting, an easier way to confirm that it’s not PuTTY that’s at fault (which might save a long time in fussing with PuTTY’s parameters for naught) is to try the console. If you haven’t run discourse-setup then I’m pretty sure that the issue is swap space.

@pfaffman I did use the discourse official installation process to the letter, including the use of PuTTY referenced in that official installation document.

It is possible that you derived the opinion how I used DO 1-click install and think that I should depend on DO:

If you want help with Digital Ocean’s one-click installation and want to “rely on DO folks to ensure reasonable behavior”, then you should rely on them to support you.

My reference to 1-clock install comes from the recent email from Discourse:

Hooray, a new version of Discourse is available!

Your version: 2.7.0.beta1
New version: 2.7.0.beta2

So, the current situation is that

  • I genuinely appreciate your help, Jay :revolving_hearts: knowing that you make your living from helping people with Discourse. Despite me not looking like a potential customer, you spent time pulling me out of the weeds.
  • I managed to install the 2.7.0.2 .beta2 upgrade, easy as a breeze, once I learned that the PuTTY is misbehaving regardless of the keep-alive settings I applied. So, I switched from SSH-based authentication to userid / password pair, logged in the droplet host, and run the ./launcher rebuild app command successfully.

Many thanks to everyone who provided parts of the solution.

1 Like

Oh! Mea culpa! So sorry!

Then you do have swap and my guess is all wrong. That’s too bad, because it was going to be an easy fix. :wink:

That’s a relief after I falsely accused you!

And it’s terrible to learn that putty is so bad. I don’t understand how it continues to be the recommended ssh client for windows.

I think that there’s now some client that’s part of that Linux subsystem thing, but the list version of windows I used regularly was windows 98.

Glad you got it sorted!

2 Likes

All modern operating systems ship with SSH clients out of the box, there should be no need for third party clients. Even on windows terminal I can just type SSH. It should work provided your windows is updated.

4 Likes

Oh my word. Really? Last I (thought I) looked it required some installation that looked hard.

And it can use a normal ssh key and not that pem nonsense?

This long story has a big happy-end and could be categorized as a storm in a teacup. I learned a lot about Discourse and as a consequence plan to stay around indefinitely. Here is the itemized happy-end description:

  1. My problem upgrading from Beta1 to Beta2 was manifested by the PuTTY console timing out. I interpreted this as a colossal crash in the Discourse upgrade task and spent a lot of time learning Discourse “internals” - a rabbit hole I am very happy to follow needlessly

  2. The solution to my problem is extremely simple (once you know where to “push”) - simple as 1, 2, 3 below
    image
    (the fact that I started with too big “keep-alive” interval, let me to believe how PuTTY is really a crappy software and spent a lot of time switching from SSH based droplet access to [id, password] based authentication needed for Digital Ocean’s own console (which is really bad). Note that this experiment completely rehabilitates the PuTTY tool.

  3. @Falco opened our “collective eyes” pointing out to simply use Windows 10’s built -in OpenSSH (thanks to Scott Hanselman).


Since I already promised to @codinghorror to write the best ever document presenting Discourse to the world as a thank you for his (and his team) help to get me understanding Discourse, @pfaffman let me make this @Falco 's suggestion be the first part of my document

4 Likes

I recommend using tmux, this way you can reconnect to the session running rebuild even if your client times out.

2 Likes