Upgrade from within China fails due to git issues

We are running a Discourse instance on an Aliyun/Alibaba Ubuntu 20.04 server, and as with everything related to Git, we face issues due to the Great Firewall. Upgrading manually with launcher rebuild app most of the time fails due to GnuTLS errors (various types). It’s not related to the Git versions installed on the server, it’s in fact related to the handshake handling inside the GFW; of course we do not understand the details, but several sources discuss this matter in detail. So compiling Git manually with OpenSSL instead is also not an option.

Sometimes the pull process makes it past the core, even manages to clone the Docker Manager plugin, but after 2-3 plugin pulls, there usually is a timeout or another error.

Example:

$ ./launcher rebuild app
Ensuring launcher is up to date
Fetching origin
Launcher is up-to-date
Stopping old container
+ /usr/bin/docker stop -t 60 app
app
cd /pups && git pull && git checkout v1.0.3 && /pups/bin/pups --stdin
fatal: unable to access 'https://github.com/discourse/pups.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated.
76630913bae18d6b45b6b3ecc3ec390c1e69222a493f2ecf424ee06adf9d1002
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.

This one is also rather common:

fatal: unable to access 'https://github.com/discourse/discourse.git/': GnuTLS recv error (-54): Error in the pull function.

potential solution 1
Usually when cloning from GitHub, doing so via SSH instead of HTTPS gives better results or does not fail, but due to Discourse’s unique rebuild task, I have no clue where to configure what, so that the launcher pulls via SSH instead of HTTPS. Is it possible to set up the Discourse instance to do so?

potential solution 2
As another option, I have a SOCKS5 proxy available to circumvent the GFW for watching porn to access blocked resources from inside China, and I do know Git can be configured to use the socks:// protocol, but I unfortunately do not understand how and where to set up the configuration in Discourse, so that the pull processes of the Discourse launcher can use the proxy. I would like to avoid to do this with git config --global for the root user, but rather have this information in a config for the Discourse repos. Can anybody point me to how to achieve this?

It’s cumbersome, as we use this Discourse instance in our intranet and right now our instance is basically dead for more than a month, which of course has severe impact on our operations.

2 Likes

Does passing the proxy environment variables in the app.yml under the env section make it work?
And here is a solution for Rubygem under GFW:

Did you see Replace rubygems.org with taobao mirror to resolve network error in China

Thank you very much. The Ruby gems do not cause issues, as I have incorporated the template mentioned in your post in our app.yml from the beginning, which works like a charm.

It is about cloning the main repository and the plugin repositories.

I need to check out the env variables for Git flags, unfortunately I suck at Docker and especially at docker-compose files. Do you have any source to point me to?

Discourse didn’t use docker-compose as I know.

I believe add the below command to the before_web hook can make it work, like what web.china.template.yml does.

git config --global http.proxy socks5://yourproxy:port

and if you don’t need proxy anymore after build, add below to the after_web hook:

git config --global --unset http.proxy

All hooks are running inside the container so I don’t think it would be a problem

1 Like
2 Likes

Another proof for how stupid I am about Docker. Yep, it’s obviously not a docker-compose file. Is it called “Docker file”? Or does that term refer to the config.json? Whatever, your advice pointed me to the right direction, just the hook should be called before_code instead of before_web.

In short: Set up a socks5 with shadowsocks-libev, listen on the local machine on 172.17.0.1 (not localhost), pass on the proxy info as in your message, rebuild app.

I will write an in-depth guide here, as I assume there are more people out there going through this painful experience. Right now I am still facing issues with theme component repositories, so my rebuild has not succeeded yet, but I have passed at least all the plugin fetches.

Slightly irrelevant to the very topic, but the pain I am facing is not to be able to start the app, as the existing redis-server config - which we have on another machine - does not match the reality of the current app status. So I can’t start the container and deactivate the theme components via the GUI, which face a timeout when cloning them.

Thank you very much for pointing me to your explanation, but I would like to add a few remarks, as the example does not fit perfectly.

  1. I don’t understand this, sorry? The env command gives me a lot of info, but none related to my gitconfig
  2. As I don’t understand 1), I couldn’t figure out which variables to pass on. I also didn’t add the git flags to the env section in app.yml, but called them via a hook.
  3. This was not necessary, as I don’t want the whole container to go through the socks, but only the git fetch process, but I suppose this point was more particular about the original use case in the thread you referred to.

But thanks again, your input pointed me into the right direction. Thumbs up for the Discourse team! :ok_hand:

1 Like

Did you purchase aliyun and choose region within China mainland?

Hk/ international aliyun will not have this issue

1 Like

Also, may be you already discussed this detail but in case you didn’t find this, here is a script that you can run to instll git via openssl

Painless manual upgrade from within China

steps

  1. create SOCKS5 proxy outside of China
  2. set up and configure proxy connection on CN server
  3. create a template for simpler editing
  4. add git proxy settings to template
  5. include template in app.yml
  6. rebuild app

1 - remote SOCKS5

For the ease of use (and their friendly pricing) I recommend setting up a Digital Ocean server in e.g. Singapore. Just use a standard Ubuntu server, go through all the basic security requirements configurations (SSH keypairs, UFW, et cetera), then install Shadowsocks:

on remote machine
$ sudo apt install shadowsocks-libev

Configure the proxy settings:

$ cd /etc/shadowsocks-libev

# I like to keep the original files
$ sudo cp config.json orig.config.json
$ sudo nano config.json

Pay close attention to timeout and method:

{
    "server":"123.123.123.123", # remote server IP
    "server_port":8400, # up to you
    "local_port":1080,
    "password":"Swordfish", 
    "timeout":600, # <= essential!
    "method":"chacha20-ietf-poly1305"
}

Make sure to double-check all the settings in the systemd configuration (/lib/systemd/system/shadowsocks-libev-local@.service). Enable the shadowsocks-libev-local@.service, reboot, check for running service.

2 - set up the proxy connection on the CN server

on Discourse machine

$ sudo apt install shadowsocks-libev

If you are on Aliyun, search for the Firewall settings in their weird console and check for the respective port settings.

You do not need to fuzz around with the systemd settings on the client machine, but keep separate config files for docker and regular use, as you might want to use the SOCKS5 proxy outside the context of docker, thus you would want to use 127.0.0.1 instead of the Docker accessible network addresses.

$ cd /etc/shadowsocks-libev
$ sudo cp config.json local.json
$ sudo cp config.json docker.json

adapt the config to something similar like this

$ sudo nano local.json

{
    "server":["123.123.123.123"], # the remote machine's IP
    "mode":"tcp_and_udp", # this annotation is different due to different shadowsocks-libev versions in my set up
    "server_port":8400,
    "local_address":"127.0.0.1",
    "local_port":1080,
    "password":"Swordfish",
    "timeout":600, # <= make sure about that
    "method":"chacha20-ietf-poly1305"
}

For the sake of convenience, let us add an alias to our .bashrc :

$ nano ~/.bashrc

# paste
alias dockershadow='ss-local -c /etc/shadowsocks-libev/local.json'

adapt the other config to let Docker go through the host machine’s network

$ sudo nano docker.json

{
    "server":["123.123.123.123"],
    "mode":"tcp_and_udp",
    "server_port":8400,
    "local_address":"172.17.0.1",
    "local_port":1080,
    "password":"Swordfish",
    "timeout":600,
    "method":"chacha20-ietf-poly1305"
}

set the alias for using the Docker specific config:

alias dockershadow='ss-local -c /etc/shadowsocks-libev/docker.json'

3 & 4 - create a template for keeping your app.yml tidy

This is absolutely optional and depends on your taste; I prefer keeping the app.yml readable and short, and instead maintain components elsewhere. Give it any name according to your taste, I chose web.git.template.yml.

$ nano templates/web.git.template.yml
# paste:

hooks:
  before_code:
    - exec:
       cmd:
         - git config --global http.proxy socks5://172.17.0.1:1080
         - git config --global https.proxy socks5://172.17.0.1:1080
         - git config --global https.sslVerify = false 

# optional
  after_code:
    - exec:
       cmd:
         - git config --global --unset http.proxy
         - git config --global --unset https.proxy
         - git config --global --unset https.sslVerify

I have tested it with the hook after_web, but that didn’t do the trick.

5 - adapt the app.yml

Call the template in your app.yml:

$ cd /<discourse dir>
$ sudo nano containers/app.yml


templates:
  - "templates/web.template.yml"
  - "templates/web.china.template.yml"
  - "templates/web.ratelimited.template.yml"
  - "templates/web.socketed.template.yml"
  - "templates/web.git.template.yml"

Your template section most likely looks different, just make sure to include web.china and the web.git-blabla (or whatever you named it) templates.
Do not expose 1080:1080 in your app.yml!

6 - rebuilding

Before rebuilding verify that your proxy settings are workable when cloning with git.

$ git config --global http.proxy socks5://172.17.0.1:1080
$ git config --global https.proxy socks5://172.17.0.1:1080
$ git config --global https.sslVerify = false 

This of course adds the proxy flags to your user’s .gitconfig in the home directory, so pay attention to remove this after testing.
Select a random large repo on Github with a ton of files and check your cloning speed. If your configuration is correct, you should be able to clone with ~12-15 mb/s, depending on your Aliyun setup. If your connection speed slowly crawls up from 200 kb/s to about 10 mb/s, then your efforts were not successful.

finally rebuild:

$ cd /<discourse directory>

# run the proxy by using the alias we have set before
$ dockershadow
$ ./launcher rebuild app

The rebuild process will fail often, so you need patience (and possibly Baijiu). The fewer plugins you have set in your app.yml, the more likely it is that your rebuild will succeed.

7 - remarks

I still consider this as a workaround, not a production ready procedure, so maybe somebody has an idea of how to mirror the GitHub repo in China, to make this less painful. And as we all know, the intransparent mechanisms inside the GFW keep changing.

Of course a SOCKS5 proxy is just one of many options, but Iike to have multi-use solutions at hand.

If anybody has an idea how to make this workaround production-ready, I appreciate your input. Discourse is fantastic software, but I assume one of the reasons for not being widely used in China is the cumbersome installation and maintenance processes. Trying to upgrade via GUI gave me a 100% failure rate within the last year, no matter which timeout settings I had configured in my nGinx reverse proxy.

Chinese translation will follow

2 Likes

Exactly. Since the main purpose of this instance is being part of a company intranet framework, HK unfortunately is not an option due to latency. Also, the (upcoming) client facing instances will be dealing with mainland users - as soon as I have figured out the Weixin authentication, so I need a workable solution for the mainland Aliyun zones.

Thank you very much, I have checked several guides around this, but as the main cause of the issue is not the TLS authentication of Git per se, but the handshake check in the GFW’s package inspection processes, I abstained from this approach. Compiling git with openssl can open doors to a new world full of pain, as I had read.

Most theme components are also pulled from GitHub when building(or container start?), so possibly there is another hook to add the git proxy that may help, and don’t remove the proxy if you want it to work with the GUI. And redis-server doesn’t seem can cause it.

The redis-server was just another issue, that had added complexity to the failure of rebuilding. It was kind of a loop: The external redis configuration had changed, while the pre-rebuild app state needed that specific redis-config to start. It couldn’t rebuild though, as the theme component fetching didn’t work.
I was lucky though after 20-20 rebuild runs and finally the theme components updates got fetched.

In the overall app design context, it would be nice to have documentation on how to rebuild in a “safe mode”, i.e. rebuilding the app independently from plugins or themes. I couldn’t find a hook to process the theme components, or how to deactivate (in contrast do uninstall) plugins either, which was a bummer.

Edit: Whoa, now I see the link to the safe mode. I couldn’t find that before (no Google for us in China, try to find anything relevant with Bing…). Gosh, that would have helped a lot!

So you specify a managed redis-server like Discourse with DO managed Redis - #3 by Falco, and failed to rebuild?

The redis problem was of secondary importance, but added significant complexity to the overall git problem. As you can tell from my in-depth post above, I have resolved the issues.

And yes, we had hooked a distributed redis cluster to our Discourse from the very beginning. It’s not managed though, just on other machines.

A failure in connection to the redis-server had caused the app not to start, thus I couldn’t deactivate the theme components in the GUI.
Applying a new redis configuration required an app rebuild, which couldn’t be done due to failure to fetch from the GitHub repos.