Letsencrypt issues

Apparently some changes have occurred with the letsencrypt service as of August 1st. My certificate didn’t expire until today, so I was only affected by the changes today. The scripts that discourse uses to manage the letsencrypt service were updated to use a new service called ZeroSSL by default instead of letsencrypt. Unfortunately, ZeroSSL apparently requires registration with an email address before it will work, so this morning when my existing certificate expired, the site wouldn’t work.

After quite a bit of investigation I figured out what the issue was, and while I’ve sort of circumvented it, I don’t think what I did to fix it is necessarily the “right” fix. First, here are the error messages that I was getting in the log:

[Wed 01 Sep 2021 05:33:58 PM UTC] Reload error for :
[Wed 01 Sep 2021 05:34:03 PM UTC] Using CA: https://acme.zerossl.com/v2/DV90
[Wed 01 Sep 2021 05:34:04 PM UTC] No EAB credentials found for ZeroSSL, let's get one
[Wed 01 Sep 2021 05:34:04 PM UTC] acme.sh is using ZeroSSL as default CA now.
[Wed 01 Sep 2021 05:34:04 PM UTC] Please update your account with an email address first.
[Wed 01 Sep 2021 05:34:04 PM UTC] acme.sh --register-account -m my@example.com
[Wed 01 Sep 2021 05:34:04 PM UTC] See: https://github.com/acmesh-official/acme.sh/wiki/ZeroSSL.com-CA
[Wed 01 Sep 2021 05:34:04 PM UTC] Please check log file for more details: /shared/letsencrypt/acme.sh.log

I tried manually registering the account from within the container, and while it said it registered, when I restarted the container I got the same error. So then I tracked down the scripts and the one that does this is inside the container at /etc/runit/1.d/letsencrypt . Here is the original script:

#!/bin/bash
/usr/sbin/nginx -c /etc/nginx/letsencrypt.conf

issue_cert() {
  LE_WORKING_DIR="${LETSENCRYPT_DIR}" /shared/letsencrypt/acme.sh --issue $2 -d mudtoemanor.baronshire.org --keylength $1 -w /var/www/discourse/public
}

cert_exists() {
  [[ "$(cd /shared/letsencrypt/mudtoemanor.baronshire.org$1 && openssl verify -CAfile ca.cer fullchain.cer | grep "OK")" ]]
}

########################################################
# RSA cert
########################################################
issue_cert "4096"

if ! cert_exists ""; then
  # Try to issue the cert again if something goes wrong
  issue_cert "4096" "--force"
fi

LE_WORKING_DIR="${LETSENCRYPT_DIR}" /shared/letsencrypt/acme.sh \
  --installcert \
  -d mudtoemanor.baronshire.org \
  --fullchainpath /shared/ssl/mudtoemanor.baronshire.org.cer \
  --keypath /shared/ssl/mudtoemanor.baronshire.org.key \
  --reloadcmd "sv reload nginx"

########################################################
# ECDSA cert
########################################################
issue_cert "ec-256"

if ! cert_exists "_ecc"; then
  # Try to issue the cert again if something goes wrong
  issue_cert "ec-256" "--force"
fi

LE_WORKING_DIR="${LETSENCRYPT_DIR}" /shared/letsencrypt/acme.sh \
  --installcert --ecc \
  -d mudtoemanor.baronshire.org \
  --fullchainpath /shared/ssl/mudtoemanor.baronshire.org_ecc.cer \
  --keypath /shared/ssl/mudtoemanor.baronshire.org_ecc.key \
  --reloadcmd "sv reload nginx"

if cert_exists "" || cert_exists "_ecc"; then
  grep -q 'force_https' "/var/www/discourse/config/discourse.conf" || echo "force_https = 'true'" >> "/var/www/discourse/config/discourse.conf"
fi

/usr/sbin/nginx -c /etc/nginx/letsencrypt.conf -s stop

As the email registration didn’t work I decided to put in the option to have the acme.sh script just go back to using the original letsencrypt instead of using ZeroSSL, which instructions were provided on how to do at this website: https://community.letsencrypt.org/t/the-acme-sh-will-change-default-ca-to-zerossl-on-august-1st-2021/144052

First I tried putting this as the third line in the script right after the “/usr/sbin/nginx -c /etc/nginx/letsencrypt.conf” statement:

/shared/letsencrypt/acme.sh --set-default-ca --server letsencrypt

The log showed a message that the default was changed, but right after that I got the same error about no email registered for ZeroSSL. There was a delay of about 6 seconds in the log between that message and the error messges, which makes me think that the acme.sh script must be maintaining state information via environment variables and that the subsequent executions of the command were running in a different context and thus lost the variable. So what I ended up having to do is change all the invocations of acme.sh in the letsencrypt script to add this operand to the command “–server letsencrypt”. When I did that and restarted the container a new certificate was generated by letsencrypt instead of ZeroSSL and the site came back up.

Here is the modified version of the letsencrypt script that I used:

#!/bin/bash
/usr/sbin/nginx -c /etc/nginx/letsencrypt.conf
/shared/letsencrypt/acme.sh --set-default-ca --server letsencrypt

issue_cert() {
  LE_WORKING_DIR="${LETSENCRYPT_DIR}" /shared/letsencrypt/acme.sh --server letsencrypt --issue $2 -d mudtoemanor.baronshire.org --keylength $1 -w /var/www/discourse/public
}

cert_exists() {
  [[ "$(cd /shared/letsencrypt/mudtoemanor.baronshire.org$1 && openssl verify -CAfile ca.cer fullchain.cer | grep "OK")" ]]
}

########################################################
# RSA cert
########################################################
issue_cert "4096"

if ! cert_exists ""; then
  # Try to issue the cert again if something goes wrong
  issue_cert "4096" "--force"
fi

LE_WORKING_DIR="${LETSENCRYPT_DIR}" /shared/letsencrypt/acme.sh \
  --installcert \
  -d mudtoemanor.baronshire.org \
  --fullchainpath /shared/ssl/mudtoemanor.baronshire.org.cer \
  --keypath /shared/ssl/mudtoemanor.baronshire.org.key \
  --server letsencrypt \
  --reloadcmd "sv reload nginx"

########################################################
# ECDSA cert
########################################################
issue_cert "ec-256"

if ! cert_exists "_ecc"; then
  # Try to issue the cert again if something goes wrong
  issue_cert "ec-256" "--force"
fi

LE_WORKING_DIR="${LETSENCRYPT_DIR}" /shared/letsencrypt/acme.sh \
  --installcert --ecc \
  -d mudtoemanor.baronshire.org \
  --fullchainpath /shared/ssl/mudtoemanor.baronshire.org_ecc.cer \
  --keypath /shared/ssl/mudtoemanor.baronshire.org_ecc.key \
  --server letsencrypt \
  --reloadcmd "sv reload nginx"

if cert_exists "" || cert_exists "_ecc"; then
  grep -q 'force_https' "/var/www/discourse/config/discourse.conf" || echo "force_https = 'true'" >> "/var/www/discourse/config/discourse.conf"
fi

/usr/sbin/nginx -c /etc/nginx/letsencrypt.conf -s stop

I also left the original command I had inserted to try to set the default service just in case, but it may not be necessary.

So what needs to happen is that this script needs to be changed to either explicitly define the letsencrypt service to be used on all invocations of acme.sh, or to be able to figure out how acme.sh is saving state information so a single invocation of the default command functions, or lastly put support in for ZeroSSL and the necessity to collect and save an email address.

I’m assuming that what I did will be overwritten the next time I upgrade versions and I’ll have to do it again, if this isn’t done.

If I’ve missed anything here or this was somehow addressed in another way that doesn’t require a change to the scripts, please let me know.

1 Like

just add your email address in the config file.

/var/discourse/containers/app.yml

## If you added the Lets Encrypt template, uncomment below to get a free SSL certificate
 LETSENCRYPT_ACCOUNT_EMAIL: gavin@truecode.co.za

do ./launcher rebuild app

then you should be good to go

We fixed this a couple of weeks ago, did you try a rebuild?

3 Likes

No, I haven’t. I’m running build 2.8.0.beta4, and it says that I’m up to date. Do I have to do a rebuild after updates, or are things being downloaded on the fly between updates that would require a rebuild? I do have an email address in LETSENCRYPT_ACCOUNT_EMAIL .

I can do a rebuild. I’ll have to put back the original version of the letsencrypt script first, or will the rebuild refresh it? Am I supposed to do a rebuild after every update, or is there some method to notify me when that’s necessary? To date I’ve never done a rebuild after I got the site initially built about six months ago, although I have been applying updates.

While you can update the app from the web interface, from time to time we have to ship an update to the underlying environment Discourse runs (the container image). Those environmental updates require a rebuild.

For undoing your local changes, you may be able to stash those with

cd /var/discourse
git stash
./launcher rebuild app
1 Like

Are these “envoronmental updates” the “docker-manager” updates that I see after clicking on the “perform upgrades here” link on the admin settings page? Guess I’m a little confused now about which updates I’m supposed to get from where and when I know to do a rebuild. Although I see an upgrade on that page, the previous page still says that I’m up to date.

No, to get those you must access the server via command line and issue a rebuild.

OK. I did the rebuild but I’m not certain if it’s going to work or not in 90 days as the certificate hasn’t expired now, so it’s not trying to get a new one. I checked the script letsencrypt and it shows a new modified date (today) but when I compare it to the old version of the script (the original, not after my modifications), they are identical. I tried invoking the script manually from within the container using --force per one of the output message comments, but that didn’t work. So at this point I’m just going to have to take it on faith that when the certificate expires it will renew OK.