Letsencrypt not renewing


#1

It’s attempting to renew the cert, and verification apparently times out. This has been happening for several days. Here’s /shared/letsencrypt/acme.sh.log; I’m replacing encrypted material strings with “IBO*” and “mz2*”; and changing “https” to “hs” and “http” to “hp” so that I don’t overflow the new-user link limit.

[Tue Jul  4 07:38:05 UTC 2017] url='hs://acme-v01.api.letsencrypt.org/acme/challenge/IBO*'
[Tue Jul  4 07:38:05 UTC 2017] payload='{"resource": "challenge", "keyAuthorization": "mz2*"}'
[Tue Jul  4 07:38:05 UTC 2017] POST
[Tue Jul  4 07:38:05 UTC 2017] url='hs://acme-v01.api.letsencrypt.org/acme/challenge/lBO*'
[Tue Jul  4 07:38:05 UTC 2017] _CURL='curl -L --silent --dump-header /shared/letsencrypt/http.header '
[Tue Jul  4 07:38:06 UTC 2017] _ret='0'
[Tue Jul  4 07:38:06 UTC 2017] code='202'
[Tue Jul  4 07:38:06 UTC 2017] sleep 2 secs to verify
[Tue Jul  4 07:38:08 UTC 2017] checking
[Tue Jul  4 07:38:08 UTC 2017] GET
[Tue Jul  4 07:38:08 UTC 2017] url='hs://acme-v01.api.letsencrypt.org/acme/challenge/lBO*'
[Tue Jul  4 07:38:08 UTC 2017] timeout

then three instances of

[Tue Jul  4 07:38:06 UTC 2017] sleep 2 secs to verify
[Tue Jul  4 07:38:08 UTC 2017] checking
[Tue Jul  4 07:38:08 UTC 2017] GET
[Tue Jul  4 07:38:08 UTC 2017] url='hs://acme-v01.api.letsencrypt.org/acme/challenge/lBO*'
[Tue Jul  4 07:38:08 UTC 2017] timeout
[Tue Jul  4 07:38:08 UTC 2017] _CURL='curl -L --silent --dump-header /shared/letsencrypt/http.header '
[Tue Jul  4 07:38:08 UTC 2017] ret='0'
[Tue Jul  4 07:38:08 UTC 2017] Pending

and finally

[Tue Jul 4 07:38:13 UTC 2017] discussion.ambridgereporter.org.uk:Verify error:Fetching hp://discussion.ambridgereporter.org.uk/.well-known/acme-challenge/mz2*: Timeout

Oddly, the cert/key files are timestamped to match this, but they contain the old cert/key.

More information on request. This is my own server, and the Discourse nginx instance is exposed directly.


(Jay Pfaffman) #2

Are these logs from rebuilding? What version are you on latest?


#3

That was from a “/var/discourse/launcher rebuild app” this morning.

Discourse claims to be 1.9.0.beta2.

Automated renewal from a couple of days ago similarly reports timeouts, and sometimes a code 400. Munged as before:

[Mon Jul  3 00:00:10 UTC 2017] writing token:n9d* to /var/www/discourse/public/.well-known/acme-challenge/n9d*
[Mon Jul  3 00:00:10 UTC 2017] Changing owner/group of .well-known to discourse:discourse
[Mon Jul  3 00:00:10 UTC 2017] url='hs://acme-v01.api.letsencrypt.org/acme/challenge/6TG*'
[Mon Jul  3 00:00:10 UTC 2017] payload='{"resource": "challenge", "keyAuthorization": "n9d*"}'
[Mon Jul  3 00:00:10 UTC 2017] POST
[Mon Jul  3 00:00:10 UTC 2017] url='hs://acme-v01.api.letsencrypt.org/acme/challenge/6TG*'
[Mon Jul  3 00:00:10 UTC 2017] _CURL='curl -L --silent --dump-header /shared/letsencrypt/http.header '
[Mon Jul  3 00:00:18 UTC 2017] _ret='0'
[Mon Jul  3 00:00:18 UTC 2017] code='202'
[Mon Jul  3 00:00:18 UTC 2017] sleep 2 secs to verify
[Mon Jul  3 00:00:20 UTC 2017] checking
[Mon Jul  3 00:00:20 UTC 2017] GET
[Mon Jul  3 00:00:20 UTC 2017] url='hs://acme-v01.api.letsencrypt.org/acme/challenge/6TG*'
[Mon Jul  3 00:00:20 UTC 2017] timeout
[Mon Jul  3 00:00:20 UTC 2017] _CURL='curl -L --silent --dump-header /shared/letsencrypt/http.header '
[Mon Jul  3 00:00:23 UTC 2017] ret='0'
[Mon Jul  3 00:00:23 UTC 2017] Pending
[Mon Jul  3 00:00:23 UTC 2017] sleep 2 secs to verify
[Mon Jul  3 00:00:25 UTC 2017] checking
[Mon Jul  3 00:00:25 UTC 2017] GET
[Mon Jul  3 00:00:25 UTC 2017] url='hs://acme-v01.api.letsencrypt.org/acme/challenge/6TG*'
[Mon Jul  3 00:00:25 UTC 2017] timeout
[Mon Jul  3 00:00:25 UTC 2017] _CURL='curl -L --silent --dump-header /shared/letsencrypt/http.header '
[Mon Jul  3 00:00:29 UTC 2017] ret='0'
[Mon Jul  3 00:00:29 UTC 2017] discussion.ambridgereporter.org.uk:Verify error:Fetching hp://discussion.ambridgereporter.org.uk/.well-known/acme-challenge/n9d*: Timeout
[Mon Jul  3 00:00:29 UTC 2017] pid
[Mon Jul  3 00:00:29 UTC 2017] No need to restore nginx, skip.
[Mon Jul  3 00:00:29 UTC 2017] _clearupdns
[Mon Jul  3 00:00:29 UTC 2017] skip dns.
[Mon Jul  3 00:00:29 UTC 2017] _on_issue_err
[Mon Jul  3 00:00:29 UTC 2017] Please check log file for more details: /shared/letsencrypt/acme.sh.log
[Mon Jul  3 00:00:29 UTC 2017] url='hs://acme-v01.api.letsencrypt.org/acme/challenge/6TG*'
[Mon Jul  3 00:00:29 UTC 2017] payload='{"resource": "challenge", "keyAuthorization": "n9d*"}'
[Mon Jul  3 00:00:29 UTC 2017] POST
[Mon Jul  3 00:00:29 UTC 2017] url='hs://acme-v01.api.letsencrypt.org/acme/challenge/6TG*'
[Mon Jul  3 00:00:29 UTC 2017] _CURL='curl -L --silent --dump-header /shared/letsencrypt/http.header '
[Mon Jul  3 00:00:32 UTC 2017] _ret='0'
[Mon Jul  3 00:00:32 UTC 2017] code='400'
[Mon Jul  3 00:00:32 UTC 2017] Return code: 1
[Mon Jul  3 00:00:32 UTC 2017] Error renew discussion.ambridgereporter.org.uk.
[Mon Jul  3 00:00:32 UTC 2017] ===End cron===

#4

Still no joy. I tried installing netcat inside the docker container and stopping nginx, but apart from suppressing the “install netcat” warning it didn’t help:

I’m changing the server name to “DAOU” so that I don’t trip the too-many-links warning.

(root prompt) /shared/letsencrypt/acme.sh --cron --home /shared/letsencrypt/
[Wed Jul 12 10:31:02 UTC 2017] ===Starting cron===
[Wed Jul 12 10:31:02 UTC 2017] Installing from online archive.
[Wed Jul 12 10:31:02 UTC 2017] Downloading https://github.com/Neilpang/acme.sh/archive/master.tar.gz
[Wed Jul 12 10:31:04 UTC 2017] Extracting master.tar.gz
[Wed Jul 12 10:31:04 UTC 2017] Installing to /shared/letsencrypt/
[Wed Jul 12 10:31:04 UTC 2017] Installed to /shared/letsencrypt//acme.sh
[Wed Jul 12 10:31:04 UTC 2017] Installing alias to '/root/.profile'
[Wed Jul 12 10:31:04 UTC 2017] OK, Close and reopen your terminal to start using acme.sh
[Wed Jul 12 10:31:04 UTC 2017] Good, bash is found, so change the shebang to use bash as preferred.
[Wed Jul 12 10:31:04 UTC 2017] OK
[Wed Jul 12 10:31:04 UTC 2017] Install success!
[Wed Jul 12 10:31:04 UTC 2017] Upgrade success!
[Wed Jul 12 10:31:04 UTC 2017] Auto upgraded to: 2.7.3
[Wed Jul 12 10:31:04 UTC 2017] Renew: 'DAOU'
[Wed Jul 12 10:31:04 UTC 2017] Single domain='DAOU'
[Wed Jul 12 10:31:04 UTC 2017] Getting domain auth token for each domain
[Wed Jul 12 10:31:04 UTC 2017] Getting webroot for domain='DAOU'
[Wed Jul 12 10:31:04 UTC 2017] Getting new-authz for domain='DAOU'
[Wed Jul 12 10:31:06 UTC 2017] The new-authz request is ok.
[Wed Jul 12 10:31:06 UTC 2017] Verifying:DAOU
[Wed Jul 12 10:31:10 UTC 2017] Pending
[Wed Jul 12 10:31:15 UTC 2017] Pending
[Wed Jul 12 10:31:23 UTC 2017] DAOU:Verify error:Fetching hp://DAOU/.well-known/acme-challenge/t1L*: Timeout
[Wed Jul 12 10:31:23 UTC 2017] Please check log file for more details: /shared/letsencrypt/acme.sh.log
[Wed Jul 12 10:31:24 UTC 2017] Error renew DAOU.
[Wed Jul 12 10:31:24 UTC 2017] ===End cron===

Anyone have a clue? In a normal system I could try a different cert-renewal scheme, like the one I wrote that works on all my other servers, but here I’m stuck with acme unless I move completely outside the discourse ecosystem and run it manually.


#5

Aha!

I tried moving outside discourse and using my own client, which does proper logging, and it appears that (a) there’s an IPv6 routeing problem and (b) LE is trying v6 and, rather than failing over to v4 as one might expect and as test clients do, is simply timing out on the v6. While I try to fix the v6 problem, I’ve temporarily removed the v6 address from DNS, so we’ll see if tomorrow morning’s auto renewal works.


#6

Yes, that did the trick.