Instalação gerou certificado corrompido, de zero bytes

Vamos chamar meu site de example.com para este post; ele usa um FQDN e tudo funcionava antes. Devido a algumas depurações de problemas de e-mail, tenho feito várias reimplantações e reconstruções nas últimas 24 horas.

Instalei o Discourse seguindo discourse/docs/INSTALL-cloud.md at main · discourse/discourse · GitHub

Agora meu site não carrega (sem resposta nas portas 80 ou 443). O log do nginx diz:

2019/05/11 14:49:14 [emerg] 7866#7866: cannot load certificate "/shared/ssl/example.com.cer": PEM_read_bio_X509_AUX() failed (SSL: error:0906D06C:PEM routines:PEM_read_bio:no start line:Expecting: TRUSTED CERTIFICATE)

Entrando no aplicativo e verificando o arquivo, ele está vazio, com zero/0 bytes:

-rw-r--r-- 1 root root    0 May 11 13:59 /shared/ssl/example.com.cer
-rw------- 1 root root 3243 May 11 13:59 /shared/ssl/example.com.key

Estou perdido agora e não encontrei soluções, então pergunto aqui:

Posso acionar uma renovação do certificado usando alguma ferramenta integrada da configuração Docker do Discourse? Se não, posso fazer algo uma vez para corrigir isso e depois ter certeza de que as renovações serão tratadas automaticamente pela configuração, conforme pretendido?

Existe algum log de instalação? Pesquisei, mas não encontrei menção. Espero encontrar alguns erros relacionados ao Let’s Encrypt e gostaria de investigar. Talvez eu tenha atingido algum limite.

1 curtida

Are both the let’s encrypt and ssl templates loaded in your app.yml?

The easiest thing is usually to delete (or rename) your app.yml, see that the container isn’t running, and run discourse-setup again

Are you behind a reverse proxy or something like Cloudflare?

Yes.

Did that, same result. :frowning:

Nope, just a normal cloud server.

It’s possible that you reached keys encrypt rate limits, though if the cert was there it wouldn’t be trying again.

You might try removing the ssl and letsencrypt directories in shared/standalone

Done, rebuilt, same.

Is there really no log of the initial installation stored somewhere?

Did you unpublish :80 in some way? Either by commenting out the line in the expose: block, altering the firewall on the server, or something along those lines?

Nope, I did nothing manually to the configuration. The server is a plain, updated Ubuntu 18.04.

Ha! I did not know that ./launcher logs app would show much more than the production or nginx log.

Look at this beauty, I got into rate-limiting indeed:

run-parts: executing /etc/runit/1.d/letsencrypt
[Sat May 11 22:58:13 UTC 2019] Create account key ok.
[Sat May 11 22:58:13 UTC 2019] Registering account
[Sat May 11 22:58:15 UTC 2019] Registered
[Sat May 11 22:58:15 UTC 2019] ACCOUNT_THUMBPRINT='STRIPPED'
[Sat May 11 22:58:15 UTC 2019] Creating domain key
[Sat May 11 22:58:15 UTC 2019] The domain key is here: /shared/letsencrypt/example.com/example.com.key
[Sat May 11 22:58:15 UTC 2019] Single domain='example.com'
[Sat May 11 22:58:15 UTC 2019] Getting domain auth token for each domain
[Sat May 11 22:58:16 UTC 2019] Getting webroot for domain='example.com'
[Sat May 11 22:58:16 UTC 2019] Verifying: example.com
[Sat May 11 22:58:19 UTC 2019] Success
[Sat May 11 22:58:19 UTC 2019] Verify finished, start to sign.
[Sat May 11 22:58:19 UTC 2019] Lets finalize the order, Le_OrderFinalize: https://acme-v02.api.letsencrypt.org/acme/finalize/STRIPPED/STRIPPED
[Sat May 11 22:58:20 UTC 2019] Sign failed, finalize code is not 200.
[Sat May 11 22:58:20 UTC 2019] {
  "type": "urn:ietf:params:acme:error:rateLimited",
  "detail": "Error finalizing order :: too many certificates already issued for exact set of domains: example.com: see https://letsencrypt.org/docs/rate-limits/",
  "status": 429
}
[Sat May 11 22:58:20 UTC 2019] Please check log file for more details: /shared/letsencrypt/acme.sh.log
Error loading file ca.cer
140536865126040:error:02001002:system library:fopen:No such file or directory:bss_file.c:175:fopen('ca.cer','r')
140536865126040:error:2006D080:BIO routines:BIO_new_file:no such file:bss_file.c:178:
140536865126040:error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib:by_file.c:253:
usage: verify [-verbose] [-CApath path] [-CAfile file] [-purpose purpose] [-crl_check] [-no_alt_chains] [-attime timestamp] [-engine e] cert1 cert2 ...
recognized usages:
	sslclient 	SSL client
	sslserver 	SSL server
	nssslserver	Netscape SSL server
	smimesign 	S/MIME signing
	smimeencrypt	S/MIME encryption
	crlsign   	CRL signing
	any       	Any Purpose
	ocsphelper	OCSP helper
	timestampsign	Time Stamp signing
[Sat May 11 22:58:21 UTC 2019] Single domain='example.com'
[Sat May 11 22:58:21 UTC 2019] Getting domain auth token for each domain
[Sat May 11 22:58:23 UTC 2019] Getting webroot for domain='example.com'
[Sat May 11 22:58:23 UTC 2019] example.com is already verified, skip http-01.
[Sat May 11 22:58:23 UTC 2019] Verify finished, start to sign.
[Sat May 11 22:58:23 UTC 2019] Lets finalize the order, Le_OrderFinalize: https://acme-v02.api.letsencrypt.org/acme/finalize/STRIPPED/STRIPPED
[Sat May 11 22:58:24 UTC 2019] Sign failed, finalize code is not 200.
[Sat May 11 22:58:24 UTC 2019] {
  "type": "urn:ietf:params:acme:error:rateLimited",
  "detail": "Error finalizing order :: too many certificates already issued for exact set of domains: example.com: see https://letsencrypt.org/docs/rate-limits/",
  "status": 429
}
[Sat May 11 22:58:24 UTC 2019] Please check log file for more details: /shared/letsencrypt/acme.sh.log
[Sat May 11 22:58:24 UTC 2019] Installing key to:/shared/ssl/example.com.key
[Sat May 11 22:58:24 UTC 2019] Installing full chain to:/shared/ssl/example.com.cer
cat: /shared/letsencrypt/example.com/fullchain.cer: No such file or directory
Started runsvdir, PID is 1928
ok: run: redis: (pid 1940) 0s
ok: run: postgres: (pid 1937) 0s
nginx: [emerg] cannot load certificate "/shared/ssl/example.com.cer": PEM_read_bio_X509_AUX() failed (SSL: error:0906D06C:PEM routines:PEM_read_bio:no start line:Expecting: TRUSTED CERTIFICATE)

/shared/letsencrypt/acme.sh.log is a bit more verbose but hey, this problem is clear enough now. I will salvage a previous cert from a backup and see if Discourse will pick it up on a rebuild.

However these lines hint at errors not being handled in a nice way but bleeding into following commands:

usage: verify [-verbose] [-CApath path] [-CAfile file] [-purpose purpose] [-crl_check] [-no_alt_chains] [-attime timestamp] [-engine e] cert1 cert2 ...

and

cat: /shared/letsencrypt/example.com/fullchain.cer: No such file or directory

That should probably get some error handling?

2 curtidas