Installation produced broken, zero byte certificate

Let’s call my site example.com for this post, it uses a FQDN and things worked before. Due to some debugging of mail issues I have been redeploying and rebuilding many times in the past 24 hours.

I installed Discourse following discourse/INSTALL-cloud.md at master · discourse/discourse · GitHub

Now my site does not load (no reply on port 80 or 443). nginx’s log says:

2019/05/11 14:49:14 [emerg] 7866#7866: cannot load certificate "/shared/ssl/example.com.cer": PEM_read_bio_X509_AUX() failed (SSL: error:0906D06C:PEM routines:PEM_read_bio:no start line:Expecting: TRUSTED CERTIFICATE)

Entering the app and looking at the file, it is empty with zero/0 bytes:

-rw-r--r-- 1 root root    0 May 11 13:59 /shared/ssl/example.com.cer
-rw------- 1 root root 3243 May 11 13:59 /shared/ssl/example.com.key

I am lost now and found no solutions so here I ask:

Can I trigger a renewal of the certificate using some built-in tool of the Discourse docker setup? If not, can I do something once to fix this and then be assured that renewals will be handled automatically by the setup as intended by it?

Is there an installation log? I searched but found no mention. I expect some errors related to letsencrypt and would like to investigate. Maybe I reached some limit.

Are both the let’s encrypt and ssl templates loaded in your app.yml?

The easiest thing is usually to delete (or rename) your app.yml, see that the container isn’t running, and run discourse-setup again

Are you behind a reverse proxy or something like Cloudflare?

Yes.

Did that, same result. :frowning:

Nope, just a normal cloud server.

It’s possible that you reached keys encrypt rate limits, though if the cert was there it wouldn’t be trying again.

You might try removing the ssl and letsencrypt directories in shared/standalone

Done, rebuilt, same.

Is there really no log of the initial installation stored somewhere?

Did you unpublish :80 in some way? Either by commenting out the line in the expose: block, altering the firewall on the server, or something along those lines?

Nope, I did nothing manually to the configuration. The server is a plain, updated Ubuntu 18.04.

Ha! I did not know that ./launcher logs app would show much more than the production or nginx log.

Look at this beauty, I got into rate-limiting indeed:

run-parts: executing /etc/runit/1.d/letsencrypt
[Sat May 11 22:58:13 UTC 2019] Create account key ok.
[Sat May 11 22:58:13 UTC 2019] Registering account
[Sat May 11 22:58:15 UTC 2019] Registered
[Sat May 11 22:58:15 UTC 2019] ACCOUNT_THUMBPRINT='STRIPPED'
[Sat May 11 22:58:15 UTC 2019] Creating domain key
[Sat May 11 22:58:15 UTC 2019] The domain key is here: /shared/letsencrypt/example.com/example.com.key
[Sat May 11 22:58:15 UTC 2019] Single domain='example.com'
[Sat May 11 22:58:15 UTC 2019] Getting domain auth token for each domain
[Sat May 11 22:58:16 UTC 2019] Getting webroot for domain='example.com'
[Sat May 11 22:58:16 UTC 2019] Verifying: example.com
[Sat May 11 22:58:19 UTC 2019] Success
[Sat May 11 22:58:19 UTC 2019] Verify finished, start to sign.
[Sat May 11 22:58:19 UTC 2019] Lets finalize the order, Le_OrderFinalize: https://acme-v02.api.letsencrypt.org/acme/finalize/STRIPPED/STRIPPED
[Sat May 11 22:58:20 UTC 2019] Sign failed, finalize code is not 200.
[Sat May 11 22:58:20 UTC 2019] {
  "type": "urn:ietf:params:acme:error:rateLimited",
  "detail": "Error finalizing order :: too many certificates already issued for exact set of domains: example.com: see https://letsencrypt.org/docs/rate-limits/",
  "status": 429
}
[Sat May 11 22:58:20 UTC 2019] Please check log file for more details: /shared/letsencrypt/acme.sh.log
Error loading file ca.cer
140536865126040:error:02001002:system library:fopen:No such file or directory:bss_file.c:175:fopen('ca.cer','r')
140536865126040:error:2006D080:BIO routines:BIO_new_file:no such file:bss_file.c:178:
140536865126040:error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib:by_file.c:253:
usage: verify [-verbose] [-CApath path] [-CAfile file] [-purpose purpose] [-crl_check] [-no_alt_chains] [-attime timestamp] [-engine e] cert1 cert2 ...
recognized usages:
	sslclient 	SSL client
	sslserver 	SSL server
	nssslserver	Netscape SSL server
	smimesign 	S/MIME signing
	smimeencrypt	S/MIME encryption
	crlsign   	CRL signing
	any       	Any Purpose
	ocsphelper	OCSP helper
	timestampsign	Time Stamp signing
[Sat May 11 22:58:21 UTC 2019] Single domain='example.com'
[Sat May 11 22:58:21 UTC 2019] Getting domain auth token for each domain
[Sat May 11 22:58:23 UTC 2019] Getting webroot for domain='example.com'
[Sat May 11 22:58:23 UTC 2019] example.com is already verified, skip http-01.
[Sat May 11 22:58:23 UTC 2019] Verify finished, start to sign.
[Sat May 11 22:58:23 UTC 2019] Lets finalize the order, Le_OrderFinalize: https://acme-v02.api.letsencrypt.org/acme/finalize/STRIPPED/STRIPPED
[Sat May 11 22:58:24 UTC 2019] Sign failed, finalize code is not 200.
[Sat May 11 22:58:24 UTC 2019] {
  "type": "urn:ietf:params:acme:error:rateLimited",
  "detail": "Error finalizing order :: too many certificates already issued for exact set of domains: example.com: see https://letsencrypt.org/docs/rate-limits/",
  "status": 429
}
[Sat May 11 22:58:24 UTC 2019] Please check log file for more details: /shared/letsencrypt/acme.sh.log
[Sat May 11 22:58:24 UTC 2019] Installing key to:/shared/ssl/example.com.key
[Sat May 11 22:58:24 UTC 2019] Installing full chain to:/shared/ssl/example.com.cer
cat: /shared/letsencrypt/example.com/fullchain.cer: No such file or directory
Started runsvdir, PID is 1928
ok: run: redis: (pid 1940) 0s
ok: run: postgres: (pid 1937) 0s
nginx: [emerg] cannot load certificate "/shared/ssl/example.com.cer": PEM_read_bio_X509_AUX() failed (SSL: error:0906D06C:PEM routines:PEM_read_bio:no start line:Expecting: TRUSTED CERTIFICATE)

/shared/letsencrypt/acme.sh.log is a bit more verbose but hey, this problem is clear enough now. I will salvage a previous cert from a backup and see if Discourse will pick it up on a rebuild.

However these lines hint at errors not being handled in a nice way but bleeding into following commands:

usage: verify [-verbose] [-CApath path] [-CAfile file] [-purpose purpose] [-crl_check] [-no_alt_chains] [-attime timestamp] [-engine e] cert1 cert2 ...

and

cat: /shared/letsencrypt/example.com/fullchain.cer: No such file or directory

That should probably get some error handling?

1 Like