Rebuild failing due to MaxMind DB

My launcher rebuild app has failed multiple times due to a failure related to the MaxMind DB:

Done compressing application-d5be6ae5cb1fddec6f1ddadfdb8fa2e99cbefcb56633aff5b5341fde6c39c33e.js : 23.41 secs

Done compressing all JS files : 79.32 secs

184:M 10 Jun 2019 17:44:00.087 * 10 changes in 300 seconds. Saving...
184:M 10 Jun 2019 17:44:00.088 * Background saving started by pid 1148
1148:C 10 Jun 2019 17:44:00.097 * DB saved on disk
1148:C 10 Jun 2019 17:44:00.097 * RDB: 0 MB of memory used by copy-on-write
184:M 10 Jun 2019 17:44:00.189 * Background saving terminated with success
#<Thread:0x000055ffabca0ed0@/var/www/discourse/lib/tasks/assets.rake:214 run> terminated with exception (report_on_exception is true):
/var/www/discourse/lib/discourse.rb:31:in `execute_command': /var/www/discourse/lib/discourse_ip_info.rb:38:in `mmdb_download':  (RuntimeError)
gzip: /tmp/GeoLite2-City.gz20190610-491-1j7nws4.gz: unexpected end of file
	from /var/www/discourse/lib/discourse_ip_info.rb:38:in `mmdb_download'
	from /var/www/discourse/lib/tasks/assets.rake:217:in `block (3 levels) in <top (required)>'
	from /var/www/discourse/lib/tasks/assets.rake:216:in `each'
	from /var/www/discourse/lib/tasks/assets.rake:216:in `block (2 levels) in <top (required)>'
rake aborted!
/var/www/discourse/lib/discourse_ip_info.rb:38:in `mmdb_download': 
gzip: /tmp/GeoLite2-City.gz20190610-491-1j7nws4.gz: unexpected end of file
/var/www/discourse/lib/discourse.rb:31:in `execute_command'
/var/www/discourse/lib/discourse_ip_info.rb:38:in `mmdb_download'
/var/www/discourse/lib/tasks/assets.rake:217:in `block (3 levels) in <top (required)>'
/var/www/discourse/lib/tasks/assets.rake:216:in `each'
/var/www/discourse/lib/tasks/assets.rake:216:in `block (2 levels) in <top (required)>'
Tasks: TOP => assets:precompile
(See full trace by running task with --trace)
I, [2019-06-10T17:44:47.244706 #14]  INFO -- : Downloading MaxMindDB...
Compressing Javascript and Generating Source Maps

I, [2019-06-10T17:44:47.245661 #14]  INFO -- : Terminating async processes
I, [2019-06-10T17:44:47.245978 #14]  INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main pid: 68
I, [2019-06-10T17:44:47.246283 #14]  INFO -- : Sending TERM to exec chpst -u redis -U redis /usr/bin/redis-server /etc/redis/redis.conf pid: 184
2019-06-10 17:44:47.246 UTC [68] LOG:  received fast shutdown request
184:signal-handler (1560188687) Received SIGTERM scheduling shutdown...
2019-06-10 17:44:47.248 UTC [68] LOG:  aborting any active transactions
2019-06-10 17:44:47.252 UTC [68] LOG:  worker process: logical replication launcher (PID 77) exited with exit code 1
2019-06-10 17:44:47.255 UTC [72] LOG:  shutting down
2019-06-10 17:44:47.268 UTC [68] LOG:  database system is shut down
184:M 10 Jun 2019 17:44:47.333 # User requested shutdown...
184:M 10 Jun 2019 17:44:47.333 * Saving the final RDB snapshot before exiting.
184:M 10 Jun 2019 17:44:47.341 * DB saved on disk
184:M 10 Jun 2019 17:44:47.342 # Redis is now ready to exit, bye bye...


FAILED
--------------------
Pups::ExecError: cd /var/www/discourse && su discourse -c 'bundle exec rake assets:precompile' failed with return #<Process::Status: pid 489 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params {"cd"=>"$home", "hook"=>"assets_precompile", "cmd"=>["su discourse -c 'bundle exec rake assets:precompile'"]}
c13084f0c50befc27d34645224f4b1680c28eda7e05030e8eb0114ff0e311d96
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one

If I download it on that server using wget, it untars fine.

EDIT: nope, I was downloading the wrong path (https://geolite.maxmind.com/download/geoip/database/GeoLite2-Country.tar.gz) whereas we use:

○ → wget https://geolite.maxmind.com/geoip/databases/GeoLite2-City/update
--2019-06-10 14:36:54--  https://geolite.maxmind.com/geoip/databases/GeoLite2-City/update
Resolving geolite.maxmind.com (geolite.maxmind.com)... 104.17.201.89, 104.17.200.89, 2606:4700::6811:c959, ...
Connecting to geolite.maxmind.com (geolite.maxmind.com)|104.17.201.89|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28565904 (27M) [application/gzip]
Saving to: ‘update’

update                                    41%[===============================>                                                ]  11.17M  67.4KB/s    eta 4m 27s 

… which is evidently throttled to 64KBps. That’s harsh on rebuild times.

EDIT: seems that file is no longer throttled, I was able to pull it from multiple places at full speed and the rebuild succeeded as well.

(we should still fix the fact that it makes the build :boom:)

5 « J'aime »

To me the only fix left here is to stop maxmind on precompile by default and rely on the somewhat stale db in the base image

3 « J'aime »

Maybe have an environment variable for people who really want it fresh? It seems like some people really care but others, not so much.

It gets updated during runtime by a scheduled job so it doesn’t matter if it’s a bit stale during build.

3 « J'aime »

The problem is that we would be allowing inconsistent state, location shows up right, rebuild, location is wrong

I much prefer consistency

2 « J'aime »

I’ll take a working instance with a slightly stale database over a failed rebuild any day of the week.

2 « J'aime »

You can already do this today:

https://github.com/discourse/discourse/blob/7b17eb06da6f83350f6ed8e6c523e77022cdc970/config/discourse_defaults.conf#L237-L237

Set DISCOURSE_REFRESH_MAXMIND_DB_DURING_PRECOMPILE_DAYS to taste.

Set to 0 for… just don’t do anything during precompile, rely on base image for maxmind db.

Set to 100 for… I don’t care this can be pretty old, but not SUPER old.


The open discussion here is:

  1. Should we add an I don't care if maxmind update fails during precompile option?

  2. Should we add a “scheduled job” that updates maxmind DB if it is N days old?

I am against 1, cause it leads to “inconsistent state post rebuild”. We are used to having a very consistent state after rebuilds and this adds a wild card.

I am not strongly against (2) but one issue here for our own hosting is that we could not even use (2) cause we would likely get us banned off maxmind.

So I am not sure what more to do here.

If self hosters were complaining a lot about “rebuilds” failing due to maxmind I would be open to changing the default for DISCOURSE_REFRESH_MAXMIND_DB_DURING_PRECOMPILE_DAYS to 0.

6 « J'aime »

Looks like this is such a complaint:
https://meta.discourse.org/t/restore-db-problem/120563/7

3 « J'aime »

This appears to be MMDB related as well. Pardon the screen shot, but it’s what the client sent and it appears that he tried again and the upgrade worked.