Rebuild failing due to MaxMind DB

My launcher rebuild app has failed multiple times due to a failure related to the MaxMind DB:

Done compressing application-d5be6ae5cb1fddec6f1ddadfdb8fa2e99cbefcb56633aff5b5341fde6c39c33e.js : 23.41 secs

Done compressing all JS files : 79.32 secs

184:M 10 Jun 2019 17:44:00.087 * 10 changes in 300 seconds. Saving...
184:M 10 Jun 2019 17:44:00.088 * Background saving started by pid 1148
1148:C 10 Jun 2019 17:44:00.097 * DB saved on disk
1148:C 10 Jun 2019 17:44:00.097 * RDB: 0 MB of memory used by copy-on-write
184:M 10 Jun 2019 17:44:00.189 * Background saving terminated with success
#<Thread:0x000055ffabca0ed0@/var/www/discourse/lib/tasks/assets.rake:214 run> terminated with exception (report_on_exception is true):
/var/www/discourse/lib/discourse.rb:31:in `execute_command': /var/www/discourse/lib/discourse_ip_info.rb:38:in `mmdb_download':  (RuntimeError)
gzip: /tmp/GeoLite2-City.gz20190610-491-1j7nws4.gz: unexpected end of file
	from /var/www/discourse/lib/discourse_ip_info.rb:38:in `mmdb_download'
	from /var/www/discourse/lib/tasks/assets.rake:217:in `block (3 levels) in <top (required)>'
	from /var/www/discourse/lib/tasks/assets.rake:216:in `each'
	from /var/www/discourse/lib/tasks/assets.rake:216:in `block (2 levels) in <top (required)>'
rake aborted!
/var/www/discourse/lib/discourse_ip_info.rb:38:in `mmdb_download': 
gzip: /tmp/GeoLite2-City.gz20190610-491-1j7nws4.gz: unexpected end of file
/var/www/discourse/lib/discourse.rb:31:in `execute_command'
/var/www/discourse/lib/discourse_ip_info.rb:38:in `mmdb_download'
/var/www/discourse/lib/tasks/assets.rake:217:in `block (3 levels) in <top (required)>'
/var/www/discourse/lib/tasks/assets.rake:216:in `each'
/var/www/discourse/lib/tasks/assets.rake:216:in `block (2 levels) in <top (required)>'
Tasks: TOP => assets:precompile
(See full trace by running task with --trace)
I, [2019-06-10T17:44:47.244706 #14]  INFO -- : Downloading MaxMindDB...
Compressing Javascript and Generating Source Maps

I, [2019-06-10T17:44:47.245661 #14]  INFO -- : Terminating async processes
I, [2019-06-10T17:44:47.245978 #14]  INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main pid: 68
I, [2019-06-10T17:44:47.246283 #14]  INFO -- : Sending TERM to exec chpst -u redis -U redis /usr/bin/redis-server /etc/redis/redis.conf pid: 184
2019-06-10 17:44:47.246 UTC [68] LOG:  received fast shutdown request
184:signal-handler (1560188687) Received SIGTERM scheduling shutdown...
2019-06-10 17:44:47.248 UTC [68] LOG:  aborting any active transactions
2019-06-10 17:44:47.252 UTC [68] LOG:  worker process: logical replication launcher (PID 77) exited with exit code 1
2019-06-10 17:44:47.255 UTC [72] LOG:  shutting down
2019-06-10 17:44:47.268 UTC [68] LOG:  database system is shut down
184:M 10 Jun 2019 17:44:47.333 # User requested shutdown...
184:M 10 Jun 2019 17:44:47.333 * Saving the final RDB snapshot before exiting.
184:M 10 Jun 2019 17:44:47.341 * DB saved on disk
184:M 10 Jun 2019 17:44:47.342 # Redis is now ready to exit, bye bye...


FAILED
--------------------
Pups::ExecError: cd /var/www/discourse && su discourse -c 'bundle exec rake assets:precompile' failed with return #<Process::Status: pid 489 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params {"cd"=>"$home", "hook"=>"assets_precompile", "cmd"=>["su discourse -c 'bundle exec rake assets:precompile'"]}
c13084f0c50befc27d34645224f4b1680c28eda7e05030e8eb0114ff0e311d96
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one

If I download it on that server using wget, it untars fine.

EDIT: nope, I was downloading the wrong path (https://geolite.maxmind.com/download/geoip/database/GeoLite2-Country.tar.gz) whereas we use:

○ → wget https://geolite.maxmind.com/geoip/databases/GeoLite2-City/update
--2019-06-10 14:36:54--  https://geolite.maxmind.com/geoip/databases/GeoLite2-City/update
Resolving geolite.maxmind.com (geolite.maxmind.com)... 104.17.201.89, 104.17.200.89, 2606:4700::6811:c959, ...
Connecting to geolite.maxmind.com (geolite.maxmind.com)|104.17.201.89|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28565904 (27M) [application/gzip]
Saving to: ‘update’

update                                    41%[===============================>                                                ]  11.17M  67.4KB/s    eta 4m 27s 

… which is evidently throttled to 64KBps. That’s harsh on rebuild times.

EDIT: seems that file is no longer throttled, I was able to pull it from multiple places at full speed and the rebuild succeeded as well.

(we should still fix the fact that it makes the build :boom:)

5 Likes

To me the only fix left here is to stop maxmind on precompile by default and rely on the somewhat stale db in the base image

3 Likes

Maybe have an environment variable for people who really want it fresh? It seems like some people really care but others, not so much.

It gets updated during runtime by a scheduled job so it doesn’t matter if it’s a bit stale during build.

3 Likes

The problem is that we would be allowing inconsistent state, location shows up right, rebuild, location is wrong

I much prefer consistency

2 Likes

I’ll take a working instance with a slightly stale database over a failed rebuild any day of the week.

2 Likes

You can already do this today:

Set DISCOURSE_REFRESH_MAXMIND_DB_DURING_PRECOMPILE_DAYS to taste.

Set to 0 for… just don’t do anything during precompile, rely on base image for maxmind db.

Set to 100 for… I don’t care this can be pretty old, but not SUPER old.


The open discussion here is:

  1. Should we add an I don't care if maxmind update fails during precompile option?

  2. Should we add a “scheduled job” that updates maxmind DB if it is N days old?

I am against 1, cause it leads to “inconsistent state post rebuild”. We are used to having a very consistent state after rebuilds and this adds a wild card.

I am not strongly against (2) but one issue here for our own hosting is that we could not even use (2) cause we would likely get us banned off maxmind.

So I am not sure what more to do here.

If self hosters were complaining a lot about “rebuilds” failing due to maxmind I would be open to changing the default for DISCOURSE_REFRESH_MAXMIND_DB_DURING_PRECOMPILE_DAYS to 0.

4 Likes

Looks like this is such a complaint:

3 Likes