Rebuild failing due to MaxMind DB

supermathie · June 10, 2019, 5:56pm

My launcher rebuild app has failed multiple times due to a failure related to the MaxMind DB:

Done compressing application-d5be6ae5cb1fddec6f1ddadfdb8fa2e99cbefcb56633aff5b5341fde6c39c33e.js : 23.41 secs

Done compressing all JS files : 79.32 secs

184:M 10 Jun 2019 17:44:00.087 * 10 changes in 300 seconds. Saving...
184:M 10 Jun 2019 17:44:00.088 * Background saving started by pid 1148
1148:C 10 Jun 2019 17:44:00.097 * DB saved on disk
1148:C 10 Jun 2019 17:44:00.097 * RDB: 0 MB of memory used by copy-on-write
184:M 10 Jun 2019 17:44:00.189 * Background saving terminated with success
#<Thread:0x000055ffabca0ed0@/var/www/discourse/lib/tasks/assets.rake:214 run> terminated with exception (report_on_exception is true):
/var/www/discourse/lib/discourse.rb:31:in `execute_command': /var/www/discourse/lib/discourse_ip_info.rb:38:in `mmdb_download':  (RuntimeError)
gzip: /tmp/GeoLite2-City.gz20190610-491-1j7nws4.gz: unexpected end of file
	from /var/www/discourse/lib/discourse_ip_info.rb:38:in `mmdb_download'
	from /var/www/discourse/lib/tasks/assets.rake:217:in `block (3 levels) in <top (required)>'
	from /var/www/discourse/lib/tasks/assets.rake:216:in `each'
	from /var/www/discourse/lib/tasks/assets.rake:216:in `block (2 levels) in <top (required)>'
rake aborted!
/var/www/discourse/lib/discourse_ip_info.rb:38:in `mmdb_download': 
gzip: /tmp/GeoLite2-City.gz20190610-491-1j7nws4.gz: unexpected end of file
/var/www/discourse/lib/discourse.rb:31:in `execute_command'
/var/www/discourse/lib/discourse_ip_info.rb:38:in `mmdb_download'
/var/www/discourse/lib/tasks/assets.rake:217:in `block (3 levels) in <top (required)>'
/var/www/discourse/lib/tasks/assets.rake:216:in `each'
/var/www/discourse/lib/tasks/assets.rake:216:in `block (2 levels) in <top (required)>'
Tasks: TOP => assets:precompile
(See full trace by running task with --trace)
I, [2019-06-10T17:44:47.244706 #14]  INFO -- : Downloading MaxMindDB...
Compressing Javascript and Generating Source Maps

I, [2019-06-10T17:44:47.245661 #14]  INFO -- : Terminating async processes
I, [2019-06-10T17:44:47.245978 #14]  INFO -- : Sending INT to HOME=/var/lib/postgresql USER=postgres exec chpst -u postgres:postgres:ssl-cert -U postgres:postgres:ssl-cert /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main pid: 68
I, [2019-06-10T17:44:47.246283 #14]  INFO -- : Sending TERM to exec chpst -u redis -U redis /usr/bin/redis-server /etc/redis/redis.conf pid: 184
2019-06-10 17:44:47.246 UTC [68] LOG:  received fast shutdown request
184:signal-handler (1560188687) Received SIGTERM scheduling shutdown...
2019-06-10 17:44:47.248 UTC [68] LOG:  aborting any active transactions
2019-06-10 17:44:47.252 UTC [68] LOG:  worker process: logical replication launcher (PID 77) exited with exit code 1
2019-06-10 17:44:47.255 UTC [72] LOG:  shutting down
2019-06-10 17:44:47.268 UTC [68] LOG:  database system is shut down
184:M 10 Jun 2019 17:44:47.333 # User requested shutdown...
184:M 10 Jun 2019 17:44:47.333 * Saving the final RDB snapshot before exiting.
184:M 10 Jun 2019 17:44:47.341 * DB saved on disk
184:M 10 Jun 2019 17:44:47.342 # Redis is now ready to exit, bye bye...


FAILED
--------------------
Pups::ExecError: cd /var/www/discourse && su discourse -c 'bundle exec rake assets:precompile' failed with return #<Process::Status: pid 489 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params {"cd"=>"$home", "hook"=>"assets_precompile", "cmd"=>["su discourse -c 'bundle exec rake assets:precompile'"]}
c13084f0c50befc27d34645224f4b1680c28eda7e05030e8eb0114ff0e311d96
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one

If I download it on that server using wget, it untars fine.

EDIT: nope, I was downloading the wrong path (https://geolite.maxmind.com/download/geoip/database/GeoLite2-Country.tar.gz) whereas we use:

○ → wget https://geolite.maxmind.com/geoip/databases/GeoLite2-City/update
--2019-06-10 14:36:54--  https://geolite.maxmind.com/geoip/databases/GeoLite2-City/update
Resolving geolite.maxmind.com (geolite.maxmind.com)... 104.17.201.89, 104.17.200.89, 2606:4700::6811:c959, ...
Connecting to geolite.maxmind.com (geolite.maxmind.com)|104.17.201.89|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28565904 (27M) [application/gzip]
Saving to: ‘update’

update                                    41%[===============================>                                                ]  11.17M  67.4KB/s    eta 4m 27s

… which is evidently throttled to 64KBps. That’s harsh on rebuild times.

EDIT: seems that file is no longer throttled, I was able to pull it from multiple places at full speed and the rebuild succeeded as well.

(we should still fix the fact that it makes the build )

sam · June 10, 2019, 9:16pm

To me the only fix left here is to stop maxmind on precompile by default and rely on the somewhat stale db in the base image

pfaffman · June 10, 2019, 9:20pm

Maybe have an environment variable for people who really want it fresh? It seems like some people really care but others, not so much.

supermathie · June 10, 2019, 9:31pm

It gets updated during runtime by a scheduled job so it doesn’t matter if it’s a bit stale during build.

sam · June 10, 2019, 9:55pm

The problem is that we would be allowing inconsistent state, location shows up right, rebuild, location is wrong

I much prefer consistency

Stephen · June 11, 2019, 5:49am

I’ll take a working instance with a slightly stale database over a failed rebuild any day of the week.

sam · June 11, 2019, 6:31am

You can already do this today:

https://github.com/discourse/discourse/blob/7b17eb06da6f83350f6ed8e6c523e77022cdc970/config/discourse_defaults.conf#L237-L237

Set DISCOURSE_REFRESH_MAXMIND_DB_DURING_PRECOMPILE_DAYS to taste.

Set to 0 for… just don’t do anything during precompile, rely on base image for maxmind db.

Set to 100 for… I don’t care this can be pretty old, but not SUPER old.

The open discussion here is:

Should we add an I don't care if maxmind update fails during precompile option?
Should we add a “scheduled job” that updates maxmind DB if it is N days old?

I am against 1, cause it leads to “inconsistent state post rebuild”. We are used to having a very consistent state after rebuilds and this adds a wild card.

I am not strongly against (2) but one issue here for our own hosting is that we could not even use (2) cause we would likely get us banned off maxmind.

So I am not sure what more to do here.

If self hosters were complaining a lot about “rebuilds” failing due to maxmind I would be open to changing the default for DISCOURSE_REFRESH_MAXMIND_DB_DURING_PRECOMPILE_DAYS to 0.

RGJ · June 17, 2019, 9:01pm

Looks like this is such a complaint:
https://meta.discourse.org/t/restore-db-problem/120563/7

pfaffman · June 21, 2019, 4:58pm

This appears to be MMDB related as well. Pardon the screen shot, but it’s what the client sent and it appears that he tried again and the upgrade worked.

Topic		Replies	Views
Rebuilding always fails when the MAXMIND daily limit is exhausted Bug maxmind , dev-ops , fixed	63	849	August 5, 2024
Upgrade / Rebuilds Fail due to MaxMind DB EOL Installation maxmind	15	8498	March 6, 2020
Rebuild is failing with a wrong MaxMindDB API key Bug maxmind	2	426	March 21, 2023
Setting DISCOURSE_MAXMIND_LICENSE_KEY breaks docker image/container build Installation maxmind	22	1229	May 16, 2024
Discourse down after failed web update, rebuild & doctor not helping Installation	8	628	April 16, 2023

Rebuild failing due to MaxMind DB

Related topics