Finding and eliminating a DatetimeFieldOverflow post

(Andrew Waugh) #1

After an import I have this in my log:

Job exception: PG::DatetimeFieldOverflow: ERROR: date/time field value out of range: “273790698683-03-22 15:29:32.618208 BC”

Now… most of my users are retired, but they’re not that old.

I tried searching for, and deleting, posts older than 1993, but the error is still there.

How do I go about finding this user/post?

Timeouts via MiniRacer::ScriptTerminatedError
(Jeff Atwood) #2

Any ideas here @techapj?

(Andrew Waugh) #3

It is causing users:change_post_ownership and rebake to bomb out, which is kind of a bother as I am working on merging users with multiple historical accounts.

Would it be possible to change the year to some specific value? In my case, the year part is the overflow, but the month and day seem to be intact, so if some db job would find them and change them to then they would at least stop the change and rebake from bombing.

Yes, it is an artifact from the import, so it isn’t strictly speaking discourse’s problem, but the problem remains, if there is an item in the db which causes an overflow then you still need some mechanism for clearing that up. (and I suspect we aren’t the only people who’ve imported and ended up with a bad field).

(Arpit Jalan) #4

I’ve never encountered this error before. @JagWaugh which import script you used to perform the import?

Can you provide complete error log?

(Andrew Waugh) #5

I’m not sure how the import was done (I’m late to the game) but I’ll ask.

I’m working on a Virtualbox test machine hosted at home, so I can’t give you access, but I can post logs. Which log do you want to see?

(Arpit Jalan) #6

Provide info, backtrace & env log for this error.

(Andrew Waugh) #7


Job exception: PG::DatetimeFieldOverflow: ERROR:  date/time field value out of range: "273790698683-03-23 21:12:53.718783 BC"
LINE 1: WHERE "users"."active" = 'f' AND (created_at < '273790698...
: SELECT  "users".* FROM "users" INNER JOIN user_stats AS us ON us.user_id = WHERE "users"."active" = 'f' AND (created_at < '273790698683-03-23 21:12:53.718783 BC') AND (NOT admin AND NOT moderator) LIMIT 200


/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/rack-mini-profiler-0.10.1/lib/patches/db/pg.rb:90:in `exec'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/rack-mini-profiler-0.10.1/lib/patches/db/pg.rb:90:in `async_exec'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `block in exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `block in log'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activesupport- `instrument'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `log'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `exec_no_cache'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `execute_and_clear'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `exec_query'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `select'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `select_all'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `select_all'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `find_by_sql'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `exec_queries'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `load'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `to_a'
/var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activerecord- `each'
/var/www/discourse/app/models/user.rb:1039:in `purge_unactivated'
/var/www/discourse/app/jobs/scheduled/purge_inactive.rb:6:in `execute'
/var/www/discourse/app/jobs/base.rb:154:in `block (2 levels) in perform'


hostname	JL-test-adw-app
process_id	141
application_version	e7564dc1d74aaa0477a3c488d063dd9d98443784
current_db	default
job	Jobs::PurgeInactive
problem_db	default

(Andrew Waugh) #8

Our old platform was a handbuilt one, the import process was a dump, transform, import process.

Not to nag, but you get my point about the problem of handling an overflow condition if it does happen inside discourse, right?

(Arpit Jalan) #9

What is the value of purge unactivated users grace period days setting on your site?

(Mittineague) #10

I think what happened is a date in Unix Time (aka Posix, Epoch) format i.e. “seconds since Jan 1 1970” didn’t get converted into the correct format when you ran the import.

TBH I would be surprised if the problem is only a single occurrence and that fixing the import script and starting over would probably be the best thing to do.

The Bad News - the Discourse database has a great many timestamp fields.
The Good News - if this is causing a problem while trying to reassign ownership that narrows things down a bit.

I think these queries would be a good start.

FROM posts 
WHERE CAST(created_at AS text) ILIKE '%BC' 
OR CAST(updated_at AS text) ILIKE '%BC' 
OR CAST(deleted_at AS text) ILIKE '%BC' 
OR CAST(last_version_at AS text) ILIKE '%BC' 
OR CAST(baked_at AS text) ILIKE '%BC' 
OR CAST(hidden_at AS text) ILIKE '%BC' 
FROM users 
WHERE CAST(created_at AS text) ILIKE '%BC' 
OR CAST(updated_at AS text) ILIKE '%BC' 
OR CAST(last_posted_at AS text) ILIKE '%BC' 
OR CAST(last_seen_at AS text) ILIKE '%BC' 
OR CAST(last_emailed_at AS text) ILIKE '%BC' 
OR CAST(approved_at AS text) ILIKE '%BC' 
OR CAST(previous_visit_at AS text) ILIKE '%BC' 
OR CAST(suspended_at AS text) ILIKE '%BC' 
OR CAST(suspended_till AS text) ILIKE '%BC' 

Hopefully you get a result of “1” at the most and you can then change COUNT(id) to * and find what needs to be changed

(Andrew Waugh) #11

On the live site it was 14 when I took the backup from the live site, it is 14 on the test machine.

We have bumped the value on the live site to 99999999999999 precisely because it was purging the historical accounts which I am trying to figure out how to migrate.

Our messages go back to the early '90s, and many of the older accounts have had multiple emails.

(Arpit Jalan) #12

This is the root cause of the error. Setting it to a sane value like 10000 (~27 years) will fix this issue.

I am now going ahead and adding a max limit for purge unactivated users grace period setting to prevent this issue from happening in future.

(Andrew Waugh) #13

Could you add -1 to disable purging completely?

(Andrew Waugh) #14

I think you may have it there, the forum has run on pretty much every platform known to man and beast since it started, including a long period on a windows box in someone’s basement in Norway.

(Arpit Jalan) #15

Done via:

No, that is not supported.

(Andrew Waugh) #16

Ran them in data explorer,

On the posts query I get (happens immediately):

PG::QueryCanceled: ERROR: canceling statement due to statement timeout

On the users query I get “Count 0”

(Mittineague) #17

I suspect the invalid timestamp field is the 9999999999 and that if you change that you will have better luck.

A 0 for users is better than a 1

If you have a lot of posts I guess you could add a

AND id > 0 
AND id < 500

and keep tweaking the id values. But I think you should try changing the 99999999 to 10000 (or however many days your oldest is) first

(Andrew Waugh) #18

the field was 14 when the backup I used was created, and it’s 14 now… Even if I had put it up to 9999999999999 at some point and changed it back to 14 later, shouldn’t some crawler have changed the timestamps after about 5 ./launcher rebuild app and restarts of the VM?

does 1.8 million posts count as a lot?

(Mittineague) #19

OMG, it would take forever tweaking ranges for that many.

Maybe this would run without timing out.

FROM posts 
WHERE CAST(created_at AS text) ILIKE '%BC' 

(Andrew Waugh) #20

I’m just running a backup right now, once that is done I’ll try the other querys.

On a related note, with 1.8 million posts the performance wasn’t exactly blinding, so I deleted most of the categories, leaving just enough to give me a representative sample. I expected the size of the backup file to go down, but it didn’t (not appreciably). Am I right in believing that nothing actually gets deleted by a deletion, it just doesn’t get shown?