Vbulletin bulk importer: null value in column "pinned_globally" of relation "topics" violates not-null constraint

I’m trying to do am import with vbulletin bulk importer. I’ve managed to get it to mostly do it. It’s created users and posts, but topics are not getting created.

The stuff getting passed to create_topics(topics) looks right. The stuff in processed in base.rb:create_records looks right (skipped isn’t set). But no topics are getting created.

Here’s the error:

ERROR:  null value in column "pinned_globally" of relation "topics" violates not-null constraint

But if a topic is not pinned globally, what value should it have? I’m trying commenting out that field in TOPIC_COLUMNS in base.rb.

EDIT: I think this may do it but won’t know for a while:

    create_topics(topics) do |row|
      created_at = Time.zone.at(row[5])

      t = {
        imported_id: row[0],
        title: normalize_text(row[1]),
        category_id: category_id_from_imported_id(row[2]),
        user_id: user_id_from_imported_id(row[3]),
        closed: row[4] == 0,
        created_at: created_at,
        views: row[6] || 0,
        visible: row[7] == 1,
        pinned_globally: row[8] == 1 # <============== JP added this:
      }
      t[:pinned_at] = created_at if row[8] == 1

      t
    end

That’s odd, since that column has a default value? Does it have a default value in your database when you do \d topics ?

pinned_globally | boolean | | not null | false

Aha. That explains why it’s not in the code, I think.

But it doesn’t solve the mystery.

pinned_globally    | boolean   |     | not null | false

EDIT:

Random AI says:

Perhaps this a “bulk insert tool”?

I am unable to find a source that credibly says what the above quote does, but it does make sense, but in that the point of this is to go fast, so skipping defaults seems like a thing that it would do, and it does explain what is happending, and, I hope, the solution that I am still waiting to see if it worked.

Here is actual documentation! PostgreSQL: Documentation: 17: COPY

If a column list is specified, COPY TO copies only the data in the specified columns to the file. For COPY FROM , each field in the file is inserted, in order, into the specified column. Table columns not specified in the COPY FROM column list will receive their default values.

So, if I’m reading that right, if a field is in the field list, then postgres blindly copies what you give it, and blank/null gets inserted instead of the desired default.

It’s going slower and slower. Is there a reason not to use LIMIT 1000 like the regular importer does? Seems like maybe 885K topics is a lot to bite off at once?

I checked the script for the last bulk import I did and it indeed has an explicit pinned_globally: false, so that’s apparently necessary - it’s the only explicit hard coded column value in the code.

    create_topics(topics) do |row|
      t = {
        imported_id: row[0],
        title: my_normalize_text(row[1]),
        category_id: category_id_from_imported_id(row[2]),
        user_id: user_id_from_imported_id(row[3]),
        created_at: Time.zone.at(row[4]),
        pinned_globally: false
      }

Weird since it doesn’t have this for other similar not null, default false columns like closed or has_summary.

Last import I did with the bulk importer did 3M+ topics in 2 hours or so. Maybe you have a memory leak? Or maybe your MySQL code (or whatever you use to read the source data) is slow somewhere?

Good news! All the topics were created! Bad news! none of the posts are connected to them, but hopefully that’s because the posts were created before the topics were.

That is odd. It was so sure I had an explanation. :person_shrugging:

The 7 Million posts took only a couple of hours, but the <1M topics took like 4.

It’s an old machine (that they apparently bought just for the job?), and mysql is remote. Looking at htop there’s no obvious memory leak at the system level. I’ve wiped all the data and am rebuilding the containers to see if it’ll work this time.

Thanks so very much for your help.

1 Like

Well, now the user_email import is failing with:

CONTEXT:  COPY user_emails, line 1: "1  \N      @gmail.com     true    2004-03-08 14:12:00 UTC 2004-03-08 14:12:00 UTC"

It’s taken me another several hours, but here’s why–the process_topic function handles all of those default values.

I guess there should be a

topic[:pinned_globally] ||= false

or maybe

topic[:pinned_globally] ||= topic[:pinned_at].nil?
1 Like