Fix published date and author for topics created via wp-discourse-embed plugin

I recently installed wp-discourse-embed plugin on Quarter to Three WordPress blog.

As soon as I installed the plugin there was an influx of new topics being created for existing (old) WP posts.

I ran this rake task to fix the published date and author for topics that are linked to old WP posts:

desc "Fetch WP data"
task "users:fetch_wp_data" => [:environment] do

  puts "Fetching WP data"
  i = 0
  TopicEmbed.find_each do |topic_embed|
    begin
      html = open(topic_embed.embed_url).read
      raw_doc = Nokogiri::HTML(html)
    rescue
      next
    end

    topic = topic_embed.topic
    next if topic.nil?

    author_username = raw_doc.at('meta[@name="author"]')
    if author_username.present?
      begin
        author = User.where(username_lower: author_username[:content].strip).first
        if author.present? && author.username_lower != topic.user.username_lower
          PostOwnerChanger.new(post_ids: [topic.first_post.id], topic_id: topic.id, new_owner: User.find_by(username_lower: author.username_lower), acting_user: User.find_by(username_lower: "system"), skip_revision: true).change_owner!
        end
      rescue
        # handle error
      end
    end

    wp_post_date = raw_doc.at('meta[@property="article:published_time"]')
    if wp_post_date.present?
      begin
        timestamp = DateTime.parse(wp_post_date[:content].strip).to_i
        PostTimestampChanger.new(topic_id: topic.id, timestamp: timestamp).change!
      rescue ArgumentError
       # handle invalid date
      end
    end

    putc "."
    i += 1
  end

  puts "", "#{i} posts fetched and corrected!", ""
end

Commands I ran:

cd /var/discourse
./launcher enter app
vim lib/tasks/users.rake
(append the above rake task to `lib/tasks/users.rake` file)
rake users:fetch_wp_data
(remove the appended rake task from `lib/tasks/users.rake` file)
4 Likes

This helps me to figure out some more about how rake works and seeing PostTimestampChanger is helpful because it’s taking me a long while to figure out those internals.

My question started out as “why not just add this rake tool to Discourse?”

My answer is that users:fetch_wp_data isn’t included in lib/task/users.rake because it depends on an external plugin. And then I wondered why bother to delete it, but I think that’s because if we don’t remove it when we’re done, we’ll get into git problems when the next pull happens?

3 Likes

Yep, your answer is correct! :100:

1 Like

Hi, I tried your script but removed this part:

author_username = raw_doc.at('meta[@name="author"]')
if author_username.present?
  begin
    author = User.where(username_lower: author_username[:content].strip).first
    if author.present? && author.username_lower != topic.user.username_lower
      PostOwnerChanger.new(post_ids: [topic.first_post.id], topic_id: topic.id, new_owner: User.find_by(username_lower: author.username_lower), acting_user: User.find_by(username_lower: "system"), skip_revision: true).change_owner!
    end
  rescue
    # handle error
  end
end

Because I only wanted to update the Discourse posts dates.
The script told me: 61 posts fetched and corrected! but the dates weren’t changed.

Any idea how to make this work?