Import posts from Facebook group into Discourse

import

(Sander Datema) #1

Latest update: v2.0 (January 22nd 2016)

Note:
Importing posts from users from a Facebook group into your Discourse install might be frowned upon. They never signed up for their data to be exported to a new location. Read: privacy concerns.

Three years after the initial release by @Sander78 there is a huge update, written by @meriksson. A huge thanks to him for this, because it’s not just a little update, it’s a complete rewrite of my original script with a lot more options. And it’s battle tested.


Poll: Which forum importer would you most like to see a tutorial for?
(Nicholas Perry) #2

Thanks! I’ll be trying this out once I get a local discourse up on my VM.

I’d love to see a cross-directional feature. Bots that do this type of thing were what made google-wave pretty awesome. I’d love to see a facebook group cross-poster app.


(Sander Datema) #3

Updated to v1.1:

  • Don’t show total amount of posts, gave errors
  • Script will change Site Settings for you and restore them afterwards
  • Configuration moved to file import_facebook.yml
  • Make better use of Koala gem
  • Use as much data as possible when creating users from Facebook
  • Create Facebook credentials in Discourse so user should be able to
    claim account later on
  • Script is a lot more verbose

(Nicholas Perry) #4

This, all of this.

This means we have a direct 1:1 translation from our current Facebook group to a discourse instance. Thank you so much!

You have saved be so much work. I can’t wait to get this going at http://praetorlabs.com


(Sander Datema) #7

Still need to test that part though. :slight_smile: Hypothetically a user should be able to login using Facebook, even if the script created a non existing email address.


(Nicholas Perry) #8

The login feature made this #2 in what I’m going to do with my Hyper-V cluster, where #1 is getting discourse to run to begin with.

I still need to clear out some of my personal files on the secondary machine before I have my VM cluster up and running. But your script will defiantly be tested :smile:


(Sander Datema) #9

Updated to v1.5:

  • Added colors for clearity
  • Updated to latest Facebook Graph API
  • Solved lots of bugs
  • Added option to use fake email addresses
  • Added test mode (no changes to Discourse)
  • Refactoring


(Sander Datema) #10

Unfortunately there is something wrong with the import script: it breaks oneboxes.

I have no idea what is causing this. But after using this script, new oneboxes (ones that point to another post on the same forum) will onmly show the name of the forum and no contents at all.

So better not use it until it’s fixed. Since the API is not yet officially supported and since I’m not even using it :smile:, I doubt it will be fixed soon.

Unless @sam or someone else has the time to see what I am doing wrong here:

# Import Facebook posts into Discourse
# fb_ is Facebook related stuff, dc_ is Discourse related stuff.
# @fb_writers is an array of all the writers of posts and comments in the
# Facebook group
def fb_import_posts_into_dc(dc_category)
  post_count = 0
  @fb_posts.each do |fb_post|
    post_count += 1

    # Create a new topic
    dc_topic = Topic.new

    # Get details of the writer of this post
    fb_post_user = @fb_writers.find {|k| k['id'] == fb_post['actor_id'].to_s}

    # Get the Discourse user id of this writer
    dc_user_id = dc_get_user_id(fb_username_to_dc(fb_post_user['username']))

    # Facebook posts don't have a title, so use first 50 characters of the post as title
    dc_topic.title = fb_post['message'][0,50]
    # Remove new lines and replace with a space
    dc_topic.title = dc_topic.title.gsub( /\n/m, " " )

    # Set ID of user who created the topic
    dc_topic.user_id = dc_user_id

    # Set topic category
    dc_topic.category_id = dc_category.id

    # Set topic create and update time
    dc_topic.created_at = Time.at(fb_post['created_time'])
    dc_topic.updated_at = dc_topic.created_at

    progress = post_count.percent_of(@fb_posts.count).round.to_s
    puts "[#{progress}%]".blue + " Creating topic '" + dc_topic.title.blue + "' (#{dc_topic.created_at})"

    # Everything set, save the topic
    if dc_topic.valid? then
      dc_topic.save!

      # Create the contents of the topic (the first post), using the Facebook post
      dc_post = Post.new

      dc_post.user_id = dc_topic.user_id
      dc_post.topic_id = dc_topic.id
      dc_post.raw = fb_post['message']

      dc_post.created_at = Time.at(fb_post['created_time'])
      dc_post.updated_at = dc_post.created_at

      if dc_post.valid? then
        dc_post.save!
        puts " - First post of topic created".green
      else # Skip if not valid for some reason
        puts "Contents of topic from Facebook post #{fb_post['post_id']} failed to import, #{dc_post.errors.messages[:base]}".red
      end

      # Now create the replies, using the Facebook comments
      unless fb_post['comments']['count'] == 0 then
        fb_post['comments']['comment_list'].each do |comment|
          # Get details of the writer of this comment
          comment_user = @fb_writers.find {|k| k['id'] == comment['fromid'].to_s}

          # Get the Discourse user id of this writer
          dc_user_id = dc_get_user_id(fb_username_to_dc(comment_user['username']))

          dc_post = Post.new

          dc_post.user_id = dc_user_id
          dc_post.topic_id = dc_topic.id
          dc_post.raw = comment['text']

          dc_post.created_at = Time.at(comment['time'])
          dc_post.updated_at = dc_post.created_at

          if dc_post.valid? then
            dc_post.save!
          else # Skip if not valid for some reason
            puts " - Comment (#{comment['id']}) failed to import, #{dc_post.errors.messages[:raw][0]}".red
          end
        end
        puts " - #{fb_post['comments']['count'].to_s} Comments imported".green
      end
    else # In case we missed a validation, don't save
      puts "Topic of Facebook post #{fb_post['post_id']} failed to import, #{dc_topic.errors.messages[:base]}".red
    end
  end
end

(Sam Saffron) #11

I know what it is, you need to port it to use PostCreator (in lib) there is a comment in that specifies the params you need and some specs.

@eviltrout has started a transition phase to service objects, that way we avoid a lot of the callback soup we have today. It makes diagnosing issues and testing much simpler.


(Sander Datema) #12

Thanks for having a look at the code. I’ll start using PostCreator.

Unfortunately PostCreator won’t allow you to set the created_at and updated_at values needed for imports. So I’ll have to create a solution for that.


(Sam Saffron) #13

that is a bug in it, we need a PR to add that


Where does Discourse stand on minor refactoring PRs
(Sander Datema) #14

Updated to v1.6:

  • Now correctly implements PostCreator.

#15

Thanks for this work @Sander78 :ok_hand:


(Erlend Sogge Heggen) #16

I might be using this soon.

I’d like to know more about how the user import works.

  • Going from Facebook to Discourse, how do people claim their new account? Must they log in with Facebook?
  • How did the 6000-members import go?

(Sander Datema) #17

Facebook won’t send the mail address when importing, so all users get a unique fake address. However, since there Facebook username is in the database, they can simpy login using their Facebook login.

Caveat is that you need to tell all your users to change their email address in Discourse if they want to be able to receive digests, etc.

Haven’t tested the 6000 users yet, cause the group I wanted to test has grown to 10000 now and I think that starting a new forum in that case is a lot better.


(Erlend Sogge Heggen) #18

Thanks for the prompt reply.

Hmm, maybe we’ll write a simple plugin that keeps reminding users to change their e-mail address if they’re still using a @localhost address.

Why do you think it’d be better to start from scratch if you have 10’000 users? Would the script not be able to handle it?


(Juffin) #19

I’m planning importing 900 + user group with thousands of posts and replays (very active) into my install, any known issues i should take into consideration?
and thanks for this amazing plugin :wink:


(Sander Datema) #20

Well, definitely use the test mode first. I haven’t used my plugin for quite a while (since one only need it once).

For the rest: it should work with your amount of people.


(Khoa Nguyen) #23

I use your importer on the lastest Discourse 1.20beta3 @Sander78
and this is log when I run the importer (with test mode, everything are fine)

** Invoke import:facebook_group (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute import:facebook_group

*** Using fake email addresses

Facebook token accepted
Batch: 98 posts (since 04/09/2014 15:15 until 28/11/2014 10:43)

Amount of posts: 98
Amount of writers: 15
rake aborted!
NoMethodError: undefined method tr' for nil:NilClass /var/www/discourse/lib/tasks/import_facebook.rake:359:infb_username_to_dc'
/var/www/discourse/lib/tasks/import_facebook.rake:247:in block in dc_create_users_from_fb_writers' /var/www/discourse/lib/tasks/import_facebook.rake:245:ineach'
/var/www/discourse/lib/tasks/import_facebook.rake:245:in dc_create_users_from_fb_writers' /var/www/discourse/lib/tasks/import_facebook.rake:67:inblock in '
/usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/task.rb:240:in call' /usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/task.rb:240:inblock in execute'
/usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/task.rb:235:in each' /usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/task.rb:235:inexecute'
/usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/task.rb:179:in block in invoke_with_call_chain' /usr/local/lib/ruby/2.0.0/monitor.rb:211:inmon_synchronize'
/usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/task.rb:172:in invoke_with_call_chain' /usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/task.rb:165:ininvoke'
/usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/application.rb:156:in invoke_task' /usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/application.rb:112:inblock (2 levels) in top_level'
/usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/application.rb:112:in each' /usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/application.rb:112:inblock in top_level'
/usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/application.rb:121:in run_with_threads' /usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/application.rb:106:intop_level'
/usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/application.rb:84:in block in run' /usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/application.rb:182:instandard_exception_handling'
/usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/lib/rake/application.rb:79:in run' /usr/local/lib/ruby/gems/2.0.0/gems/rake-10.4.0/bin/rake:33:in'
/usr/local/bin/rake:23:in load' /usr/local/bin/rake:23:in'
Tasks: TOP => import:facebook_group

Can you help me to make this work?


(Mittineague) #24

Do all of the Facebook usernames meet the Discourse username allowed characters and length limit requirements?