Import posts from Facebook group into Discourse


(js1) #65

Okay, just had to disable RateLimiter in import_facebook.rake file:

if TEST_MODE then
exit_script # We’re done
# Disable RateLimiter?

# Create users in Discourse

(Andrew ) #66

This seems to have been caused by some incompatibility with the particular token I’d generated, though I can’t say exactly what. Resolved with new token(s).

(Andrew ) #67

Is the script supposed to import “link posts” (in Facebook’s terminology)? It did not work (imports the comments about the link, but leaves no trace of what the link was) in the tests @su_js1 and I did independently and working together, and I’d just like to confirm.

This thread and the documentation were clear that images wouldn’t be imported, and I assumed that if link posts aren’t imported, that disclaimer would/should be included as well.

(Andrew ) #68

Has anyone in this thread figured out a workable bulk change ownership function? The Facebook group I imported dates back almost 3 years and some users have thousands of posts. Merging the FB group with the existing Discourse (imported from pre-existing bbPress, and added to subsequently) is rather cumbersome without a way to do this.

(js1) #69

What’s the correct way to get the import tool to use real email addresses? And, does that require matching up the fb and discourse accounts?

(Jesse Perry) #70

I wouldn’t recommend that you use real email addresses during the import process, because it will quite literally spam them right after import. By default this tool adds a “.fake” to the email addresses to not spam anyone (and I recommend you set up a test Mandrill API token to not even have the emails attempt to go out).

(Dean Taylor) #71

It should be noted that you have other options in terms of disabling emails for the entire site.

For example the standard import scripts adjust a few site settings during the import and then revert them on completion:

(Andrew ) #72

The massive negative ramifications of using real emails with outgoing messages activated is understood. The mandrill test mode is handy for sure.

I think @su_js1 is asking about something we talked about outside of this thread, wherein all imported email addresses are username@localhost.fake. Our imports, separate and working together, never imported users’ actual email addresses, not even with .fake appended.

So for clarification:

  1. Does anyone currently get the script to import actual email addresses (with or without .fake) with a live group.
  2. If yes, are there other prerequisites (token must be generated by group admin, perhaps?) to get the real email addresses to import?

This is an FB group I started and am still admin on, and have frequent engagement with. I can notify and direct members of all of this through the group. However, the @localhost addresses lower the utility of this transition/import greatly, and orphaned accounts/content seen to be unavoidable since Discourse needs real email addresses to attach users in a meaningful way.

edit for clarification: I don’t recall using my FB account that has admin in the group for the tests.

(Jeff Atwood) #73

Someone would need to contact Facebook support to figure out if Facebook allows the emails to come out.

Could be they’re just jerks about this now and want to be a roach motel – user info (email) goes into Facebook, but never comes back out again. Hard to say unless someone asks and researches.

I don’t even have a Facebook account so this isn’t a good one for me to research.

(Jesse Perry) #74

Yes, when I imported I got all account email addresses with “fake” added. All I did was grab an API key from FB per instructions with user_groups and read_stream as permission. Also I’m not friends with the group members, but I am admin. So, I don’t believe it’s a security thing (that Facebook won’t give you email addresses because you’re not friends). Yes, definitely getting the API key through your account (that is admin) is a must.

You could play around with the Graph Explorer: Graph API Explorer - Facebook for Developers

If that gives you members’ email addresses, then the issue lies somewhere in the import process to Discourse, not with Facebook.

(js1) #75

The NoMethodError seems to be related to a token with admin access. It seems to hint that no data is being received even with read_stream permissions.

(Andrew ) #76

@jesselperry, would you be willing to disclose the number of members on the group it’s working for you on? Facebook has a threshold at 250 that changes what admins see, and our import with admin token in a group with 1,000+ members fails. When a non-admin token is used, the rake works, but no emails still.

Facebook messages and posts in groups with fewer than 250 people are marked as “seen” after your group members have seen them. If your group reaches 250 members or more, you’ll no longer see who’s seen messages and posts.

{edit: the group size doesn’t seem to be the issue}

(Jesse Perry) #77

Well my group was definitely smaller. Less than 50. Could it be pagination? Like if FB doesn’t give all the results with one call, but making multiple calls paginating through the results?

I suppose you could also just import without the emails, then somehow directly alter the Discourse database with the correct emails, if you could get those in a list separately from Facebook. Beyond my knowledge (and I believe yours, from what you were saying).

Sorry for the trouble! It will be worth it once it’s done and have your user data free from the chains of FB.

(Martin Eriksson) #78

I have spent a few weeks rewriting this script and made a lot of progress. I will list some of the specific updates below but let me first of all mention that this version of the script is somewhat battle-tested rather than simply experimental. I reworked it because I wanted to migrate a number of very active groups which have served key purposes in a specific community so I needed it to work well and have a reasonably complete feature set.

Specifically, I have used it to export 25 Facebook groups to a single Discourse instance including creating 900+ user accounts, 8,000+ topics, 70,000+ comments, 160,000+ likes and 2000+ images. Along the way I ran into a lot of tricky details and corner cases and I am happy to report that the current version of the script imports all of these groups flawlessly (I have specifically tested by re-importing all of them from scratch with the latest version to catch remaining issues). Of course, I can not guarantee that it will work for all other groups but I think it is now ready for anyone to try out.

What remains are things I consider marginal, like importing polls which I think would be possible. Also, images are not imported for a small number of posts because I have simply not found any way of getting them from the API. But all of the things that previously was lacking for serious migration efforts are fixed, i.e. related to importing comments, likes and images. See details about this and more in the readme file:

GitHub - sanderdatema/import_facebook_into_discourse: This rake task will import all posts of a Facebook group into Discourse

I would be excited if someone who manages larger groups would want to try the importer. If you appreciate my hard work with this project, please reward me by letting me know how the script is used. I would also be happy to help if someone has problems, especially if we run into new issues when importing very large groups.

Here are some highlights from the changelog:

  • Import all top-level comments (not just the 25 first comments per thread)

  • Import replies, i.e. comments on comments

  • Import likes for all posts and comments

  • Import images for all comments and almost all posts

  • Import shared links

  • Import user tagging/mentions in messages

  • More sophisticated method for constructing topic titles

  • More detailed terminal output and progress indicators

  • Graceful handling of errors, e.g. expired access tokens and unfetchable objects

  • Detailed reporting when script exits listing e.g. objects which could not be fetched

  • Handling of a large number of special cases and quirks of the Facebook API

  • Many modifications to enable stopping and restarting (necessary for importing large groups)

  • Optional saving of all imported data to disk (useful for importing large groups)

  • Option for doing imports completely from files without accessing API

  • More reliable method for creating Discourse user accounts

  • Optional flag for avoiding API rate limiting

  • Cleaner code and code style

Please try it out and let me know how it goes!

:warning: Note: I am unclear about whether or not using the importer is allowed by the Facebook API terms of service. The requirements for interacting with groups through the API are restrictive but also not very clearly defined in my opinion. However, Facebook has a general policy of allowing data exports, e.g. individual users can download a CSV archive with all their timeline activities, so I think that it is pretty clear that this usage does not violate the spirit of related Facebook policies.

(Vu Huynh) #79

Hello, how large is your hosting storage?

(Jesse Perry) #80

WOW great work. I personally don’t have a need for this anymore — but this will help a lot of people with how often Facebook groups are used.

(Vu Huynh) #81

I have a problem when install your plugin.

First, I add git - git clone to my app.yml file

then rebuild Discourse but nothing happen :frowning:

I read your introduce on github but don’t know how to install by gemfile

Can you help me?

(Sander Datema) #82

Did you folllow the instructions in the readme? It’s not a plugin, so it won’t work the way you installed it.

(Vu Huynh) #83

Yes, I follow the readme, but this is the first time I work with Discourse/ ruby … because the connect between facebook group and forum meet my solution :frowning: I don’t understand how to operate this code, sorry about that. If you can, please help me. Thank you very much

(Martin Eriksson) #84

Our nightly backups are about 500 MB in size, including uploads/images.

As @Sander78 mentioned, this is not a plugin, so you do not need to modify app.yml, rebuild Discourse etc. There are basically these steps:

  1. Move the rake task into the rake task folder. Move the config file to the config folder. This is the basic installation.

  2. Add the special dependencies. This is the part which involves the Gemfile. If you are familiar with Ruby dependency management, this part is easy. If you are unsure, read up on how Ruby gems and Gemfile works. It is not very complicated! Basically, the Gemfile says what libraries your program needs and bundle reads the Gemfile to make sure they are installed.

  3. Add you specific information to the config file (i.e. group id and name, access token etc).

  4. Run the rake task.

All of this is described in the readme file and those instructions are complete. You do not need to do anything else, i.e. modify some other files, rebuild etc.

If all of this is unfamiliar to you, you should probably learn more about Ruby, Ruby on Rails, gems, rake tasks etc. These things are not specific to Discourse but rather standard Ruby stuff. To run a Discourse site by yourself, you will probably need to have some basic knowledge about these things anyway.

Good luck!