Migration from Yahoo! Groups

I don’t have mbox files, and I’m not aware of any way to get them–Yahoo certainly won’t let me download them. Do you know of something that would convert JSON to mbox? Google shows a number of tools for going in the other direction, but I don’t see anything that covers this quickly.

I’d expected that, since there were existing scripts designed to migrate Yahoo groups specifically, those scripts would actually work, and that would be the most straightforward way to accomplish this task. It appears my expectation was optimistic–the scripts “work” in that they migrate the messages, and they kind of migrate the users, but missing most of the email addresses and assigning most of the messages to the wrong user is a bit of a problem.

The thing that’s frustrating me is that it seems like this should be a trivial fix for someone who actually knows a thing or two about Ruby–but unfortunately I’m not such a person (I’m trying, but there’s never enough time for everything). My group is small enough that I can probably fix it manually if I need to–but I’d rather not need to, and even more to the point, I’m trying to come up with a general method that other Yahoo groups owners can use.

Edit: I guess I should be glad that I’m managing as much as I am in a language I really don’t know anything about, but I still feel like there’s something major (that should be obvious) that I’m missing. I’ve tried using a different method with the Mail gem. The portion of import_users that I’ve edited reads as follows:

    create_users(profiles.to_a) do |u|

      user_id = user_id + 1

      # fetch last message for profile to pickup latest user info as this may have changed
      user_info = @collection.find("ygData.profile": u["_id"]["profile"]).sort("ygData.msgId": -1).limit(1).to_a[0]

      # Store user_id to profile lookup
      @user_profile_map.store(user_info["ygData"]["profile"], user_id)

      puts "User created: #{user_info["ygData"]["profile"]}"
      
      user_email = Mail::Address.new(HTMLEntities.new.decode(user_info["ygData"]["from"]))

      user =
       {
        id: user_id,  # yahoo "userId" sequence appears to have changed mid forum life so generate this
        username: user_info["ygData"]["profile"],
        name: user_info["ygData"]["authorName"],
        email: user_email.address, # mandatory
        created_at: Time.now
      }
      user
    end

And it works! Well, mostly. Of 302 distinct users counted by the script, it imports 289. They show up on the admin page with the correct usernames, full names (when provided), and email addresses. The script says it imports all 302 and reports no errors. But when it starts importing topics, I get this:

Importing discussions
Topic: 1 / 12232  (0.01%)  Subject: Newspapers
Topic: 2 / 12232  (0.02%)  Subject: Ents
Traceback (most recent call last):
	8: from script/import_scripts/yahoogroup.rb:168:in `<main>'
	7: from /home/dan/discourse/script/import_scripts/base.rb:47:in `perform'
	6: from script/import_scripts/yahoogroup.rb:40:in `execute'
	5: from script/import_scripts/yahoogroup.rb:101:in `import_discussions'
	4: from script/import_scripts/yahoogroup.rb:101:in `each_with_index'
	3: from script/import_scripts/yahoogroup.rb:101:in `each'
	2: from script/import_scripts/yahoogroup.rb:132:in `block in import_discussions'
	1: from /home/dan/discourse/script/import_scripts/base.rb:535:in `create_post'
/home/dan/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/activerecord-6.0.0/lib/active_record/core.rb:177:in `find': Couldn't find User with 'id'=298 (ActiveRecord::RecordNotFound)

…which isn’t surprising, since the highest user id is 290.

2 Likes

Would Discourse have any logs that would indicate which users hadn’t been created and why? Where would those be?

1 Like

Emphasis added on my error. Turns out Yahoo does let you download them, but it’s a bit of a process, and nowhere does it tell you that mbox files are what you’ll get. Yahoo has a “Get my data” tool. Go there, log in, submit a request, and wait until they notify you (about a week for me). They’ll send you an email with a URL, where you’ll go to download a .zip file that appears to contain most of the contents of every group of which you’re a member (the photos appear to be missing). Somewhat surprisingly, the .mbox files contain full email addresses even for groups of which you aren’t a moderator.

So, @gerhard, it appears I was premature in disregarding your suggestion–my apologies.

Edit: Yes, the .mbox process seems to work much better. Some messages are getting skipped (~100 for the apparent lack of a date, for example), but almost all the 38k messages made it, all the users made it (and a spot check indicates they’re all associated with the correct posts), all with the correct email addresses. It isn’t perfect at keeping topics together (the other script wasn’t either), but it’s doing pretty well. And, as a bonus, it makes for a simpler method to document than what I’d been trying to do. Only downside I see so far is the delay for Yahoo to make your stuff available to download.

10 Likes

Wow! That’s pretty wild. I guess they figure that if you’ve been on the list, you already have the email addresses.

This is good news - I just did the download and it looks like I have a pretty comprehensive archive of messages for my yahoogroup that I’d like to hang on to, in handy and portable mbox format. Sweet!

6 Likes