Issues with Using disqus.rb Import Script

Greetings,

I’ve just setup Discourse but I would like to import my comments from Disqus. I noticed there’s a lovely import script, however it doesn’t seem to be working for me.

UPDATE #2: Ok, it seems I had some malformed XML, so I am now having the same issue as noted here. The issue is that Disqus no longer includes the email address in the XML exports and in fact “hides” them in their dashboard. So maybe it just won’t be possible to import the comments unless if you add some additional code to also make up email addresses on the fly for the create_users function.

UPDATE: Actually I guess I should step back for a second. Without adjusting the frozen_string_literal item at the top of the script I get:

Traceback (most recent call last):
        6: from script/import_scripts/disqus.rb:228:in `<main>'
        5: from script/import_scripts/disqus.rb:228:in `new'
        4: from script/import_scripts/disqus.rb:21:in `initialize'
        3: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogiri-1.10.10/lib/nokogiri/xml/sax/parser.rb:104:in `parse_file'
        2: from /var/www/discourse/vendor/bundle/ruby/2.6.0/gems/nokogiri-1.10.10/lib/nokogiri/xml/sax/parser.rb:104:in `parse_with'
        1: from script/import_scripts/disqus.rb:176:in `characters'
script/import_scripts/disqus.rb:195:in `record': can't modify frozen String (FrozenError)

So, maybe that issue should be resolved first (before digging into the below)?

root@discourse:/var/www/discourse# su discourse -c "bundle exec ruby script/import_scripts/disqus.rb"
Loading existing groups...
Loading existing users...
Loading existing categories...
Loading existing posts...
Loading existing topics...

importing users...

importing topics...


Updating topic status

Updating bumped_at on topics

Updating last posted at on users

Updating last seen at on users

Updating first_post_created_at...

Updating user post_count...

Updating user topic_count...

Updating topic users

Updating post timings

Updating featured topic users

Updating featured topics in categories
        4 / 4 (100.0%)  [3222 items/min]  n]  
Resetting topic counters


Done (00h 00min 00sec)

I don’t know a whole lot of Ruby - actually I don’t know any, but I do know enough to try to add some debugging like puts "#{id}" to see if anything is getting fetched and stuff. For example I added above line 190 - puts "#{target}" or puts "#{str}" so I can see it’s definitely reading the file.

I know enough of the script is working that IMPORT_FILE and IMPORT_CATEGORY have been set properly.

Any ideas on what else I do to troubleshoot/debug?

Thanks! :blue_heart:

I did some work with the discus a couple months ago. I think I fixed the frozen_string_literal by replacing target[sym] << str with target[sym] += str, though just changing true to false in the first line should be fine.
I vaguely remember it doing nothing as it seems to have for you, but I can’t remember what it was and don’t see anything obvious in my changes. And I had to make some other changes as this import needs to integrate into an existing site, so I’m afraid my code won’t help you.

1 Like

Hey @pfaffman! Thanks for taking the time to respond!

I tried using target[sym] += str as well and changing the first line to be false as well. I think my update #2 above is correct. It’s trying to pass the value in email to another function, but since it’s null or empty the other function flips out and errors :stuck_out_tongue_closed_eyes:

Some sort of new code would need to be introduced to handle where email is empty and replace it with a default value, generate a new unique value or just skip over creating a new user when it’s empty.

    @parser.posts.each do |id, p|
      p[:author_email] = "#{p[:author_username]}@nowhere.invalid" unless p[:author_usrname]
      next if p[:is_spam] == 'true' || p[:is_deleted] == 'true'
      puts "name: #{p[:author_name]}, username: #{p[:author_username]}, email: #{p[:author_email]} "
      by_email[p[:author_email]] = { name: p[:author_name], username: p[:author_username] }
    end

If you fix it and it’s truly broken for Disqus imports, feel free to open a PR to improve it!

1 Like

I always intend to…

2 Likes

Alright! I’ve gotten this working by using the following:

    @parser.posts.each do |id, p|
      p[:author_username] = "#{p[:author_name]}" unless p[:author_username]
      p[:author_email] = "#{p[:author_username]}@disqus.sucks" unless p[:author_email]
      next if p[:is_spam] == 'true' || p[:is_deleted] == 'true'
      puts "name: #{p[:author_name]}, username: #{p[:author_username]}, email: #{p[:author_email]}"
      by_email[p[:author_email]] = { name: p[:author_name], username: p[:author_username] }
    end

    @parser.threads.each do |id, t|
      t[:author_username] = "#{t[:author_name]}" unless t[:author_username]
      t[:author_email] = "#{t[:author_username]}@disqus.sucks" unless t[:author_email]
      by_email[t[:author_email]] = { name: t[:author_name], username: t[:author_username] }
    end

I also have frozen_string_literal: false set.

2 Likes

I made it farther, but now something new is nil. I’ll hunt and poke, but I would appreciate help, and I commit to contribute my discoveries back.

I applied the patches in this thread, and then became stuck here when importing topics:

Traceback (most recent call last):
        5: from script/import_scripts/disqus.rb:236:in `<main>'
        4: from /var/www/discourse/script/import_scripts/base.rb:47:in `perform'
        3: from script/import_scripts/disqus.rb:29:in `execute'
        2: from script/import_scripts/disqus.rb:70:in `import_topics_and_posts'
        1: from script/import_scripts/disqus.rb:70:in `each'
script/import_scripts/disqus.rb:84:in `block in import_topics_and_posts': undefined method `topic' for nil:NilClass (NoMethodError)

OK. I don’t think I’m up for this tonight.

I’ve been able to figure out that find_remote() returns nil for a comment and I have no idea what this means nor why it matters. I suspect that it has something to do with HTTP/HTTPS, so out of desperation, I’m going to just change all the URLs from HTTP to HTTPS in my Disqus export files and hope for the best.

Advice is so welcome.

[UPDATE!]

My current conjecture: if the Disqus comment is on a post whose link doesn’t have a Disqus thread any more (such as a tumblr blog from 1853), then this import script dies a horrible death.

By chance, if I’m right, could someone who knows the code better than I suggest how to patch the importer so that it happily skips such threads and comments?

And if I’m not right, well, then I’ll probably reply to this with more discoveries. :stuck_out_tongue:

[UPDATE!]

I deleted, by hand, all the comments from threads with links that no longer go anywhere. This appears to have sufficed to import all my Disqus comments. I believe, therefore, it would be nice if someone who can read the code for this script could patch it accordingly. I don’t think I can do it.

I hope, at least, that these words help the next poor soul who follows me. Good luck.

Peace.