Forwarding new posts from googlegroup to Discourse

There are a couple of posts on migrating googlegroups to discourse using scrapers but I’d also like to have new messages that are delivered there be forwarded automatically to Discourse.

The first steps are easy enough:

  • add the email address of a Discourse category to the googlegroup in question (at googlegroups use the direct-add option so that it doesn’t need a confirmation email)
  • go to the Discourse settings and turn off “block auto generated emails” or the messages will all bounce because Discourse detects that it came via a mailing list (and will email the user of the googlegroup!)

BUT the emails still don’t get through because the “To:” field of the emails from googlemail is the address of the whereas the “Delivered to:” address is the one we need. I don’t suppose anyone knows a way to get around this?

2 Likes

Having a robust mechanism for running the Google group and discourse in parallel would be ideal during migration.

I’d be happy with a “capture all Google group content and migrate only” approach; ie. Any new discourse posts would NOT be synced back to the mailing list.

This would help capture rogue emails, recalcitrant list members and stop the need for just “ending” the list overnight.

Any thoughts on this @zogstrip?

Agreed. We don’t need or want 2-way sync, just message forwarding

We’ve been contemplating this for some time now. Initial thoughts by @sam here:

The ideal scenario being:

That would be even better!

I was just thinking that allowing the “Delivered to:” field to be checked would make this roughly work with very little effort at the dev end (it might not solve all the problems, actually, but it would solve the current block)

On a similar line I actually wrote a little Python script to forward the old emails from the group using an IMAP/SMTP combo. In theory it was nice:

  • I could detect the intended category of quite a lot of the messages based on their content and direct them to the appropriate discourse mailbox
  • doesn’t need access the docker container or change anything directly on the server
  • the date of the post is used correctly (i.e. we can post in the past!) unlike the googlegroup scraper(?)

Unfortunately I ran into a few problems with email rejections and bounces; I guess it looked too much like the forum was being spammed by new users and I didn’t have time (yet) to go through the settings to work out how to make discourse temporarily relaxed enough to let in all the posts from new staged users.

In case anyone wants to play with it:

from __future__ import unicode_literals
from datetime import date
import imaplib
import smtplib
import email
import getpass
import time

sent = []

testing = False
#from
HOST = 'imap.googlemail.com'
USERNAME = 'user.name'
PASSWORD = getpass.getpass()
ssl = True
fromDate = "01-July-2016"
toDate = "10-July-2016"
toSearch = [ # search these mailboxes with associated terms
        {'mbox':'read_mail', 'term':'TO "psychopy-users@googlegroups.com"'}, 
        {'mbox':'psychopy-dev', 'term':'TO "psychopy-dev@googlegroups.com"'},
        ]

# smtp server 
# DON't USE smtp.gmail as it converts the From: field to be that account
outbox = smtplib.SMTP('smtp.server.address:port')

# imap server
client = imaplib.IMAP4_SSL(HOST)
client.login(USERNAME, PASSWORD)
# print client.list() # show valid mailboxes

# some helper functions to search for terms
def is_(msg, terms, notTerms):
    guess = False
    terms = ['routine','flow','builder','graphical']
    notTerms = ['coder'] # indicates ambiguous text
    #look for hopeful terms
    for thisTerm in terms:
        try:
            if thisTerm in msg:
                guess = True
                break
        except: # will fail if encoding error
            return False
    # then vito with notTerms
    for thisTerm in notTerms:
        if thisTerm in msg:
            return False
    return guess

def isBuilder(msg):
    terms = ['routine','flow','builder','graphical']
    notTerms = ['coder'] # indicates ambiguous text
    guess = is_(msg, terms, notTerms)
    return guess

def isCoder(msg):
    terms = ['import', 'iohub', 'script']
    notTerms = ['builder'] # indicates ambiguous text
    guess = is_(msg, terms, notTerms)
    return guess

# store topic names that we already categorised
# or we could end up in different categories for a single topic
knownTopics = {} 

# just for info to keep track
nBuilder=0
nCoder = 0
nUnknown = 0
nDev = 0

# do the actual work
for search in toSearch:
    # set the location and term for this search
    client.select(search['mbox']) # select that mailbox/folder
    searchPhrase = '(%s SINCE "%s" BEFORE "%s")' %(search['term'], fromDate, toDate)
    status, response = client.search(None, searchPhrase)
    # then loop over results
    msgIDs = response[0].split()
    for msgID in msgIDs:
        if msgID in sent:
            continue
        status, email_data = client.fetch(msgID, "(RFC822)")
        env, msg = email_data[0]
        message = email.message_from_string(msg)
        subj = message['Subject']

        #try to determine target 
        if 'dev' in search['term']:
            target = "psychopy+dev@discoursemail.com"
            nDev += 1
        elif isBuilder(msg):
            target = "psychopy+builder@discoursemail.com"
            nBuilder += 1
        elif isCoder(msg):
            target = "psychopy+coder@discoursemail.com"
            nCoder += 1
        else:
            target = "psychopy+other@discoursemail.com"
            nUnknown += 1
        knownTopics[subj[-15:]] = target

        # make sure the "To:" field matches the target address
        message.replace_header("To", target)
        print("%s: %s - %s" %(msgID, message['From'], message['Subject']))
        print("  -> %s" %(target))

        if not testing:
            outbox.sendmail(message['From'], target, message.as_string())
            time.sleep(2.0) # could adjust to slow this down and look more "human"?
            sent.append(msgID)

print("Builder=%i, Coder=%i, Dev=%i, Unknown=%i"
        %(nBuilder, nCoder, nDev, nUnknown))
print(sent) # could use this to store handled messages for next run
1 Like

Can you PM me the raw version of an email with that Delivered-To header? We’re already supporting this “destination” but I need to test to make sure it works properly.

1 Like

Just pushed a fix that should allow you to support your use case. This will try to process incoming email on all its “destination” fields before rejecting it.

https://github.com/discourse/discourse/commit/323bd555c05f04de74110ed02375209e71b57a30

2 Likes

Cool. Thanks. :slight_smile:

How often roughly do the hosted discourse server get updated with things like this? We have an open-source hosted package.

Our hosted customers are updated at least every week (unless we’re doing a huge refactor).

Awesome. I’ll turn on the forwarding again in a week then and see what happens [crosses fingers]

1 Like

I was trying to activate this today with the following setup.

  • Directly added the in-mail of the category: ...@discoursemail.com to the google-group
  • Disabled Block incoming emails identified as being auto generated.
  • Enable Accept incoming e-mails from users with no account for that category

An e-mail from google-groups would be rejected with the error Email::Receiver::BadDestinationAddress.
If I set the e-mail in mail to the ...@googlegroups.com address the new topics are successfully created, but I then can’t use the ...@discoursemail.com address for direct mail-in. And I only want to use the forwarding during the transition time.

Looking at the raw e-mail it seems that there is a Delivered-To: field that contains the DestinationAddress and the To: field only contains the ...@googlegroups.com address.

An alternative would be to allow for several Custom incoming email addresses.

@zogstrip Should I PM you the raw message?

Have you tried adding multiple incoming email addresses separated by a “|”? :wink: (it’s a hidden feature)

5 Likes

Can this be adapted to import an email chain?