A Topic Auto-Tagger with Machine Learning?

I’m an expert in machine learning, but not in discourse. I’ve been using discourse a lot though, and really enjoying it.

A tool that I think would be very powerful would be a script to run (interacting with the API) that could:

  1. Look at tag usage, and auto-tag topics based on data. For example, if a tag being used is “recipe” and some topics are tagged “recipe”, a machine learning algorithm could identify additional posts that should be tagged “recipe.”

  2. Look at topics to propose new tags, and auto-tag relevant topics.

I think the right implementation of this would be in python, and off the actual discourse implementation. With good software design, the NLP and ML would be segregated from the code interacting with discourse. By interacting with discourse through the API, this would allow arbitrary ML code to be developed, and for an open-source python package to be developed with contributions from the ML community essentially independently of the discourse distribution. A solid interface might also enable applying ML to discourse forum management in other ways too.

So, I’m interested in developing an initial go at the ML/NLP to implement an auto-tagger, and making this an open source library.

Are there any discourse developers familiar with the API that would be interested in helping with the discourse communication component of this project/library? This is a critical component of the team that is missing. We need someone who can do this before we can get started.

Are there any other academics/experts that would like to participate in the ML/NLP development?

5 Likes

Perhaps @samamorgan or @black have some suggestions or would participate?

https://github.com/samamorgan/discourse

1 Like

I would recommend leaning on webhooks here, they you would consume the webhook from you python app and react to it using our API.

Curious to see how well it goes.

Other area that may be interesting is using word2vec or some sort of sentence to vector to figure out topic similarity in #support … many things are asked many times in many different ways, gluing information together can be very beneficial.

4 Likes

At the moment, I’m thinking that neo4j might be the way to go…

I’ll also look into webhooks…

@swamidass Sounds like an interesting project! I’d be happy to hop in and help if it’s open-source.

1 Like

@samamorgan, can you build some interface code to import (using the API) discourse information into neo4j. Turns out that this can be done without any python code. Neo4j has an interface to do this. Look at the twitter and stackexchange examples:

This is relevant too:

If you have a project in mind, start a git and outline the process. I’m happy to hop on and contribute as time allows if you link that here.

2 Likes

Thanks. Here is the git repository.

https://github.com/swamidass/discourse-machine-learning

1 Like