I’m an expert in machine learning, but not in discourse. I’ve been using discourse a lot though, and really enjoying it.
A tool that I think would be very powerful would be a script to run (interacting with the API) that could:
-
Look at tag usage, and auto-tag topics based on data. For example, if a tag being used is “recipe” and some topics are tagged “recipe”, a machine learning algorithm could identify additional posts that should be tagged “recipe.”
-
Look at topics to propose new tags, and auto-tag relevant topics.
I think the right implementation of this would be in python, and off the actual discourse implementation. With good software design, the NLP and ML would be segregated from the code interacting with discourse. By interacting with discourse through the API, this would allow arbitrary ML code to be developed, and for an open-source python package to be developed with contributions from the ML community essentially independently of the discourse distribution. A solid interface might also enable applying ML to discourse forum management in other ways too.
So, I’m interested in developing an initial go at the ML/NLP to implement an auto-tagger, and making this an open source library.
Are there any discourse developers familiar with the API that would be interested in helping with the discourse communication component of this project/library? This is a critical component of the team that is missing. We need someone who can do this before we can get started.
Are there any other academics/experts that would like to participate in the ML/NLP development?