A Topic Auto-Tagger with Machine Learning?

swamidass · October 31, 2020, 5:04pm

I’m an expert in machine learning, but not in discourse. I’ve been using discourse a lot though, and really enjoying it.

A tool that I think would be very powerful would be a script to run (interacting with the API) that could:

Look at tag usage, and auto-tag topics based on data. For example, if a tag being used is “recipe” and some topics are tagged “recipe”, a machine learning algorithm could identify additional posts that should be tagged “recipe.”
Look at topics to propose new tags, and auto-tag relevant topics.

I think the right implementation of this would be in python, and off the actual discourse implementation. With good software design, the NLP and ML would be segregated from the code interacting with discourse. By interacting with discourse through the API, this would allow arbitrary ML code to be developed, and for an open-source python package to be developed with contributions from the ML community essentially independently of the discourse distribution. A solid interface might also enable applying ML to discourse forum management in other ways too.

So, I’m interested in developing an initial go at the ML/NLP to implement an auto-tagger, and making this an open source library.

Are there any discourse developers familiar with the API that would be interested in helping with the discourse communication component of this project/library? This is a critical component of the team that is missing. We need someone who can do this before we can get started.

Are there any other academics/experts that would like to participate in the ML/NLP development?

swamidass · October 31, 2020, 5:09pm

Perhaps @samamorgan or @black have some suggestions or would participate?

https://github.com/samamorgan/discourse

sam · November 2, 2020, 4:36am

I would recommend leaning on webhooks here, they you would consume the webhook from you python app and react to it using our API.

Curious to see how well it goes.

Other area that may be interesting is using word2vec or some sort of sentence to vector to figure out topic similarity in support … many things are asked many times in many different ways, gluing information together can be very beneficial.

swamidass · November 2, 2020, 10:37am

At the moment, I’m thinking that neo4j might be the way to go…

I’ll also look into webhooks…

samamorgan · November 2, 2020, 6:09pm

@swamidass Sounds like an interesting project! I’d be happy to hop in and help if it’s open-source.

swamidass · November 3, 2020, 5:08pm

@samamorgan, can you build some interface code to import (using the API) discourse information into neo4j. Turns out that this can be done without any python code. Neo4j has an interface to do this. Look at the twitter and stackexchange examples:

This is relevant too:

samamorgan · November 3, 2020, 6:29pm

If you have a project in mind, start a git and outline the process. I’m happy to hop on and contribute as time allows if you link that here.

swamidass · November 4, 2020, 6:17pm

Thanks. Here is the git repository.

https://github.com/swamidass/discourse-machine-learning

Topic		Replies	Views
How to add machine-generated tags to a post? Feature	2	837	June 28, 2020
Topic auto tagging Feature	27	7923	September 28, 2021
Automated tagging when topic is created Support tags	2	60	March 5, 2025
Using AI To Tag And Categorize Forum Posts Support ai	5	144	May 19, 2025
Feature suggestion: "add tag" popup should suggest likely tags based on topic content Feature	3	720	January 19, 2021

A Topic Auto-Tagger with Machine Learning?

Related topics