Word cloud plugin for discourse?

Is there a word cloud plugin for discourse?

Carl

3 Likes

There is not… is there a specific reason you’d like one? how would it be used?

6 Likes

if would be cool in two ways. one, a word cloud i could click on could then bring up all the topics that match that click on a word like “subscriber”.

two, you could display other types of searches like this, or top posters, or whatever you want.

probably could be something that runs in a cron job one a day or more often.

1 Like

I thought this was a fun idea :game_die: … so I created it*

It’s at a very early ‘just working’ stage and needs a lot of refinement and additional options and potentially some click functionality:

https://github.com/merefield/discourse-word-cloud

It adds a link on your Hamburger Menu. :tada:

:warning: be aware that currently it builds the word stats from all Posts, regardless of type and location. This could effectively act as a very-round-the-houses mild privacy leak (might need some additional safeguards to exclude words from posts in private areas). You have to be logged in to see it and access the data though … and the words are rendered as SVG’s … and it only shows the top x hundred words, so unlikely to be much of a concern to most sites. I’ll work on that to make it more secure, but this way the query runs very fast.

Enjoy. :man_dancing:

*It leverages some pretty nifty existing libraries which I’ve credited in the repo. Shout out to @DiscourseMetrics whose query I leveraged.

15 Likes

very cool. i think you would also want to not include certain words in the word cloud?

1 Like

Sure, it needs a whole load of sensible exclusions and the regexes need work to get rid of markdown formatting etc. whilst not making it overly complicated. This is just a start. I’ve just added some colour.

2 Likes

Just to be clear though it’s awesome lol

1 Like

Added a localised list of ignore words:

https://github.com/merefield/discourse-word-cloud/commit/066529ed048b004d6f9c6859697cde6aed24a9fd

which should make results a little more interesting …

I’ve also added a lot of sanitising logic, so the result is much better.

3 Likes

Nice! :heart_eyes_cat: I like this effort. Nice job. If I could request features:

  • make the hamburger menu link optional (I like the idea of this being an easter egg)
  • create category setting, to only include selected categories
  • provide a category route so you can generate a word cloud of just one category and sub-categories, e.g. /wordcloud/category

Here’s how it looks on my neighborhood forum.

4 Likes

works well, need to fine tune it:

3 Likes

Great feedback, thanks, and some good ideas!

Yes that sounds like a good approach. 3 metres deep in client work atm but will look at Category selection for next update.

2 Likes

NB Word stats are updated every hour now (which is probably still excessive, but for the time being makes it easier to checkout changes in Production as we go through a lot of initial code evolution).

NB#2 I’ve not yet considered other languages here beyond English (it’s certainly not tested). The current word manipulation may not work well in some languages. Suggestions & PR’s welcome.

3 Likes

Cool! Here’s an updated wordle just including the most relevant categories.

Mine is a small community and still fairly new. To be honest, though, the info presented in the wordle looks pretty but is not especially meaningful or useful. I guess it could be used as a visual in a retrospective topic about the community or something along those lines. Would be fun to see more examples of how people use this.

Some of the included words are common and meaningless, e.g. youd, off, got, add etc. I wonder if the “word cloud ignore portion” setting (which is 100 for me, the default) is doing its job? Or maybe there is another/better list of words to ignore?

1 Like

Yeah, happy to consider a larger list (I’d found a 200 word list here, but deferred to wikipedia as a more ‘authoritative source’)

1 Like

OK i’ve:

  • expanded the ignore list to 300 words, using a list I found here
  • enhanced the regex’s to strip out quotes (so the word ‘quote’ didn’t get featured so much!)
  • removed the arbitrary cull of the top ten remaining words which was redundant after adding the ignore list.

NB if there are still words you want to exclude, just add them to the beginning of:

like i’ve done here (eg. ‘ive’, ‘its’, ‘topic’, ‘post’)

to see the impact of any changes more quickly, simply re-trigger the job from Sidekiq:

That’s it for a while I suggest. I may create a dedicated Topic.

3 Likes

OK, you might like this:

https://github.com/merefield/discourse-word-cloud/commit/84770618bec7e17457faff2b31a54aa894ee5743

Update: I’ve now simplified the ignore list arrangement so there’s no longer a setting for ‘portion’ of ignore list employed, you simply have to delete or add words to the ignore list using the native localised setting:

https://github.com/merefield/discourse-word-cloud/commit/074e0902269e752c11c3c29018f8c68c813327d3

2 Likes

do we need to uninstall old version to get this?

1 Like

You should only need to upgrade the plugin. Having issues?

i apologize we figured it out. :sunglasses:

1 Like

No problem at all :+1: