Is there a word cloud plugin for discourse?
Carl
There is not… is there a specific reason you’d like one? how would it be used?
if would be cool in two ways. one, a word cloud i could click on could then bring up all the topics that match that click on a word like “subscriber”.
two, you could display other types of searches like this, or top posters, or whatever you want.
probably could be something that runs in a cron job one a day or more often.
I thought this was a fun idea … so I created it*
It’s at a very early ‘just working’ stage and needs a lot of refinement and additional options and potentially some click functionality:
https://github.com/merefield/discourse-word-cloud
It adds a link on your Hamburger Menu.
be aware that currently it builds the word stats from all Posts, regardless of type and location. This could effectively act as a very-round-the-houses mild privacy leak (might need some additional safeguards to exclude words from posts in private areas). You have to be logged in to see it and access the data though … and the words are rendered as SVG’s … and it only shows the top x hundred words, so unlikely to be much of a concern to most sites. I’ll work on that to make it more secure, but this way the query runs very fast.
Enjoy.
*It leverages some pretty nifty existing libraries which I’ve credited in the repo. Shout out to @DiscourseMetrics whose query I leveraged.
very cool. i think you would also want to not include certain words in the word cloud?
Sure, it needs a whole load of sensible exclusions and the regexes need work to get rid of markdown formatting etc. whilst not making it overly complicated. This is just a start. I’ve just added some colour.
Just to be clear though it’s awesome lol
Added a localised list of ignore words:
https://github.com/merefield/discourse-word-cloud/commit/066529ed048b004d6f9c6859697cde6aed24a9fd
which should make results a little more interesting …
I’ve also added a lot of sanitising logic, so the result is much better.
Nice! I like this effort. Nice job. If I could request features:
/wordcloud/category
Here’s how it looks on my neighborhood forum.
Great feedback, thanks, and some good ideas!
Yes that sounds like a good approach. 3 metres deep in client work atm but will look at Category selection for next update.
Category selection is in:
https://github.com/merefield/discourse-word-cloud/commit/0777adc19516688ec651f3c1439b981dc8367ec0
If you select no Category (default) you get a scan of all forum Posts (PMs and all). If you add just one Category, word stats are restricted to that etc.
As are humungous improvements to the regex’s ( ) which now clean up the ‘raws’ nicely and get rid of most if not all the Markdown.
NB Word stats are updated every hour now (which is probably still excessive, but for the time being makes it easier to checkout changes in Production as we go through a lot of initial code evolution).
NB#2 I’ve not yet considered other languages here beyond English (it’s certainly not tested). The current word manipulation may not work well in some languages. Suggestions & PR’s welcome.
Cool! Here’s an updated wordle just including the most relevant categories.
Mine is a small community and still fairly new. To be honest, though, the info presented in the wordle looks pretty but is not especially meaningful or useful. I guess it could be used as a visual in a retrospective topic about the community or something along those lines. Would be fun to see more examples of how people use this.
Some of the included words are common and meaningless, e.g. youd, off, got, add etc. I wonder if the “word cloud ignore portion” setting (which is 100 for me, the default) is doing its job? Or maybe there is another/better list of words to ignore?
Yeah, happy to consider a larger list (I’d found a 200 word list here, but deferred to wikipedia as a more ‘authoritative source’)
OK i’ve:
NB if there are still words you want to exclude, just add them to the beginning of:
like i’ve done here (eg. ‘ive’, ‘its’, ‘topic’, ‘post’)
to see the impact of any changes more quickly, simply re-trigger the job from Sidekiq:
That’s it for a while I suggest. I may create a dedicated Topic.
OK, you might like this:
https://github.com/merefield/discourse-word-cloud/commit/84770618bec7e17457faff2b31a54aa894ee5743
Update: I’ve now simplified the ignore list arrangement so there’s no longer a setting for ‘portion’ of ignore list employed, you simply have to delete or add words to the ignore list using the native localised setting:
https://github.com/merefield/discourse-word-cloud/commit/074e0902269e752c11c3c29018f8c68c813327d3
do we need to uninstall old version to get this?
You should only need to upgrade the plugin. Having issues?
i apologize we figured it out.
No problem at all