How to clean up the community

… to make it a healthy digital space.

All communities start at level zero, and will grow until the team of administrators feels comfortable … and begins to analyze why there are topics of debate that have been published for years but do not receive visits. Do you need to keep that theme? Shall we archive it? Do we delete it?

If we archive it, search engine spiders will be able to continue indexing it, so it could continue to bring new readers to the forum. But … is it really useful? Or would it be better to delete it?

On the other hand, if we delete it, we free up resources and refresh the possibility that those who come to the community open new and fresh discussions about content that is not published (or archived).

In my case, my forum is approaching 500,000 pageviews in the last 30 days, and I want to better optimize the content that I show the world.

How to optimize content? How to properly clean old content?

Some tasks that are being carried out right now:

  • In tutorial category or knowledge posts, new questions such as new topics are being moved to the support category, and messages with content are left in the appropriate sector, with the timer that new responses will be automatically removed.

  • I have changed the settings to some categories so that the search engine bots cannot index the content, which is only for registered users. (Categories sensitive to search engines only).

What else could be done?

Any idea is appreciated


Good question

It would be useful to have an understanding of what kind of community you have. My thoughts differ between support communities and CoPs (for example).


Thanks for your reply Sarah. I know that you are a consultant in digital communities so your experience in the field would be of great help. It would be great to know your opinion for both types of communities.

In my case, since 2008 I dedicate myself to consulting in ERP, exactly the SAP system, so I have been providing information and support to companies and end users for many years, who like me, once started from scratch with the system. And over time we have developed some kind of addiction for the consulting profession.

I usually manage support communities, this project is the largest I have. I have used other systems and I attest that Discourse has exceeded all my expectations. That is why I want to enhance it, and maintain ideas for debugging and cleaning the content, to offer my readers a quality space and fresh information.


Cool. In the case of support communities it’s reasonably cut and dried, IMO. Delete content that is no longer up to date or relevant.

I would assume that any deprecated advice has been replaced by an updated topic so people searching will still find a result.


Well right now I’m fighting with this “bots” …

OMG, why so many pageviews from bots!

I’ve already read this: How to block all crawlers but Google's
But… I’m in trouble here… is it ok to block “all” crawlers (except Google bot) ?


You can do what you like but I’d be careful here. There are other legit bots.

Why don’t you dig into the crawler report and see which ones are causing the most noise?


Yes, of course, I forgot to clarify that I am doing this cleaning management; crawler by crawler, almost weekly analysis to detect which one is the most invading the website, and I block it one by one.

What I’m afraid of is that adding so many bots to the blacklist will somehow affect the performance of the site.

So I thought that maybe instead of blocking one by one, the best thing could be to add the essential bots to the whitelist and block all the others. But … essential trackers, do they exist?

I searched the forum to see if there is a debate “dedicated” to essential crawlers but I could not find it. If you know of a topic related to this, please let me know.

1 Like

If your site is public and SEO has value to you then any bot which adds your data to a useful index is “essential”. Look at your sources of traffic and compare that to the bots, is there any correlation?

A bot whitelist may be the better solution here, right?


If 100% is the entire organic source, in the last 30 days:

My last blocked crawler was:
The responsable of the hits on June 22. More than 20.000 pageviews from that bot :angry: