Looking for a "site health"/link check feature

We just started using Discourse about 2 months ago and so we’re still learning. I’m looking for a way to test all the user-generated content still links to valid locations. I’m mostly interested in links, images, and downloads. Basically, give me a report of outbound 404s.

I’ve looked through meta, the plugins directory, and the API docs and I’m not finding exactly what I’m wanting. I’m not sure that a plugin exists or if I need to cobble something together using the API.

Does such a tool exist?

1 Like

For images we cover this out of the box by downloading hot-linked images locally, which ensures images will never break.

For links you will need a custom plugin.

4 Likes

Thanks for the confirmation. We’ll start on that.

Good to know about the images. Presumably that goes against our storage cap (hosted version).

1 Like

If you’re hosted (and not on Enterprise), then you likely can’t use a plugin. You might be better off cobbling something together with the API. You can start with a data explorer query that returns, say, the post ID and the URL. And then you could do something like have it check the URLs and maybe flag the post if the URL goes south.

1 Like

Hi Jay,

We’re hosted by CDCK and on an enterprise plan. But, I’ll look at your suggestion as well.

I try to keep link rot to a minimum in our developer forums without damaging the overall usefulness of posts. Sometimes the underlying technology is deprecated or removed, which can’t be helped.

Thanks

4 Likes

Hi Eric,

As a hosted customer you get a bit of special treatment here :hugs:

What you can do today is use a data explorer query (which is a plugin we support on both business and enterprise) to ask us about the most recent links in posts:

SELECT url, post_id, click
FROM topic_links
WHERE not internal
ORDER BY post_id DESC
LIMIT 100

Will list the last 100 external URLs linked in posts. This can be downloaded as a CSV and you can then run a link validation tool on the result and flag problem posts.

What more we track clicks on links, so you could sort this by click count or exclude stuff that was clicked zero times.

Would that help you solve your problem?

6 Likes

Hi Sam,

I think that would work. We won’t have a huge volume of traffic, but I don’t want link rot to set in over time. Especially on links back into our corporate site(s).

1 Like