Shields.io unable to retrieve Discourse statistics API

Hi everyone,

We’ve had a Discourse shield on our repository for a while now, and it recently stopped working.

If you go to https://shields.io/category/chat and select any Discourse shield, you can enter your discourse domain address and it will show you the shield with the correct statistics. You can try this with meta.discourse.org.

However, when we enter our discourse host address (https://discuss.grakn.ai), for any statistics and for both http/https, it always returns “invalid”.

Discussion Forum

When a host is not found, Shields.io would return “inaccessible”. Thus we assume “invalid” means it’s accessible but there are access rights issues or invalid responses.

Is it possible that a recent update/upgrade broke something on Discourse statistics API that Shields.io uses?

Thank you so much!

It’s working for me in my site, maybe you’re not setting correctly the protocol? Or the Grakn Discourse has any kind on modification that breaks that endpoint.

imagen

2 Likes

You might want to ask Shields.io about that problem. It works with all other sites I tested, so this isn’t our bug.

1 Like

@marianord that’s exactly my question: where are the “protocols” you’re mentioning? How can they be configured? I’ve not changed any settings.

@gerhard given that shields.io is working for other Discourse sites, it does not seem likely to be an issue on their side. Unless they’re re not reading the output from our site statistics properly - but how can we find out about this? What is the endpoint from Discourse that is used to query the statistics? Perhaps we should start there?

I’m mentioning http vs https.

1 Like

This happened because our Discourse installation blocked Shields.io’s user agent (Shields.io). This setting is named whitelisted crawler user agents and can be edited at
<discourse_server>/admin/site_settings/category/all_results?filter=crawler

1 Like

Interesting! Thank you @max_grakn! We did add Googlebot to the whitelist recently, I think that may be the cause.

@codinghorror are we meant to use Blacklist and Whitelist at the same time? As in, if you add things to whitelist, does that mean everything else is blacklisted (which therefore make the blacklist redundant)?

No, the crawler whitelist is very dangerous, and should only be used carefully per the help text.

User agents of web crawlers that should be allowed to access the site. WARNING! SETTING THIS WILL DISALLOW ALL CRAWLERS NOT LISTED HERE!

1 Like