How to allow user-agent access to private discourse?


#1

I’m trying to allow Swifttype’s crawler (Swiftbot) to crawl our private discourse instance and they’re telling me I need to allow authentication access on discourse itself. Except I’m finding no documentation for such.

  • For context, I have discourse as the forum for my main site, I’m trying to add a site-wide search so users can search for discourse content from anywhere on my site. And this is what swifttype plugin does.

Is there any way to do this within docker on digitalocean?


(Mittineague) #2

Wouldn’t this be done in your sites robots.txt file?


#3

I wanted to try that first. Can we edit the file?


(Mittineague) #4

AFAIK there should be no need to as long as your Security Setting is
allow_index_in_robots_txt: true

I’m wondering if it isn’t so much a crawl thing and that they need to be allowed in the robots.txt file or if it is a widget permissions thing.


#5

In this case, our discourse is actually private and not indexed at all by search engines, which is why we need to make an exception somewhere for this specific user-agent Swiftbot.


(Mittineague) #6

I think this would need to be a plugin.

It might be easier to put the Discourse Search feature into your non-Discourse pages.


(Jeff Atwood) #7

I don’t think we’ve ever encountered this situation before. You would need to authenticate the crawler as a regular user, probably through the Discourse API. Do we have any examples of this @techapj?


(Arpit Jalan) #8

Yes we do!


To crawl the topics via Discourse API just provide your API key and username, and the requests made will be authenticated as regular user. For example:

To achieve the same via cURL request, pass api_key and api_username as param. For example:

curl -X GET -d api_key="API_KEY" -d api_username="ADMIN_USERNAME" http://discourse.example.com/latest.json
```

Both of the above examples will fetch all the latest topics (even for private Discourse instance).

@Joey_Tuan for more detailed API documentation, please see this topic:

https://meta.discourse.org/t/discourse-api-documentation/22706?u=techapj

(Jeff Atwood) #9

Do NOT use the key of an admin though. That would be incredibly dangerous! Just use the key of a trust level zero user. If the key gets out there, anyone can use it.