How to allow user-agent access to private discourse?

Joey_Tuan · November 13, 2015, 11:49pm

I’m trying to allow Swifttype’s crawler (Swiftbot) to crawl our private discourse instance and they’re telling me I need to allow authentication access on discourse itself. Except I’m finding no documentation for such.

For context, I have discourse as the forum for my main site, I’m trying to add a site-wide search so users can search for discourse content from anywhere on my site. And this is what swifttype plugin does.

Is there any way to do this within docker on digitalocean?

Mittineague · November 13, 2015, 11:56pm

Wouldn’t this be done in your sites robots.txt file?

Joey_Tuan · November 13, 2015, 11:59pm

I wanted to try that first. Can we edit the file?

Mittineague · November 14, 2015, 12:09am

AFAIK there should be no need to as long as your Security Setting is
allow_index_in_robots_txt: true

I’m wondering if it isn’t so much a crawl thing and that they need to be allowed in the robots.txt file or if it is a widget permissions thing.

Joey_Tuan · November 14, 2015, 12:11am

In this case, our discourse is actually private and not indexed at all by search engines, which is why we need to make an exception somewhere for this specific user-agent Swiftbot.

Mittineague · November 14, 2015, 12:29am

I think this would need to be a plugin.

It might be easier to put the Discourse Search feature into your non-Discourse pages.

codinghorror · November 14, 2015, 12:40am

I don’t think we’ve ever encountered this situation before. You would need to authenticate the crawler as a regular user, probably through the Discourse API. Do we have any examples of this @techapj?

techAPJ · November 15, 2015, 6:48pm

Yes we do!

To crawl the topics via Discourse API just provide your API key and username, and the requests made will be authenticated as regular user. For example:

github.com

discourse/discourse_api/blob/main/examples/example.rb#L9


      
          # frozen_string_literal: true
          $LOAD_PATH.unshift File.expand_path("../../lib", __FILE__)
          require File.expand_path("../../lib/discourse_api", __FILE__)
          
          config = DiscourseApi::ExampleHelper.load_yml
          
          client = DiscourseApi::Client.new(config["host"] || "http://localhost:3000")
          client.api_key = config["api_key"] || "YOUR_API_KEY"
          client.api_username = config["api_username"] || "YOUR_USERNAME"
          
          # get latest topics
          puts client.latest_topics

To achieve the same via cURL request, pass api_key and api_username as param. For example:

curl -X GET -d api_key="API_KEY" -d api_username="ADMIN_USERNAME" http://discourse.example.com/latest.json
```

Both of the above examples will fetch all the latest topics (even for private Discourse instance).

@Joey_Tuan for more detailed API documentation, please see this topic:

https://meta.discourse.org/t/discourse-api-documentation/22706?u=techapj

codinghorror · November 15, 2015, 10:52pm

Do NOT use the key of an admin though. That would be incredibly dangerous! Just use the key of a trust level zero user. If the key gets out there, anyone can use it.

Topic		Replies	Views
Robots.txt to completely block all indexing of private site Support	5	835	April 5, 2020
Can't login after removing all bot user agents from site setting Support	6	867	November 24, 2017
How to protect myself from bots crawling my Discourse instance? Support	6	1581	January 17, 2022
Menu, title collumn, and a lot a stuff are gone Support	15	1081	December 11, 2017
Needing to edit robots.txt file - where is it? Support	42	7475	April 29, 2023

How to allow user-agent access to private discourse?

Related topics