Interact with discourse from Python?

Thank you so much! Yes I’ll do this! I’m specifically looking for pageviews (logged in users, anonymous users, crawlers) but I can’t find it in the API documentation. Any pointers?

Some of the admin-specific calls aren’t in the API docs

I would open the network tab, go to the admin page, view the report with the data you want to retrieve, and then check the network tab to see what the browser loaded.

Which is really a summary of Reverse engineer the Discourse API

What I would do is use the data explorer plugin to get whatever you want and then you can pull that down with the API. Run Data Explorer queries with the Discourse API

Absolutely; if you want data differing from what’s already on offer in the admin panel, DE is the way to go.

It also gives the guarantee those queries won’t return different data after an update, BUT also the underlying structures may change and you may need to maintain the query.

Tradeoffs either way.

Thank you both! I got away with the “reverse engineer" method + API key! Thank you so much!

A bit late to this conversation (well, the extension of it :p), but I also wanted to pull data from a discourse forum and didn’t want the hassle of setting up an API key, if you (or anyone) wants a simple wrapper to pull posts from any discourse forum you can check it our here

Released on PyPi so easy to install with pip/uv, handles rate limiting for you and is typed with Pydantic (makes for a better DX imo). Usage:

from discourse_reader import DiscourseClient

client = DiscourseClient("https://meta.discourse.org")

# Browse categories
for cat in client.categories():
    print(f"{cat.name}: {cat.topic_count} topics")

# Get a topic with all its posts
topic = client.topics.get(12345)
print(topic.title)
print(topic.opening_post.cooked)       # the original post (HTML)
print(topic.accepted_answer)           # accepted answer or None
for reply in topic.posts.replies():
    print(reply.username, reply.cooked)

Not as extensive as pydiscoursebut that’s intentional since it works without an API key, it also definitely won’t offer better or faster data than the data explorer plugin but I think it’s nice if you just wanna quickly pull a batch of threads of simple site statistics :slight_smile:

I get the impression that this approach might violate the terms of service for this forum and the default terms of service for Discourse forums.

You may not automate access to the forum, or monitor the forum, such as with a web crawler, browser plug-in or add-on, or other computer program that is not a web browser. You may crawl the forum to index it for a publicly available search engine, if you run one.

Hmmm. I don’t think I’m doing anything special beyond simply wrapping what would otherwise be a simple curl request to any of the publicly documented API endpoints. However, if the @Discourse team takes any offense to what I created please let me know.

Personally, I don’t think the package itself violates any ToS since the responsibility of respecting a forums’ terms will always be with the dev using the tool. This package only hits public and documented API endpoints, if a developer has malicious intent to scrape or monitor a forum, this would honestly already be a trivial task.

On that note, pydiscourse offers the same functionality, the only difference being the need for an API key (I don’t know how easy this is to do as a regular user), after which it can similarly be used to violate the ToS of any forum. So if the default rule is to not automate access to the forum, wouldn’t pydiscourse and discourse2 not also violate ToS? discourse2 even advertises access to publicly accessible data in their list of features if no API key is provided:

Works in both server and browser* environments (*useful for querying public data without API keys and on relevant origin, e.g. latest topics, etc)

There are probably a lot more packages out there in other languages that already support this type of access.

Some more context: I built this so I can easily pull data from a forum that one of our customers host (but we don’t have direct DB access). It just makes my workflow cleaner and my hope is to assist others that are in the same situation.

The thing is that generating an API key first needs access to the Admin interface (Admin > Advanced > API keys), so giving one an API key would be something the Admins want to do; not any regular user can get one.

Yeah if the only way to get an API key is from the admin interface, then this package could simplify violating a specific forums’ ToS.

Though I still want to discuss some of the other points I made, and hear other peoples’ thoughts on those, namely: Anyone could already trivially scrape/monitor with curl or requests. Shouldn’t the responsibility lie with that developer to not violate the ToS? Or should it lie within the tools they used itself?

For discourse2 and similar packages, they are more broadly purposed, but discourse2 does still advertise the ability work on public endpoints if no API key is provided. Does that enable ToS violation to the same degree?

Also, since discourse is GPLv2, does the creation of a tool like discourse-reader inherently violate any terms directly?

Curious to hear other people’s thoughts on these.

The official discourse_api ruby gem also supports accessing public data without an API key. So I think it’s fine for the tooling to exist. It’s up to users to ensure they’re complying with any forum-specific ToS.

(that’s my personal opinion - not an official legal statement from CDCK :sweat_smile:)

It’s also worth noting - unauthenticted ‘bot’ requests are subject to much stricter rate limits, and potentially other ‘bot protection’ security layers (e.g. Cloudflare). So if you can, it’s always best to use an API key.

Thanks for the response! For the time being, I updated the README in my package with a disclaimer to respect the ToS of whatever site a dev might want to use it against.

I was not aware of this default ToS rule at all when I made this, hopefully anyone who looks to use this package also learns about it in the future as well :slight_smile:

Yeah, this straight up echoes the arguments for VCRs… a while ago. Similarly, lockpicks. There exist legitimate and illegitimate uses of tools and it’s upon the operator to be response.

Again, IANAL and this is not an official statement, but I feel this accurately represents our perspective on this:

There’s a big difference between well-intended exploring with a tool (e.g.) and setting up automation.

We’re not going to get grumpy with people using meta with tools like this especially if they’re developing functionality or learning how to interact with the Discourse API. We’ll encourage it, as long as you’re not bulk-scraping data, incurring undue load, or degrading others’ experience.