Bot writer's tip: Processing every post

:information_source: Note: This guide assumes you are operating an authorized bot on a Discourse forum, potentially using the User API or an Admin API Key. If your bot is blocked by the admins, discuss the purpose of your bot with them and do not attempt to circumvent that block.

:information_source: Would your bot be better if it was run on the server? Consider creating a plugin instead: Beginner's Guide to Creating Discourse Plugins - Part 1

Introduction

This guide will present an algorithm for a bot user to inspect and process every post made on a Discourse forum that the bot’s user is allowed to access (aside from private messages).

You will need durable storage for a single integer, the highest successfully processed post ID. For example, you could write this to Redis or to a plaintext file. Using Redis will allow you to persist message bus subscriptions across process restarts on your end.

It is highly recommended to create a brand-new user account for the bot, so that it can be added to groups and private messages as necessary. Avoid using the @system account.

The following set of algorithms is written in an imitation of the WHATWG specification style, and gradually builds up to the algorithm to continuously monitor for new posts.

Algorithm Specifications

Let the forum base URL be the URL to the site with no trailing slash - e.g. https://meta.discourse.org or https://www.contoso.com/forum for subfolder installations.

Fetch the next recent posts

To fetch the next recent posts given an integer highest seen post ID, and a flag triggered by message bus, run these steps:

  • Let maximum response post ID be the result of adding fifty (50) to the highest seen post ID.
  • Let request uri be the concatenation of the forum base URL, /posts.json, ?before=, and the maximum response post ID.
  • Let response be the result of :satellite: fetch JSON respecting rate limits with the request uri and the credentials.
  • If response is an HTTP error, abort these steps with an error.
  • Let posts be the JSON array at path latest_posts inside response.
  • Let the new posts seen flag be unset.
  • For each JSON object post in posts in reverse order, execute these steps:
    • Let post ID be the JSON number at path id inside post.
    • Set highest seen post ID to post ID.
    • Set the new posts seen flag.
    • :white_check_mark: Emit post. (:information_source: In other words: Send the post to whatever custom processing you want to perform.)
    • If emit returned a backpressure signal, break this loop.
  • :information_source: The above loop executes in reverse order so that your code sees the oldest posts first and the newest posts last.

  • If the new posts seen flag is set:
    • Execute the steps for :floppy_disk: persisting state to storage with highest seen post ID.
  • End these steps returning the highest seen post ID.

Probe a high existing post ID

To probe a high existing post ID, execute these steps:

  • Let latest probe request uri be the concatenation of the forum base URL and /posts.json.
  • Let latest probe response be the result of :satellite: fetch JSON respecting ratelimits with the latest probe request uri and the credentials.
  • If latest probe response is an HTTP error, abort these steps with an error.
  • Let probe posts be the JSON array at path latest posts inside latest probe response.
  • For each JSON object post in probe posts:
    • Let post id be the JSON number at path id inside post.
    • End these steps, returning post id.
  • Abort these steps with an error.

Backfill from latest

To backfill from latest given an optional integer highest seen post ID, execute these steps:

  • Let minimum post ID be the highest seen post ID if present, and zero (0) otherwise.
  • Let high existing post ID be the result of probe a high existing post ID.
  • If maximum post ID is an error, abort these steps with an error.
  • Execute the steps to backfill given the minimum post ID and the high existing post ID.

Backfill

To backfill given two integers minimum post ID and high existing post ID:

  • Let current minimum post ID be minimum post ID.
  • Repeat these steps:
    • Execute the steps to fetch the next recent posts given the current minimum post ID and an unset triggered by message bus flag.
    • If the steps to fetch the next recent posts did not complete successfully:
      • Update the exponential backoff algorithm with a failure signal, and wait the specified amount of time.
      • Continue to the next loop iteration (without updating the the current minimum post ID).
    • Let candidate maximum response post ID be the result of adding fifty (50) to the current minimum post ID.
    • If the candidate maximum response post ID is greater than or equal to the high existing post ID, :white_check_mark: end these steps.
    • Set the current minimum post ID to the candidate maximum response post ID.

Continuously monitor for new posts

To continuously monitor for new posts, execute these steps:

  • Let highest seen post ID be an unset optional integer.
  • Set highest seen post ID to the result of :arrow_forward: restoring state from storage.
  • If highest seen post ID is unset:
    • Set initial post ID to the result of probing a high existing post ID.
    • Execute the steps for :floppy_disk: persisting state to storage with initial post ID.
    • Set highest seen post ID to initial post ID.
  • Set notifications to the result of executing the steps to :satellite: subscribe to the message bus, with a channel of /latest.
  • Execute the following steps repeatedly:
    • Set new highest seen post ID to the result of fetching the next recent posts, with the triggered by message bus flag set if a message bus update occurred, and the highest seen post ID.
    • If the steps to fetch the next recent posts did not complete successfully:
      • Update the exponential backoff algorithm with a failure signal, and wait the specified amount of time.
      • Continue to the next loop iteration.
    • If the new highest seen post ID is different from the highest seen post ID:
      • Update the exponential backoff algorithm with a success signal.
      • Set the highest seen post ID to the new highest seen post ID.
    • Wait for a message on notifications or for an implementation-defined timeout to occur. This timeout must be no shorter than 10 minutes and may reasonably range up to 24 hours or slightly higher.

Algorithms you need to provide:

  • :satellite: fetch JSON respecting rate limits, taking a request uri and optional credentials.
    • This must automatically back off and retry using the exponential backoff algorithm and/or the server-provided Retry-After information when presented with a 429 error.
  • :arrow_forward: restoring state from storage
  • :floppy_disk: persisting state to storage, taking an integer
  • :satellite: subscribe to the message bus

14 Likes