How Does Regenerating Summaries Work?

Can someone please explain the rules for regenerating summaries? There was some discussion that staff can regenerate summaries after one hour, but that’s not what I’m seeing. (Not sure what I’m seeing- it seems inconsistent.) If there is a new post, should it offer the option to regenerate? And to anyone, or just staff?

After experimenting, here’s what I see:

  • For topics with < 100 posts, the regenerate button is enabled immediately for staff only
  • For topics with > 100 posts, the regenerate button is not enabled, even after waiting an hour

Even if the regenerate button did enable, it’s not a viable workflow for our staff to continually hit the buttons, so I tentatively plan to implement a webhook listening for new posts that regenerates the summary using https://forum.example.com/discourse-ai/summarization/t/12345. Quick calculation says that would run around $500/year for our forum. I realize Discourse is trying to protect against an unexpected large cost.

Hey @markschmucker,

We are currently working on a backfill strategy for Topic summaries in DiscourseAI. We plan to ship it next week, and I will update you here when it lands.

4 Likes

We have the feature now @markschmucker, you can adjust it via the ai summary backfill maximum topics per hour site setting.

I’ve set that to 12 and on my Anthropic dashboard I see an event every 5 minutes, as expected. But I can’t see the payload from that dashboard. How can I see what summary Discourse is regenerating?

It is all stored in the ai_api_audit_logs table. If you have Data Explorer, you can use the following query:

SELECT
  *
FROM
  ai_api_audit_logs
ORDER BY
  id DESC
LIMIT
  100

After playing with that, it looks like backfilling generates summaries for some of the Latest topics (looks like around 100 topics or topics that were updated in the past few weeks?).

During or after the backfill cycle, if a topic with a summary gets a new post, its summary is not automatically updated. (If it has < 100 posts, there is a “Regenerate” button that staff can hit manually.)

When should the topic with a new post get updated?

It should be updated in up to 5 minutes after a reply is posted, at least for normal topics with under 50 replies.

Is your community more geared towards mega-topics?

cc @Roman_Rizzi

Now I see there was a failure in the SummariesBackfill job- I hit a daily rate limit at Anthropic. That might be why it appeared to stop after maybe the 100 Latest topics, and also why the updated topic did not get a new summary.

So if I did not hit a rate limit, will SummariesBackfill summarize all 60,000 of our topics? Even those that have been inactive for years?

Most topics have more than 100 replies. We have 8 topics with over 1000 replies.

Yes, it starts every 5 minutes to do a batch, prioritizing the most recent active and skipping ones who already have an up to date summary.

If you configure your max limit per hour to be higher than how many topics with new activity you have per hour, on average, it will eventually backfill all your topics.

1 Like

Feature request: something like ai summary backfill maximum age, so we don’t incur a lot of cost summarizing old topics with no activity say in the last six months. I estimate it would cost us $3,000 to summarize all 60,000, most of which we don’t care about.

cc @Roman_Rizzi

Just confirming here are you using haiku 3.5, it should produce good summaries

I’m using sonnet 3.5, for no good reason. Haiku 3.5 should cut the cost a lot. I think the feature request is still worth considering though.

1 Like

yeah certainly.

I feel like we can possibly create a fancy backfilling algorithm in automation, cause there are a lot of knobs you may want to twiddle with beyond age.

  • Only these categories
  • Stuff with more than X views
  • Stuff with more than N likes
  • Stuff with accepted answers
  • Stuff newer than X

Adding 10 site settings for this will overwhelm users.

2 Likes

I agree with Mark. If we are to implement this for our forum, we dont want old posts getting resummarized that dont have new activity. AI is too costly at the moment considering summaries are just one small part of all the AI tools.

1 Like