API: Getting all posts in a topic

pfaffman · 14 مارس 2016، 2:30م

I’m working on a script that will evaluate participation in a discussion and produce a number based on how many messages they wrote, likes, and replies and such that will become students’ grade for “participating” in a discussion.

/t/blah/TOPIC_ID.json returns only 20 posts. Is there a way to get all of them, or will I need to do a request for all of them?

I looked a little at what gets passed to poll, but it wasn’t immediately apparent that I could somehow pass it something like a range or number of posts that I wanted.

pfaffman · 14 مارس 2016، 2:47م

Maybe the easiest way is to get the data I want from the data explorer plugin.

Now I’m thinking that it would be cool to write a plugin that showed people’s scores next to their profile pic in their posts in that topic.

blake · 14 مارس 2016، 3:32م

When you GET /t/blah/TOPIC_ID.json the output will also contain a stream array that has all the id’s for the topic.

You then can call: /t/blah/TOPIC_ID/posts.json?post_ids[] and pass in all the ids from the stream array.

pfaffman · 14 مارس 2016، 3:34م

Thanks! And the problem with data explorer is that there is no way (that I can see quickly) to pass in the topic_id that I want.

blake · 14 مارس 2016، 3:43م

Also its probably a good idea to still break up fetching all the posts into multiple requests rather than 1 big one. So if you have a topic with 100 posts. You should break it up into 5 smaller requests fetching 20 posts at a time.

jandres · 19 سبتمبر 2018، 4:34م

Hi @blake @pfaffman

I don’t know why i’m getting a response from /t/blah/TOPIC_ID.json with only the last 10 posts.
I see the chunk_size parameter (in response) set to 10, but i don’t know why is like that or where can i change this value.

Thank you very much

pfaffman · 19 سبتمبر 2018، 4:41م

I once wrote code that downloaded all of the posts in a topic.

I don’t know if the code still works, but it looks like how many posts you get is harder to predict than you might think. You pass the first post-id to control what posts you get.

See https://github.com/pfaffman/discourse-downloader/blob/master/discourse-downloader#L69

blake · 19 سبتمبر 2018، 5:28م

The best way to download all the posts to a topic via the api is to mimic what Discourse is doing in the web browser so please checkout How to reverse engineer the Discourse API for details. Basically go the the topic in the browser open up your dev tools and look at the xhr requests as you scroll through the topic.

Here are the steps to download all the posts in the topic via the api:

Hit /t/-/{id}.json. This will contain a ‘posts_stream’ hash that contains a ‘posts’ array and a ‘stream’ array. The ‘posts’ array will give you the first 20 posts.
Now you need to loop through the ‘stream’ array which gives you all of the post ids in the topic. Remove the first 20 post ids from the stream (otherwise you are re-downloading them for no reason).
In chunks of 20 pass in all the post_ids to /t/{id}/posts.json like this:
http://localhost:3000/t/8/posts.json?post_ids[]=46&post_ids[]=47&post_ids[]=48&post_ids[]=49&post_ids[]=50&post_ids[]=51&post_ids[]=52&post_ids[]=53&post_ids[]=54&post_ids[]=55&post_ids[]=56&post_ids[]=57&post_ids[]=58&post_ids[]=59&post_ids[]=60&post_ids[]=61&post_ids[]=62&post_ids[]=63&post_ids[]=64&post_ids[]=65

jandres · 19 سبتمبر 2018، 5:45م

Thanks @blake and @pfaffman for your soon response.

I agree with @blake steps for get all posts in a topic.

According with step 1
I just wanted to know if there are any parameter (maybe in the header of request) to set the chunk_size in /t/blah/TOPIC_ID.json request, because if i made a request from POSTMAN i get the first 20 posts as described previosly, but if i made the request using my web app using angular, i just get the first 10.

So i think there are something in the request that change the response from discourse server.

I use this request because from her i can get the post stream and the first 20 posts in one request. So i take her like my first request base to get all post in a topic.

I know this question is not critical, i can figure out to get my solution using multiple request. I am just curious to know why

blake · 19 سبتمبر 2018، 6:52م

For some reason your angular app is triggering slow_chunk_size.

github.com/discourse/discourse

lib/topic_view.rb

main


      
          if @preload
            @preload.delete blk
            @preload = nil if @preload.length == 0

from

github.com/discourse/discourse

app/controllers/application_controller.rb

main


      
            response.headers["Discourse-No-Onebox"] = "1" if SiteSetting.login_required
          end

So that might be something to look into.

There is not a chunk_size parameter you can set, but if you pass in print=true like /t/-/{id}.json?print=true it will set the chunk size to 1000.

jandres · 19 سبتمبر 2018، 6:54م

Thank you so much

That is the trick. I am running my app (actually is an ionic v3 app) from chrome dev tools on android device and i alwasy get the first 10. When i swtich to browser mode, i get the 20.

benraay · 31 مايو 2019، 3:20م

this saved my life !

lisbethw1130 · 2 أغسطس 2019، 3:59ص

print parameter just saved me

bjornekstrom · 20 يوليو 2020، 3:57م

The ?print=true command is great, indeed! It seems however that there is a rate limit for ?print=true commands of five calls per hour. Is there a way to make more API calls per hour?

fifi_joon · 16 أبريل 2022، 12:36م

هذا الحل لا يعمل على “الموضوعات الضخمة”. هل لديك أي حلول لها؟

pfaffman · 16 أبريل 2022، 7:54م

إذا لم يكن الأمر كذلك، فستحتاج إلى إجراء طلبات متعددة لاسترداد الباقي. كم عددها التي يتم إرجاعها؟

polv · 21 أبريل 2022، 7:27م

بينما تعمل ?print=true (وكذلك &page=2)، يبدو أن هناك حدًا للمعدل أكثر من عدم وجود print=true. أتساءل كم عدد الطلبات التي يمكنني إجراؤها، واعتبارها آمنة، لتجنب الوصول إلى الحالة 422.

أحاول قراءة حوالي 9000 مشاركة، لذلك مقابل 20 مشاركة في المرة الواحدة، ستكون إما بطيئة جدًا، أو سيتم تحديد معدلها…

pfaffman · 21 أبريل 2022، 9:08م

أوصي بأن تكتب الكود مع توقع حدوث تحديد للمعدل.

polv · 22 أبريل 2022، 1:02ص

إنه في UserScript الخاص بي. تمييز اللغة لـ typescript لا يعمل؛ و js يعمل بشكل غريب. يجب أن يكون javascript. يعمل كل من ts و typescript في StackOverflow، ومع ذلك.

interface IPost {
  id: number
  username: string
  post_number: number
  cooked: string
}

interface ITopicResponse {
  actions_summary: {}[]
  archetype: string
  fancy_title: string
  title: string
  post_stream: {
    posts: IPost[]
    stream: number[]
  }
  posts_count: number
  reply_count: number
}

export async function jsonFetch<T>(url: string): Promise<T | null> {
  const r = await fetch(url)
  if (r.ok) {
    const json = await r.json()
    if (!json.errors) {
      return json
    }
  }

  logger('error', r)
  return null
}

export async function fetchAll(urlBase: string) {
  const r0 = await jsonFetch<ITopicResponse>(urlBase + '.json?print=true')
  if (!r0) return []

  const posts: IPost[] = r0.post_stream.posts
  let page = 2
  while (posts.length < r0.posts_count) {
    const r = await jsonFetch<ITopicResponse>(
      urlBase + '.json?print=true&page=' + page++
    )
    if (!r || !r.post_stream.posts.length) {
      break
    }
    posts.push(...r.post_stream.posts)
    await new Promise((resolve) => setTimeout(resolve, 1000))
  }

  return posts
}

fetchAll('https://community.wanikani.com/t/16404').then(console.log)

في نص ts-node باستخدام Axios،

(node:1102374) UnhandledPromiseRejectionWarning: Error: Request failed with status code 422

إذا انتظرت وقتًا طويلاً، مثل 10 دقائق، فسيفشل في الصفحة 2؛ ولكن إذا كررت الآن، فلا يمكنني تحميل أي عنوان URL.

وبدون print يعمل بشكل جيد.

في الواقع، لقد نجحت في جعله يعمل عن طريق تجنب print=true.

export async function fetchAll(urlBase: string) {
  const r0 = await jsonFetch<ITopicResponse>(urlBase + '.json');
  if (!r0) return [];

  const stream = r0.post_stream.stream || [];
  const chunks: number[][] = [];
  while (stream.length) {
    chunks.push(stream.splice(0, 300));
  }

  const posts: IPost[] = [];
  let isContinue = true;
  while (chunks.length && isContinue) {
    const rs = await Promise.all(
      chunks
        .splice(0, 10)
        .map((ids) =>
          jsonFetch<ITopicPostResponse>(
            urlBase +
              '/posts.json?' +
              ids.map((id) => `post_ids[]=${id}`).join('&'),
          ),
        ),
    ).then((rs) =>
      rs.map((r) => {
        if (!r) {
          isContinue = false;
          return [];
        }
        return r.post_stream.posts;
      }),
    );

    rs.map((r) => {
      posts.push(...r);
    });

    if (chunks.length) {
      await new Promise((r) => setTimeout(r, 1000));
    }
  }
  posts.push(...r0.post_stream.posts);

  if (!isContinue) {
    logger(
      'error',
      `Total posts: ${r0.posts_count} != real count: ${posts.length}, due to Rate Limit?`,
    );
  }

  return posts;
}

Daniil_Bazhenov · 11 يناير 2023، 12:36م

لقد واجهت مشكلة في أن 'print' => true لا تعمل، ولكن 'print' => 'true' تعمل.
PHP Guzzle.
ربما يجب عليك إضافة معالج print = 1 أيضًا.

الموضوع		الردود	مرات العرض
Fetch All Posts from a Topic Using the API Integrations rest-api , how-to	4	2667	17 ديسمبر 2024
Get all posts from topic Dev rest-api	1	1311	19 سبتمبر 2018
API can pull only 20 posts Dev rest-api	8	1859	15 أكتوبر 2020
Api to fetch topic by page Dev rest-api	6	1850	10 يونيو 2019
Getting all posts in a topic Dev rest-api	3	110	21 مارس 2025

API: Getting all posts in a topic

الموضوعات ذات الصلة