Any way to get raw posts in bulk?


#1

Getting a topic view (/t/1234.json, with or without post_ids[]) nets you a post stream which includes the whole “page”, but doesn’t include raw content. Getting a single post view (/posts/123456.json) gives you a different model, which does have raw, but you can only get one post at a time this way. And /raw/1234/56 obviously has the same problem.

So, is there a way I’m missing to get more than one raw post at a time? Cooked is fairly useless for parsing/presentation outside HTML, and banging a hundred or so requests to get a topic view would probably get me banned with prejudice on any forum (not to mention how slow it is).


(Jeff Atwood) #2

Maybe we should add a way to get raw posts in bulk. Not sure if this exists.


(Kane York) #3

I am fairly certain it does not. Would be nice to add.


(Apparently Archetype) #4

I’ll second my request for this.

It would make various projects i’m considering doing for that forum (and possibly releasing here too if they are useful enough and don’t cause excessive server load)

Project ideas that could benefit and that i’m likely to attempt:

  • a console based browser for discourse
  • a native android viewer
  • browser user scripts to make the site more enjoyable to use

all these and more would be made much easier if i had access to the raw post data in /t/1234.json

Additionally AFAICT the difference between the post view in /t/1234.json and /posts/123456.json is that the later has the raw post and the former does not. All other fields appear to behave the same.

I’ve not dived into the code yet (i’m getting ready to, always meant to learn Ruby and this is good reason to start) but that does sort of suggest to my naive mind that /t/1234.json is doing something special to remove the raw post from the model before serializing. so maybe this would have a relatively easy fix?


(Sam Saffron) #5

I am fine to add a query param here, /t/1234.json?include_raw=true

PR welcome


(Apparently Archetype) #6

If you’re looking for one from me it might be a while, i’m literally starting from square one here.

I thank you for the offer, but i don’t think I’m ready to take on even something this minor yet. I’ve got to wrap my head around this project first.

I like the idea of it being triggered via query param. that allows the apps that need/want the raw to pull it without increasing network load on mobile devices that don’t necessarily care about the raw.


(Kane York) #7

https://github.com/discourse/discourse/pull/2938

Example usage:

https://meta.discourse.org/t/any-way-to-get-raw-posts-in-bulk/21615/6?include_raw=1

(Sam Saffron) #8

We need consistency in our api. is =t used everywhere / anywhere? personally I prefer we just go for =1 always.


(Kane York) #9

Well I just wrote it like that because it actually tests for the existence of the parameter, the contents don’t matter. Just like ?expand=1 on github compare views


(Apparently Archetype) #10

that seems reasonable to me, and matches what I’ve seen of other query parameters.


(Apparently Archetype) #11

Looks like this has been applied to meta.d now, noticing odd behavior.

This link includes the raw: Any way to get raw posts in bulk?

This link does not contain the raw https://meta.discourse.org/t/21615/posts.json?post_ids[]=82123&post_ids[]=82154&post_ids[]=82509&post_ids[]=82657&post_ids[]=82658&post_ids[]=82659&post_ids[]=82686&include_raw=1

Can we make a similar alteration to whatever view runs /t/21615/posts.json to make it return raw when asked as well? or is my syntax wrong on the second query?


(Kane York) #12

Hm, guessing it’s a different code path. Give me a bit.

Yup, figures that that was the only route using a different set of code.


(Apparently Archetype) #13

Excellent! Thanks for the quick turn around on this!


(Daniela) #14