Making DOI Ready PDFs from Discourse

How do I download a topic as HTML or a PDF file? I’m a bit lost on how to do this. Any help?

1 Like

The system isn’t designed to be ready-to-print as you lose the context, but you could use the print shortcut Ctrl+P and save it from there?

4 Likes

I see! That works.

Though is there a way to improve on this? This seems to be an important thing to enable. Is there a way to enable download as a single webpage or webpage archive?

Why important? What’s your use case for this? I’m intrigued

1 Like

This has been discussed at length already. FWIW print to pdf solves it for me in most cases.

https://meta.discourse.org/t/print-long-topic-to-pdf-redux-again/44639?u=tobiaseigen

5 Likes

I’m trying to get a DOI for a forum thread, because this is a conversation of high value. I want it to be cited. For that reason, I need to get it distilled down into a semi-professional PDF to submit to Zenodo.

https://zenodo.org/

This is an increasingly important thing to support. As academics are allowed to cite anything with a DOI. In fact, having facilities to ease DOI generation workflows would be really useful.

4 Likes

Yes, that is an interesting use-case. I can’t quite get my head around how you present either all or just a selection of posts in a topic in this way as opposed to simply using the permalink for the topic.

As a matter of interest, does it have to be a PDF in this case? Is a topic not just as much of a ‘digital object’? (I’m not an academic, as you can tell).

I guess I can see how this MIGHT work as a plugin for communities around research topics. You can search this site for conversations around the topic of printing and exporting (such as the excellent link from @tobiaseigen above) and maybe there would be folks to collaborate with to develop the concept further?

1 Like

There’s already a solution

Print to PDF, which is well understood and supported on every operating system.

1 Like

I’d argue that you’d want to create the DOI for the thread itself, not a static and out-of-date. I thought that Zenodo would let you do that, but FWIW, you can just append /print to the end of a topic URL to get the print version.

2 Likes

This is all helpful. I just blogged about the use case I am after here:

The challenge I’ve now seen from using this is several fold.

It does have to be a static file that can be uploaded to Zenodo. The “promise” behind DOIs is that they will always be accessible in perpetuity, and immutable. Uploading to Zenodo guarrantees this, but we have to upload. It need not be a PDF. A well packaged HTML file could work too, but PDF’s are viewable on the website. The key thing is that it must be static.

That would be a great feature. Though it seems downstream from some other problems. Here is what I see are the big issues:

  1. Links are not handled well, nor are crosslinks.
  2. Most of the problems are hackishly solvable with custom css added to the template.
  3. Links should ideally be handled as footnotes, but this is not easy to do with standard CSS. I should be done on the backend.
  4. It is currently untested on things like “spoiler” tags or Latex.
  5. There is a ton of nuisance elements that need to be removed (such as cross link indicators after posts). If someone wants to help with this, I can produce a definitive list.
  6. Process is pretty manual. Ideally there is an Admin button one could click which would interface directly with the Zenodo API, including getting the versioning correctly, etc. This admittedly is a lot of work. So I imagine will take a while.

I can provide the CSS I am using if someone wants to make it default. All it does is add some CSS in print and page css sections to clean things up for print. That is probably the easiest thing to do right now, perhaps adding some configuration options to discourse to tune it. Is there interest in that?

The HTML link to footnote is a real problem that does need to be addressed. It is most likely solvable only on the backend, as CSS is not set up for this. That, it seems, it the most important hard problem to solve quickly.

Is this something anyone can help with?

3 Likes

A couple other features that would be really powerful and useful, and eaiser than full Zenodo Integration:

  1. Add a button to click for users to request a DOI of a topic.
  2. Once a DOI is assigned, have a way for admins to display the DOI alongside the title.
  3. Add markup code to identify and automatically format DOI’s correctly like this: DOI.
2 Likes

I know this might be work, but if its done, I think it could create some good press for Discourse. I think there is important intellectual work happening in some discourse communities, and there is growing interest in alternate publishing platforms in academia.

1 Like

One more critical challenge is how to deal with author handles. Ideally, the cited people are included at the bottom of the pdf. One again, this is only solvable on the backend, but seems imminently doable.

This should be pretty doable in a plugin. Were I still an academic I would probably give it a shot.

2 Likes

The good old arc90 Readability bookmarklet reformatted links to footnotes in client-side Javascript. It no longer works but there are copies on the web since it was open source:

1 Like

I’m displaying my ignorance here, how would I use this?

I think in principle you could try this out by pasting something like this in the Javascript console (warning: don’t paste things in the console that you don’t understand):

var s = document.createElement('script');s.type='text/javascript';s.src='https://rawgit.com/cristiandouce/4493741/raw/d0fd4f71f3ecd272a23088901eab9a7170f78270/readability.js';document.documentElement.appendChild(s)

It doesn’t seem to immediately work on Discourse content, so it’s going to need more work. But the addFootnotes function could be a useful starting point.

1 Like

You would instead use the Discourse API to register a decorateCooked callback and run that JS on each post as they get rendered – but this won’t work for the print view.

I’ve been struggling with a solution to downloading threads and I’m afraid this is not a solution. Replies to comments are no longer nested under the relevant comment, but are listed in chronological order along with all comments. It makes it impossible to meaningfully use as far as I can see.

If there is a solution I’d love to see it. I’m about to go through the very painful process of expanding replies and copying and pasting, a few at a time for 1.2K comments, plus their replies. If only I could avoid doing this!

1 Like

Oh, I see, you want the content reformatted in a specific way – that’ll take an external program or plugin.

Discourse isn’t a threaded discussion system, so displaying the replies to the topic in chronological order, as the user interface does, is the only built in method of displaying topic content.