Print long topic to PDF, redux, again

This has been discussed before, yet as I understand it, it is not considered important. The issue is, there’s no convenient method to save a long topic to a PDF file. Personally, I can’t understand why this wouldn’t be considered important.

I’m posting this here because I’m a member of the Keyboard Maestro forum (which obviously uses Discourse), and it was suggested there that someone (me) try to see if I could get some progress here on this forum.

So here I am, bringing up a subject that seems to be destined for the trash bin, or at least the dusty back pile of “yeah, maybe, someday, if someone yells loudly enough”.

(I’m being tongue in cheek here, a little - I understand priorities. But still, SOMEONE YELLING LOUDLY here.) Any hope?

12 Likes

Why does it need to be convenient? The request comes up so rarely. Printing is not not even a 1% of the time task these days.

That, and AFAIK most browsers have “print to PDF” from their print menu.

1 Like

I did’t say I wanted to print it. I said I wanted a PDF of it. To put on my iPad. To read when I have time, potentially in an unconnected state.

5 Likes

The issue is long topics. Try to print a long topic to PDF, and you’ll get some topic text then a lot of blank pages. It seems the only feasible way to PDF long topics is to turn off JavaScript, which limits the web page to around 20 posts, print that to PDF, go to the next page, etc.

5 Likes

This sounds like it would be better served by a stored partial copy of the database.

How complicated would it be to add an Export button to the bottom of each topic, which automatically formats the topic for print? Something that removes the UI, and simply displays the post text (and images) one after another? An example would be what Gmail does removing all of the UI.

5 Likes

Or a “Printer-friendly version”?

2 Likes

There already is a print.css

It is somewhat aged and in need of some updating. But other than the new timeline causing another page it looks good to me.

Have you tried it?

print-to-pdf.pdf (213.7 KB)

4 Likes

But it won’t work for beyond 20 posts. To my knowledge there is still nothing that will do that for you.

2 Likes

I didn’t realize this would get so much attention so quickly. Have to go to dinner now, but please, continue on without me. :smile:

I am guessing that is due to infinite scrolling…posts beyond 20 (above or below) aren’t loaded.

Edit: Seeing as the UI is all JS, I feel that something will need to generate the full topic in “printer-friendly version”, and load another page. If Discourse could generate the PDF itself, all the better!

Good ideas here, but I have no resources to allocate to this task, and won’t for the foreseeable future. It is possible @erlend_sh could fund something like this from the community, if someone here is particularly interested in building it.

5 Likes

I occasionally wish for this myself as admin but am beginning to think that offering this to users would prevent the desired behavior - we want people to read and participate via discourse. It’s bad enough that we allow participation by email.

I’d love to see this implemented as a moderator wrench option to “export topic as PDF”. So in the odd instance when my CEO requests a PDF to read on the plane I can provide it for him but it’s not something that is readily available to all users.

(…maybe that option could be extended later to allow export as JSON that can then be imported to another discourse instance)

1 Like

I remember a topic on SitePoint regarding this feature, as SitePoint is also powered by Discourse.

HTTrack may provide something in the meantime while waiting for a PR; with the proper scan rules set plus JavaScript turned off.

2 Likes

I think the main use case of this feature, namely “going offline with Discourse”, is best solved by implementing ServiceWorkers and other new standard APIs for enabling offline functionality.

Another alternative path you could go down would be to commission someone to do a client-side browser plugin for this.

3 Likes

Just an FYI - If you add support for this, make sure that “code blocks” (or whatever you call them) don’t end up getting truncated. Since they’re usually displayed in a box with scroll bars, this is certainly a possibility.

Thanks.

PS: I actually love Discourse, as a user. I’m a typical user in that I bellyache for something I don’t have, but I’d like to be atypical and say “Thanks”. Good stuff.

6 Likes

This might be too much of a workaround, but if you have access to a WordPress site, you can use this plugin to save a Discourse topic as a WordPress post:

And then use a plugin like this to convert the post to a PDF:

This will work for reasonably long topics, but not infinitely long. The ‘Discourse Topic Archive’ plugin needs better error handling. The ‘pdf-print’ plugin can be quite slow for long topics.

Here’s a PDF of the ‘What is the most awesome plugin for Discourse, that does not yet exits?’ topic that was created by this method.

awesome-plugins.pdf (1.1 MB)

6 Likes

Thanks.

Actually, what ended up working pretty well was to turn off JavaScript, select and copy the text on a page, paste it into TextEdit (I’m on a Mac), go to the next page, and repeat until done.

Surprisingly, TextEdit handled the pasted text pretty darn well - can’t say the same for Word.

So it wasn’t too bad.

4 Likes

I don’t know how long this will last (due to future coding changes)…
… totally not recommended …
… and you follow these instructions at your own risk - ie. there is “no support” …
… and your browser may crash doing this …
… but for anybody desperate enough to need to do it.

There is a global variable that influences “cloaking” - which is what hides / removes content when it scrolls out of view to save memory / allow Discourse behave as well as it does on low-powered devices whilst scrolling.

That variable is window.inTestEnv.

However it needs to be set before the main Discourse code loads.

You can use Chrome Dev Tools to do this:

  1. Go to the topic page you want to print in the browser
  2. Scroll to the top
  3. Press F12 (this opens Chrome Dev Tools)
  4. Press F5 (refreshes the page)
  5. Select the “Sources” tab
  6. Find the source code to the page you are currently on in the left hand nav, double click it to open the actual source code.
  7. Scroll down until you find the first <script> HTML tag in the source
  8. Add a break point in by clicking the line number of the first line of script.
  • You should see a blue arrow, like the one on line 43 pictured above (it might not be line 43).
  1. Press F5 (refreshes the page), this time stopping the JavaScript execution on the break point.
  2. Press Esc, this toggles the display of the “Console”, if you don’t see it - press it again and it should appear.
  3. In the Console type window.inTestEnv = true and press Enter
  4. Press F8 - this resumes the JavaScript execution and the page loading.
  5. Press F12 (this closes Chrome Dev Tools)
  6. Slowly and repeatedly press Page Dn until you reach the bottom of the topic.
  7. Press Ctrl+P (this opens the print dialog).
  8. Select “Save as PDF” Destination and press “Save”.
  9. Select the location to save the file, press “Save”.

Note that you may have to repeat the sequence 15 though 17 quickly / repeatedly until the file actually saves - as the print dialog closes / crashes when there is an issue. I found doing 15 though 17 faster and changing the destination away from a printer to PDF got me there in the end.

I test printed the “1000 replies” topic on “try” and it “works for me” to prove it works for long topics (outputs a 206 page A4 document).

Please don’t try this on here on “meta” only on your “own” personal instances.


By the way my option is that time should be put into Service Worker / offline support - not printing.

7 Likes

Perhaps the word “printing” should be removed from the topic title.
AFAIK, no one really wants to print to paper.

###However, there are at least two very good use cases to have an option to display the entire topic:

  1. Create an PDF from the full topic, that can be used to:
  • Read and annotate offline (like on a plane).
  • Provide as a reference for management or your team
  1. Clip to an Evernote Note
  • Same uses as #1.
  • Have as a technical reference that can be tagged, linked to, and annotated within Evernote.
  • Be available for searching along with other Evernote Notes.
  1. Clip to other Knowledgebases, like DevonThink.

I don’t see having this “full topic display” feature would have any significant effect on the normal usage of a Discourse forum.

Thanks for considering this feature request.

11 Likes

I would like to bring this issue back up.

Recently started a paid business hosting plan and it is a big deal for a lot of my users to be able to print a full topic.

There are a couple reasons for this. But it is mainly for posterity.

My organization is heavily focused on research and the various members want to print and save a thread in their personal archives or with their own research.

Secondly my demographic is often older than on a lot of sites. Telling them to turn off javascript in their browser so they can print a thread would be such a foreign concept to many of them that I would sound like an idiot even suggesting it. Many of these people also print it so they can read it on the paper, and not at their computer.

It is something that is very important to the users on my site that I need a solution for.

In my opinion, the best solution for me would be a button that says [Print View] on each and every thread, and opens up a new tab without the infinite scrolling and with the pagination turned on - similar to how it would look if i load the thread with javascript disabled.

7 Likes

Sure – open to that. We’d need some kind of minimum viable design to start, and decide if it is core or plugin.

6 Likes

I would also be super excited to sew this feature. Sitting here in Tanzania with members of my community it is striking how helpful this would be.

Perhaps a good spot for it would be on the link popup, as a printer or (I think better) download glyph. When selected from the post menu it would just download the one post in a single PDF. From the topic menu it would download the whole topic in a single PDF.

2 Likes

I’m taking a shot at this, in a lightweight approach.

6 Likes

So I have a basic version of this in my fork:

  • It renders the no-js crawler version.

  • It renders a big html with 1000 posts per page

  • It’s rate limited by 10 prints / user / hour

  • You can use your browser/SO/virtual printer to convert to PDF (1000 test posts is 277 pages, 6.1MB)

I will probably add some CSS now (emojis are huge), anything else I forgot?

15 Likes

Awesome that you are working on this and have come so far. Thanks so much! :sunflower:

If the output is a 6.1mb pdf, maybe a better approach would be to process it in the background and deliver it via PM when finished?

1 Like

I think he is stating, using the Print > Save as PDF in your browser, or using a virtual printer that prints to PDF on your computer. It isn’t being converted to PDF on the server. If that is the case, moving it to a scheduled job won’t solve anything, but I could have misread that.

6 Likes

Exactly @cpradio. Discourse is just generating html.

4 Likes

It’s a start. I’ve done the must have parts, but there’s probably something I forgot.

14 Likes

So has this been merged yet?

Subtly dealing with whisper I think exisits here…

1 Like

I’m fixing my mistakes in the PR with help from @tgxworld and @zogstrip. Also we are discussing enabling by default with a small limit.

6 Likes

Anything up with this? I have an academic user who needs to export entire discussions for offline analysis.

1 Like

We weren’t able to come up with anything we liked here, so we kinda put it on the back burner.

My solution is going to be a client-side parser that’ll take a topic URL and spit out something vaguely resembling in Mbox file.

First version is a go on Meta :tada:

  1. Go to a long topic: The State of JavaScript on Android in 2015 is... poor

  2. Press CTRL+P

  3. :printer:

17 Likes

The quote styling is a bit funky :slight_smile:

5 Likes

Looks great!

I’ve noticed on Firefox that I sometimes get the browser-native print dialog pop up immediately, then your new popup window once I close the print dialog, and then the desired print dialog. I also noticed that the popup-blocker kicked in – so is the keypress perhaps not prevent propagation of the event, and is it perhaps opening the popup after an async call, rather than as a direct result of the keypress?

Exactly. I couldn’t find a way to properly stop propagation on the print event. (it doesn’t look possible)

1 Like

I don’t know if it’s possible on the print event, but it’s possible on the key combo. (I didn’t know there was a print event!) Note that you can’t use keypress – that only works in Firefox, as the other browsers seem to handle the default print in keydown.

Here’s the super-simple approach:

https://jsfiddle.net/0kch2s63/2/

It works correctly in both Firefox and Chrome. Unfortunately in Edge you still get the printing popup – but the event is handled appropriately. Not sure about Safari – I don’t have a Mac on me at present. :slight_smile:

3 Likes

This was happening indeed in earlier versions. Since this line got added, it doesn’t happen anymore with me on latest Firefox at Ubuntu Linux.

3 Likes

I tried an approach were we could re-use the CSS we have:

Not sure if it’s the best way, but this way we doesn’t bloat the crawler version for bots and keep the style DRY.


One of the pages of our longest topic (157 pages long) on meta with quotes and oneboxes:

7 Likes

@falco can you summarize the final result here?

3 Likes

As requested, there is no UI for printing.

So any user can start to print by pressing CTRL+P when on a long topic. (This shortcut is show in our keyboard help)

For example, going to The State of JavaScript on Android in 2015 is... poor and pressing CTRL+P will open a popup with a print dialog asking to print 64 pages.

13 Likes

Nice work! I love this!

I don’t suppose there is a way to exclude certain (or all) html customizations from the javascriptless or print view, is there? Right now it shows up as a bulleted table of contents because it’s not including the accompanying css, which we don’t want.

Example:

Sure you can! Or at least in theory. I haven’t had a chance to put it in practice yet though.

@media print {
  .menu-primary-container { display: none }
}

You can also hide such things from the crawler using

body.crawler {
  .menu-primary-container { display: none }
}

Edit: and by crawler, I mean the crawler view the print window seems to utilize. Google will still see your links as I don’t think it parses CSS to know what content it should index (I could be wrong on that though).

6 Likes

(This is off topic)
I think Google does do this, I remember some posts talking about how they detected font-color: #fffffe on keyword stuffing as being too invisible to see.

3 Likes

There is actually a slight inconsistency in the crawler.html.erb as it includes theme html, but doesn’t include theme css. (see here; compare with application.html.erb which includes everything in discourse_stylesheet.html.erb).

This means any custom html added by a theme will appear in the print (or other crawler-based views), but cannot be styled, or removed via styling.

I’ve made a pr to add theme styles to the crawler view.

cc @Falco

6 Likes

I wonder if what we want here is a specific css theme piece for crawler view, uneasy just adding the blanket app css here cause people need to think about this if the want to customize css in crawler view

7 Likes

Yes, I was thinking the same thing.

Although, including the theme css is arguably better than current state, which just includes the un-styled theme html.

Not necessarily cause it could potentially cause visual breakage, nobody tests it

1 Like

I do agree here.

This is certainly better!

I wonder with our current structure, how hard is to pull only the CSS that affects posts here @awesomerobot?

6 Likes

Also note that the HTML structure in crawler view is completely different to that of normal topic pages. I think that improving the no-js/crawler view to look a bit more like the normal rendering is a good idea, but it’s also kinda a prereq to doing that.

(They also don’t render small-actions at all, which is a much more important thing to fix.)

3 Likes

A theme with a dark background would probably fail, plus I’m not sure someone with a dark theme would want to always print it that way (likely harder to read on paper and eats through all your ink). So we’d want some logic/modification (maybe we strip out color and keep other styles?), which leads back to:

To me it seems like the easiest path here stylistically is to do something with our no-JS view which already seems light on printing and easy on reading.

8 Likes

Sorry, just to clarify, is your vote to add “crawler.scss” file as an optional theme asset? Or something else?

2 Likes

Ah I didn’t read back far enough, yeah that was a bit unclear — I was suggesting sticking with the crawler view for print instead of trying to apply theme styles automatically.

So, a crawler.scss file as an optional is a good idea for flexibility, but we could also improve the default crawler view and print stylesheet a bit.

5 Likes

Sorry to bother, but just to clarify, did you decide on a next step here?

I’ll withdraw my current PR, but I’m happy to help (if appropriate) with whatever the solution is to the issue with having custom html but no custom css in the crawler view.

6 Likes

Idea for archival purposes…

Could you please consider also implementing /print also for

And then when someone is visiting https://meta.discourse.org/print and then form there show them links to https://meta.discourse.org/t/postgresql-12-update/151236/print?

That might help with archival? (related: A basic Discourse archival tool)

For example PostgreSQL 12 update looks great!