Oneboxing of PDFs and other attachments

tobiaseigen · January 18, 2017, 4:20pm

Continuing the discussion from Custom visualization for specific attachment types:

Putting this out there as a feature request. I’d love to see the ability to onebox PDFs and other attachments along the lines of google docs oneboxing. Or perhaps even simply a file attachment appearance like you’d get using the file upload, ideally also with the file type and size provided.

Right now putting a PDF URL on its own line presents as a raw URL. Very 1990s. As my millenial colleague told me recently, “who needs to know what http is in this day and age?”

pfaffman · January 18, 2017, 4:40pm

Kids these days. I’m so sure.

codinghorror · January 18, 2017, 11:32pm

Sure oneboxing of PDF is a reasonable idea, @techapj can you add it to your list? At minimum try to get the title of the document and a text summary. I would not worry about thumbnail as that will be considerably harder, just use a generic (but pretty) PDF icon like we do for Google Docs.

techAPJ · February 6, 2017, 7:30pm

Okay, we now support PDF onebox using PDF metadata.

The oneboxing works best when the metadata of PDF file is complete i.e. it contains “Title”, “Subject” & “Author”.

Demo:

PDF contains complete metadata:

PDF only have “Title” & “Author” as metadata:

PDF with no metadata:

tobiaseigen · February 7, 2017, 4:00pm

Thanks for this - super exciting to see PDF oneboxing.

However I’m having a bit of trouble with it - my PDFs do not appear to be oneboxed, even here on meta. Here’s an example:

techAPJ · February 8, 2017, 7:34am

I had to revert this change, specifically getting information from “PDF metadata”.

The PDF metadata was being fetched using pdf-reader gem which introduced lots of its own dependencies. I just removed the dependency on pdf-reader gem for onebox.

Now the onebox will simply show pdf filename and filesize. This change significantly reduces time required to onebox because instead of fetching the whole file and loading it in memory we are now just making a HEAD request to get “Content-Length” for filesize and the URL contains filename.

Here is the demo of new PDF onebox:

.

I looked into this locally. It was because of pdf title not being able to forced into UTF-8 encoding. The new onebox fixes this issue:

https://namati.org/wp-content/uploads/2017/01/4.Evidence_Land-Rights_-Myanmar-2017-Final.pdf

tobiaseigen · February 8, 2017, 2:08pm

Fabulous. Confirmed working - thanks!

rriemann · February 9, 2017, 5:10pm

What about pdfs that are mail attachements or have been simply uploaded as attachments? They are already on the server. Wouldn’t this ease the analysis of meta-data?

Falco · February 9, 2017, 5:14pm

You still need a library to read this metadata and to load the file in memory, that’s more expensive.

rriemann · February 9, 2017, 5:28pm

I really thought that the meta-data are stored at the very beginning of the
file, so that just a stub would need to be loaded.

techAPJ · February 9, 2017, 5:34pm

That may be correct (I am not sure), but to read the pdf file in Ruby we will have to depend on pdf-reader gem. Hence additional library and more memory.

schungx · September 17, 2017, 8:40am

Are these OneBox information cached or you have to reprocess it every time the URL is shown?

If it is cached, then I can’t see why spending the time to read the file and extract the meta info should be resource-wasting.

fefrei · September 17, 2017, 8:47am

It’s cached – this information is baked into the HTML-version of the post

riking · September 18, 2017, 4:41pm

We also have to consider the resources of install time and disk space - adding a whole bundle of other gems isn’t really helpful on that front.

tophee · September 20, 2017, 7:59am

This is a pity. Having the metadata displayed would be extremely useful, especially in the academic context, where a lot of pdfs are shared. I understand that this may not be the right setting to have enabled by default because it potentially uses a lot of resources, but is there a chance of bringing this back as a site setting? Or perhaps at least for locally uploaded pdfs, i.e. where the pdf doesn’t need to be downloaded?

sam · September 20, 2017, 3:39pm

I would prefer this to, at least initially, be a plugin.

I don’t want to worry about another gem dependency, I don’t want to worry about it potentially causing memory bloat on our job processor. Putting it in a plugin a 3rd party maintains would shield us from this and allow you to nut out all the intricacies and edge cases with bad metadata that is floating around there in random PDFs.

tobiaseigen · September 20, 2017, 3:46pm

FWIW and after rereading my OP above, I think my need is met by the current functionality. Providing extra info about the PDF contents is a “nice to have” not a requirement.

terraboss · September 20, 2017, 3:58pm

I don’t like this at all.

In my use case pdf attachments are directly embedded into the text. Oneboxes costs space and the real world benefit is extremely low. As I have previously suggested about a year ago, I would prefer an HTML5-based PDF viewer and more capabilities to search inside these pdf documents with the Discourse search. - Maybe, it could be nice to automaticly insert an PDF icon right before the linked file name. This signals more then enough, that an pdf file is being placed on this location

tobiaseigen · September 20, 2017, 4:02pm

I’d be happy with this too, and suggested it in the OP.

But really my need here is met already and I wouldn’t want to see the discourse team devoting too much more time to making PDFs more presentable in discussions. Bike shedding and all that. But I suppose making this change could be pr-welcome.

pfaffman · September 20, 2017, 4:03pm

That sounds like an awesome idea for a plugin! I’d guess a couple or three days of work for a programmer familiar with Discourse (which means, not me!), given that some HTML-5 based viewer already exists.

Topic		Replies	Views
OneBox common document formats Feature	3	1247	September 18, 2017
PDF preview instead of download, on uploaded files Support	4	1337	June 19, 2020
Format rendering of links to (local) pdf files? Support	4	1122	July 14, 2020
Browser-based PDF Viewer with search and highlighting capabilities Feature	9	2498	December 15, 2020
Inline PDF Previews Theme component official , desktop , pdf-previews	134	12505	January 28, 2025

Related topics