Putting this out there as a feature request. I’d love to see the ability to onebox PDFs and other attachments along the lines of google docs oneboxing. Or perhaps even simply a file attachment appearance like you’d get using the file upload, ideally also with the file type and size provided.
Right now putting a PDF URL on its own line presents as a raw URL. Very 1990s. As my millenial colleague told me recently, “who needs to know what http is in this day and age?”
Sure oneboxing of PDF is a reasonable idea, @techapj can you add it to your list? At minimum try to get the title of the document and a text summary. I would not worry about thumbnail as that will be considerably harder, just use a generic (but pretty) PDF icon like we do for Google Docs.
I had to revert this change, specifically getting information from “PDF metadata”.
The PDF metadata was being fetched using pdf-reader gem which introduced lots of its own dependencies. I just removed the dependency on pdf-reader gem for onebox.
Now the onebox will simply show pdf filename and filesize. This change significantly reduces time required to onebox because instead of fetching the whole file and loading it in memory we are now just making a HEAD request to get “Content-Length” for filesize and the URL contains filename.
Here is the demo of new PDF onebox:
.
.
.
I looked into this locally. It was because of pdf title not being able to forced into UTF-8 encoding. The new onebox fixes this issue:
What about pdfs that are mail attachements or have been simply uploaded as attachments? They are already on the server. Wouldn’t this ease the analysis of meta-data?
That may be correct (I am not sure), but to read the pdf file in Ruby we will have to depend on pdf-reader gem. Hence additional library and more memory.
This is a pity. Having the metadata displayed would be extremely useful, especially in the academic context, where a lot of pdfs are shared. I understand that this may not be the right setting to have enabled by default because it potentially uses a lot of resources, but is there a chance of bringing this back as a site setting? Or perhaps at least for locally uploaded pdfs, i.e. where the pdf doesn’t need to be downloaded?
I would prefer this to, at least initially, be a plugin.
I don’t want to worry about another gem dependency, I don’t want to worry about it potentially causing memory bloat on our job processor. Putting it in a plugin a 3rd party maintains would shield us from this and allow you to nut out all the intricacies and edge cases with bad metadata that is floating around there in random PDFs.
FWIW and after rereading my OP above, I think my need is met by the current functionality. Providing extra info about the PDF contents is a “nice to have” not a requirement.
In my use case pdf attachments are directly embedded into the text. Oneboxes costs space and the real world benefit is extremely low. As I have previously suggested about a year ago, I would prefer an HTML5-based PDF viewer and more capabilities to search inside these pdf documents with the Discourse search. - Maybe, it could be nice to automaticly insert an PDF icon right before the linked file name. This signals more then enough, that an pdf file is being placed on this location
I’d be happy with this too, and suggested it in the OP.
But really my need here is met already and I wouldn’t want to see the discourse team devoting too much more time to making PDFs more presentable in discussions. Bike shedding and all that. But I suppose making this change could be pr-welcome.
That sounds like an awesome idea for a plugin! I’d guess a couple or three days of work for a programmer familiar with Discourse (which means, not me!), given that some HTML-5 based viewer already exists.