Searchable File Attachments

VNVJeep · September 20, 2016, 3:16pm

We have need to not only search the forums, but the attachments as well. Currently enjoying a 2 week preview of the product, liking it immensely, but just found out that search did not yield any results from any of the sample .pdf’s we uploaded.

It would be incredibly nice if discourse could index and search .pdf formats, as well as other standard office-type or text formats.

Could you please add this to the upcoming feature list? Much appreciated!!

codinghorror · September 20, 2016, 11:35pm

Whoa there, do you mean attachment content? As in what the files contain? I am not sure I view that as within the scope of Discourse.

Search should match filenames, if your filenames are unique enough, because the filenames are part of the post body. But the contents of the files are not considered part of the Discourse posts…

sam · September 21, 2016, 5:45am

It is definitely something I would be supportive of in a plugin if someone feels like building it.

Would be nice to add this level of extensiblity to search.

VNVJeep · September 21, 2016, 7:48pm

Correct… being able to search content within the attachments. To me, this is one of those features that is almost becoming a standard nowadays… and our users are beginning to expect this kind of functionality. We run multiple organization’s websites, and have been performing this kind of functionality for many years using the Microsoft Indexing Service. We have other sites that have switched to the Sitefinity CMS product, and this functionality was a must-have there as well. Gmail lets you do it as well across the the attachments you have saved in your account. It’s a tremendously valuable feature for those who provide and upload a lot of content within file attachments.

Anyways, please let me know if you reconsider, or if you do hear of a plugin that would be capable of doing something like this, I’m definitely interested!

codinghorror · September 21, 2016, 8:04pm

It’d probably be a feature we only offer to enterprise hosted instances.

Falco · September 21, 2016, 8:10pm

This gem seens a good candidate with a good compatibility across document types:

https://github.com/Erol/yomu/blob/master/README.md

However running Java, adding a potentially very big column with search data and creating necessary plugin hooks on the search infrastructure is something, involved

VNVJeep · September 21, 2016, 8:25pm

Yeah… we’re not that big. The Standard hosting model is even overkill for our sized group.

Mittineague · September 21, 2016, 8:26pm

A bit different spec, but it might be easier and less resource intensive to have a page that queries uploads and lists the files, if that would be sufficient and be a fair compromise.

VNVJeep · September 21, 2016, 8:28pm

Or perhaps, add another field in the upload file dialog box that asks for a description, and allows you to dump some content into there that would be searchable?

riking · September 22, 2016, 7:10am

With that kind of code, I’d be very concerned about bugs in the file format parsing. Office files are prone to making bugs when you try to parse them, including RCE on occasion.

codinghorror · November 14, 2016, 10:24pm

Duplicate of

?

newmember · May 3, 2022, 4:38am

I think the idea would be to use something like [Apache Tika – Apache Tika] https://tika.apache.org/ and then make the extracted meta data searchable in Discourse.

sam · May 3, 2022, 5:00am

Going to close this as a dupe of: Index File Contents for Search

Very supportive of someone experimenting in a plugin, no concrete plans from our side to integrate with a tika server, etc.

Topic		Replies	Views
Index PDFs for search Support	2	331	October 7, 2023
Index File Contents for Search Feature ai , ai-search	7	1651	October 16, 2024
Option to search for images (or PDFs or other attachment types) Feature	1	423	September 28, 2023
Browser-based PDF Viewer with search and highlighting capabilities Feature	9	2498	December 15, 2020
Filter topics in category containing file attachments Dev rest-api	4	878	February 10, 2021

Searchable File Attachments

Related topics