Index File Contents for Search

Right now, file names are indexed for search but the contents are not. It would be nice if the contents of files were also indexed, at least for the most common text-based files types, e.g. txt, pdf, doc, xls, csv, etc.

Any plans for this?

7 Likes

No plans for this at the moment.

“Me, too”
We’d like to have attachments (in our case, PDF’s) indexed for the search engine, too.

2 Likes

This is very much an enterprise customer type feature. We don’t have concrete plans here with a timeline, I am uncertain what would happen to Postgres with huge PDF documents.

Certainly something we have thought about over the years and may get to over the next few years.

4 Likes

Curious if Discourse added the capability to index and search PDF’s yet?

1 Like

Not yet, very feasible to build though in a plugin.

1 Like

When developing such a plugin: Where would you start? Being totally new to the discourse code I’d probably try to hook in UploadCreator, but that might be very wrong.

Developing a Discourse plugin that integrated with Paperless would be a good start.

Such a plugin would be involved to say the least, as stated before.

A plugin like this would require that the Discourse API allows for external handling of documents. Is that currently available?

This plugin would require integration with the search capabilities offered by the Discourse API. While this is not trivial, this has been done by several existing plugins, namely the discourse/discourse-algolia plugin.

Other Areas to Consider

  • Backups

This would be something I would personally be interested in cutting my teeth on. I have started by looking at the Paperless API along with reverse engineering the discourse/discourse-algolia project… but there are others that integrate with search.

Any thoughts as to the choice of Paperless? I like how active the project is and the amount of issues they have closed in addition to the number of issues they have (currently 0).

1 Like