検索用インデックスファイルの内容

ahuling · 2015 年 8 月 7 日午後 6:37

Right now, file names are indexed for search but the contents are not. It would be nice if the contents of files were also indexed, at least for the most common text-based files types, e.g. txt, pdf, doc, xls, csv, etc.

Any plans for this?

codinghorror · 2015 年 8 月 7 日午後 6:47

No plans for this at the moment.

DDo · 2020 年 6 月 11 日午前 7:06

“Me, too”
We’d like to have attachments (in our case, PDF’s) indexed for the search engine, too.

sam · 2020 年 6 月 11 日午前 7:40

This is very much an enterprise customer type feature. We don’t have concrete plans here with a timeline, I am uncertain what would happen to Postgres with huge PDF documents.

Certainly something we have thought about over the years and may get to over the next few years.

Craig_Robben · 2022 年 3 月 17 日午後 8:58

DiscourseはPDFのインデックス作成と検索機能を追加しましたか？

sam · 2022 年 3 月 17 日午後 11:16

まだですが、プラグインで構築することは非常に可能です。

avandorp · 2022 年 6 月 29 日午前 7:51

このようなプラグインを開発する場合、どこから始めますか？ Discourseのコードには全く慣れていないので、UploadCreatorにフックしようとするかもしれませんが、それは全く間違っているかもしれません。

mjbergman92 · 2024 年 10 月 16 日午後 5:45

Paperless と統合する Discourse プラグインを開発することは、良い出発点になるでしょう。

前述のように、このようなプラグインは、少なくとも手間がかかるものになるでしょう。

このようなプラグインには、Discourse API がドキュメントの外部処理を許可する必要があるでしょうか？それは現在利用可能ですか？

このプラグインには、Discourse API が提供する検索機能との統合が必要になります。これは簡単ではありませんが、discourse/discourse-algolia プラグインなど、いくつかの既存のプラグインによって実現されています。

その他の考慮事項

バックアップ

これは、私が個人的に経験を積みたいと思う分野です。Paperless API を調べ、discourse/discourse-algolia プロジェクトをリバースエンジニアリングすることから始めましたが、検索と統合する他のプロジェクトもあります。

Paperless の選択について、何か考えはありますか？プロジェクトの活発さと、クローズされたイシューの数、そして現在のイシューの数（現在 0 件）が好きです。

dennisjbr · 2025 年 9 月 23 日午前 5:45

PDF、スキャン、または画像経由で多くのドキュメントを扱うフォーラムにとって、これは非常に大きな改善になるでしょう。もしこれが検索に限定されるのであれば、理想的にはテキストを抽出し/作成して、それをPostgresの列に保存するだけで済みます。そうすれば、Postgresのフルテキスト検索をそのまま活用できます。

例えばLinuxでは、pdftotextを使用してPDFからテキストを抽出し、DBに保存できます。別の（より高価な）アイデアは、AIビジョンを使用してPDFまたは画像を説明/抽出し、DBに保存することです。

ご意見をお聞かせください。

トピック		返信	表示
Index PDFs for search Support	2	354	2023 年 10 月 7 日
Searchable File Attachments Feature	13	2812	2022 年 5 月 3 日
Add support for searching pdf files in forum topics from AI personas Feature	3	88	2025 年 12 月 12 日
Browser-based PDF Viewer with search and highlighting capabilities Feature	9	2549	2020 年 12 月 15 日
Upload and discuss pdfs in composer Feature ai	5	221	2025 年 2 月 24 日

検索用インデックスファイルの内容

関連トピック