If you would like to chat about your experience, feel free to schedule a chat over a call Appreciate any feedback
Features
Fast search
Ability to search topics, posts, chat messages and users
Posts and Topics results include PMs
Chat Messages include private channels and DMs
UI-based filters for things like tags, categories, users, inboxes, channels, etc.
Keyword, Semantic, Hybrid and Hyde search modes
FAQ
Search stops working after a while on the page
Indeed it does; please refresh.
It doesn’t support our search grammar, like @user or #category
Indeed it doesn’t, but it’s something that can easily be added if we decide to ship this.
Topic and Post search targets being distinct is a weird choice
I can see why, especially if you are used to how Discourse search works for the last decade. If we decide to ship this, we could build a mode that does both at the same time, or even simply run both and show them both in the UI. For the constraints of this experiment, this was the easiest way to address both use cases of:
I know that this topic exists and just want to find it (Topic search)
I want to research any occurrence of this query (Post search)
Results quality isn’t quite there
We barely touched what’s possible here; at the moment, we only prioritize categories and assign weights to title and body. This would need further tweaks to match the refinement we have on the existing search, but also brings the possibilities of going further. Unfortunately, a lot is controlled via the JS API, and the library we are using hamstrung us quite a bit here.
Semantic / HyDE / Hybrid are slow
We added a larger debounce on those to work around some annoyances on the JS library we are using. If we decide to ship this, this JS library is the first on the chopping block. As for the overall speed of those, they depend on two requests, the first one for embeddings, which is running on ancient hardware on AWS, and that doesn’t help. We could also inject embeddings at the middleware proxy for cutting down the latency. Again, experiment time constraints.
Technical details
This experiment is using Typesense, an Algolia open-source clone. It’s running in an EC2 instance in the same place as everything else on the Meta hosting.
The front-end doesn’t request from Typesense directly; instead, all calls are proxied via the Discourse app, using a Rack middleware.
The search bar / results / refinements is using InstantSearchJS wrapped in EmberJS. Unfortunately, this library caused a lot of trouble, and we won’t be using it if we ship this.
The server is using 7.35 GB of RAM to index all of Meta. Just keep in mind that most of that is because of embeddings; it would be less than 2 GB without embeddings.
I think if we were to continue with this plugin it might make sense to default to hybrid because people aren’t really familiar with switching search types like this.
Gave it a quick try and I find it promising. The past days, I searched for the experimental admin bar topic and the restoring backups from command line topic. In the first case, it just took a while to find it in the search results, in the second case, I ended up searching my bookmarks. But the new search brings up both and is a lot faster than the old one. So definitely an improvement for me
Interesting, can’t wait to explore it further! Looks promising so far!
Could we see something like this eventually replace the full search function entirely? In addition, will this tool serve as an addition to the current search tool in the top toolbar?
This experiment was made to research the feasibility of re-building our search from the ground up and what are the trade-offs of this new approach. While it’s too early to tell, if the new search experience is received well enough it can then be considered to be integrated into Discourse many search features, be it full page search, inline header search, similar topic search for Related Topics, user mentions, hashtag autocompletion, etc.
That’s easily doable with this technology, both on mobile and on desktop, much like Google search UI.
Modes only exist for the experiment so people are able to easily compare and test, if this were to ship modes would become an admin option most likely, instead of something user facing.
I also like the /filter experiment and the way it lays out all my filter options. So seems there’s a couple of directions explored with respect to finding content. Will this come to play together?
What would really simplify things for me if there’s a common language in the end, so as a user I would have a straightforward understanding of:
what’s a Search? What’s a Filter?
how do I recognize each on the interface, where should I expect to find each?
when am I presented a simple/common set of filters? How do I access/expand the full set?
If shipped, the backend for instant search would power /filter. It’s basically the same thing we are already doing in the Topics search in this experiment.
If you don’t mind me asking, for context and so that I know exactly what I’m looking for when I’m testing:
What is wrong with the current search today that you’re looking to change or improve on?
What are we evaluating here besides speed?
If you decide to rebuild search from scratch, is there a chance to have Discourse search search other places of content? e.g. docs not hosted inside of Discourse?
I have a long list as well, but I wanted to make sure I came in with questions and understanding before I laid on the horn
I’m really glad to see consideration for improvement. FWIW, while it has work to go, we are constantly told how much our users love our search (aka Discourse search) when comparing it to other search experiences they have, even with its quirks.
As a forcing function, my team is exclusively (at least amongst our own team) using Discourse exclusively to communicate. This was the first request by one of my team members today:
This is something we considered heavily when experimenting with an off-the-shelf solution like Typesense. It would make it a lot easier to extend our search to external “documents”, by either allowing customers to inject documents into our database or allowing the front-end to consume from other instances that follow some guidelines to be compatible.
Good thing we got those two right off the bat with the experiment! Thanks for the feedback.