Add Algolia search to your Discourse


(Josh Dzielak) #1

:mag_right: discourse-algolia :blue_heart:

I’ve recently created a plugin that indexes topics, posts, users and tags with Algolia and combines them into a multi-category autocomplete search. Here’s a GIF of how it works, and you can try it live right now on the Algolia Community Discourse.

You can find the Github repository, complete with installation instructions, on github at algolia/discourse-algolia.

Plugin configuration just requires populating a few fields. Indexing tasks are put in the jobs queue after objects are saved. Note: you will need to create an Algolia account, which is free up to 10,000 records. (Disclaimer: I work at Algolia.)

:raising_hand_woman: FAQ

Q: Does this replace the default Discourse search?
A: Only if you want it to, and right now only for the autocomplete in the header. The full search page is still reachable by hitting the enter key without a search result selected, or by using the “advanced search” link in the autocomplete footer. I say “only if you want it to” because you can enable indexing to Algolia but not affect the UI or existing Discourse search in any way - that’s why there are two checkboxes in the plugin settings. If you’re just doing indexing, you can search the data in your Algolia dashboard to see how it’s working.

Q: Do I have to pay to use the plugin?
A: It depends on how much data you have. If you have less than 1k posts, you should fit into Algolia’s free Community Plan, which gives you 10k records. A good rule of thumb is that you’ll need 10 Algolia records for each post, as posts are split up into-paragraph size chunks for optimum relevance and speed. Still, YMMV. If you’d like to use the plugin but have concerns about the cost, just send me an email and I’ll see what we can do. If you are an open source project or a non-profit, you may quality for higher limits, just fill out this form and mention Discourse.

:hammer_and_wrench: Seeking Beta Testers!

I’m looking for up to 5 Discourse owners who would be interested in trying this out and helping me make it more robust and full-featured. Ideally you have a good amount of data we can test with and use search a lot. If you’re interested, please send me a PM or email me. In return, any Algolia quota you need during the beta will be free and we’ll give you 50% off any future usage if you decide to become a paying customer.

:memo: Issues & Feature Requests

Please file them on the Github repository or reply with your question here.

Thanks!
Josh


#2

Looks excellent, I’ll be hitting you up with a PM shortly. I’m in. :slight_smile:


(Bas van Leeuwen) #3

Ah man, you delivered this just too late :sob:
We recently invested in getting a modified version of this plugin working.

It looks great btw! :slight_smile:


(Bas van Leeuwen) #4

Have you tackled the “problem” of users being able to query results they otherwise wouldn’t be able to see? E.g. staff-only posts etc.
That’s something that we still have to solve, so we might borrow some “inspiration” from your code :smiley:

Did you use secured API keys?


(Josh Dzielak) #5

Thanks @bas for the question!

Right now, we follow a similar approach as the discourse-slack-official plugin, which is to have the plugin configuration specify a specific user, and only index things that the user can see.

This makes it easy to avoid indexing Staff category posts, for example, by creating a dummy user who is not a mod or admin. Right now it is not possible to have two different users able to see different data, but this is something I’m thinking about for the future.


(Josh Dzielak) #6

One thing I’ll add there around privacy/security - no private posts or private messages are indexed.


(Bas van Leeuwen) #7

Security levels are a hard requirement for us unfortunately, but it’s something we’ll have to figure out on our own :slight_smile:


(Sam Saffron) #8

I just reviewed the code and it looks quite nice and tidy.

One concern I have is:

Why is there an explicit json dependency here? I worry that it will conflict with Discourse over time if we ever choose to be explicit about our json version and not use the one bundled with Rails.

I am much more comfortable adding the json dependency into our project Gemfile.

Regarding httpclient, is there a reason algolia gem needs to depend on that and not simply use the built in stuff? I also worry about potential conflicts, but at least for now we use excon in core so the odds of conflict are low.


(Josh Dzielak) #9

Thanks for the review @sam.

The algoliasearch gem has these dependencies in its gemspec, but they don’t get loaded by the plugin unless I explicitly declare them as dependencies, otherwise I the result is gem::MissingSpecError. There might be a better way to do this, I am sure, and I confess to being not fully familiar with how the plugin installs and loads the gems. Any thoughts there?

For the json gem, it would be much better to get that from Rails, as the current way it’s being done does generate warnings about already-defined constants.

For httpclient I think this was just a design decision made by the original creators of the gem, not sure if it is possible to fall back to another http lib without modifying it heavily.


(Sam Saffron) #10

I see Josh, so Algolia the company is not in control of alagoliasearch the gem?


(Josh Dzielak) #11

Sorry, could have been clearer there. Algolia created and controls the gem. When I referred to the “original creators” these are in fact my colleagues :slight_smile:


(Sam Saffron) #12

Aha, so we could potentially improve the algolia gem to use HTTP calls via Class: Net::HTTP (Ruby 2.4.2) direct and strip the explicit json dependency?

That makes the Discourse plugin less “fragile” longer term (and reduces overall dependencies of people integrating)


(Josh Dzielak) #13

I’m sure it’s possible but I’d have to check with the team and see the work involved. The API client is used in many production implementations and we would have to be cautious about changing the plumbing there. Still, I do understand the desire for reducing new dependencies for those who integrate the plugin.

In the short term, it should be possible to avoid needing to depend on a version of the json gem. I will need to check with a better Rubyist than I as to how we go about that.

I filed an issue on the Github repo to track it:


(Magnetidog) #14

Hello, thanks for this plugin!

We are trying to deploy this, but we are getting this error when it gets to indexing the posts with Algolia. Any idea what could cause the issue?

Pushing posts to Algolia

Failed to report error: MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. 
Commands that may modify the data set are disabled. 
Please check Redis logs for details about the error. 
2 Failed to process job: ERR Error running script (call to f_b06356ba4628144e123b652c99605b873107c9be): @user_script:14: @user_script: 14: -MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. 

Commands that may modify the data set are disabled. Please check Redis logs for details about the error.    

["/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis/client.rb:121:in `call'", 
"/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis.rb:2399:in `block in _eval'",
 "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis.rb:58:in `block in synchronize'", "/usr/local/lib/ruby/2.4.0/monitor.rb:214:in `mon_synchronize'", 
"/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis.rb:58:in `synchronize'", 
"/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis.rb:2398:in `_eval'", 
"/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis.rb:2450:in `evalsha'",
 "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/message_bus-2.1.1/lib/message_bus/backends/redis.rb:380:in `cached_eval'"
, "/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/message_bus-2.1.1/lib/message_bus/backends/redis.rb:140:in `publish'",
"/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/message_bus-2.1.1/lib/message_bus.rb:248:in `publish'", 
"/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/message_bus-2.1.1/lib/message_bus.rb:485:in `block in new_subscriber_thread'", 
"/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/message_bus-2.1.1/lib/message_bus/timer_thread.rb:102:in `do_work'", 
"/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/message_bus-2.1.1/lib/message_bus/timer_thread.rb:30:in `block in initialize'"]

And is there a way to restart the importer but skip the user bit (as those are already imported in Algolia now).

Thanks1


(Sam Saffron) #15

Errors seems totally unrelated, try rebuilding your container, you are likely hitting a docker bug we noticed over the past few days.


(Magnetidog) #16

Hello Sam - thank you, that fixed the issue!


(Magnetidog) #17

Hi,

we tried reindexing all posts after we worked in a beta directory with our forums.

Anyhow, something seems really wrong - we did not modify anything to the rake option to import posts, but the post count is nearly 3 times the amount of posts we have on our forum.

Is there any reason for this? Trying to look into the Algolia post index I cannot find duplicates for our posts, but 400k posts turned into 1.2M posts. This is a big problem and difference in terms of costs.


(Magnetidog) #18

We have submitted a ticket to Algolia - we would be happy to debug the issue which I believe is due to some problem with Discourse 2.0 (as we did not have the problem when we first tried the plugin on 1.9, but I might be wrong), if it is possible to discount from our account today’s operations on the discourse_posts index.

If we are given a way to test this without accruing extra operations we can do different tests until we find a solution to the problem.


(Jesse Perry) #19

For what it’s worth, I’m also running the Algolia plugin and I also updated to Discourse 2.0 and have not seen any jump in the number of records on Algolia.


(Magnetidog) #20

Hi Jesse,

this happened by running the reindex all posts option after the upgrade. That empties the entire index, then runs the importer. Just upgrading to 2.0 causes no trouble (I did it three times already) - it is the reindexing option that seems to lead to duplicate post indexing. After I ran the importer, the counter of the posts was nearly 1.2M, vs 400k posts in discourse. Running the importer on Discourse 1.9 imported the correct amount of posts.

I am afraid of running the tool again until I get a reply from them. My guess is that the issue occurs when the number of posts is large and maybe the RAM is not enough - even though we have 12GB available and 4GB of swap enabled.

24

To make things worse, you have no overall count of posts until the end of the process.