I work on the JRuby project, and have recently (a couple times in the past year) attempted to get Discourse running. It would mean you can run a single Discourse process for a whole site, and probably use less memory and CPU at the same time. I think it’s worth getting it to run.
As with most existing Ruby apps, there are a few missing C extensions.
The good news is that most of these extensions appear to have alternatives for JRuby, or they’re trivial enough that it should be easy to just make a JRuby version.
I wanted to start a discussion here so we can talk about some of the exts and possible replacements.
Yes this is totally safe, puma is supported. If we really want to conserve memory here though it may be interesting to run sidekiq inside puma.
Oh … fast_blank should not be a dependency for onebox, I just removed it, next release will not have it.
I think it probably makes sense just to not depend on fast_blank for jruby and set the fast_blank dependency in Discourse to MRI only. String@match? does matching without needing to set globals so it should in theory be fast enough for jRuby.
Happy to make this MRI dependency only for now.
Oh, we should bench this if perf is the same we can move to xorcist
therubyrhino should just work, if not there’s a pretty solid attempt to use the engine provided with Java 8,
which is kind of a (direct) successor of Rhino, gem 'dienashorner', platform: :jruby … really depends what the JS engine is used for - if mostly for compiling with the asset pipeline than both are expected to work fine.
have also looked into xorcist, managed to run into one issue - not failing properly on frozen strings, but its pretty edge case that is easy to work-around + PR submitted.
Aside: can y’all share with those of us on the sidelines a little bit about why? Is the notion that JRuby would replace MRI as the default implementation for Discourse in the future if you get this working?
That certainly isn’t my goal! JRuby would always be an alternative, and in some cases it may be a better choice than MRI for larger deployments. A single JRuby instance can handle an entire server load, maxing out all cores. If you are getting to the point of running 3 or 4 or 5 MRI instances, there may be a good case to try JRuby.
We also usually perform better, but that may take some tweaking early on.
I realize looking at the Rinku README that it’s a drop-in replacement for Rails autolinking, which after 3.1 was pulled out as the rails_autolink gem. I am looking into doing a port of Rinku, but simply using rails_autolink works around this one right now.
Rinku is a drop-in replacement for Rails 3.1 `auto_link`
----------------------------------------------------
Auto-linking functionality has been removed from Rails 3.1,
and is instead offered as a standalone gem, `rails_autolink`. You can
choose to use Rinku instead.
On Discourse, our main use for V8 is the markdown cooking.
Since we have a live preview and extensible markdown pipeline, we guarantee the same behavior on the browser and on server side by running the exact same code.
Even better, it accepts CharSequence, which means we should just be able to pass it a Ruby string (or one of our representations of it) and it will function mostly without pre-transcoding everything into Java characters.
I’ll see if I can get some basic API equivalent wrapper around it.
So far…it works. But it does eventually create Java strings, so we may (or may not) want to adapt or fork this library to work with Ruby strings more directly.
Regarding Nokogumbo, I’d just like to say here that it seems bad that Discourse is using both Nokogiri and Nokogumbo together.
Nokogumbo follows the HTML 5 parsing specification; Nokogiri is built on libxml2’s HTML 4 parser. They differ in behavior in ways that can introduce subtle bugs when handling tricky corner cases worked out during the development of the HTML 5 parser specification.
I recommend Nokogumbo over Nokogiri, because Nokogumbo matches what browsers do, and, more philosophically, because the HTML 5 parser is fully specified, as opposed to HTML 4 which left room for undefined behavior.
(It’s like the difference between kramdown and CommonMark.)