Why is Discourse so slow on Android?

(Jeff Atwood) #1

When I was on vacation last week, I took my cellular Nexus 7 with me so I could keep up with Discourse posts, here and on our partner sites.


  • the Nexus 7 was never a super fast device even at launch
  • it was released in the ancient days of 2012
  • Discourse is a very intensive JavaScript application

So I wasn’t exactly expecting barn-burning speeds.

I like using http://bbs.boingboing.net as a benchmark since it’s our busiest site and I can go there any time of the day and many of the topics will consistently have new posts, and a lot of them are larger than the default post chunk size (20).

What I noticed was that Discourse performance was much worse than I remembered before on the Nexus 7. On a topic that has 20+ posts I saw this consistently:

  • time to the “loading…” indicator: 2 seconds
  • time to the topic loading a full 20 post chunk: 5 seconds

The Nexus 7 is substantially faster on small topics, where I’d see:

  • time to the 'loading…" indicator: 2 seconds
  • time to the topic loading 1 or 2 posts: 1 second

The majority of the performance problem comes in loading the posts, and seems to scale more or less linearly with the number of posts loaded in the topic, up to the maximum chunk size of 20.

But since many topics on BBS do have 20+ posts, it was excruciating to browse BBS and keep up with all the new posts on the Nexus 7 when I was paying a seven second penalty on every click. Brutal. Just brutal.

OK, so maybe the N7 is just old. Coincidentally, the new 2013 Nexus 7 was announced while I was on vacation. I immediately ordered 3 for the team and myself. It arrived literally the day I got back from vacation, this past Monday. It’s easily twice as fast as the old model by every benchmark there is – and it’s plenty snappy in real world use!

So I excitedly unboxed the 2013 Nexus 7, updated it to latest over WiFi, and visited a large BBS topic, expecting a big free Moore’s Law improvement:

  • time to the “loading…” indicator: 2 seconds
  • time to the topic loading a full 20 post chunk: 5 seconds

Err… WTF? :confused: How can a 2x faster device have the same exact performance on Discourse as the old model? That’s deeply concerning! We are building Discourse for the next decade of the Internet and we are assuming that newer devices will be faster. But this is exactly the same speed as the old device! How is that even possible, when every benchmark I read shows that the 2013 Nexus 7 is twice as fast as the 2012 model?

We did not see this with the iPad 3 and iPad 4, performance on Discourse scaled up with the device perfectly. I also noticed that the Surface RT, which is the same hardware as the Nexus 7, was not producing Discourse topic load times anything close to 7 seconds. It wasn’t fast, exactly – like I said the Tegra 3 hardware that the N7 and Surface RT are based on is not exactly speedy – but it was much faster than the N7 in Discourse. Same hardware, different OS, different browser. I also tested with Firefox for Android and got the same exact numbers, even though (as far as I can tell) it is a completely different HTML and JS rendering engine.

This points toward Android as the problem.

Anecdotally, here are some rough load times observed on large, full-chunk (20 post) BBS topics on different devices:

  • Nexus 7 (both) – 2 sec load, 5 sec posts
  • iPhone 5 – 1 sec load, 2/3 sec posts
  • iPad 4 – 1 sec load, 2 sec posts
  • Surface Pro – instant load, 1 sec posts

Video demo of iPhone 5 vs Nexus 5 on the same Discourse pages:

The goal is for all devices to get to the Surface Pro (Intel i5) speeds over time, and I am confident that will happen in the next few years. However, if new Android devices are released and there’s some crazy bottleneck we don’t understand that prevents a JavaScript app like Discourse from scaling on new Android devices, that’s… a serious concern.

So @eviltrout spent the last few days looking in depth at this. The good news is he was able to increase N7 load times from 7 sec to 3-4 sec. But we have yet to find a “smoking gun” on Android that shows why, apparently, JavaScript performance is scaling so poorly to new devices.

I’ll let him answer with details and pointers to the specific GitHub checkins.

We’ve heard that other devs have noticed poor JavaScript perf on Android devices – does anyone have any specifics?

Ember Performance 1.7.1 -> 1.8.1
Do your users ever complain about the speed of Discourse?
Discourse as Android Application
Reporting new Slow Queries / Performance
Big Picture Features Poll
More visual feedback immediately after topnav clicks
(Michael Scott Shappe) #2

I find myself wondering if Google has done all it should to optimize their Javascript engine for ARM processors, expecting perhaps that most web-sites with Javascript-intensive web UIs would stress using apps instead on mobile devices.

This is pure speculation, but I know ARM is a sufficiently different architecture from x86 and x86-64 that there’s no guarantee that performance optimizations made for one platform would work on another.

(Jeff Atwood) #3

Perhaps, but that would imply iOS perf should be bad, when we see excellent Mobile Safari perf on iPhone / iPad. And as previously mentioned, nearly perfect Discourse performance scaling from the twice-as-fast iPad 4 over the iPad 3.

I feel like the N7 and N10 and Chrome for Android have been out long enough that early ARM performance hiccups should be resolved by now!

(Michael Scott Shappe) #4

Does Mobile Safari use Google’s engine for Javascript? If not, Apple may have spent more time on it than Google has.

There’s Chrome for iOS? Does it suffer the same degradation? (Actually…I’ll check that myself in a second…)

(Brentley Jones) #5

All browsers on iOS use Safari for rendering, so it won’t matter.

Edit: I stand corrected, below.

(Jeff Atwood) #6

Chrome for iOS uses the non-native JavaScript engine, see Daring Fireball: Why the Nitro JavaScript Engine Isn’t Available to Apps Outside Mobile Safari in iOS 4.3

The Nitro JavaScript engine is only available within Mobile Safari. Outside Mobile Safari — whether in App Store apps using the UIWebView control, or in true web apps that have been saved to the home screen — apps get iOS’s older JavaScript engine.

Some of you may or may not remember when Mobile Safari was revamped not that long ago, and with it came a rebuild of the underlying code around the Nitro JavaScript engine that drastically increased performance and provided the user with a browsing experience that contained fewer lags and pauses while visiting their favorite websites. In a rather bitter sweet twist, Apple unfortunately doesn’t allow apps built around the UIWebView to take advantage of this newer engine and same goes for all of the default third-party apps.

(Michael Scott Shappe) #7

Nevertheless, Chrome for iOS on an iPad 3 loads meta.discourse.org pretty quickly. bbs.boingboing.net is a little chunkier to load but seemed the same in both Chrome and Safari (iOS7 Beta 4).

It was definitely slower to load on my Shiny New Nexus 7, though.

(Robin Ward) #8

Ember Patches

A few of the improvements I’ve made were technical changes to Ember.js. I’ll be creating detailed pull requests for each one to see if the core team finds them sane and is willing to integrate them. The good news is we’ve been running most of them for a few days with no major reported errors. Discourse is quite a large Ember app, so if we’re working smoothly there’s a good chance the patches are safe. Here’s a short description of each:

  1. Disabling the W3C Range API — This is used by by the metamorph part of ember to replace content when it changes. The metamorph code will use the new API if it exists in your browser, and if not it has a polyfill. In my benchmarks on Desktop + Android, the polyfilled version was much faster than the native browser version, which is surprising. So I’ve updated the code to only use the polyfill.

  2. Reducing view observers — Ember uses observers on an element property of views. Creating these observers adds up if you have many views and they are only really used for clearing out memory references to DOM nodes when elements are removed. I have a patch that removes the observers, and clears out the memory references in a different way.

  3. Slow constructors — This is a controversial and surprising one. Basically, creating objects in Ember can be slow due to a call to finishChains after m.proto is deleted. I have a little hack here that checks if chains is defined before the proto is deleted. In my benchmarks it adds up to quite a few ms.

Discourse Changes

One performance optimization you can make in an Ember app is the use of the group helper. The idea is to reduce bindings by re-rendering the view they belong to. We used this in a few places on Discourse but not everywhere.

In the latest Discourse code, we removed the group helper, but use the same internal concepts it uses for view performance. We have a new class, Discourse.GroupView that groups everything inside it. In many cases replacing a regular Ember.View with Discourse.GroupView just improves performance, but it’s not something you necessarily want to apply globally.

I’ve also introduced a new handlebars helper, {{#groupedEach}} which applies grouping to the each structure as well as all the views inside it. It is similar to the GroupedEach helper in Ember, but I’ve found a few odd bugs and weirdnesses there so I created Discourse.GroupedEach to try and tweak it to our needs. If it works this will likely be submitted as another PR to ember as well :smile:

(Jeff Atwood) #9

And the good news is that most, if not all of these changes, improve performance on all platforms, not just Android/Chrome, yes?

(Robin Ward) #10

Yes, I sadly have not identified a fix that seems Android specific. All of the above improve the performance on all platforms tested so far!

(Michael Scott Shappe) #11

Hm. Maybe something different about the way Chrome has to do memory management on Android? I would still expect that to be less of a bottleneck on the N7-2013 (twice as much RAM as well as the faster CPU and GPU), though.

I think that’s easily the most puzzling part of this – not that it’s slower, but that it’s not at all faster on a device that is otherwise much snappier than its predecessor.

(Jeff Atwood) #12

Yes, that is why I am so concerned, exactly.

(Michael Scott Shappe) #13

Would it make any sense for Chrome for Android to be throttling or yielding itself? If so, maybe it’s doing so based on “real” time rather than anything relative to processor speed…

OK, I’m just speculating wildly now, I’ll admit…

(Adam Baxter) #18

Were you running Android 4.3 with these tests? It’ll be interesting to see how much faster that makes things, although the userbase is going to be very small for quite some time.

(Jeff Atwood) #19

The 2013 nexus 7 is by definition a 4.3 device but there is no significant perf difference between Android 4.2 and 4.3 in any of the benchmarks I have seen.

(Ben T) #20

Well, the stock browser got discontinued and is being phased out for just chrome, so OS updates would only change the underlying system preformance. But, one of the cool things with chrome is you can inspect elements over the debugger, and access the profiling tools to check rendering times and other details. I see render times of ~4000ms to 8000ms max in my older phone with the first calls to ember’s rendering taking the longest (like the first post being rendered.) I’ll try to get some more details soon.

(Lee_Ars) #21

Nothing really salient to add, but I did just load up both my forum and Meta on my Nexus 7 (one of the press demo versions they handed out at the “Breakfast with Sundar” event), and yes, both forums were dog-slow to load. Clicking around through topics was a very laggy affair.

chrome://version responds with the following on the tablet:

Application: Chrome 28.0.1500.94 1500094
OS: Android 4.3.0; Nexus 7 Build/JSS15J
Blink: 537.36 (@154319)
JavaScript: V8

(Ben T) #22

The longest render times are usually the rendering of posts, at around 50-60ms for average length posts, and 100-200ms for more format heavy posts. The main problem points seem to be how long the scripts take executing, coupled with rendering the HTML then style of the group of posts. I’m sure that being run against a non compiled build would make what takes the longest more obvious.

This is my phone, for comparison. I’ve enabled a couple of the extra development flags relating to overflow scrolling and the new HTTP cache, along with it being the beta build. I see load times of about 3 seconds to meta, and about 4 seconds to this post.

Application:    Chrome Beta 29.0.1547.40 1547040
OS:             Android 4.2.2; Galaxy Nexus Build/JDQ39E
Blink:          537.36 (@155197)
JavaScript:     V8

(Michael Scott Shappe) #23

In the hopes it will be useful to someone (and also because I was curious how it worked!), I just ran profiles of three Discourse pages – A discussion on BoingBoing, the main page for Meta and this discussion.

Where would be a good place to upload/share these for someone to take a look?

Edit to clarify: These were run in Chrome for Android using remote debugging to capture Javascript profiling data.

A cursory glance at all three, plus running a profile of this discussion in a laptop Chrome window, shows that ‘get offsetWidth’ comes up consistently near the top of the hit list, as does ‘e’ and ‘groupCollapsed’.

Loading this discussion on Chrome for Android, for example, ‘get offsetWidth’ took 337ms Self and Total while ‘e’ took 239ms ‘Self’ and 698ms ‘Total’ and ‘groupCollapsed’ took 224ms Self and Total.

(Adam Baxter) #24

Are these screenshots or HTML pages?