Optimizing FCP/LCP by caching raw-view module lookups

I’m trying to improve First Contentful Paint (FCP) and Largest Contentful Paint (LCP) with these two PRs:

I’m really interested in the actual impact of these changes – so please have a try and give some feedback.

And of course any help for testing, refactoring and test coverage is more than welcome.

11 Likes

The first one looks like an easy optimisation that I think makes sense, @david and @eviltrout spent time optimising this area so I will be very curious to see what they think about it.

The second feels a bit more fragile long term, totally get the desire to optimise but it worries me a bit cause it will be an area we need to maintain.

5 Likes

Hi @rrit - thanks for the PRs. The first one sounds like a good improvement. Were you able to measure the performance impact? How much time does it save?

As @sam said, the maintainability of the second one is a little concerning. It looks like it’s a copy/paste from the Ember source code? Did you change something in order to improve performance?

4 Likes

lookupView-patch

Implemented now via Map instead of Array.

Time spent on application boot in lookupView - for development instance:

Time saved: ~115 ms

This decreases the time spent inside appendOutletView from 1.083 ms to 946 ms - for development instance.


patch on raw handlebar helpers

Yes, it’s actually a copy-paste with one change: use an inexpensive check for isPath.

      // replaces @ember/-internals/utils isPath
      // @see: https://github.com/emberjs/ember.js/blob/3537670c14883346e11e841fcb71333384fcbc87/packages/%40ember/-internals/metal/lib/path_cache.ts#L5-L7
      // @see: https://github.com/emberjs/ember.js/blob/255a0dd3c7de1187f4a2f61a97cf78bfff8f66a8/packages/%40ember/-internals/glimmer/lib/utils/bindings.ts#L70
      let isPath = context.indexOf('.') > -1;

E.g. renderTopicListItem ultimately triggers many calls of _getPath (another 50-100 ms to save):
Firefox Profiler (callstack filtered to _getPath inside renderTopicListItem)

Maybe the expensive calls to _getPath is something to be optimized in Ember.js and not in Discourse.


And check out Firefox Profiler to get some inside on JavaScript execution:

4 Likes

Thanks for the patches. The second one does seem a little brittle.

Are your benchmarks running in development mode or production mode? Ember has quite a different profile in both.

2 Likes

@david found an excellent way to fix this issue - see his comment on github.

The time for calls to renderTopicListItem on webpage ‘latest’ drops from 348 ms to 201 ms in an Ember ‘production’-build.

The earlier benchmarks were still running in development mode.


How can I run benchmarks in Ember.js production mode?

# Start ember in production mode
d/ember-cli server --environment="production"
2 Likes

Unfortunately I haven’t been able to replicate this large speed increase. On Firefox and Chrome (macOS) I’m not seeing any measurable improvement. Chrome spends around 23ms in renderTopicListItem. Firefox 30ms. On an older android device (Pixel 3), I’m seeing around 108ms. The numbers don’t seem to change before/after the change.

By the way, I measured these numbers by using the performance API. I added performance.mark("rtli-start") at the start of renderTopicListItem, and then performance.measure("rtli", "rtli-start") at the end.

Then I reload the browser with dev tools closed and browser plugins disabled (dev tools and browser plugins can significantly affect rendering performance). Then after loading is complete, open dev tools and run this to sum up the measurements:

performance.getEntriesByName("rtli").reduce((v, m) => v + m.duration, 0);

We’ll definitely get this change merged - it’s clearly a better implementation. But I’m not sure whether it will give us a visible difference in render performance :thinking:

7 Likes

I can still reproduce the performance benefits using performance API in Firefox private-mode (Linux).

Testing http://localhost:4200/latest
Time spent in renderTopicListItem is down to ~190 ms from ~290 ms.

My Discourse testing-instance has many topics with many answers and many different authors - data pulled from a productive-instance. This results in lots of elements to be rendered.
Maybe this is the difference in our benchmarks?


Pre-rendering of content below-the-fold

Discourse pre-renders 30 topics on page ‘latest’. Then the content gets displayed for the first time (FCP). Above-the-fold there are only ~12 topics visible.

Same for a topic page: 20 posts pre-rendered, but max of 6 one-line-posts are visible above-the-fold.

This might be another point of optimization for FCP.

1 Like

Would you mind sharing the Firefox and OS version? The 290ms number is almost 3x slower than a 2018 android device, which is a little surprising.

That may explain some of the difference, yeah. In my case, I ran them using live data from Meta:

bin/ember-cli --environment production --proxy https://meta.discourse.org

Yeah, this is a possible improvement. However, we’ll need to be very careful that the layout and/or scrolling doesn’t jump around (e.g. if the user is refreshing the page when they’re already scrolled halfway down). The definition of ‘below the fold’ also varies based on the device/browser/theme.

3 Likes

Proxy to meta.discourse.org

Unfortunately running ember with a proxy fails for me:

d/ember-cli --environment production --proxy https://meta.discourse.org

http://localhost:4200/

Discourse Build Error

Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames:
Host: localhost. is not in the cert's altnames: DNS:*.cdck-prod-meta.discourse.cloud

http://127.0.0.1:4200/

Discourse Build Error

Error [ERR_TLS_CERT_ALTNAME_INVALID]: Hostname/IP does not match certificate's altnames:
Host: meta.discourse.org. is not in the cert's altnames: DNS:*.cdck-prod-meta.discourse.cloud

System used for benchmarks

Extracted from Firefox about:support

Name Firefox
Version 95.0.2
Build-ID 20211219102529
Distributions-ID canonical-002
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0
Betriebssystem Linux 5.10.0-0.bpo.9-amd64 #1 SMP Debian 5.10.70-1~bpo10+1 (2021-10-10)
Betriebssystem-Theme Adwaita-dark / Adwaita
Anwendungsprogrammdatei /snap/firefox/777/usr/lib/firefox/firefox
Name Firefox Developer Edition
Version 96.0b10
Build-ID 20211228195952
Update-Verzeichnis /opt/firefox-dev-autoinstall
Update-Kanal aurora
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:96.0) Gecko/20100101 Firefox/96.0
Betriebssystem Linux 5.10.0-0.bpo.9-amd64 #1 SMP Debian 5.10.70-1~bpo10+1 (2021-10-10)
Betriebssystem-Theme Adwaita-dark / Adwaita
Anwendungsprogrammdatei /opt/firefox-dev-autoinstall/firefox-bin

Extracted from Chromium chrome://system/

CHROME VERSION 90.0.4430.212 built on Debian 10.9, running on Debian 10.11
OS VERSION Linux: 5.10.0-0.bpo.9-amd64

OS version:

# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
2 Likes

The PR for the refactoring is now merged:

Thanks for raising this one @rrit - it’s a nice improvement!

5 Likes

This topic was automatically closed after 9 hours. New replies are no longer allowed.