Code-level performance testing

I’ve been looking at the performance of the ActivityPub plugin recently and considering the best ways to reliably test, and prove, performance for the purpose of such a project. Here’s a few initial pieces of context:

  • The plugin is an open source project with multiple parties involved (i.e. Discourse, who owns the plugin, and Pavilion who is currently building it).

  • Different parties may have different internal performance testing tools / systems.

  • Multi-party open source projects benefit from commonly available methods of testing, and proving, that something works, or in this case performs, reliably.

  • Discourse (laudably) cares about performance.

  • Currently, the only commonly available method of testing / proving server-side performance in Discourse (that I’m aware of) is track_sql_queries, which is typically (but not exclusively) used in request tests.

  • While the query count is one indicator of performance, it’s not the only one (some queries are bigger than others).

For a recent example of the use of track_sql_queries see

If you’re familiar with this area, you probably know something like

The rule of thumb is that unit tests need speed and performance tests need time.

(quote from this decent explainer)

Which, on the face of it, can make performance testing (beyond query counting) somewhat hard to integrate into an rspec (or similar) suite. That said, some people try

I’m curious what other practical methods, suggestions or ideas folks may have to add more commonly available performance testing, and performance proofs, to the Discourse ecosystem. Or if there are methods or approaches I haven’t mentioned here. I would emphasise the words “practical” and “commonly available” there.

One thought that occurs is that it could be possible to use MiniProfiler in a spec, i.e. something like Rack::MiniProfiler.profile_singleton_method. But I have neither tried that, or know whether that’d be a good idea.

7 Likes

My general recommendation is to avoid performance testing in specs.

We have some examples where we try to monitor for N+1s in a spec, but they all tend to be pretty fragile.

Its a very very tough problem with no obvious solution, all solutions come with compromises so we generally avoid this and just monitor production for this kind of stuff.

1 Like