Traffic Dashboard Stats

I just deployed some very basic web traffic stats:

A page view is defined as a HTTP request that we serve with an 200 (OK) provided

  1. It is decorated with the Discourse-Track-View header, which is injected when we move around the site
  2. It has content type of “text/html” is not an XHR request and is a GET request

This gives you a quick way to tell what kind of traffic you are seeing removing a bunch of noise

If you feel like “expanding” and getting a raw view, click on:

“Show Detailed Traffic Report” at the bottom of the dashboard, it breaks down the traffic by the HTTP response codes:

Status codes are self explanatory, background refers to requests made to the “message bus” which is responsible for live updates and “topic timings” which tracks read times.

Suggestions, Questions?

20 Likes

It’d be nice to be able to distinguish between server requests from a browser vs. requests via an API key. (It might help for watching for abuse to be able to distinguish between keys too.)

3 Likes

I think some stats would be of interest to many Admins.
Moderators not so much, eg. as a Mod do I really care when a member was last emailed?

I’m certain, given the obsession with SEO many have, the crawler stats would be of interest to many.

I think other stats could be useful to Admins too.
new topics / posts per category.
Watching vs. Muted per category.

Probably best to have it like the Admin Settings page, as I agree, it could get very crowded if everything had it’s own “tab”.

Or maybe have it like “Exports” and do an on-demand “health” report?

I have been thinking about this and totally get @codinghorror’s concern regarding information porn. Its a tough problem to tackle with many different customers.

The immediate problem we need to tackle is this:

This is a very complex problem cause Discourse has a rather “different” view of what a page view means.

If a user visits a topic and then loads more posts is that 1 page view or 2?
If the answer to the above is 1 what about crawlers that make 2 distinct non json requests?
Why not just count every web request as a page view?

The questions just pile up and answering is complex.

I deployed this system about 5 hours ago, since we have had 113 thousand background requests and a total of 176 thousand web requests served. This is not even counting static assets.

Even if we only count GET requests to topics we already ate through 5500 in the last 5 hours, so its likely we would hit 750k requests a month on topics alone which would blow the business plan just for meta.

So, besides probably needing to adjust our limits, we first need a sane and easy to explain stat.

###What do I think a page view should be?

  1. A non ajax HTTP GET request that is served successfully
  2. An ajax HTTP GET request that is “decorated” by the Ember router on route transition.
    a. When you move from topic list to topic, we count a page view
    b. When you move from the topic page to the user page we count a view
    c. When you switch filters on the user page we do not count a new page view

Once we have this out of the way we can simply define a table for “page views” and count anon/crawler/logged in and total page views.

This information should always be there on all instances and will easily allow people to do some basic capacity planning. 4, simple to explain numbers that carry through to our buy page.

###Longer term we need performance counters

People are having a really tough time figuring out if they are under-provisioned. I think that there are a bunch of stats we can add to help answer that in a “performance” table

  • How many GET req took longer than 200ms?
  • How many GET requests took longer than 1 second?
  • How many GET requests took longer than 5 seconds?
  • How many GET requests total?
  • How many server errors?

Given these 5 simple metrics people can easily compare a baseline of performance, track updates that improve performance. Additionally when / if they migrate to us they can quickly tell how much “better” stuff has become. Sure, its nowhere as comprehensive as New Relic or other solutions, but it can be enough to get a good gauge on general performance.

However, simple performance counters will take quite a while to figure out, it may be more interesting to gather “median” and “99th percentile” but we would need a new system for that.

So, the performance piece is going to have to wait for 1.3.

###General server usage

I still think there is room for a simple counter of

  • 2XX requests (successful web requests)
  • 3XX requests (redirects)
  • 4XX requests (client errors)
  • 5XX requests (server errors)
  • Background requests (split out from all above)

I find the general usage very interesting, but its not everyone’s cup of tea, I think this particular data can be an opt-in thing hidden behind a site setting. When diagnosing capacity issues we can enable it. The amount of traffic we are handling is quite staggering.

So to recap… for 1.2 we will only have 4 counters:

page views anon
page views logged in
page views crawler
page views total

Some performance dashboard should be queued for 1.3 and beyond, aggressive monitoring of all traffic will be opt in (already there so easy to add for 1.2)

Complex analysis is going to need to come in from a plugin or google analytics, so I don’t really think we need to split up API reqs from the rest of the pile right now, cause the more segmentation of data we do the harder to consume and understand the numbers become.

Thoughts?

cc @eviltrout (in particular about the page view definition - which we need to wire in to the transition spot using a magic ajax http header or something like that)

11 Likes

Here’s a custom header on the XHR:

https://github.com/discourse/discourse/commit/a852f6c56fdb991e9c4a491053bc596ec229c293

2 Likes

I revised the OP with the updated system. It feels like a great balance of power with usability. Advanced stats are hidden and opt in.

We still need to resolve an issue where visiting user page each time you change a filter a new page view is logged which @eviltrout will help me with.

We will also revise our quotas once we stabilize all the counting and have a few days of data across our sites.

3 Likes

Any chance we can get more detailed stats for crawls and errors? e.g. the actual requests made for them?

Server errors (50x) already show up in sitename/logs

(40x) errors is often just noise, like 404s and the like. Eventually we may aggregate the most egregious 404s, but not in the immediate future. Crawl stats is even more complex to analyze.

Keep in mind the logging is VERY light weight, all we are allowed to do is increment a counter, we cant really work backwards from that.

If you really want to dig in to stuff you always have the NGINX logs and rails logs.

1 Like

Good points. Thanks for the info.

I like the feature none-the-less.

We might have to delay this one a bit, it’s quite tricky to intercept this right now. I did see in a future Ember release (1.11) it will be simpler so maybe we can revisit then.

(Also, intuitively I suspect users changing filters makes up a small percentage of logged views)

1 Like

Yeah we can ignore this for now.

1 Like

I know that this has been changed to “API requests” in 1.3b, that i’m currently running.

But - it would be really nice to have a link from that text (i.e. API requests) to some sort of description that explains what exactly this means in language that a non-software engineer might be able to understand.

By the way - one other question. What is the logic behind the date sequencing in the details page on bar chart. I find it very confusing - see below:

and here:

Yes, IMHO there should be something in place to either correct wrongly formatted input or at least specify the correct format to use
i.e.
yyyy-mm-dd

1 Like

I remember reading about the shouting WPEngine’s customers did a while back because their Google Analytics metrics were way different than WPEngine’s “visits” counter.

WPEngine responded with this:
http://wpengine.com/2014/02/03/refining-pricing-gap-solution/

Here’s a description of how they count visits:
http://wpengine.com/support/count-visits/

In short, they are pricing their tiers based on “visits per month”, counting them with separate IP hits per day (to actual content, not static assets), not pageviews.
This is a really easy metric to measure and I believe it would be a fair metric for your customers.

Perhaps this is the solution you’re looking for rather than using pageviews?

1 Like

This counter is based on Ember page transitions, so every time the browser transitions to a different “route”, the next request is counted.

2 Likes

Wow this is super complicated. Check it out. From their page:

  1. When a human being first arrives on the site and loads the page, staying there for 31 seconds, that’s a visit.
  2. If that same human then clicks a link and sees another page, that’s not a new visit; that’s part of the same visit.
  3. If that same human doesn’t have cookies or Javascript enabled, still all that should count as one visit.
  4. If that same human loads the site with a different browsers, that’s still not a new visit; that’s part of the same visit.
  5. If that same human bookmarks the site, then 11 days later comes back to the site, that is a new visit.
  6. When a robot loads the site (like a Google or Bing search bot), that’s a visit, but if one robot scans 100 pages quickly, that’s one visit. (You might disagree that a robot is a “visit,” but consider that from a hosting perspective, we still had to process and serve all those pages just like it was a human being, so from a cost or scaling perspective, bots count the same as humans.)
  7. If a robot scans 20,000 pages over the course of a month, that’s not just one visit. It shouldn’t be 20,000 visits, but neither should it be 1. Something in the range of 100-1,000 visits is acceptable. This is because the same robot can come from different IP addresses.
  8. There are additional cases too where the “right thing to do” is less clear. For example, take the case of a “quick bounce.” Suppose a human clicks a link to the site, then before the site has a chance to load the human clicks “back.” Does that count as a visit? Our servers still had to render and attempt to return the page, so in that sense “yes.” But a human didn’t see the site and Google Analytics isn’t going to see that hit, so in that sense “no.” Because we need the notion of a “visit” to correspond to “the amount of computing resources required to serve traffic,” we round off in favor of saying “yes.”
Definitely good food for thought. It concludes with...

We take the number of unique IP addresses seen in a 24-hour period as the number of “visits” to the site during that period. The number of “visits” in a given month is the sum of those daily visits during that month

2 Likes

Yup … but their solution was dead simple.
Count unique IP visits per day, discard static asset hits.

EDIT I didn’t notice @codinghorror’s edit there … :wink:

1 Like

Warning! I got this reply from Jason Cohen (he’s a friend of the company)

Note on that, however: We do get blow-back from people who disagree with our definition of “visit,” to the point where we’re actively looking at changing our pricing to not depend on “visits.” So, beware!

So that simple definition of “visit” as

number of unique IP addresses seen in a 24-hour period as the number of “visits” to the site during that period

May not suffice.

1 Like

Maybe download bandwidth would be a good metric?

2 Likes

I still think page views (new term needed so we don’t confuse people with google analytics) as currently defined is good.

We do a +1 every time you hit a URL and get HTML back and a +1 each time you transition routes.

The one pain point we have with it is that bots can skew numbers a lot.

1 Like