I just deployed some very basic web traffic stats:
A page view is defined as a HTTP request that we serve with an 200 (OK) provided
It is decorated with the Discourse-Track-View header, which is injected when we move around the site
It has content type of “text/html” is not an XHR request and is a GET request
This gives you a quick way to tell what kind of traffic you are seeing removing a bunch of noise
If you feel like “expanding” and getting a raw view, click on:
“Show Detailed Traffic Report” at the bottom of the dashboard, it breaks down the traffic by the HTTP response codes:
Status codes are self explanatory, background refers to requests made to the “message bus” which is responsible for live updates and “topic timings” which tracks read times.
It’d be nice to be able to distinguish between server requests from a browser vs. requests via an API key. (It might help for watching for abuse to be able to distinguish between keys too.)
I have been thinking about this and totally get @codinghorror’s concern regarding information porn. Its a tough problem to tackle with many different customers.
This is a very complex problem cause Discourse has a rather “different” view of what a page view means.
If a user visits a topic and then loads more posts is that 1 page view or 2?
If the answer to the above is 1 what about crawlers that make 2 distinct non json requests?
Why not just count every web request as a page view?
The questions just pile up and answering is complex.
I deployed this system about 5 hours ago, since we have had 113 thousand background requests and a total of 176 thousand web requests served. This is not even counting static assets.
Even if we only count GET requests to topics we already ate through 5500 in the last 5 hours, so its likely we would hit 750k requests a month on topics alone which would blow the business plan just for meta.
So, besides probably needing to adjust our limits, we first need a sane and easy to explain stat.
###What do I think a page view should be?
A non ajax HTTP GET request that is served successfully
An ajax HTTP GET request that is “decorated” by the Ember router on route transition.
a. When you move from topic list to topic, we count a page view
b. When you move from the topic page to the user page we count a view
c. When you switch filters on the user page we do not count a new page view
Once we have this out of the way we can simply define a table for “page views” and count anon/crawler/logged in and total page views.
This information should always be there on all instances and will easily allow people to do some basic capacity planning. 4, simple to explain numbers that carry through to our buy page.
###Longer term we need performance counters
People are having a really tough time figuring out if they are under-provisioned. I think that there are a bunch of stats we can add to help answer that in a “performance” table
How many GET req took longer than 200ms?
How many GET requests took longer than 1 second?
How many GET requests took longer than 5 seconds?
How many GET requests total?
How many server errors?
Given these 5 simple metrics people can easily compare a baseline of performance, track updates that improve performance. Additionally when / if they migrate to us they can quickly tell how much “better” stuff has become. Sure, its nowhere as comprehensive as New Relic or other solutions, but it can be enough to get a good gauge on general performance.
However, simple performance counters will take quite a while to figure out, it may be more interesting to gather “median” and “99th percentile” but we would need a new system for that.
So, the performance piece is going to have to wait for 1.3.
I still think there is room for a simple counter of
2XX requests (successful web requests)
3XX requests (redirects)
4XX requests (client errors)
5XX requests (server errors)
Background requests (split out from all above)
I find the general usage very interesting, but its not everyone’s cup of tea, I think this particular data can be an opt-in thing hidden behind a site setting. When diagnosing capacity issues we can enable it. The amount of traffic we are handling is quite staggering.
So to recap… for 1.2 we will only have 4 counters:
page views anon
page views logged in
page views crawler
page views total
Some performance dashboard should be queued for 1.3 and beyond, aggressive monitoring of all traffic will be opt in (already there so easy to add for 1.2)
Complex analysis is going to need to come in from a plugin or google analytics, so I don’t really think we need to split up API reqs from the rest of the pile right now, cause the more segmentation of data we do the harder to consume and understand the numbers become.
Thoughts?
cc @eviltrout (in particular about the page view definition - which we need to wire in to the transition spot using a magic ajax http header or something like that)
Server errors (50x) already show up in sitename/logs
(40x) errors is often just noise, like 404s and the like. Eventually we may aggregate the most egregious 404s, but not in the immediate future. Crawl stats is even more complex to analyze.
Keep in mind the logging is VERY light weight, all we are allowed to do is increment a counter, we cant really work backwards from that.
If you really want to dig in to stuff you always have the NGINX logs and rails logs.
We might have to delay this one a bit, it’s quite tricky to intercept this right now. I did see in a future Ember release (1.11) it will be simpler so maybe we can revisit then.
(Also, intuitively I suspect users changing filters makes up a small percentage of logged views)
I know that this has been changed to “API requests” in 1.3b, that i’m currently running.
But - it would be really nice to have a link from that text (i.e. API requests) to some sort of description that explains what exactly this means in language that a non-software engineer might be able to understand.
By the way - one other question. What is the logic behind the date sequencing in the details page on bar chart. I find it very confusing - see below:
I remember reading about the shouting WPEngine’s customers did a while back because their Google Analytics metrics were way different than WPEngine’s “visits” counter.
In short, they are pricing their tiers based on “visits per month”, counting them with separate IP hits per day (to actual content, not static assets), not pageviews.
This is a really easy metric to measure and I believe it would be a fair metric for your customers.
Perhaps this is the solution you’re looking for rather than using pageviews?
Wow this is super complicated. Check it out. From their page:
When a human being first arrives on the site and loads the page, staying there for 31 seconds, that’s a visit.
If that same human then clicks a link and sees another page, that’s not a new visit; that’s part of the same visit.
If that same human doesn’t have cookies or Javascript enabled, still all that should count as one visit.
If that same human loads the site with a different browsers, that’s still not a new visit; that’s part of the same visit.
If that same human bookmarks the site, then 11 days later comes back to the site, that is a new visit.
When a robot loads the site (like a Google or Bing search bot), that’s a visit, but if one robot scans 100 pages quickly, that’s one visit. (You might disagree that a robot is a “visit,” but consider that from a hosting perspective, we still had to process and serve all those pages just like it was a human being, so from a cost or scaling perspective, bots count the same as humans.)
If a robot scans 20,000 pages over the course of a month, that’s not just one visit. It shouldn’t be 20,000 visits, but neither should it be 1. Something in the range of 100-1,000 visits is acceptable. This is because the same robot can come from different IP addresses.
There are additional cases too where the “right thing to do” is less clear. For example, take the case of a “quick bounce.” Suppose a human clicks a link to the site, then before the site has a chance to load the human clicks “back.” Does that count as a visit? Our servers still had to render and attempt to return the page, so in that sense “yes.” But a human didn’t see the site and Google Analytics isn’t going to see that hit, so in that sense “no.” Because we need the notion of a “visit” to correspond to “the amount of computing resources required to serve traffic,” we round off in favor of saying “yes.”
Definitely good food for thought. It concludes with...
We take the number of unique IP addresses seen in a 24-hour period as the number of “visits” to the site during that period. The number of “visits” in a given month is the sum of those daily visits during that month
Warning! I got this reply from Jason Cohen (he’s a friend of the company)
Note on that, however: We do get blow-back from people who disagree with our definition of “visit,” to the point where we’re actively looking at changing our pricing to not depend on “visits.” So, beware!
So that simple definition of “visit” as
number of unique IP addresses seen in a 24-hour period as the number of “visits” to the site during that period