High number of 404's / 4xx (client error)

(Dean Taylor) #1

Noting a high number of 404’s since updating to a version with traffic stats.

Is there any way to view a list of 4xx’s to diagnose any potential problem?

(Sam Saffron) #2

you can look at your nginx logs / rails logs to see the exact errors.

(Dean Taylor) #3

cat access.log.1 | cut -d '"' -f3 | cut -d ' ' -f2 | sort | uniq -c | sort -r
      8 404
   5358 200
      4 403
      2 302
      2 204
      1 502
    148 499
      1 201
    116 304

Can’t see the correlation in the nginx logs.

@sam Do you have a command I can run on the rails logs to get the numbers out?

(Dean Taylor) #4

I tried this:

grep -o -E "Completed 4[0-9]{2} " production.log|wc -l

Only low 4 - quickly looking through - it looks right as a count from the grep.


(Apparently Archetype) #5

hmm… that’s a nonstandard httpstatus code used by nginx to mean “Client has closed the connection”

which at a SWAG, would be users navigating your site and doing so faster than all page elements load, causing the clients to hangup connections for incompletely loaded elements the client no longer wants, which in turn is logged as 499 by nginx.

(Dean Taylor) #6

Sadly 148 is nowhere near 1417.

Here are some updated numbers:

cat access.log.1 | cut -d '"' -f3 | cut -d ' ' -f2 | sort | uniq -c | sort -r
      9 403
    777 304
    510 499
     28 404
     26 302
      2 204
  16626 200
      1 502
      1 400
      1 201

(Kane York) #7

Message bus connections are terminated when you navigate topics. Perhaps that’s the source of most of the '499’s?

(Jeff Atwood) #8

@sam we should stop tracking 499 if that is the case.

(Kane York) #9

But no, that’s the nginx logs. The web server never actually generates a 499.

(Sam Saffron) #10

I will, that should just go in the background bucket anyway

EDIT: Changed it so all message bus client error go to the background bucket.

(Dean Taylor) #11

New day, new stats.

Since the original report there have been some code changes as noted by @sam.

Here are updated stats report based a new fresh day.

####Access Logs
Note the new command line explicitly using the date to make sure another dates is not included.

It wasn’t last time as I manually checked the logs, but time time on checking the logs there was a mix of two days.

cat access.log.1 | grep '13/Feb/2015' | cut -d '"' -f3 | cut -d ' ' -f2 | sort | uniq -c | sort -r
   8433 200
      6 403
     38 404
     21 302
    212 304
    201 499
      1 400
      1 301

####Dashboard Screen Grab
Note that I’m not including yesterdays numbers as the code change was was done on that day and the numbers are obviously going to be wrong.

This assumes that the date time stamps in the access logs match the times used by Discourse and that there is no time date zone offsets going on.

I’m thinking this might be a likely cause of discrepancies if not carefully considered.

####Discourse Version

Discourse 1.2.0.beta6 - https://github.com/discourse/discourse version 5f8e604abc4a99df267b2d4e6544678040ab1ea6

(system) #12

(Sam Saffron) #13

Where are we now on this any point keeping open?