Welcome! Log In Create A New Profile

Advanced

Monitoring http returns

Posted by Jeff Abrahamson 
Jeff Abrahamson
Monitoring http returns
April 11, 2018 06:30AM
I want to monitor nginx better: http returns (e.g., how many 500's, how
many 404's, how many 200's, etc.), as well as request rates, response
times, etc.  All the solutions I've found start with "set up something
to watch and parse your logs, then ..."

Here's one of the better examples of that:

https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide

Perhaps I'm wrong to find this curious.  It seems somewhat heavy and
inefficient to put this functionality into log watching, which means
another service and being sensitive to an eventual change in log format.

Is this, indeed, the recommended solution?

And, for my better understanding, can anyone explain why this makes more
sense than native nginx support of sending UDP packets to a monitor
collector (in our case, telegraf)?

--

Jeff Abrahamson
+33 6 24 40 01 57
+44 7920 594 255

http://p27.eu/jeff/

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Frank Liu
Re: Monitoring http returns
April 11, 2018 07:00AM
This module can get you started:
https://github.com/gfrankliu/nginx-http-reqstat

On Tue, Apr 10, 2018 at 9:19 PM, Jeff Abrahamson <[email protected]> wrote:

> I want to monitor nginx better: http returns (e.g., how many 500's, how
> many 404's, how many 200's, etc.), as well as request rates, response
> times, etc. All the solutions I've found start with "set up something to
> watch and parse your logs, then ..."
>
> Here's one of the better examples of that:
>
> https://www.scalyr.com/community/guides/how-to-
> monitor-nginx-the-essential-guide
>
> Perhaps I'm wrong to find this curious. It seems somewhat heavy and
> inefficient to put this functionality into log watching, which means
> another service and being sensitive to an eventual change in log format.
>
> Is this, indeed, the recommended solution?
>
> And, for my better understanding, can anyone explain why this makes more
> sense than native nginx support of sending UDP packets to a monitor
> collector (in our case, telegraf)?
>
> --
>
> Jeff Abrahamson
> +33 6 24 40 01 57
> +44 7920 594 255
> http://p27.eu/jeff/
>
>
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Peter Booth
Re: Monitoring http returns
April 11, 2018 07:20AM
Jeff,

There are some very good reasons for doing things in what sounds like a heavy inefficient manner.

The first point is that there are some big differences between application code/business logic and monitoring code:

Business logic, or what your nginx instance is doing is what makes you money. Maximizing uptime is critical.
Monitoring code typically has a different release cycle, often it will be deployed in a tactical reactive fashion.
By decoupling the monitoring from the application logic you protect against the risk that your monitoring code
break your application, which would be a Bad Thing, The converse point is that your monitoring software is
most valuable when your application is failing, or is overloaded. That's why it's good thing if your monitoring
code doesn’t depend upon the health of your plant’s infrastructure.

One example of a product that is in some ways comparable to nginx that did things the other way was the
early versions of IBM’s Websphere application server. Version 2 persisted all configuration settings as EJBs.
That meant that their was no way to view a web sphere instance's configuration when the app server
wasn’t running. The product’s designer’s were so hungry to drink their EJB Kool-Aid they didnt stop to ask
“Is this smart?” This why, back in 1998 one could watch an IBM professional services consultant spend weeks
installing a websphere instance or you could download and install Weblogic server in 15 minutes yourself.

tailing a log file doesnt sound sexy, but its also pretty hard to mess it up. I monitored a high traffic email site with a
very short Ruby script that would tail an nginx log, pushing messages ten at a time as UDP datagrams to an influxdb.
The script would do its thing for 15 mins then die. cron ensured a new instance started every 15 minutes. It was
more efficient than a shell script because it didn't start new processes in a pipeline.

I like the scalar guide but I disagree with their advice on active monitoring I think its smarter to use real user
requests to test if servers are up. i have seen many high profile sites that end up serving more synthetic requests
than real customer initiated requests.


> On 11 Apr 2018, at 12:19 AM, Jeff Abrahamson <[email protected]> wrote:
>
> I want to monitor nginx better: http returns (e.g., how many 500's, how many 404's, how many 200's, etc.), as well as request rates, response times, etc. All the solutions I've found start with "set up something to watch and parse your logs, then ..."
>
> Here's one of the better examples of that:
>
> https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide
> Perhaps I'm wrong to find this curious. It seems somewhat heavy and inefficient to put this functionality into log watching, which means another service and being sensitive to an eventual change in log format.
>
> Is this, indeed, the recommended solution?
>
> And, for my better understanding, can anyone explain why this makes more sense than native nginx support of sending UDP packets to a monitor collector (in our case, telegraf)?
> --
>
> Jeff Abrahamson
> +33 6 24 40 01 57
> +44 7920 594 255
>
> http://p27.eu/jeff/ http://p27.eu/jeff/
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Jeff Abrahamson
Re: Monitoring http returns
April 11, 2018 08:10AM
On Wed, Apr 11, 2018 at 01:17:14AM -0400, Peter Booth wrote:
> There are some very good reasons for doing things in what sounds
> like a heavy inefficient manner.

I suspected, thanks for the explanations.


> The first point is that there are some big differences between
> application code /business logic and monitoring code:
>
> [...]

good summary, I agree with you.


> tailing a log file doesnt sound sexy, but its also pretty hard to
> mess it up. I monitored a high traffic email site with a very short
> Ruby script that would tail an nginx log, pushing messages ten at a
> time as UDP datagrams to an influxdb. The script would do its thing
> for 15 mins then die. cron ensured a new instance started every 15
> minutes. It was more efficient than a shell script because it didn't
> start new processes in a pipeline.

It's hard to mess up as long as you're not interested in
exactly-once. ;-)

The tail solution has the particularity that (1) it could miss things
if the short gap between process death and process start sees more
events than tail catches at startup or if the log file rotates a few
seconds into that 15 minute period, and (2) it could duplicate things
in case of very few events in that period. Now, with telegraf/influx,
duplicates aren't a concern, because influx keys on time, and our site
is probably not getting so much traffic that a tail restart is a big
deal, although log rotation could lead to gaps we don't like.

Of course, this is why Logwatch was written...


> I like the scalar guide but I disagree with their advice on active
> monitoring I think its smarter to use real user requests to test if
> servers are up. i have seen many high profile sites that end up
> serving more synthetic requests than real customer initiated
> requests.

I'm not sure I understood what you mean by "active monitoring". I've
understood "sending http queries to see if they are handled properly".

In that context: I think both submitting queries (from outside one's
own network) and passively watching stats on the service itself are
essential. Passively watching stats gives me information on internal
state, useful in itself but also when debugging problems. Active
monitoring from a different network can alert me to problems that may
not be specific to any one service, maybe even are at the network
level.

Of course, yes, active monitoring shouldn't be trying to DoS my
service. ;-)

Jeff Abrahamson
https://www.p27.eu/jeff/


> On 11 Apr 2018, at 12:19 AM, Jeff Abrahamson <[email protected]> wrote:
>
> I want to monitor nginx better: http returns (e.g., how many
> 500's, how many 404's, how many 200's, etc.), as well as request
> rates, response times, etc. All the solutions I've found start
> with "set up something to watch and parse your logs, then ..."
>
> Here's one of the better examples of that:
>
> https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide
>
> Perhaps I'm wrong to find this curious. It seems somewhat heavy
> and inefficient to put this functionality into log watching,
> which means another service and being sensitive to an eventual
> change in log format.
>
> Is this, indeed, the recommended solution?
>
> And, for my better understanding, can anyone explain why this
> makes more sense than native nginx support of sending UDP
> packets to a monitor collector (in our case, telegraf)?
>
> --
>
> Jeff Abrahamson
> +33 6 24 40 01 57
> +44 7920 594 255
>
> http://p27.eu/jeff/
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Peter Booth
Re: Monitoring http returns
April 12, 2018 03:10AM
So under the covers things are rarely as pretty as one hopes. In the example quoted the influxdb instance was actually a pool of different pre 1.0 instances- each of which had different bugs or fixes. The log script actually pushed 15:30 worth of data to intentionally overlap.

The most surprising observation was that substantially more than 50% of the web traffic was from bots, scrapers, test tools and other nonhuman user agents (over 300 different signatures). If you accept as a given that sometimes there will be an overload situation where users will abandon carts you then have to ask “how much cash are we leaving on the table because of these nonhuman requests (which included more than a dozen different flavors of active testing)?”

There’s a human psychology element to this issue. People don’t find it easy to think probabilistically and accepting the inevitability of overload requires a certain amount of bravery that not all techies can muster. It’s easier to act like a Dilbert character and say “anything less than 100% uptime is unacceptable”

Regarding active testing, if we have a shopper who is connecting via FIOS from their home in Minnesota and experiencing acceptable performance what more do we know from a Gomez, Pingdom, Keynote request that originated from a data center in Minnesota? At least one of these three were colocated on the VLANs as a large CDN vendor. The good news that the test took teported was invariably more positive than real customer experiences- hence the big surge in interest in RUM.

The challenge in a large web site is the vast number of parties who have a vested interest in the site being up- and each of them figured “request a page a minute is no big deal.” But the aggregate picture was ugly. Bad site structure will cause google and bing and other search engines to scrape in a pathological manner


Sent from my iPhone

> On Apr 11, 2018, at 2:04 AM, Jeff Abrahamson <[email protected]> wrote:
>
>> On Wed, Apr 11, 2018 at 01:17:14AM -0400, Peter Booth wrote:
>> There are some very good reasons for doing things in what sounds
>> like a heavy inefficient manner.
>
> I suspected, thanks for the explanations.
>
>
>> The first point is that there are some big differences between
>> application code /business logic and monitoring code:
>>
>> [...]
>
> good summary, I agree with you.
>
>
>> tailing a log file doesnt sound sexy, but its also pretty hard to
>> mess it up. I monitored a high traffic email site with a very short
>> Ruby script that would tail an nginx log, pushing messages ten at a
>> time as UDP datagrams to an influxdb. The script would do its thing
>> for 15 mins then die. cron ensured a new instance started every 15
>> minutes. It was more efficient than a shell script because it didn't
>> start new processes in a pipeline.
>
> It's hard to mess up as long as you're not interested in
> exactly-once. ;-)
>
> The tail solution has the particularity that (1) it could miss things
> if the short gap between process death and process start sees more
> events than tail catches at startup or if the log file rotates a few
> seconds into that 15 minute period, and (2) it could duplicate things
> in case of very few events in that period. Now, with telegraf/influx,
> duplicates aren't a concern, because influx keys on time, and our site
> is probably not getting so much traffic that a tail restart is a big
> deal, although log rotation could lead to gaps we don't like.
>
> Of course, this is why Logwatch was written...
>
>
>> I like the scalar guide but I disagree with their advice on active
>> monitoring I think its smarter to use real user requests to test if
>> servers are up. i have seen many high profile sites that end up
>> serving more synthetic requests than real customer initiated
>> requests.
>
> I'm not sure I understood what you mean by "active monitoring". I've
> understood "sending http queries to see if they are handled properly".
>
> In that context: I think both submitting queries (from outside one's
> own network) and passively watching stats on the service itself are
> essential. Passively watching stats gives me information on internal
> state, useful in itself but also when debugging problems. Active
> monitoring from a different network can alert me to problems that may
> not be specific to any one service, maybe even are at the network
> level.
>
> Of course, yes, active monitoring shouldn't be trying to DoS my
> service. ;-)
>
> Jeff Abrahamson
> https://www.p27.eu/jeff/
>
>
>> On 11 Apr 2018, at 12:19 AM, Jeff Abrahamson <[email protected]> wrote:
>>
>> I want to monitor nginx better: http returns (e.g., how many
>> 500's, how many 404's, how many 200's, etc.), as well as request
>> rates, response times, etc. All the solutions I've found start
>> with "set up something to watch and parse your logs, then ..."
>>
>> Here's one of the better examples of that:
>>
>> https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide
>>
>> Perhaps I'm wrong to find this curious. It seems somewhat heavy
>> and inefficient to put this functionality into log watching,
>> which means another service and being sensitive to an eventual
>> change in log format.
>>
>> Is this, indeed, the recommended solution?
>>
>> And, for my better understanding, can anyone explain why this
>> makes more sense than native nginx support of sending UDP
>> packets to a monitor collector (in our case, telegraf)?
>>
>> --
>>
>> Jeff Abrahamson
>> +33 6 24 40 01 57
>> +44 7920 594 255
>>
>> http://p27.eu/jeff/
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Peter Booth
Re: Monitoring http returns
April 12, 2018 03:10AM
Just to be clear, I’m not contrasting active synthetic testing with monitoring resource consumption. I think that the highest value variable is $, or those variables that have highest correlation to profit. The real customer experience is probably #2 after sales. Monitoring things like active connections, cache hit ratios etc is important to understand “what is normal?” It’s easy for our mental model of how a site works to differ markedly from reality.

Sent from my iPhone

> On Apr 11, 2018, at 2:04 AM, Jeff Abrahamson <[email protected]> wrote:
>
>> On Wed, Apr 11, 2018 at 01:17:14AM -0400, Peter Booth wrote:
>> There are some very good reasons for doing things in what sounds
>> like a heavy inefficient manner.
>
> I suspected, thanks for the explanations.
>
>
>> The first point is that there are some big differences between
>> application code /business logic and monitoring code:
>>
>> [...]
>
> good summary, I agree with you.
>
>
>> tailing a log file doesnt sound sexy, but its also pretty hard to
>> mess it up. I monitored a high traffic email site with a very short
>> Ruby script that would tail an nginx log, pushing messages ten at a
>> time as UDP datagrams to an influxdb. The script would do its thing
>> for 15 mins then die. cron ensured a new instance started every 15
>> minutes. It was more efficient than a shell script because it didn't
>> start new processes in a pipeline.
>
> It's hard to mess up as long as you're not interested in
> exactly-once. ;-)
>
> The tail solution has the particularity that (1) it could miss things
> if the short gap between process death and process start sees more
> events than tail catches at startup or if the log file rotates a few
> seconds into that 15 minute period, and (2) it could duplicate things
> in case of very few events in that period. Now, with telegraf/influx,
> duplicates aren't a concern, because influx keys on time, and our site
> is probably not getting so much traffic that a tail restart is a big
> deal, although log rotation could lead to gaps we don't like.
>
> Of course, this is why Logwatch was written...
>
>
>> I like the scalar guide but I disagree with their advice on active
>> monitoring I think its smarter to use real user requests to test if
>> servers are up. i have seen many high profile sites that end up
>> serving more synthetic requests than real customer initiated
>> requests.
>
> I'm not sure I understood what you mean by "active monitoring". I've
> understood "sending http queries to see if they are handled properly".
>
> In that context: I think both submitting queries (from outside one's
> own network) and passively watching stats on the service itself are
> essential. Passively watching stats gives me information on internal
> state, useful in itself but also when debugging problems. Active
> monitoring from a different network can alert me to problems that may
> not be specific to any one service, maybe even are at the network
> level.
>
> Of course, yes, active monitoring shouldn't be trying to DoS my
> service. ;-)
>
> Jeff Abrahamson
> https://www.p27.eu/jeff/
>
>
>> On 11 Apr 2018, at 12:19 AM, Jeff Abrahamson <[email protected]> wrote:
>>
>> I want to monitor nginx better: http returns (e.g., how many
>> 500's, how many 404's, how many 200's, etc.), as well as request
>> rates, response times, etc. All the solutions I've found start
>> with "set up something to watch and parse your logs, then ..."
>>
>> Here's one of the better examples of that:
>>
>> https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide
>>
>> Perhaps I'm wrong to find this curious. It seems somewhat heavy
>> and inefficient to put this functionality into log watching,
>> which means another service and being sensitive to an eventual
>> change in log format.
>>
>> Is this, indeed, the recommended solution?
>>
>> And, for my better understanding, can anyone explain why this
>> makes more sense than native nginx support of sending UDP
>> packets to a monitor collector (in our case, telegraf)?
>>
>> --
>>
>> Jeff Abrahamson
>> +33 6 24 40 01 57
>> +44 7920 594 255
>>
>> http://p27.eu/jeff/
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
oscaretu
Re: Monitoring http returns
April 13, 2018 06:20PM
Perhaps this can be useful for you:
https://github.com/Lax/nginx-http-accounting-module

Kind regards,
Oscar

On Wed, Apr 11, 2018 at 6:19 AM, Jeff Abrahamson <[email protected]> wrote:

> I want to monitor nginx better: http returns (e.g., how many 500's, how
> many 404's, how many 200's, etc.), as well as request rates, response
> times, etc. All the solutions I've found start with "set up something to
> watch and parse your logs, then ..."
>
> Here's one of the better examples of that:
>
> https://www.scalyr.com/community/guides/how-to-
> monitor-nginx-the-essential-guide
>
> Perhaps I'm wrong to find this curious. It seems somewhat heavy and
> inefficient to put this functionality into log watching, which means
> another service and being sensitive to an eventual change in log format.
>
> Is this, indeed, the recommended solution?
>
> And, for my better understanding, can anyone explain why this makes more
> sense than native nginx support of sending UDP packets to a monitor
> collector (in our case, telegraf)?
>
> --
>
> Jeff Abrahamson
> +33 6 24 40 01 57
> +44 7920 594 255
> http://p27.eu/jeff/
>
>
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>



--
Oscar Fernandez Sierra
oscaretu@gmail.com
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Liu Lantao
Re: Monitoring http returns
April 26, 2018 02:30PM
Author here :) Thanks Oscar.

With `accounting` module, the metrics such as status codes, rates, and
response time are logged,
you can let it write to a local file, or (by default) via syslog to forward
them to remote host/app.

Another way is use ELK stack, document here:
https://translate.google.com/translate?u=http%3A%2F%2Fchenlinux.com%2F2014%2F02%2F19%2Fngx-accounting-to-logstash%2F


2018-04-14 0:12 GMT+08:00 oscaretu <[email protected]>:

> Perhaps this can be useful for you: https://github.com/Lax/nginx-
> http-accounting-module
>
> Kind regards,
> Oscar
>
> On Wed, Apr 11, 2018 at 6:19 AM, Jeff Abrahamson <[email protected]> wrote:
>
>> I want to monitor nginx better: http returns (e.g., how many 500's, how
>> many 404's, how many 200's, etc.), as well as request rates, response
>> times, etc. All the solutions I've found start with "set up something to
>> watch and parse your logs, then ..."
>>
>> Here's one of the better examples of that:
>>
>> https://www.scalyr.com/community/guides/how-to-monitor-
>> nginx-the-essential-guide
>>
>> Perhaps I'm wrong to find this curious. It seems somewhat heavy and
>> inefficient to put this functionality into log watching, which means
>> another service and being sensitive to an eventual change in log format.
>>
>> Is this, indeed, the recommended solution?
>>
>> And, for my better understanding, can anyone explain why this makes more
>> sense than native nginx support of sending UDP packets to a monitor
>> collector (in our case, telegraf)?
>>
>> --
>>
>> Jeff Abrahamson
>> +33 6 24 40 01 57
>> +44 7920 594 255
>> http://p27.eu/jeff/
>>
>>
>> _______________________________________________
>> nginx mailing list
>> nginx@nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx
>>
>
>
>
> --
> Oscar Fernandez Sierra
> oscaretu@gmail.com
>
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>



--
Liu Lantao
EMAIL: liulantao ( at ) gmail ( dot ) com
WEBSITE: http://blog.liulantao.com
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Sorry, only registered users may post in this forum.

Click here to login