Welcome! Log In Create A New Profile

Advanced

Identifying "Writing" connections in status stub

Posted by Vlad K. 
Vlad K.
Identifying "Writing" connections in status stub
July 26, 2017 12:20PM
Hello list,

I'm graphing information from the nginx status page, and have noticed
something odd. The "Writing" connections are flat over time, not
correlated to the Active/Reading/Waiting connections and are steadily
increasing over time. Example for the past week:

https://pasteboard.co/GCHKB3B.png

Where it drops, is where I've restarted (not reloaded) the service, and
starts growing up after a short while. This server FreeBSD but I've
noticed it also in Debian.

Is there a way to find out which connections are these, which remote IPs
they are so I can track them with netstat or sockstat? This looks to me
like connection FD or something has been leaking. If this is a bug, I'm
not sure what to report.

I've also noticed, from time to time, connections lingering for long
time in "CLOSED" state (as reported by netstat), googling for which
seems to suggest a bug in application, where it doesn't release the FD
after the remote has closed.


Thanks.


--
Vlad K.
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Peter Booth
Re: Identifying "Writing" connections in status stub
July 26, 2017 01:40PM
Vlad,

I'd suggest beginning by seeing whether or not this is real. If you create a cron job that invokes netstat -ant every hour, then summarize the connections and either view them manually or write them into an influxdb and graph with grafana you will see whether or not the #tcp connections really is growing and, if so, which connections are growing.

That would seem like a useful first step.

Peter

Sent from my iPhone

> On Jul 26, 2017, at 6:15 AM, Vlad K. <nginx-ml@acheronmedia.hr> wrote:
>
>
> Hello list,
>
> I'm graphing information from the nginx status page, and have noticed something odd. The "Writing" connections are flat over time, not correlated to the Active/Reading/Waiting connections and are steadily increasing over time. Example for the past week:
>
> https://pasteboard.co/GCHKB3B.png
>
> Where it drops, is where I've restarted (not reloaded) the service, and starts growing up after a short while. This server FreeBSD but I've noticed it also in Debian.
>
> Is there a way to find out which connections are these, which remote IPs they are so I can track them with netstat or sockstat? This looks to me like connection FD or something has been leaking. If this is a bug, I'm not sure what to report.
>
> I've also noticed, from time to time, connections lingering for long time in "CLOSED" state (as reported by netstat), googling for which seems to suggest a bug in application, where it doesn't release the FD after the remote has closed.
>
>
> Thanks.
>
>
> --
> Vlad K.
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
On 2017-07-26 13:36, Peter Booth wrote:
> Vlad,
>
> I'd suggest beginning by seeing whether or not this is real. If you
> create a cron job that invokes netstat -ant every hour, then summarize
> the connections and either view them manually or write them into an
> influxdb and graph with grafana you will see whether or not the #tcp
> connections really is growing and, if so, which connections are
> growing.

Thanks for the suggestion, but with it slow progression and low
signal-to-noise ration in comparison with the daily and weekly
connection cycle, I'm not sure it would be practically possible to
measure it like that. But your suggestion gave me another idea, to
record IPs of established tcp conns every 5 minutes and then see which
ones remain constant.

But are you suggesting that nginx status is reporting wrong/fake
numbers? What do you mean by "real"?

And what about connections that are staying in "CLOSED" state until I
restart or reload nginx?




--
Vlad K.
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Maxim Dounin
Re: Identifying "Writing" connections in status stub
July 30, 2017 01:50AM
Hello!

On Wed, Jul 26, 2017 at 06:05:28PM +0200, Vlad K. wrote:

> On 2017-07-26 13:36, Peter Booth wrote:
> > Vlad,
> >
> > I'd suggest beginning by seeing whether or not this is real. If you
> > create a cron job that invokes netstat -ant every hour, then summarize
> > the connections and either view them manually or write them into an
> > influxdb and graph with grafana you will see whether or not the #tcp
> > connections really is growing and, if so, which connections are
> > growing.
>
> Thanks for the suggestion, but with it slow progression and low
> signal-to-noise ration in comparison with the daily and weekly
> connection cycle, I'm not sure it would be practically possible to
> measure it like that. But your suggestion gave me another idea, to
> record IPs of established tcp conns every 5 minutes and then see which
> ones remain constant.
>
> But are you suggesting that nginx status is reporting wrong/fake
> numbers? What do you mean by "real"?
>
> And what about connections that are staying in "CLOSED" state until I
> restart or reload nginx?

Connections in "CLOSED" state are likely leaked sockets. On a
reload nginx should write appropriate alerts to error log, saying
something like "open socket #<fd> left in connection <n>". These
alerts are logged in appropriate connection numbers, and details
of a particular connection can be traced through debug log.

It might not be trivial to debug such socket leaks though, and
before doing anything else it is in general a good idea to:

- make sure you are using latest nginx version, and

- the problem is not in a 3rd party module (that is, you can
reproduce it without 3rd party modules).

--
Maxim Dounin
http://nginx.org/
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
On 2017-07-30 01:47, Maxim Dounin wrote:
>
> It might not be trivial to debug such socket leaks though, and
> before doing anything else it is in general a good idea to:
>
> - make sure you are using latest nginx version, and
>
> - the problem is not in a 3rd party module (that is, you can
> reproduce it without 3rd party modules).

It's latest stable, 1.12.1 on FreeBSD.

Unfortunately I can't remove 3rd party modules as this is production. I
have no idea what to do to try replicate that in testing.

But thanks for your reply.



--
Vlad K.
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Peter Booth
Re: Identifying "Writing" connections in status stub
July 30, 2017 11:20AM
Vlad,

You might not need to replicate it- you have it happening in production in front of you.
Some questions:

1. When is the last time that your production nginx was restarted?
2. Do you have regular restarts?
3. Is there an obstacle to restarting at some point?
4. Is this a single instance or do you have multiple nginx hosts?
5. What 3rd party models are you using?
6. Is the website in question an enterprise app or something that is internet visible?

Maxim’s hypothesis of leaking sockets from third party plugin is the simplest, most likely explanation for what you report.

I start from a position of trusting nothing. If you can you capture the output of lsof -i :80 or net stat -ant | grep TCP or a
similar ss command you can know for certain that your visualization is “telling the truth”
Certainly the line labeled “Writing” looks unusual. Do you know of any site events that might have caused the minimum on
23 July, the spike on 24th, and the step up on 25th July?

Peter




> On Jul 30, 2017, at 4:09 AM, Vlad K. <nginx-ml@acheronmedia.hr> wrote:
>
> On 2017-07-30 01:47, Maxim Dounin wrote:
>> It might not be trivial to debug such socket leaks though, and
>> before doing anything else it is in general a good idea to:
>> - make sure you are using latest nginx version, and
>> - the problem is not in a 3rd party module (that is, you can
>> reproduce it without 3rd party modules).
>
> It's latest stable, 1.12.1 on FreeBSD.
>
> Unfortunately I can't remove 3rd party modules as this is production. I have no idea what to do to try replicate that in testing.
>
> But thanks for your reply.
>
>
>
> --
> Vlad K.
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Peter Booth
Re: Identifying "Writing" connections in status stub
July 30, 2017 11:30AM
I just reread the thread and realize that you answered q2, and that makes the graph even more
surprising. You say that it son FreeBSD - does this mean that you don’t have /proc available to you?
Is there a procstat or other way to see the equivalent of /proc/<pid>/fd - a list of all open file descriptions for a specific pid?



> On Jul 30, 2017, at 5:15 AM, Peter Booth <peter_booth@me.com> wrote:
>
> Vlad,
>
> You might not need to replicate it- you have it happening in production in front of you.
> Some questions:
>
> 1. When is the last time that your production nginx was restarted?
> 2. Do you have regular restarts?
> 3. Is there an obstacle to restarting at some point?
> 4. Is this a single instance or do you have multiple nginx hosts?
> 5. What 3rd party models are you using?
> 6. Is the website in question an enterprise app or something that is internet visible?
>
> Maxim’s hypothesis of leaking sockets from third party plugin is the simplest, most likely explanation for what you report.
>
> I start from a position of trusting nothing. If you can you capture the output of lsof -i :80 or net stat -ant | grep TCP or a
> similar ss command you can know for certain that your visualization is “telling the truth”
> Certainly the line labeled “Writing” looks unusual. Do you know of any site events that might have caused the minimum on
> 23 July, the spike on 24th, and the step up on 25th July?
>
> Peter
>
>
>
>
>> On Jul 30, 2017, at 4:09 AM, Vlad K. <nginx-ml@acheronmedia.hr <mailto:nginx-ml@acheronmedia.hr>> wrote:
>>
>> On 2017-07-30 01:47, Maxim Dounin wrote:
>>> It might not be trivial to debug such socket leaks though, and
>>> before doing anything else it is in general a good idea to:
>>> - make sure you are using latest nginx version, and
>>> - the problem is not in a 3rd party module (that is, you can
>>> reproduce it without 3rd party modules).
>>
>> It's latest stable, 1.12.1 on FreeBSD.
>>
>> Unfortunately I can't remove 3rd party modules as this is production. I have no idea what to do to try replicate that in testing.
>>
>> But thanks for your reply.
>>
>>
>>
>> --
>> Vlad K.
>> _______________________________________________
>> nginx mailing list
>> nginx@nginx.org <mailto:nginx@nginx.org>
>> http://mailman.nginx.org/mailman/listinfo/nginx
>
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
On 2017-07-30 11:26, Peter Booth wrote:
> I just reread the thread and realize that you answered q2, and that
> makes the graph even more
> surprising. You say that it son FreeBSD - does this mean that you
> don’t have /proc available to you?
> Is there a procstat or other way to see the equivalent of
> /proc/<pid>/fd - a list of all open file descriptions for a specific
> pid?

procfs is technically available but I'm not enabling it. We do have
sockstat and fstat though that basically can show the same. I have
checked and I don't see any corresponding open connections, that's why I
was wondering about how to list what nginx thinks are connections being
Written to, so I could find what those connections did last in the
access log.

I ran a periodic netstat, sorted the IPs (not removing src ports), and
ran a diff from previous run, keeping only those that aren't +/-. Over
some period of time, the only connections that stayed open for long were
actively using the roundcube webmail which (especially with our
keepalive and http2 enabled) runs a request every minute. But the number
of those did not correspond to the number reported as Writing. Plus
these connections have daily highs and downs. "Writing" in the nginx
status does not correlate with the rest.


> 1. When is the last time that your production nginx was restarted?

In my first post in this thread is a link to the graph (by Munin) where
it shows a dip in Writing, that's where I restarted. Reloading doesn't
change the number of Writing reported by nginx.


> 2. Do you have regular restarts?

No, just regular reloads.


> 3. Is there an obstacle to restarting at some point?

If you're asking me if I can restart nginx to check something, I can do
that.


> 4. Is this a single instance or do you have multiple nginx hosts?

Single instance, one master and one worker thread, in a jail, and
there's no other jail running nginx on the server.


> 5. What 3rd party models are you using?

These are the options/modules ENABLED for the port:

DSO=on: Enable dynamic modules support
FILE_AIO=on: Enable file aio
HTTP=on: Enable HTTP module
HTTPV2=on: Enable HTTP/2 protocol support (SSL req.)
HTTP_ADDITION=on: Enable http_addition module
HTTP_AUTH_REQ=on: Enable http_auth_request module
HTTP_CACHE=on: Enable http_cache module
HTTP_DAV=on: Enable http_webdav module
HTTP_FLV=on: Enable http_flv module
HTTP_GUNZIP_FILTER=on: Enable http_gunzip_filter module
HTTP_GZIP_STATIC=on: Enable http_gzip_static module
HTTP_MP4=on: Enable http_mp4 module
HTTP_RANDOM_INDEX=on: Enable http_random_index module
HTTP_REALIP=on: Enable http_realip module
HTTP_REWRITE=on: Enable http_rewrite module
HTTP_SECURE_LINK=on: Enable http_secure_link module
HTTP_SLICE=on: Enable http_slice module
HTTP_SSL=on: Enable http_ssl module
HTTP_STATUS=on: Enable http_stub_status module
HTTP_SUB=on: Enable http_sub module
IPV6=on: Enable IPv6 support
MAIL=on: Enable IMAP4/POP3/SMTP proxy module
MAIL_IMAP=on: Enable IMAP4 proxy module
MAIL_POP3=on: Enable POP3 proxy module
MAIL_SMTP=on: Enable SMTP proxy module
MAIL_SSL=on: Enable mail_ssl module
STREAM=on: Enable stream module
STREAM_SSL=on: Enable stream_ssl module (SSL req.)
STREAM_SSL_PREREAD=on: Enable stream_ssl_preread module (SSL req.)
THREADS=on: Enable threads support
WWW=on: Enable html sample files

Which is all default for the port, except I also enabled MAIL_* ones as
I'll be needing some mail proxying. But at the moment I have no mail {}
blocks defined. Looking at these I guess I could trim down defaults. Who
needs FLV nowadays :)

I'm not sure which are or aren't 3rd party, but if the descriptions are
fully correct, then it looks like I'm not using any "3rd party" ones,
because we have options that explicitly state when a module id "3rd
party" (and the part I'm not sure is if all are listed as such).

However, it's also compiled with DSO=on, and with the above options, it
produces these:

/usr/local/libexec/nginx/ngx_mail_module.so
/usr/local/libexec/nginx/ngx_stream_module.so

None of which I've loaded at the moment.


> 6. Is the website in question an enterprise app or something that is
> internet visible?

The nginx jail is serving numerous PHP sites and a Python web app, each
in their own jails. Using fastcgi for php-fpm and uwsgi for python, all
over tcp connections between jails.



--
Vlad K.
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Peter Booth
Re: Identifying "Writing" connections in status stub
July 30, 2017 01:40PM
See below


> On Jul 30, 2017, at 6:12 AM, Vlad K. <nginx-ml@acheronmedia.hr> wrote:
>
> On 2017-07-30 11:26, Peter Booth wrote:
>> I just reread the thread and realize that you answered q2, and that
>> makes the graph even more
>> surprising. You say that it son FreeBSD - does this mean that you
>> don’t have /proc available to you?
>> Is there a procstat or other way to see the equivalent of
>> /proc/<pid>/fd - a list of all open file descriptions for a specific
>> pid?
>
> procfs is technically available but I'm not enabling it. We do have sockstat and fstat though that basically can show the same. I have checked and I don't see any corresponding open connections, that's why I was wondering about how to list what nginx thinks are connections being Written to, so I could find what those connections did last in the access log.

It appears that you have a lot of data that could help in this analysis.
How frequently is the status page being queried? Does every status datapoint get recorded
or is munin showing some rolled up rrd data?



>
> I ran a periodic netstat, sorted the IPs (not removing src ports), and ran a diff from previous run, keeping only those that aren't +/-. Over some period of time, the only connections that stayed open for long were actively using the roundcube webmail which (especially with our keepalive and http2 enabled) runs a request every minute. But the number of those did not correspond to the number reported as Writing. Plus these connections have daily highs and downs. "Writing" in the nginx status does not correlate with the rest.

If you open the status page in a browser do the numbers report match what you see with netstat?


>
>
>> 1. When is the last time that your production nginx was restarted?
>
> In my first post in this thread is a link to the graph (by Munin) where it shows a dip in Writing, that's where I restarted. Reloading doesn't change the number of Writing reported by nginx.

Thats what I thought. I think that the graph looks weird. Over the time interval [18 July, 23 July] the time series
labelled “writing” connections increases almost monotonically from approx 5 to 15. Then the nginx is restarted
and the graph jumps back unto about 12/13 and continues writing again. Do you have a hypothesis that explains
why the graph could jump back to 12/13, rather than spend a few days increasing linearly in the way it did from
the 18th to the 23rd?

How long was nginx down for? If you graph only the “writing” variable for just 23rd July does the length of
time that the # of writing connections is thoughtto be 0 make sense?

I wonder whether what you are seeing could be a side-effect of the server being in a FreeBSD jail?
I haven’t used FreeBSD. My understanding is that FreeBSD jails are more than just chroots, similar to Solaris Zones
or OpenVZ on Linux. Does each jail have separate, independent sysctl settings/ulimits?
Do any of the other nginx sites in other jails exhibit the same behavior?
In FreeBSD jails is there an equivalent of Dom) in a XEN hypervisor? A parent or root OS?
If so, do you see all connections on al jails the you log into it? If wondering if you are hitting some ulimit or
resource shortage on the host as a whole?

>
>
>> 2. Do you have regular restarts?
>
> No, just regular reloads.
>
>
>> 3. Is there an obstacle to restarting at some point?
>
> If you're asking me if I can restart nginx to check something, I can do that.
>
>
>> 4. Is this a single instance or do you have multiple nginx hosts?
>
> Single instance, one master and one worker thread, in a jail, and there's no other jail running nginx on the server.
>
>
>> 5. What 3rd party models are you using?
>
> These are the options/modules ENABLED for the port:
>
> DSO=on: Enable dynamic modules support
> FILE_AIO=on: Enable file aio
> HTTP=on: Enable HTTP module
> HTTPV2=on: Enable HTTP/2 protocol support (SSL req.)
> HTTP_ADDITION=on: Enable http_addition module
> HTTP_AUTH_REQ=on: Enable http_auth_request module
> HTTP_CACHE=on: Enable http_cache module
> HTTP_DAV=on: Enable http_webdav module
> HTTP_FLV=on: Enable http_flv module
> HTTP_GUNZIP_FILTER=on: Enable http_gunzip_filter module
> HTTP_GZIP_STATIC=on: Enable http_gzip_static module
> HTTP_MP4=on: Enable http_mp4 module
> HTTP_RANDOM_INDEX=on: Enable http_random_index module
> HTTP_REALIP=on: Enable http_realip module
> HTTP_REWRITE=on: Enable http_rewrite module
> HTTP_SECURE_LINK=on: Enable http_secure_link module
> HTTP_SLICE=on: Enable http_slice module
> HTTP_SSL=on: Enable http_ssl module
> HTTP_STATUS=on: Enable http_stub_status module
> HTTP_SUB=on: Enable http_sub module
> IPV6=on: Enable IPv6 support
> MAIL=on: Enable IMAP4/POP3/SMTP proxy module
> MAIL_IMAP=on: Enable IMAP4 proxy module
> MAIL_POP3=on: Enable POP3 proxy module
> MAIL_SMTP=on: Enable SMTP proxy module
> MAIL_SSL=on: Enable mail_ssl module
> STREAM=on: Enable stream module
> STREAM_SSL=on: Enable stream_ssl module (SSL req.)
> STREAM_SSL_PREREAD=on: Enable stream_ssl_preread module (SSL req.)
> THREADS=on: Enable threads support
> WWW=on: Enable html sample files
>
> Which is all default for the port, except I also enabled MAIL_* ones as I'll be needing some mail proxying. But at the moment I have no mail {} blocks defined. Looking at these I guess I could trim down defaults. Who needs FLV nowadays :)
>
> I'm not sure which are or aren't 3rd party, but if the descriptions are fully correct, then it looks like I'm not using any "3rd party" ones, because we have options that explicitly state when a module id "3rd party" (and the part I'm not sure is if all are listed as such).
>
> However, it's also compiled with DSO=on, and with the above options, it produces these:
>
> /usr/local/libexec/nginx/ngx_mail_module.so
> /usr/local/libexec/nginx/ngx_stream_module.so
>
> None of which I've loaded at the moment.
>
>
>> 6. Is the website in question an enterprise app or something that is internet visible?
>
> The nginx jail is serving numerous PHP sites and a Python web app, each in their own jails. Using fastcgi for php-fpm and uwsgi for python, all over tcp connections between jails.
>
>
>
> --
> Vlad K.
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
On 2017-07-30 13:30, Peter Booth wrote:
>
> It appears that you have a lot of data that could help in this
> analysis.
> How frequently is the status page being queried? Does every status
> datapoint get recorded
> or is munin showing some rolled up rrd data?

The nginx status page is queried every 5 minutes (default Munin polling
time), and it stores raw metrics into rrd database. But munin is not
imporant in this issue. I get the same values if I query the status page
directly.


> If you open the status page in a browser do the numbers report match
> what you see with netstat?

Waiting does:

# netstat -n | grep -E "tcp4|tcp6" | grep ESTABLISHED | wc -l \
&& echo "----------------------------" \
&& fetch -qo - http://10.0.0.4/nginx_status

82
----------------------------
Active connections: 89
server accepts handled requests
669843 669843 3158515
Reading: 0 Writing: 22 Waiting: 82

And I ran it a few times with several minutes in between, the above is
just an example from the last run. This is inside the nginx jail, so
grepping tcp4|tcp6 shows only connections to the nginx server.

Now, the part I don't quite understand is whether Active = Reading +
Writing + Waiting. The above certainly doesn't seem to suggest so.



> Do you have a hypothesis that explains
> why the graph could jump back to 12/13, rather than spend a few days
> increasing linearly in the way it did from
> the 18th to the 23rd?

Bots crawling the sites, pacing themselves over a longer time frame so
there's no correlation to daily sinusoid caused by live visitors. We do
have a lot of resources on all those sites to crawl through. They're all
real estate agency sites, and there are tens of thousands of pages with
hundreds of thousands of images. And looking at the logs, quite a number
of requests from bots (that are decent enough to say they're bots).

We've deviated a bit into assuming this is a bug or some unexpected
behavior (my fault for suggesting it in the beginning). That's why all I
wanted to do was to check which IPs are those that nginx considers
"Writing" to. The only reason this caught my attention was apparently
"flat" appearance of Writing, but now thinking about bots, this could be
quite normal.


> How long was nginx down for? If you graph only the “writing”
> variable for just 23rd July does the length of
> time that the # of writing connections is thoughtto be 0 make sense?

It was only restarted. It appears the "offending" connections started
showing up less than an hour later.


>
> I wonder whether what you are seeing could be a side-effect of the
> server being in a FreeBSD jail?

I doubt it. I used to see this when the server was on Debian Jessie, but
it was much less noticeable. Then again, back then we had much less
traffic and much less content.



> Do any of the other nginx sites in other jails exhibit the same
> behavior?

There is only one instance of nginx running on the server. Individual
sites are only runing php-fpm or uwsgi in their jails.


> In FreeBSD jails is there an equivalent of Dom) in a XEN hypervisor? A
> parent or root OS?

FreeBSD jails are OS-level virtualization. It's basically similar to
containers on Linux but with more isolation (it's not just namespacing).


> If so, do you see all connections on al jails the you log into it? If
> wondering if you are hitting some ulimit or
> resource shortage on the host as a whole?

I don't think it's that, as limits are far above the current demands for
traffic, and there's nothing logged about potential resource exhaustion.



Thanks for helping me figure this out.


--
Vlad K.
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Peter Booth
Re: Identifying "Writing" connections in status stub
July 30, 2017 08:10PM
During a busier part of the day, what is your minimum, median,99%, max requests per sec?

> On Jul 30, 2017, at 9:31 AM, Vlad K. <nginx-ml@acheronmedia.hr> wrote:
>
>
>> If you open the status page in a browser do the numbers report match
>> what you see with netstat?
>
> Waiting does:
>
> # netstat -n | grep -E "tcp4|tcp6" | grep ESTABLISHED | wc -l \
> && echo "----------------------------" \
> && fetch -qo - http://10.0.0.4/nginx_status
>
> 82
> ----------------------------
> Active connections: 89
> server accepts handled requests
> 669843 669843 3158515
> Reading: 0 Writing: 22 Waiting: 82
>
> And I ran it a few times with several minutes in between, the above is just an example from the last run. This is inside the nginx jail, so grepping tcp4|tcp6 shows only connections to the nginx server.
>
> Now, the part I don't quite understand is whether Active = Reading + Writing + Waiting. The above certainly doesn't seem to suggest so.


So when you look at two different documentation pages explaining the status page,
they both show that Active = Reading + Writing + Waiting

https://www.cyberciti.biz/faq/nginx-enable-and-see-current-status-page/ https://www.cyberciti.biz/faq/nginx-enable-and-see-current-status-page/
https://www.keycdn.com/support/nginx-status/ https://www.keycdn.com/support/nginx-status/


I think that suggests that, in your environment, Writing = “really writing” + “leaked sockets that nginx thinks are writing"

I’m pretty confident that this is a bug, because of the shape of the graph. There's no obvious healthy explanation for the
number of writing connections to increase over days and return to its current value after a restart.


>
>
>
>> Do you have a hypothesis that explains
>> why the graph could jump back to 12/13, rather than spend a few days
>> increasing linearly in the way it did from
>> the 18th to the 23rd?
>
> Bots crawling the sites, pacing themselves over a longer time frame so there's no correlation to daily sinusoid caused by live visitors. We do have a lot of resources on all those sites to crawl through. They're all real estate agency sites, and there are tens of thousands of pages with hundreds of thousands of images. And looking at the logs, quite a number of requests from bots (that are decent enough to say they're bots).


Last public site I worked on had approx 40% of requests from bots or spiders (including our own active testing) and only 1/2 of the 400 user agents that weren’t interactive browsers actually identified themselves. many pretended to be browsers, and might have been scripted browsers, but were easy to identify because of the pattern software the URLS they requested.


> We've deviated a bit into assuming this is a bug or some unexpected behavior (my fault for suggesting it in the beginning). That's why all I wanted to do was to check which IPs are those that nginx considers "Writing" to. The only reason this caught my attention was apparently "flat" appearance of Writing, but now thinking about bots, this could be quite normal.







>
>
>> How long was nginx down for? If you graph only the “writing”
>> variable for just 23rd July does the length of
>> time that the # of writing connections is thoughtto be 0 make sense?
>
> It was only restarted. It appears the "offending" connections started showing up less than an hour later.
>
>
>> I wonder whether what you are seeing could be a side-effect of the
>> server being in a FreeBSD jail?
>
> I doubt it. I used to see this when the server was on Debian Jessie, but it was much less noticeable. Then again, back then we had much less traffic and much less content.
>
>
>
>> Do any of the other nginx sites in other jails exhibit the same
>> behavior?
>
> There is only one instance of nginx running on the server. Individual sites are only runing php-fpm or uwsgi in their jails.
>
>
>> In FreeBSD jails is there an equivalent of Dom) in a XEN hypervisor? A
>> parent or root OS?
>
> FreeBSD jails are OS-level virtualization. It's basically similar to containers on Linux but with more isolation (it's not just namespacing).
>
>
>> If so, do you see all connections on al jails the you log into it? If
>> wondering if you are hitting some ulimit or
>> resource shortage on the host as a whole?
>
> I don't think it's that, as limits are far above the current demands for traffic, and there's nothing logged about potential resource exhaustion.
>
>
>
> Thanks for helping me figure this out.
>
>
> --
> Vlad K.
> _______________________________________________
> nginx mailing list
> nginx@nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx

_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
On 2017-07-30 20:03, Peter Booth wrote:
> During a busier part of the day, what is your minimum, median,99%,
> max requests per sec?


Image's worth a thousand words :)

https://pasteboard.co/GDs6JSz.png

The peaks between days are API clients syncing data, they usually do it
in the night (clients are all in teh same timezone).


> I’m pretty confident that this is a bug, because of the shape of the
> graph. There's no obvious healthy explanation for the
> number of writing connections to increase over days and return to its
> current value after a restart.


I've restarted nginx yesterday, and until now, the Writing has stayed
down at 1-3 for the same level of traffic as usual. I suppose that's not
bots then...



--
Vlad K.
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Sorry, only registered users may post in this forum.

Click here to login