Welcome! Log In Create A New Profile

Advanced

One upstream connection blocks another upstream connection

Posted by Anonymous User 
Anonymous User
One upstream connection blocks another upstream connection
March 13, 2018 01:40PM
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Hello!

On Tue, Mar 13, 2018 at 12:39:05PM +0000, lje@napatech.com wrote:

> IБ─≥m investing a problem where I get delays in requests that is
> proxied to an upstream service.
> By looking at the debug log it seems that one request is
> blocking the worker so it canБ─≥t complete another requests.
>
> If you look at the log below request *352 is handled by worker
> 18728. This worker then starts to process request *360 but for
> reason it is blocked at 13:53:53 until 13:54:03. How can that
> happen ?
>
> The upstream service for request *360 doesnБ─≥t send any data in
> the time interval 13:53:53-15:54:03.
> But the upstream service for request *352 responses nearly
> immediately.
> (I have examined the communication between nginx and the
> upstream in wireshark)
>
> Another observation, in the time interval 13:53:53-15:54:03 the
> worker process seems to be in state D (uninterruptible sleep)
>
> So my question is: What can block worker 18728 so it doesnБ─≥t
> complete request *352
>
> OS: Redhat 7.4
> Nginx: 1.12.2
>
> Hopefully I have provided enough details.
>
> 2018/03/09 13:53:40 [debug] 18726#0: *249 input buf #95517
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe temp offset: 588029952
> 2018/03/09 13:53:40 [debug] 18726#0: *249 input buf #95518
> 2018/03/09 13:53:32 [debug] 18728#0: *189 readv: eof:0, avail:1
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe offset: 368738304
> 2018/03/09 13:53:32 [debug] 18728#0: *189 readv: 2, last:4096
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe buf ls:1 0000556CF522BAF0, pos 0000556CF522BAF0, size: 4096
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe buf ls:1 0000556CF5233250, pos 0000556CF5233250, size: 4096
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe recv chain: 8192
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe buf ls:1 0000556CF522EB20, pos 0000556CF522EB20, size: 4096
> 2018/03/09 13:53:32 [debug] 18728#0: *189 input buf #144060
> 2018/03/09 13:53:40 [debug] 18726#0: *249 size: 8192
> 2018/03/09 13:53:32 [debug] 18728#0: *189 input buf #144061
> 2018/03/09 13:53:40 [debug] 18726#0: *249 writev: 22, 8192, 368738304
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe offset: 588029952
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe buf ls:1 0000556CF5220480, pos 0000556CF5220480, size: 4096
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe buf ls:1 0000556CF5233250, pos 0000556CF5233250, size: 4096
> 2018/03/09 13:53:40 [debug] 18726#0: *249 pipe temp offset: 368746496
> 2018/03/09 13:53:32 [debug] 18728#0: *189 pipe buf ls:1 0000556CF522CAF0, pos 0000556CF522CAF0, size: 4096
> 2018/03/09 13:53:53 [debug] 18729#0: *352 writev: 418 of 418
> 2018/03/09 13:53:40 [debug] 18726#0: *249 readv: eof:0, avail:1
> 2018/03/09 13:53:32 [debug] 18728#0: *189 size: 8192
> 2018/03/09 13:53:53 [debug] 18729#0: *352 chain writer out: 0000000000000000
> 2018/03/09 13:53:40 [debug] 18726#0: *249 readv: 2, last:4096
> 2018/03/09 13:53:53 [debug] 18729#0: *352 event timer del: 14: 1520603638051
> 2018/03/09 13:53:32 [debug] 18728#0: *189 writev: 18, 8192, 588029952
> 2018/03/09 13:53:53 [debug] 18729#0: *352 event timer add: 14: 86400000:1520690033051

Mix of different times in logs suggests that workers are blocked
for a long time doing something (and hence the time in some worker
process are not updated for a long time).

Reasons can be different, and more information/logs are needed to
say anything for sure. In this particular case my best guess is
that your backend server is much faster than the disk you use for
proxy_temp_path, and so nginx loops buffering a response to disk
for a long time. For example, the response in *189 already
buffered about 600M, and there is no indication in the log lines
quoted that it stopped reading from the upstream somewhere. At
the same time the process thinks current time is 13:53:32, which
is 21 seconds behind 13:53:53 as logged by pid 18729 at the same
time.

An obvious workaround would be to disable or limit disk buffering,
"proxy_max_temp_file_size 0;". Additionally, using larger memory
buffers (proxy_buffer_size, proxy_buffers) might help to avoid
such monopolization of a worker process.

See also https://trac.nginx.org/nginx/ticket/1431 for a detailed
explanation of a similar problem as observed with websocket
proxying.

[...]

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
> Mix of different times in logs suggests that workers are blocked for a
long time doing something

> (and hence the time in some worker process are not updated for a long
time).



> Reasons can be different, and more information/logs are needed to say
anything for sure. In this particular case my best guess is that your
backend server is much faster than the disk you use for proxy_temp_path,

> and so nginx loops buffering a response to disk for a long time. For
example, the response in *189 already buffered about 600M, and there is no
indication in the log lines quoted that it stopped reading from the
upstream somewhere.

> At the same time the process thinks current time is 13:53:32, which is 21
seconds behind 13:53:53 as logged by pid 18729 at the same time.

>

> An obvious workaround would be to disable or limit disk buffering,
"proxy_max_temp_file_size 0;". Additionally, using larger memory buffers
(proxy_buffer_size, proxy_buffers) might help to avoid such monopolization
of a worker process.

>

> See also https://trac.nginx.org/nginx/ticket/1431 for a detailed
explanation of a similar problem as observed with websocket proxying.

>

> Maxim Dounin


This seemed to be the problem. The upstream was delivering data too fast
compared to writing the temp file. I tried to increase the buffer but that
didn't help. I disabled the temp file completely at the problem
disappeared. I no longer see this monopolization of the worker process.


Thanks for the help Maxim.


Best regards

Lars
_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
Sorry, only registered users may post in this forum.

Click here to login