Welcome! Log In Create A New Profile

Advanced

Unexpected connection timeouts to a backend server

Posted by Jean-Baptiste Quenot 
Jean-Baptiste Quenot
Unexpected connection timeouts to a backend server
September 04, 2009 11:20AM
Dear all,

I'm using haproxy 1.3.15.9 on Ubuntu.

I have a setup with two web servers. One is hosting haproxy and
nginx, it's called s1, and the other one only hosts nginx, and is
called s2. I notice several thousands times a day that the connection
timeout is reached when haproxy on s1 tries to reach nginx on s2.
This is completely unexpected because there are no network problems
between the two. I checked with "ab -n 1000 -c 100" there are no
failed requests. I checked with tcpdump, there are no RST flags
involved. The two servers are located at the same hosting provider,
so I don't believe in network problems anyway. There are no issue at
all on s1 between haproxy and nginx on the same host, only between s1
and s2.

I'm also sure the maxconn setting for this backend is not reached, it
is set to a high value 2000, stats report that max sessions ever seen
is about 200, and the timeouts also happen when there is no other
connection on haproxy.

Other settings involved, stated in the defaults section:

contimeout 5000
retries 3
redispatch

Attached is an example of offending log entries. I notice 1 or 2
retries, never more, but it affects response time because the request
either takes 5 seconds (contimeout) or 10 seconds (2 * contimeout) to
complete.

Note that there are actually two backends configured with the same
server host and port, but with different httpchk settings:

backend nginx_app
option httpchk OPTIONS /check HTTP/1.1\r\nHost:\ site.com

server s1 s1.site.com:83 check inter 10s
server s2 s2.site.com:83 check inter 10s

backend nginx
option httpchk HEAD /site.css HTTP/1.1\r\nHost:\ data.nomao.com

server s1 s1.site.com:83 check inter 10s
server s2 s2.site.com:83 check inter 10s

Last but not least, I have an ufw firewall on s1 and s2 that protect
port 83 from the outside. Only s1 is allowed to connect to port 83 on
s2.

Tell me if you need more information, I'll be glad to provide some.
This is a critical issue for me, as users may go away from my site
because of the response time :S

Thanks in advance,
--
Jean-Baptiste Quenot
http://jbq.caraldi.com/
Sep 4 10:49:24 s1 haproxy[19977]: 127.0.0.1:35243 [04/Sep/2009:10:49:19.370] frontend nginx/s2 0/5000/0/2/5003 200 21668 - - CD-- 25/15/1/1/1 0/0 {data.site.com} "GET /js/map.js?232be2f HTTP/1.1"
Sep 4 10:49:25 s1 haproxy[19977]: 127.0.0.1:35371 [04/Sep/2009:10:49:20.069] frontend nginx/s2 0/5005/4/0/5009 200 2128 - - ---- 21/12/0/0/1 0/0 {data.site.com} "GET /img/loader/loader_small_1.gif?232be2f HTTP/1.1"
Sep 4 10:49:30 s1 haproxy[19977]: 127.0.0.1:36535 [04/Sep/2009:10:49:25.036] frontend nginx_app/s2 0/5000/0/56/5056 200 4796 - - ---- 0/0/0/0/1 0/0 {img.site.com} "GET /map/spots?size=256&box=2.197265625,48.893615361480194,2.2412109375,48.922499263758254&zoom=13&tagstring=faire%20du%20shopping%20supermarche%20franprix&ids=1315616,1315575 HTTP/1.1"
Sep 4 10:49:38 s1 haproxy[19977]: 127.0.0.1:38338 [04/Sep/2009:10:49:33.009] frontend nginx/eg2 0/5005/0/0/5005 200 638 - - ---- 4/2/0/0/1 0/0 {data.site.com} "GET /img/nav/fbconnect.png?232be2f HTTP/1.1"
Jean-Baptiste Quenot
Re: Unexpected connection timeouts to a backend server
September 04, 2009 12:50PM
2009/9/4 Jean-Baptiste Quenot <[email protected]>:
>
> I'm using haproxy 1.3.15.9 on Ubuntu.

FYI I just upgraded to 1.3.20, and the problem persists.
--
Jean-Baptiste Quenot
http://jbq.caraldi.com/
Willy Tarreau
Re: Unexpected connection timeouts to a backend server
September 05, 2009 07:30PM
Hi Jean-Baptiste,

On Fri, Sep 04, 2009 at 12:44:26PM +0200, Jean-Baptiste Quenot wrote:
> 2009/9/4 Jean-Baptiste Quenot <[email protected]>:
> >
> > I'm using haproxy 1.3.15.9 on Ubuntu.
>
> FYI I just upgraded to 1.3.20, and the problem persists.

Good, this is an important piece of information, because one of
the problem I was suspecting was a clock jumping back-and-forth,
which is the reason why I implemented the monotonic clock in
1.3.16.

Tcpdump will not show you any RST precisely because a timeout
is detected on inactivity. Also, the absence of RST indicates
that there is no remains of an old session on the remote host,
because if the remote host has an old session with your source
port, it will respond either by an RST or by an ACK, depending
on the SYN sequence number. And if it replies with an ACK, then
your system will send it an RST. This case is common when
running firewalls which randomize sequence numbers (eg: PIX,
FWSM), as they completely break the TCP rules between the client
and the server.

Among the things I could think of, one of them would be the
local system sometimes running out of source ports. You could
check this as well as the fact that the system allows you to
reuse connections still in TIME_WAIT state :

/proc/sys/net/ipv4/ip_local_port_range
/proc/sys/net/ipv4/tcp_tw_reuse

Also, please check with tcpdump if you see duplicate SYNs. This
is important, because a duplicate SYN will indicate that a port
could be allocated but the connection could not be establish for
whatever reason. However, the lack of duplicate SYNs would rather
indicate a source port shortage.

You can easily capture SYN/FIN/RST packets with tcpdump that way :

tcpdump -Svvni eth0 'tcp[13]&7!=0'

Also note the '-S' above which keeps sequence numbers intact, this
will be helpful to troubleshoot issues if some retransmits are
found.

Regards,
Willy
Jean-Baptiste Quenot
Re: Unexpected connection timeouts to a backend server
September 05, 2009 11:20PM
2009/9/5 Willy Tarreau <[email protected]>:
>
> Among the things I could think of, one of them would be the
> local system sometimes running out of source ports. You could
> check this as well as the fact that the system allows you to
> reuse connections still in TIME_WAIT state :
>
>   /proc/sys/net/ipv4/ip_local_port_range

Gives a fairly large range:

32768 61000

>   /proc/sys/net/ipv4/tcp_tw_reuse

Gives:

0

"netstat -ntp | wc -l" gives:

4397

"netstat -ntp | grep -v TIME_WAIT | wc -l" gives:

445

I don't think there are too many active connections, as the port range
allows for 28k, and I have 4k. Reusing TIME_WAIT connections may not
be necessary.

> Also, please check with tcpdump if you see duplicate SYNs. This
> is important, because a duplicate SYN will indicate that a port
> could be allocated but the connection could not be establish for
> whatever reason. However, the lack of duplicate SYNs would rather
> indicate a source port shortage.

This is what I did to check duplicate SYN:

In window 1, to detect haproxy retries:

tail -f haproxy.log | egrep ' [0-9]+/[0-9]+/[0-9]+/[0-9]+/[1-9]
[0-9]+/[0-9]+ '

In window 2:

tcpdump -w ~/tcp.log -Svvn 'port 83 and host s2.site.com'

I stop tcpdump with Ctrl-C as soon I notice in window 1 that a new
line has been reported:

Sep 5 22:15:55 s1 haproxy[3517]: 127.0.0.1:45818
[05/Sep/2009:22:15:50.708] apache/s2 0/0/5003/0/5003 200 1449 - - ----
0/0/0/0/1 0/0 {data.site.com} "GET /img/pinsets/1000/10200-s.png
HTTP/1.1"

I loaded the pcap file in wireshark and noticed two things:

1) the source port is not the same as logged in the haproxy logs, it
may be useful to have both the source port connecting to haproxy, and
the source port connecting to the backend.

2) Using this filter: "tcp.dstport==83 and tcp.flags.syn==1" I iterate
over all lines and notice that the source port is never the same, so
it may imply that there is no duplicate SYN.

Best regards,
--
Jean-Baptiste Quenot
http://jbq.caraldi.com/
Willy Tarreau
Re: Unexpected connection timeouts to a backend server
September 08, 2009 07:10AM
On Sat, Sep 05, 2009 at 11:14:56PM +0200, Jean-Baptiste Quenot wrote:
(...)
> I loaded the pcap file in wireshark and noticed two things:
>
> 1) the source port is not the same as logged in the haproxy logs, it
> may be useful to have both the source port connecting to haproxy, and
> the source port connecting to the backend.

Yes, I've already been missing that too. But before changing the log
format again, I'd like to make it more flexible and let people add this
information if they need it.

> 2) Using this filter: "tcp.dstport==83 and tcp.flags.syn==1" I iterate
> over all lines and notice that the source port is never the same, so
> it may imply that there is no duplicate SYN.

exactly.

Regards,
Willy
Sorry, only registered users may post in this forum.

Click here to login