Welcome! Log In Create A New Profile

Advanced

haproxy returns http 502

Posted by Yehuda Sadeh 
Yehuda Sadeh
haproxy returns http 502
May 18, 2012 12:30AM
Hi,

We have a setup where there's haproxy (1.4.15) in front of a few apache
servers. We have a test suite that occasionally fails with 502 when going
through haproxy (this doesn't happen when going directly to one of the
apache servers). I was able to have a tcp capture while that was happening,
and I do see some strange behavior. From what I can see haproxy sends a
request to the apache server, then hangs up the tcp connection (while
response is arriving), and reopens a second tcp connection on which it
sends nothing. After 20 seconds apache closes down the second connection
and haproxy sends 502 to the client.

The haproxy log shows the following:

May 17 14:03:20 localhost haproxy[3695]:
10.214.131.22:38909[17/May/2012:14:02:59.347] api-http-ipv4
radosgw-http/plana93-1
0/0/2/-1/21020 502 204 - - SH-- 0/0/0/0/1 0/0 "GET
/jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1"

Which means, if I decipher it correctly, that the server hung the
connection after more than 20 seconds. However, looking at the server logs
I couldn't find evidence for that happening. I do see the server responding
immediately.
I was taking a tcpdump while that was happening, and I do see some strange
behavior. The haproxy log shows the following (captured on the haproxy
machine):

10.214.131.22 client
10.214.131.25 haproxy
10.214.133.23 apache


1. client <-> haproxy

Client sends request to haproxy

18073 2.787839 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [SYN]
Seq=0 Win=14600 Len=0 MSS=1460 TSV=25089868 TSER=0 WS=7
18074 2.787867 10.214.131.25 -> 10.214.131.22 TCP http > 38909 [SYN, ACK]
Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=25177841 TSER=25089868 WS=7
18075 2.787948 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [ACK]
Seq=1 Ack=1 Win=14720 Len=0 TSV=25089868 TSER=25177841
18076 2.788028 10.214.131.22 -> 10.214.131.25 HTTP GET
/jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1


2. haproxy <-> apache (connection 1)

haproxy sends request to apache

18077 2.788185 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [SYN]
Seq=0 Win=14600 Len=0 MSS=1460 TSV=25177841 TSER=0 WS=7
18078 2.789498 10.214.133.23 -> 10.214.131.25 TCP http > 39579 [SYN, ACK]
Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773204 TSER=25177841 WS=7
18079 2.789530 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [ACK]
Seq=1 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204
18080 2.789576 10.214.131.25 -> 10.214.133.23 HTTP GET
/jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
18081 2.789605 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [FIN, ACK]
Seq=333 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204

3. haproxy <-> apache (connection 2)

haproxy creates a second tcp connection to the same apache server

18082 2.789650 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [SYN]
Seq=0 Win=14600 Len=0 MSS=1460 TSV=25177841 TSER=0 WS=7
18083 2.790896 10.214.133.23 -> 10.214.131.25 TCP http > 39580 [SYN, ACK]
Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773204 TSER=25177841 WS=7
18084 2.790926 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [ACK]
Seq=1 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204

4. haproxy <-> apache (connection 1)

apache returns http header + part of the data. HTTP header is valid.

18085 2.790938 10.214.133.23 -> 10.214.131.25 TCP http > 39579 [ACK]
Seq=1 Ack=333 Win=15616 Len=0 TSV=248773204 TSER=25177841
18086 2.797494 10.214.133.23 -> 10.214.131.25 HTTP HTTP/1.1 200 OK
(application/octet-stream)

haproxy sends reset on the tcp connection
18087 2.797525 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [RST]
Seq=334 Win=0 Len=0

apache sends more data
18088 2.797545 10.214.133.23 -> 10.214.131.25 HTTP Continuation or
non-HTTP traffic

haproxy sends another reset
18089 2.797558 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [RST]
Seq=334 Win=0 Len=0

5. haproxy -> client

ack for the previous packet

18090 2.825478 10.214.131.25 -> 10.214.131.22 TCP http > 38909 [ACK]
Seq=1 Ack=282 Win=15616 Len=0 TSV=25177845 TSER=25089868

....

6. haproxy <-> apache (connection 2)

haproxy resending SYN, ACK (after a second): could it be that it lost the
first ACK response?

18208 3.788175 10.214.133.23 -> 10.214.131.25 TCP http > 39580 [SYN, ACK]
Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773304 TSER=25177841 WS=7
18209 3.788197 10.214.131.25 -> 10.214.133.23 TCP [TCP Dup ACK 18084#1]
39580 > http [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSV=25177941 TSER=248773304
SLE=0 SRE=1

....
<~20 seconds later>

7. haproxy <-> apache (connection 2)

apache hangs up the connection as haproxy didn't send anything on it

18557 23.808745 10.214.133.23 -> 10.214.131.25 TCP http > 39580 [FIN, ACK]
Seq=1 Ack=1 Win=14592 Len=0 TSV=248775306 TSER=25177941
18558 23.808876 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [RST, ACK]
Seq=1 Ack=2 Win=14720 Len=0 TSV=25179943 TSER=248775306

haproxy <-> client

close connection


18559 23.808917 10.214.131.25 -> 10.214.131.22 HTTP HTTP/1.0 502 Bad
Gateway (text/html)
18560 23.809737 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [FIN, ACK]
Seq=282 Ack=206 Win=15744 Len=0 TSV=25091970 TSER=25179943



HTTP request header

GET /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
Host: plana15.front.sepia.ceph.com
Accept-Encoding: identity
Date: Thu, 17 May 2012 18:02:59 GMT
Content-Length: 0
Authorization: AWS MDM21HVPZOL66IC8OIOF:2pAaDJ8w5YLi5ZAwL7f7ZGxMNPo=
User-Agent: Boto/2.3.0 (linux2)
X-Forwarded-For: 10.214.131.22
Connection: close

HTTP response header (apache -> haproxy on connection 1):

HTTP/1.1 200 OK
Date: Thu, 17 May 2012 18:02:59 GMT
Server: Apache/2.2.20 (Ubuntu)
Accept-Ranges: bytes
Last-Modified: Thu, 17 May 2012 18:02:59 GMT
ETag: "3580afcd8aaf507ab9860473ab48186b"
Content-Length: 4194304
Connection: close
Content-Type: application/octet-stream


We also see the same symptoms on a different environment that runs 1.5dev7,
though I didn't take any network capture so it might be a different issue.

Thanks,
Yehuda

And here is the haproxy config:

global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
maxconn 4096
daemon

defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
maxconn 2000
contimeout 5000

frontend api-http-ipv4
mode http
bind 10.214.131.25:80
default_backend radosgw-http
option forwardfor
option httpclose
clitimeout 50000
reqidel ^X-Forwarded-For:.*

backend radosgw-http
balance leastconn
srvtimeout 50000
http-check expect ! rstatus ^5
option httpchk HEAD /
timeout check 6
timeout connect 2
timeout http-request 3
server plana88-1 10.214.133.28:80 check inter 2000 rise 2 fall 5
server plana88-2 10.214.133.28:81 check inter 2000 rise 2 fall 5
server plana93-1 10.214.133.23:80 check inter 2000 rise 2 fall 5
server plana93-2 10.214.133.23:81 check inter 2000 rise 2 fall 5

listen stats 0.0.0.0:9000
mode http
stats enable
stats uri /
stats refresh 15sec
stats show-desc clitimeout 50000
srvtimeout 50000
contimeout 5000
clitimeout 50000
Cyril Bonté
Re: haproxy returns http 502
May 18, 2012 12:50AM
Hi,

Le 18/05/2012 00:28, Yehuda Sadeh a écrit :
> Hi,
>
> We have a setup where there's haproxy (1.4.15) in front of a few
> apache servers. We have a test suite that occasionally fails with 502
> when going through haproxy (this doesn't happen when going directly to
> one of the apache servers). I was able to have a tcp capture while that
> was happening, and I do see some strange behavior. From what I can see
> haproxy sends a request to the apache server, then hangs up the tcp
> connection (while response is arriving), and reopens a second tcp
> connection on which it sends nothing. After 20 seconds apache closes
> down the second connection and haproxy sends 502 to the client.
>
> The haproxy log shows the following:
>
> May 17 14:03:20 localhost haproxy[3695]: 10.214.131.22:38909
> http://10.214.131.22:38909 [17/May/2012:14:02:59.347] api-http-ipv4
> radosgw-http/plana93-1 0/0/2/-1/21020 502 204 - - SH-- 0/0/0/0/1 0/0
> "GET /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1"
>
> Which means, if I decipher it correctly, that the server hung the
> connection after more than 20 seconds. However, looking at the server
> logs I couldn't find evidence for that happening. I do see the server
> responding immediately.
> I was taking a tcpdump while that was happening, and I do see some
> strange behavior. The haproxy log shows the following (captured on the
> haproxy machine):
>
> (...)
>
> We also see the same symptoms on a different environment that runs
> 1.5dev7, though I didn't take any network capture so it might be a
> different issue.

Before any other analysis, can you retry with better timeout values ?
From the configuration you provided :

> timeout check 6
> timeout connect 2
> timeout http-request 3

Those 3 timeouts are in *milliseconds*, I think you wanted to define
them in seconds.

--
Cyril Bonté
Willy Tarreau
Re: haproxy returns http 502
May 18, 2012 01:00AM
Hi,

first, let me congratulate you for such a well documented analysis.

On Thu, May 17, 2012 at 03:28:40PM -0700, Yehuda Sadeh wrote:
> Hi,
>
> We have a setup where there's haproxy (1.4.15) in front of a few apache
> servers. We have a test suite that occasionally fails with 502 when going
> through haproxy (this doesn't happen when going directly to one of the
> apache servers). I was able to have a tcp capture while that was happening,
> and I do see some strange behavior. From what I can see haproxy sends a
> request to the apache server, then hangs up the tcp connection (while
> response is arriving), and reopens a second tcp connection on which it
> sends nothing. After 20 seconds apache closes down the second connection
> and haproxy sends 502 to the client.
>
> The haproxy log shows the following:
>
> May 17 14:03:20 localhost haproxy[3695]:
> 10.214.131.22:38909[17/May/2012:14:02:59.347] api-http-ipv4
> radosgw-http/plana93-1
> 0/0/2/-1/21020 502 204 - - SH-- 0/0/0/0/1 0/0 "GET
> /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1"
>
> Which means, if I decipher it correctly, that the server hung the
> connection after more than 20 seconds.

.... and a retry! I think this is the key of the mystery.

> However, looking at the server logs
> I couldn't find evidence for that happening. I do see the server responding
> immediately.
> I was taking a tcpdump while that was happening, and I do see some strange
> behavior. The haproxy log shows the following (captured on the haproxy
> machine):
>
> 10.214.131.22 client
> 10.214.131.25 haproxy
> 10.214.133.23 apache
>
>
> 1. client <-> haproxy
>
> Client sends request to haproxy
>
> 18073 2.787839 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [SYN]
> Seq=0 Win=14600 Len=0 MSS=1460 TSV=25089868 TSER=0 WS=7
> 18074 2.787867 10.214.131.25 -> 10.214.131.22 TCP http > 38909 [SYN, ACK]
> Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=25177841 TSER=25089868 WS=7
> 18075 2.787948 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [ACK]
> Seq=1 Ack=1 Win=14720 Len=0 TSV=25089868 TSER=25177841
> 18076 2.788028 10.214.131.22 -> 10.214.131.25 HTTP GET
> /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
>
>
> 2. haproxy <-> apache (connection 1)
>
> haproxy sends request to apache
>
> 18077 2.788185 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [SYN]
> Seq=0 Win=14600 Len=0 MSS=1460 TSV=25177841 TSER=0 WS=7
> 18078 2.789498 10.214.133.23 -> 10.214.131.25 TCP http > 39579 [SYN, ACK]
> Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773204 TSER=25177841 WS=7
> 18079 2.789530 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [ACK]
> Seq=1 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204
> 18080 2.789576 10.214.131.25 -> 10.214.133.23 HTTP GET
> /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
> 18081 2.789605 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [FIN, ACK]
> Seq=333 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204

This one is really strange. The only reason why I could imagine the FIN here
would be if the client had used one *and* there was "option abortonclose",
but none of these hypothesis were true.

Ah I just saw Cyril's discovery about your erroneous timeouts, he's plain
right !

> 3. haproxy <-> apache (connection 2)
>
> haproxy creates a second tcp connection to the same apache server
>
> 18082 2.789650 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [SYN]
> Seq=0 Win=14600 Len=0 MSS=1460 TSV=25177841 TSER=0 WS=7
> 18083 2.790896 10.214.133.23 -> 10.214.131.25 TCP http > 39580 [SYN, ACK]
> Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773204 TSER=25177841 WS=7
> 18084 2.790926 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [ACK]
> Seq=1 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204

Here we have a bug in my opinion, despite the timeout issue. The new
connection should not have been attempted since the request had already
been sent. And if the connection remains open, it's precisely because
there is no more request to send !

I'll have to dig through the code to find how it is possible to get
into this situation.

Please in parallel recheck with more correct timeouts (at least 4s
each).

Thanks,
Willy
Yehuda Sadeh
Re: haproxy returns http 502
May 18, 2012 01:30AM
On Thu, May 17, 2012 at 3:47 PM, Cyril Bonté <[email protected]> wrote:

> Hi,
>
> Le 18/05/2012 00:28, Yehuda Sadeh a écrit :
>
>> Hi,
>>
>> We have a setup where there's haproxy (1.4.15) in front of a few
>> apache servers. We have a test suite that occasionally fails with 502
>> when going through haproxy (this doesn't happen when going directly to
>> one of the apache servers). I was able to have a tcp capture while that
>> was happening, and I do see some strange behavior. From what I can see
>> haproxy sends a request to the apache server, then hangs up the tcp
>> connection (while response is arriving), and reopens a second tcp
>> connection on which it sends nothing. After 20 seconds apache closes
>> down the second connection and haproxy sends 502 to the client.
>>
>> The haproxy log shows the following:
>>
>> May 17 14:03:20 localhost haproxy[3695]: 10.214.131.22:38909
>> http://10.214.131.22:38909 [17/May/2012:14:02:59.347] api-http-ipv4
>>
>> radosgw-http/plana93-1 0/0/2/-1/21020 502 204 - - SH-- 0/0/0/0/1 0/0
>> "GET /jenkins-**d15dxs9og2dxhfhlywf8i-182/**testobj HTTP/1.1"
>>
>> Which means, if I decipher it correctly, that the server hung the
>> connection after more than 20 seconds. However, looking at the server
>> logs I couldn't find evidence for that happening. I do see the server
>> responding immediately.
>> I was taking a tcpdump while that was happening, and I do see some
>> strange behavior. The haproxy log shows the following (captured on the
>> haproxy machine):
>>
> >
>
>> (...)
>>
>>
>> We also see the same symptoms on a different environment that runs
>> 1.5dev7, though I didn't take any network capture so it might be a
>> different issue.
>>
>
> Before any other analysis, can you retry with better timeout values ?
> From the configuration you provided :
>
>
> timeout check 6
>> timeout connect 2
>> timeout http-request 3
>>
>
> Those 3 timeouts are in *milliseconds*, I think you wanted to define them
> in seconds.
>
>
Yeah, definitely!


Thanks,
Yehuda
Yehuda Sadeh
Re: haproxy returns http 502
May 18, 2012 01:40AM
On Thu, May 17, 2012 at 3:52 PM, Willy Tarreau <[email protected]> wrote:
>
> Hi,
>
> first, let me congratulate you for such a well documented analysis.
>
> On Thu, May 17, 2012 at 03:28:40PM -0700, Yehuda Sadeh wrote:
> > Hi,
> >
> >    We have a setup where there's haproxy (1.4.15) in front of a few apache
> > servers. We have a test suite that occasionally fails with 502 when going
> > through haproxy (this doesn't happen when going directly to one of the
> > apache servers). I was able to have a tcp capture while that was happening,
> > and I do see some strange behavior. From what I can see haproxy sends a
> > request to the apache server, then hangs up the tcp connection (while
> > response is arriving), and reopens a second tcp connection on which it
> > sends nothing. After 20 seconds apache closes down the second connection
> > and haproxy sends 502 to the client.
> >
> > The haproxy log shows the following:
> >
> > May 17 14:03:20 localhost haproxy[3695]:
> > 10.214.131.22:38909[17/May/2012:14:02:59.347] api-http-ipv4
> > radosgw-http/plana93-1
> > 0/0/2/-1/21020 502 204 - - SH-- 0/0/0/0/1 0/0 "GET
> > /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1"
> >
> > Which means, if I decipher it correctly, that the server hung the
> > connection after more than 20 seconds.
>
> ... and a retry! I think this is the key of the mystery.
>
> > However, looking at the server logs
> > I couldn't find evidence for that happening. I do see the server responding
> > immediately.
> > I was taking a tcpdump while that was happening, and I do see some strange
> > behavior. The haproxy log shows the following (captured on the haproxy
> > machine):
> >
> > 10.214.131.22 client
> > 10.214.131.25 haproxy
> > 10.214.133.23 apache
> >
> >
> > 1. client <-> haproxy
> >
> >  Client sends request to haproxy
> >
> > 18073   2.787839 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [SYN]
> > Seq=0 Win=14600 Len=0 MSS=1460 TSV=25089868 TSER=0 WS=7
> > 18074   2.787867 10.214.131.25 -> 10.214.131.22 TCP http > 38909 [SYN, ACK]
> > Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=25177841 TSER=25089868 WS=7
> > 18075   2.787948 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [ACK]
> > Seq=1 Ack=1 Win=14720 Len=0 TSV=25089868 TSER=25177841
> > 18076   2.788028 10.214.131.22 -> 10.214.131.25 HTTP GET
> > /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
> >
> >
> > 2. haproxy <-> apache (connection 1)
> >
> > haproxy sends request to apache
> >
> > 18077   2.788185 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [SYN]
> > Seq=0 Win=14600 Len=0 MSS=1460 TSV=25177841 TSER=0 WS=7
> > 18078   2.789498 10.214.133.23 -> 10.214.131.25 TCP http > 39579 [SYN, ACK]
> > Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773204 TSER=25177841 WS=7
> > 18079   2.789530 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [ACK]
> > Seq=1 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204
> > 18080   2.789576 10.214.131.25 -> 10.214.133.23 HTTP GET
> > /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
> > 18081   2.789605 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [FIN, ACK]
> > Seq=333 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204
>
> This one is really strange. The only reason why I could imagine the FIN here
> would be if the client had used one *and* there was "option abortonclose",
> but none of these hypothesis were true.
>
> Ah I just saw Cyril's discovery about your erroneous timeouts, he's plain
> right !
>
> > 3. haproxy <-> apache (connection 2)
> >
> > haproxy creates a second tcp connection to the same apache server
> >
> > 18082   2.789650 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [SYN]
> > Seq=0 Win=14600 Len=0 MSS=1460 TSV=25177841 TSER=0 WS=7
> > 18083   2.790896 10.214.133.23 -> 10.214.131.25 TCP http > 39580 [SYN, ACK]
> > Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773204 TSER=25177841 WS=7
> > 18084   2.790926 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [ACK]
> > Seq=1 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204
>
> Here we have a bug in my opinion, despite the timeout issue. The new
> connection should not have been attempted since the request had already
> been sent. And if the connection remains open, it's precisely because
> there is no more request to send !
>
>
> I'll have to dig through the code to find how it is possible to get
> into this situation.
>
> Please in parallel recheck with more correct timeouts (at least 4s
> each).
>

Sure, checking now. It used to take a while to reproduce, so it might
take a bit.

Thanks,
Yehuda
Yehuda Sadeh
Re: haproxy returns http 502
May 18, 2012 06:50PM
On Thu, May 17, 2012 at 4:29 PM, Yehuda Sadeh <[email protected]> wrote:
> On Thu, May 17, 2012 at 3:52 PM, Willy Tarreau <[email protected]> wrote:
>>
>> Hi,
>>
>> first, let me congratulate you for such a well documented analysis.
>>
>> On Thu, May 17, 2012 at 03:28:40PM -0700, Yehuda Sadeh wrote:
>> > Hi,
>> >
>> >    We have a setup where there's haproxy (1.4.15) in front of a few apache
>> > servers. We have a test suite that occasionally fails with 502 when going
>> > through haproxy (this doesn't happen when going directly to one of the
>> > apache servers). I was able to have a tcp capture while that was happening,
>> > and I do see some strange behavior. From what I can see haproxy sends a
>> > request to the apache server, then hangs up the tcp connection (while
>> > response is arriving), and reopens a second tcp connection on which it
>> > sends nothing. After 20 seconds apache closes down the second connection
>> > and haproxy sends 502 to the client.
>> >
>> > The haproxy log shows the following:
>> >
>> > May 17 14:03:20 localhost haproxy[3695]:
>> > 10.214.131.22:38909[17/May/2012:14:02:59.347] api-http-ipv4
>> > radosgw-http/plana93-1
>> > 0/0/2/-1/21020 502 204 - - SH-- 0/0/0/0/1 0/0 "GET
>> > /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1"
>> >
>> > Which means, if I decipher it correctly, that the server hung the
>> > connection after more than 20 seconds.
>>
>> ... and a retry! I think this is the key of the mystery.
>>
>> > However, looking at the server logs
>> > I couldn't find evidence for that happening. I do see the server responding
>> > immediately.
>> > I was taking a tcpdump while that was happening, and I do see some strange
>> > behavior. The haproxy log shows the following (captured on the haproxy
>> > machine):
>> >
>> > 10.214.131.22 client
>> > 10.214.131.25 haproxy
>> > 10.214.133.23 apache
>> >
>> >
>> > 1. client <-> haproxy
>> >
>> >  Client sends request to haproxy
>> >
>> > 18073   2.787839 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [SYN]
>> > Seq=0 Win=14600 Len=0 MSS=1460 TSV=25089868 TSER=0 WS=7
>> > 18074   2.787867 10.214.131.25 -> 10.214.131.22 TCP http > 38909 [SYN, ACK]
>> > Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=25177841 TSER=25089868 WS=7
>> > 18075   2.787948 10.214.131.22 -> 10.214.131.25 TCP 38909 > http [ACK]
>> > Seq=1 Ack=1 Win=14720 Len=0 TSV=25089868 TSER=25177841
>> > 18076   2.788028 10.214.131.22 -> 10.214.131.25 HTTP GET
>> > /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
>> >
>> >
>> > 2. haproxy <-> apache (connection 1)
>> >
>> > haproxy sends request to apache
>> >
>> > 18077   2.788185 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [SYN]
>> > Seq=0 Win=14600 Len=0 MSS=1460 TSV=25177841 TSER=0 WS=7
>> > 18078   2.789498 10.214.133.23 -> 10.214.131.25 TCP http > 39579 [SYN, ACK]
>> > Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773204 TSER=25177841 WS=7
>> > 18079   2.789530 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [ACK]
>> > Seq=1 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204
>> > 18080   2.789576 10.214.131.25 -> 10.214.133.23 HTTP GET
>> > /jenkins-d15dxs9og2dxhfhlywf8i-182/testobj HTTP/1.1
>> > 18081   2.789605 10.214.131.25 -> 10.214.133.23 TCP 39579 > http [FIN, ACK]
>> > Seq=333 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204
>>
>> This one is really strange. The only reason why I could imagine the FIN here
>> would be if the client had used one *and* there was "option abortonclose",
>> but none of these hypothesis were true.
>>
>> Ah I just saw Cyril's discovery about your erroneous timeouts, he's plain
>> right !
>>
>> > 3. haproxy <-> apache (connection 2)
>> >
>> > haproxy creates a second tcp connection to the same apache server
>> >
>> > 18082   2.789650 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [SYN]
>> > Seq=0 Win=14600 Len=0 MSS=1460 TSV=25177841 TSER=0 WS=7
>> > 18083   2.790896 10.214.133.23 -> 10.214.131.25 TCP http > 39580 [SYN, ACK]
>> > Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 TSV=248773204 TSER=25177841 WS=7
>> > 18084   2.790926 10.214.131.25 -> 10.214.133.23 TCP 39580 > http [ACK]
>> > Seq=1 Ack=1 Win=14720 Len=0 TSV=25177841 TSER=248773204
>>
>> Here we have a bug in my opinion, despite the timeout issue. The new
>> connection should not have been attempted since the request had already
>> been sent. And if the connection remains open, it's precisely because
>> there is no more request to send !
>>
>>
>> I'll have to dig through the code to find how it is possible to get
>> into this situation.
>>
>> Please in parallel recheck with more correct timeouts (at least 4s
>> each).
>>
>
> Sure, checking now. It used to take a while to reproduce, so it might
> take a bit.
>
Yeah, it definitely solved the issue.

Thanks!
Yehuda
Sorry, only registered users may post in this forum.

Click here to login