Welcome! Log In Create A New Profile

Advanced

tcp resets on reload haproxy

Posted by Mostowiec Dominik 
Mostowiec Dominik
tcp resets on reload haproxy
March 20, 2012 03:40PM
Hi,
When I stress testing haproxy and reload it with -sf option:
"The server is now under siege...[error] socket: unable to connect sock.c:222:
Connection reset by peer
[error] socket: unable to connect sock.c:222: Connection >
[error] socket: unable to connect sock.c:222: Connection >
...
"
It sends many TCP RST for a while.

my sysctl option:
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_tw_buckets = 631056
net.ipv4.tcp_max_orphans = 631056

This is known problem?
Can I do something to fix this?

--
Regards,
Dominikrefused
Willy Tarreau
Re: tcp resets on reload haproxy
March 21, 2012 07:50AM
Hi,

On Tue, Mar 20, 2012 at 03:30:18PM +0100, Mostowiec Dominik wrote:
> Hi,
> When I stress testing haproxy and reload it with -sf option:
> "The server is now under siege...[error] socket: unable to connect sock.c:222:
> Connection reset by peer
> [error] socket: unable to connect sock.c:222: Connection >
> [error] socket: unable to connect sock.c:222: Connection >
> ...
> "
> It sends many TCP RST for a while.
>
> my sysctl option:
> net.ipv4.tcp_tw_reuse = 1
> net.ipv4.tcp_tw_recycle = 1
> net.ipv4.tcp_max_tw_buckets = 631056
> net.ipv4.tcp_max_orphans = 631056
>
> This is known problem?

Yes, this is a known limitation. When you use -sf, the new process and
the old one synchronize so that the old one releases the listening FD
and the new one starts listening. The window is very short, but closing
the old FD means that pending connections requests are dropped at the
same time, causing the RSTs you observe. The higher the RTT between
your clients and your box, the more likely you are to see them. It's
hard to observe them on a local network under normal loads.

> Can I do something to fix this?

Krisztian Ivancso is working on FD passing between the old and the new
process, which should catch most of these issues. The difficulty remains
in identifying which FD can be reused and possibly adjusted when a number
of options have been set (eg: MSS, interface binding, ...).

In the mean time there is a kernel patch available on the site to enable
SO_REUSE_PORT, which allows both processes to bind the port at the same
time. It totally clears the uncertainty window since the new process binds
and only then asks the other one to release the ports. But still there are
a few RST left due to the half-open connections that cannot be transferred.

Regards,
Willy
Mariusz Gronczewski
Re: tcp resets on reload haproxy
March 22, 2012 01:10AM
2012/3/21 Willy Tarreau <[email protected]>:
>> Can I do something to fix this?
>
> Krisztian Ivancso is working on FD passing between the old and the new
> process, which should catch most of these issues. The difficulty remains
> in identifying which FD can be reused and possibly adjusted when a number
> of options have been set (eg: MSS, interface binding, ...).
>
> In the mean time there is a kernel patch available on the site to enable
> SO_REUSE_PORT, which allows both processes to bind the port at the same
> time. It totally clears the uncertainty window since the new process binds
> and only then asks the other one to release the ports. But still there are
> a few RST left due to the half-open connections that cannot be transferred.
>
> Regards,
> Willy
>
>
There is "simple and ugly" hack for that, you can block sending RST
packets on iptables when restarting haproxy, that way clients who sent
SYN when haproxy port was down will just retransmit.
Other way would be to start new haproxy copy on another port, do
iptables REDIRECT on it, then reload config on "main instance" and
remove REDIRECT, which would be even ugiler as you'd need 2 different
configs.


--
Mariusz Gronczewski
Willy Tarreau
Re: tcp resets on reload haproxy
March 24, 2012 09:20PM
On Thu, Mar 22, 2012 at 01:01:50AM +0100, Mariusz Gronczewski wrote:
> 2012/3/21 Willy Tarreau <[email protected]>:
> >> Can I do something to fix this?
> >
> > Krisztian Ivancso is working on FD passing between the old and the new
> > process, which should catch most of these issues. The difficulty remains
> > in identifying which FD can be reused and possibly adjusted when a number
> > of options have been set (eg: MSS, interface binding, ...).
> >
> > In the mean time there is a kernel patch available on the site to enable
> > SO_REUSE_PORT, which allows both processes to bind the port at the same
> > time. It totally clears the uncertainty window since the new process binds
> > and only then asks the other one to release the ports. But still there are
> > a few RST left due to the half-open connections that cannot be transferred.
> >
> > Regards,
> > Willy
> >
> >
> There is "simple and ugly" hack for that, you can block sending RST
> packets on iptables when restarting haproxy, that way clients who sent
> SYN when haproxy port was down will just retransmit.

No, don't do that, it will not work and will be even worse. The RST is
sent in response to the ACK, so with your hack, the client will infinitely
retransmit the ACK.

> Other way would be to start new haproxy copy on another port, do
> iptables REDIRECT on it, then reload config on "main instance" and
> remove REDIRECT, which would be even ugiler as you'd need 2 different
> configs.

There are people who proceed differently :
- iptables -I INPUT -p tcp --dport $PORT --syn -j DROP
- sleep 1
- service haproxy restart
- iptables -D INPUT -p tcp --dport $PORT --syn -j DROP

This has the effect of dropping the SYN before a restart, so that clients
will resend this SYN until it reaches the new process.

Regards,
Willy
Mostowiec Dominik
RE: tcp resets on reload haproxy
March 30, 2012 04:00PM
Hi,
Thanks for the response.

I have another problem:
11:20:58.713922 IP siege_host.46589 > loadbalancer.8123: Flags , seq 1849604553, win 14600, options [mss 1460,nop,wscale 4], length 0
11:20:58.713951 IP loadbalancer.8123 > siege_host.46589: Flags [S.], seq 121266129, ack 1849604554, win 14600, options [mss 1460,nop,wscale 6], length 0
11:20:58.714687 IP siege_host.46589 > loadbalancer.8123: Flags [.], ack 1, win 913, length 0
11:20:58.714894 IP siege_host.46589 > loadbalancer.8123: Flags [P.], seq 1:151, ack 1, win 913, length 150
11:21:00.717226 IP siege_host.46589 > loadbalancer.8123: Flags [F.], seq 151, ack 1, win 913, length 0
11:21:00.717254 IP loadbalancer.8123 > siege_host.46589: Flags [.], ack 1, win 229, length 0
11:21:01.723109 IP siege_host.46589 > loadbalancer.8123: Flags [P.], seq 1:151, ack 1, win 913, length 150
11:21:01.723135 IP loadbalancer.8123 > siege_host.46589: Flags [.], ack 152, win 245, length 0
11:21:01.724902 IP loadbalancer.8123 > siege_host.46589: Flags [.], seq 1:1461, ack 152, win 245, length 1460
11:21:01.724929 IP loadbalancer.8123 > siege_host.46589: Flags [.], seq 1461:2921, ack 152, win 245, length 1460
11:21:01.724936 IP loadbalancer.8123 > siege_host.46589: Flags [.], seq 2921:4381, ack 152, win 245, length 1460
11:21:01.724942 IP loadbalancer.8123 > siege_host.46589: Flags [.], seq 4381:5841, ack 152, win 245, length 1460
11:21:01.724948 IP loadbalancer.8123 > siege_host.46589: Flags [.], seq 5841:7301, ack 152, win 245, length 1460
11:21:01.724952 IP loadbalancer.8123 > siege_host.46589: Flags [P.], seq 7301:7650, ack 152, win 245, length 349
11:21:01.724981 IP loadbalancer.8123 > siege_host.46589: Flags [.], seq 7650:9110, ack 152, win 245, length 1460
11:21:01.725003 IP loadbalancer.8123 > siege_host.46589: Flags [.], seq 9110:10570, ack 152, win 245, length 1460
11:21:01.725012 IP loadbalancer.8123 > siege_host.46589: Flags [.], seq 10570:12030, ack 152, win 245, length 1460
11:21:01.725020 IP loadbalancer.8123 > siege_host.46589: Flags [P.], seq 12030:13490, ack 152, win 245, length 1460
11:21:01.725278 IP siege_host.46589 > loadbalancer.8123: Flags [R], seq 1849604705, win 0, length 0
11:21:01.725295 IP siege_host.46589 > loadbalancer.8123: Flags [R], seq 1849604705, win 0, length 0
11:21:01.725302 IP siege_host.46589 > loadbalancer.8123: Flags [R], seq 1849604705, win 0, length 0

That: 11:21:00.717226 IP siege_host.46589 > loadbalancer.8123: Flags [F.],
Siege on siege_host has timeout=2s.

It seems that haproxy not process first packet for 2s.

By default I have haproxy options like:
global
maxconn 163937
user haproxy
group haproxy
daemon
nbproc 16

defaults
log global
mode http
option httplog
option dontlognull
option forwardfor
retries 1
contimeout 1s
clitimeout 33s
srvtimeout 33s
grace 7s

listnen options:
mode http
option splice-response
option http-server-close

default backend options:
mode http
balance roundrobin
option redispatch
option httpchk GET /lbHealthCheck HTTP/1.0\r\nConnection:\ close
default-server inter 5s rise 1 fall 1 port 80
timeout check 5s

Haproxy is started with "-n 163937 -N 163937" options.

I attached stats for test when nbproc is set to '1'.

Somthing is wrong with my configuration ?

--
Dominik


-----Original Message-----
From: Willy Tarreau [mailto:[email protected]]
Sent: Wednesday, March 21, 2012 7:46 AM
To: Mostowiec Dominik
Cc: haproxy@formilux.org
Subject: Re: tcp resets on reload haproxy

Hi,

On Tue, Mar 20, 2012 at 03:30:18PM +0100, Mostowiec Dominik wrote:
> Hi,
> When I stress testing haproxy and reload it with -sf option:
> "The server is now under siege...[error] socket: unable to connect sock.c:222:
> Connection reset by peer
> [error] socket: unable to connect sock.c:222: Connection >
> [error] socket: unable to connect sock.c:222: Connection >
> ...
> "
> It sends many TCP RST for a while.
>
> my sysctl option:
> net.ipv4.tcp_tw_reuse = 1
> net.ipv4.tcp_tw_recycle = 1
> net.ipv4.tcp_max_tw_buckets = 631056
> net.ipv4.tcp_max_orphans = 631056
>
> This is known problem?

Yes, this is a known limitation. When you use -sf, the new process and
the old one synchronize so that the old one releases the listening FD
and the new one starts listening. The window is very short, but closing
the old FD means that pending connections requests are dropped at the
same time, causing the RSTs you observe. The higher the RTT between
your clients and your box, the more likely you are to see them. It's
hard to observe them on a local network under normal loads.

> Can I do something to fix this?

Krisztian Ivancso is working on FD passing between the old and the new
process, which should catch most of these issues. The difficulty remains
in identifying which FD can be reused and possibly adjusted when a number
of options have been set (eg: MSS, interface binding, ...).

In the mean time there is a kernel patch available on the site to enable
SO_REUSE_PORT, which allows both processes to bind the port at the same
time. It totally clears the uncertainty window since the new process binds
and only then asks the other one to release the ports. But still there are
a few RST left due to the half-open connections that cannot be transferred.

Regards,
Willy
Attachments:
open | download - frontend-backend.JPG (62.9 KB)
Willy Tarreau
Re: tcp resets on reload haproxy
March 31, 2012 07:00PM
Hi Dominik,

On Fri, Mar 30, 2012 at 03:52:20PM +0200, Mostowiec Dominik wrote:
> Hi,
> Thanks for the response.
>
> I have another problem:
>
> 11:20:58.713922 IP siege_host.46589 > loadbalancer.8123: Flags , seq 1849604553, win 14600, options [mss 1460,nop,wscale 4], length 0
> 11:20:58.713951 IP loadbalancer.8123 > siege_host.46589: Flags [S.], seq 121266129, ack 1849604554, win 14600, options [mss 1460,nop,wscale 6], length 0
> 11:20:58.714687 IP siege_host.46589 > loadbalancer.8123: Flags [.], ack 1, win 913, length 0
> 11:20:58.714894 IP siege_host.46589 > loadbalancer.8123: Flags [P.], seq 1:151, ack 1, win 913, length 150
> 11:21:00.717226 IP siege_host.46589 > loadbalancer.8123: Flags [F.], seq 151, ack 1, win 913, length 0
> 11:21:00.717254 IP loadbalancer.8123 > siege_host.46589: Flags [.], ack 1, win 229, length 0

Did you notice that your request packet (the 4th) was lost on the network ?

That's one reason why we always want to set timeouts above 3 sec (generally
4 or 5), so that it covers one TCP retransmit. I guess you captured on the
siege_host (you did not have -vv nor -S so some info are missing) ? Also,
you shoul be careful with the system config on siege_host, as it does not
have SACK enabled, which makes things worse when your network is lossy.

This packet loss issue is the reason for the pause you observe since the
request never reaches haproxy. If you increase your siege timeout above 3s
you'll see that many requests take 3s to be processed due to the retransmit
and that other ones still fail. You really need to find what is causing
these losses and to fix that, it's impossible to run a benchmark on a lossy
network! Check your switches and your NICs. Ensure you're not running with
an old bnx2 NIC with an old firmware.

BTW I have a few comments about your config :

> global
> maxconn 163937

What's the reason for this magic number ?

> user haproxy
> group haproxy
> daemon
> nbproc 16

Wow 16 procs ! I don't know what you intend to do, but it will generally
not bring anything and might even reduce the performance.

> defaults
> log global
> mode http
> option httplog
> option dontlognull
> option forwardfor
> retries 1
> contimeout 1s

< 3s timeout, see above

> clitimeout 33s
> srvtimeout 33s
> grace 7s

grace serves no purpose these days, especially if all instances
share the same setting (the goal was to make some instances stop
before other ones to fail external health checks).

I see that you have no default maxconn, so your frontends will still
be limited by the default maxconn (2000).

(...)
> Haproxy is started with "-n 163937 -N 163937" options.

OK so -N sets it. Still strange value anyway.

> I attached stats for test when nbproc is set to '1'.

Hmmm the load was very low :

691 MB/20k conn = 34kB per connection
At peak you reached 34kB*850 sess/s = 29 MB/s ~= 250 Mbps

It's very concerning that you're experiencing network losses at this
rate. Just a hint, it's more likely that the losses are located on
the siege host or between it and the network than on the haproxy
host, because when you run haproxy on a lossy machine you generally
observe failed health checks, which you didn't have here during the
test.

> Somthing is wrong with my configuration ?

Not particularly, let aside the strange numbers.

Regards,
Willy
Mostowiec Dominik
ODP: tcp resets on reload haproxy
April 01, 2012 01:50PM
Hi,

>> maxconn 163937
> What's the reason for this magic number ?
It's random :-)

> Did you notice that your request packet (the 4th) was lost on the network ?
> I guess you captured on the siege_host

I captured this on loadbalancer host :-( It's not network loses.

> you did not have -vv nor -S so some info are missing
I recorded this to a file, with -vv:

11:20:58.713922 IP (tos 0x0, ttl 64, id 7370, offset 0, flags [DF], proto TCP (6), length 48)
siege_host.46589 > loadbalancer.8123: Flags , cksum 0xe536 (correct), seq 1849604553, win 14600, options [mss 1460,nop,wscale 4], length 0
11:20:58.713951 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 48)
loadbalancer.8123 > siege_host.46589: Flags [S.], cksum 0x683e (incorrect -> 0x7e18), seq 121266129, ack 1849604554, win 14600, options [mss 1460,nop,wscale 6], length 0
11:20:58.714687 IP (tos 0x0, ttl 64, id 7371, offset 0, flags [DF], proto TCP (6), length 40)
siege_host.46589 > loadbalancer.8123: Flags [.], cksum 0xdf59 (correct), seq 1, ack 1, win 913, length 0
11:20:58.714894 IP (tos 0x0, ttl 64, id 7372, offset 0, flags [DF], proto TCP (6), length 190)
siege_host.46589 > loadbalancer.8123: Flags [P.], cksum 0x11eb (correct), seq 1:151, ack 1, win 913, length 150
11:21:00.717226 IP (tos 0x0, ttl 64, id 7373, offset 0, flags [DF], proto TCP (6), length 40)
siege_host.46589 > loadbalancer.8123: Flags [F.], cksum 0xdec2 (correct), seq 151, ack 1, win 913, length 0
11:21:00.717254 IP (tos 0x0, ttl 64, id 17608, offset 0, flags [DF], proto TCP (6), length 40)
loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6836 (incorrect -> 0xe205), seq 1, ack 1, win 229, length 0
11:21:01.723109 IP (tos 0x0, ttl 64, id 7374, offset 0, flags [DF], proto TCP (6), length 190)
siege_host.46589 > loadbalancer.8123: Flags [P.], cksum 0x11eb (correct), seq 1:151, ack 1, win 913, length 150
11:21:01.723135 IP (tos 0x0, ttl 64, id 17609, offset 0, flags [DF], proto TCP (6), length 40)
loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6836 (incorrect -> 0xe15e), seq 1, ack 152, win 245, length 0
11:21:01.724902 IP (tos 0x0, ttl 64, id 17610, offset 0, flags [DF], proto TCP (6), length 1500)
loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 0xbbac), seq 1:1461, ack 152, win 245, length 1460
11:21:01.724929 IP (tos 0x0, ttl 64, id 17611, offset 0, flags [DF], proto TCP (6), length 1500)
loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 0x2b9c), seq 1461:2921, ack 152, win 245, length 1460
11:21:01.724936 IP (tos 0x0, ttl 64, id 17612, offset 0, flags [DF], proto TCP (6), length 1500)
loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 0x2b79), seq 2921:4381, ack 152, win 245, length 1460
11:21:01.724942 IP (tos 0x0, ttl 64, id 17613, offset 0, flags [DF], proto TCP (6), length 1500)
loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 0x1e97), seq 4381:5841, ack 152, win 245, length 1460
11:21:01.724948 IP (tos 0x0, ttl 64, id 17614, offset 0, flags [DF], proto TCP (6), length 1500)
loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 0xd630), seq 5841:7301, ack 152, win 245, length 1460
11:21:01.724952 IP (tos 0x0, ttl 64, id 17615, offset 0, flags [DF], proto TCP (6), length 389)
loadbalancer.8123 > siege_host.46589: Flags [P.], cksum 0x6993 (incorrect -> 0xec72), seq 7301:7650, ack 152, win 245, length 349
11:21:01.724981 IP (tos 0x0, ttl 64, id 17616, offset 0, flags [DF], proto TCP (6), length 1500)
loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 0x3378), seq 7650:9110, ack 152, win 245, length 1460
11:21:01.725003 IP (tos 0x0, ttl 64, id 17617, offset 0, flags [DF], proto TCP (6), length 1500)
loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 0x9de5), seq 9110:10570, ack 152, win 245, length 1460
11:21:01.725012 IP (tos 0x0, ttl 64, id 17618, offset 0, flags [DF], proto TCP (6), length 1500)
loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6dea (incorrect -> 0x6aad), seq 10570:12030, ack 152, win 245, length 1460
11:21:01.725020 IP (tos 0x0, ttl 64, id 17619, offset 0, flags [DF], proto TCP (6), length 1500)
loadbalancer.8123 > siege_host.46589: Flags [P.], cksum 0x6dea (incorrect -> 0x534b), seq 12030:13490, ack 152, win 245, length 1460
11:21:01.725278 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
siege_host.46589 > loadbalancer.8123: Flags [R], cksum 0x496c (correct), seq 1849604705, win 0, length 0
11:21:01.725295 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
siege_host.46589 > loadbalancer.8123: Flags [R], cksum 0x496c (correct), seq 1849604705, win 0, length 0
11:21:01.725302 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
siege_host.46589 > loadbalancer.8123: Flags [R], cksum 0x496c (correct), seq 1849604705, win 0, length 0

Request are retransmitted:

---
010.177.050.158.46589-010.177.032.028.08123: GET / HTTP/1.1
Host: clouddev.onet:8123
Accept: */*
Accept-Encoding: gzip
User-Agent: JoeDog/1.00 [en] (X11; I; Siege 2.70)
Connection: close


010.177.050.158.46589-010.177.032.028.08123: GET / HTTP/1.1
Host: clouddev.onet:8123
Accept: */*
Accept-Encoding: gzip
User-Agent: JoeDog/1.00 [en] (X11; I; Siege 2.70)
Connection: close

.....
---


> Wow 16 procs ! I don't know what you intend to do, but it will generally
> not bring anything and might even reduce the performance.

I have 2x6 core server (24 core in ht).

-
Regards
Dominik

________________________________________
Od: Willy Tarreau [[email protected]]
Wysłano: 31 marca 2012 18:58
Do: Mostowiec Dominik
DW: haproxy@formilux.org
Temat: Re: tcp resets on reload haproxy

Hi Dominik,

On Fri, Mar 30, 2012 at 03:52:20PM +0200, Mostowiec Dominik wrote:
> Hi,
> Thanks for the response.
>
> I have another problem:
>
> 11:20:58.713922 IP siege_host.46589 > loadbalancer.8123: Flags , seq 1849604553, win 14600, options [mss 1460,nop,wscale 4], length 0
> 11:20:58.713951 IP loadbalancer.8123 > siege_host.46589: Flags [S.], seq 121266129, ack 1849604554, win 14600, options [mss 1460,nop,wscale 6], length 0
> 11:20:58.714687 IP siege_host.46589 > loadbalancer.8123: Flags [.], ack 1, win 913, length 0
> 11:20:58.714894 IP siege_host.46589 > loadbalancer.8123: Flags [P.], seq 1:151, ack 1, win 913, length 150
> 11:21:00.717226 IP siege_host.46589 > loadbalancer.8123: Flags [F.], seq 151, ack 1, win 913, length 0
> 11:21:00.717254 IP loadbalancer.8123 > siege_host.46589: Flags [.], ack 1, win 229, length 0

Did you notice that your request packet (the 4th) was lost on the network ?

That's one reason why we always want to set timeouts above 3 sec (generally
4 or 5), so that it covers one TCP retransmit. I guess you captured on the
siege_host (you did not have -vv nor -S so some info are missing) ? Also,
you shoul be careful with the system config on siege_host, as it does not
have SACK enabled, which makes things worse when your network is lossy.

This packet loss issue is the reason for the pause you observe since the
request never reaches haproxy. If you increase your siege timeout above 3s
you'll see that many requests take 3s to be processed due to the retransmit
and that other ones still fail. You really need to find what is causing
these losses and to fix that, it's impossible to run a benchmark on a lossy
network! Check your switches and your NICs. Ensure you're not running with
an old bnx2 NIC with an old firmware.

BTW I have a few comments about your config :

> global
> maxconn 163937

What's the reason for this magic number ?

> user haproxy
> group haproxy
> daemon
> nbproc 16

Wow 16 procs ! I don't know what you intend to do, but it will generally
not bring anything and might even reduce the performance.

> defaults
> log global
> mode http
> option httplog
> option dontlognull
> option forwardfor
> retries 1
> contimeout 1s

< 3s timeout, see above

> clitimeout 33s
> srvtimeout 33s
> grace 7s

grace serves no purpose these days, especially if all instances
share the same setting (the goal was to make some instances stop
before other ones to fail external health checks).

I see that you have no default maxconn, so your frontends will still
be limited by the default maxconn (2000).

(...)
> Haproxy is started with "-n 163937 -N 163937" options.

OK so -N sets it. Still strange value anyway.

> I attached stats for test when nbproc is set to '1'.

Hmmm the load was very low :

691 MB/20k conn = 34kB per connection
At peak you reached 34kB*850 sess/s = 29 MB/s ~= 250 Mbps

It's very concerning that you're experiencing network losses at this
rate. Just a hint, it's more likely that the losses are located on
the siege host or between it and the network than on the haproxy
host, because when you run haproxy on a lossy machine you generally
observe failed health checks, which you didn't have here during the
test.

> Somthing is wrong with my configuration ?

Not particularly, let aside the strange numbers.

Regards,
Willy
Willy Tarreau
Re: ODP: tcp resets on reload haproxy
April 01, 2012 08:30PM
Hi Dominik,

On Sun, Apr 01, 2012 at 01:43:31PM +0200, Mostowiec Dominik wrote:
> Hi,
>
> >> maxconn 163937
> > What's the reason for this magic number ?
> It's random :-)

OK

> > Did you notice that your request packet (the 4th) was lost on the network ?
> > I guess you captured on the siege_host
>
> I captured this on loadbalancer host :-( It's not network loses.

So please check network stats using "netstat -s", you're having something
causing incoming packets to be dropped, and that really does not make sense
at all.

> > you did not have -vv nor -S so some info are missing
> I recorded this to a file, with -vv:

thank you, it's better now.

>
> 11:20:58.713922 IP (tos 0x0, ttl 64, id 7370, offset 0, flags [DF], proto TCP (6), length 48)
> siege_host.46589 > loadbalancer.8123: Flags , cksum 0xe536 (correct), seq 1849604553, win 14600, options [mss 1460,nop,wscale 4], length 0
> 11:20:58.713951 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 48)
> loadbalancer.8123 > siege_host.46589: Flags [S.], cksum 0x683e (incorrect -> 0x7e18), seq 121266129, ack 1849604554, win 14600, options [mss 1460,nop,wscale 6], length 0
> 11:20:58.714687 IP (tos 0x0, ttl 64, id 7371, offset 0, flags [DF], proto TCP (6), length 40)
> siege_host.46589 > loadbalancer.8123: Flags [.], cksum 0xdf59 (correct), seq 1, ack 1, win 913, length 0
> 11:20:58.714894 IP (tos 0x0, ttl 64, id 7372, offset 0, flags [DF], proto TCP (6), length 190)
> siege_host.46589 > loadbalancer.8123: Flags [P.], cksum 0x11eb (correct), seq 1:151, ack 1, win 913, length 150

Checksum is correct, TTL is not null, everything is fine, still it's being
dropped.

> 11:21:00.717226 IP (tos 0x0, ttl 64, id 7373, offset 0, flags [DF], proto TCP (6), length 40)
> siege_host.46589 > loadbalancer.8123: Flags [F.], cksum 0xdec2 (correct), seq 151, ack 1, win 913, length 0
> 11:21:00.717254 IP (tos 0x0, ttl 64, id 17608, offset 0, flags [DF], proto TCP (6), length 40)
> loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6836 (incorrect -> 0xe205), seq 1, ack 1, win 229, length 0
> 11:21:01.723109 IP (tos 0x0, ttl 64, id 7374, offset 0, flags [DF], proto TCP (6), length 190)
> siege_host.46589 > loadbalancer.8123: Flags [P.], cksum 0x11eb (correct), seq 1:151, ack 1, win 913, length 150

And the retransmitted one is exactly the same even with the same checksum but
it's accepted this time.

Among the things that come to mind :
- to you have netfilter loaded ? It must pass this without any issue but
we have to find what causes a packet to be lost.
- check your network sysctls (sysctl -a |fgrep net.), maybe some buffer is
too small ?
- check kernel logs (dmesg) to see if you notice anything suspicious.

The packet *should* be acked and *should* be delivered to haproxy. There is
no reason it is dropped like this, because the TCP stack did not even notice
it (otherwise it would have been ACKed).

Last, what's your kernel version ? It would surprize me a lot that we'd be
facing so big a bug in the network stack, but we have to consider every
possibility.

(...)
> Request are retransmitted:

Yes that's what is observed in your trace, the request is what is in the
PUSH packet which is not ACKed. The fact that it is not ACKed indicates
that the packet was not seen by the TCP stack, which is abnormal since it
reached tcpdump at least. Too small network buffers could explain this
but at such low numbers I'm really doubting.

(...)
> > Wow 16 procs ! I don't know what you intend to do, but it will generally
> > not bring anything and might even reduce the performance.
>
> I have 2x6 core server (24 core in ht).

That doesn't change anything. Workloads consisting in fast connection
setup/teardown do not scale well on multiple cores because there is a
substantial amount of locking in the TCP stack to select a source port,
update counters, etc... And what we're doing previsely is to make this
part work a lot (under load only, here the load was low to moderate).

Multiple cores can help when doing complex processing (ssl, compression)
but not for short sessions.

Willy
Mostowiec Dominik
RE: tcp resets on reload haproxy
April 04, 2012 09:10AM
Hi,
> Krisztian Ivancso is working on FD passing
How long will it take?

--
Regards
Dominik

-----Original Message-----
From: Willy Tarreau [mailto:[email protected]]
Sent: Wednesday, March 21, 2012 7:46 AM
To: Mostowiec Dominik
Cc: haproxy@formilux.org
Subject: Re: tcp resets on reload haproxy

Hi,

On Tue, Mar 20, 2012 at 03:30:18PM +0100, Mostowiec Dominik wrote:
> Hi,
> When I stress testing haproxy and reload it with -sf option:
> "The server is now under siege...[error] socket: unable to connect sock.c:222:
> Connection reset by peer
> [error] socket: unable to connect sock.c:222: Connection >
> [error] socket: unable to connect sock.c:222: Connection >
> ...
> "
> It sends many TCP RST for a while.
>
> my sysctl option:
> net.ipv4.tcp_tw_reuse = 1
> net.ipv4.tcp_tw_recycle = 1
> net.ipv4.tcp_max_tw_buckets = 631056
> net.ipv4.tcp_max_orphans = 631056
>
> This is known problem?

Yes, this is a known limitation. When you use -sf, the new process and
the old one synchronize so that the old one releases the listening FD
and the new one starts listening. The window is very short, but closing
the old FD means that pending connections requests are dropped at the
same time, causing the RSTs you observe. The higher the RTT between
your clients and your box, the more likely you are to see them. It's
hard to observe them on a local network under normal loads.

> Can I do something to fix this?

Krisztian Ivancso is working on FD passing between the old and the new
process, which should catch most of these issues. The difficulty remains
in identifying which FD can be reused and possibly adjusted when a number
of options have been set (eg: MSS, interface binding, ...).

In the mean time there is a kernel patch available on the site to enable
SO_REUSE_PORT, which allows both processes to bind the port at the same
time. It totally clears the uncertainty window since the new process binds
and only then asks the other one to release the ports. But still there are
a few RST left due to the half-open connections that cannot be transferred.

Regards,
Willy
Willy Tarreau
Re: tcp resets on reload haproxy
April 04, 2012 09:40AM
On Wed, Apr 04, 2012 at 09:08:25AM +0200, Mostowiec Dominik wrote:
> Hi,
> > Krisztian Ivancso is working on FD passing
> How long will it take?

No idea, it's experimental right now so it may take some time before
it's finished and mergeable. We might also encounter complex issues
inherent to the existing architecture, I don't know.

Willy
Mostowiec Dominik
ODP: ODP: tcp resets on reload haproxy
April 06, 2012 01:50PM
Hi,
I think we partly solved problem.
It was probably caused by this magic number, not because it is magic bat because is too big.
> maxconn 163937
Before the change we got 0,5k PV/s (40 concurrent test workers, 3 backend servers - max 0,9k PV/s per server for the same test, tested 32kB file).
When we changed maxconn to 30k for global and default section and remove from start parameters("-n 163937 -N 163937") we have ~2k PV/s.
For the test we change it again to 163937 and performance reduce to previous 0,5kPV/s.

I wrote "partly solved" because for another test: 3 backend servers each max 8k PV/s for 12,5 kB file we got max 6k PV/s.

--
Regards
Dominik

_______________________________________
Od: Willy Tarreau [[email protected]]
Wysłano: 1 kwietnia 2012 20:27
Do: Mostowiec Dominik
DW: haproxy@formilux.org
Temat: Re: ODP: tcp resets on reload haproxy

Hi Dominik,

On Sun, Apr 01, 2012 at 01:43:31PM +0200, Mostowiec Dominik wrote:
> Hi,
>
> >> maxconn 163937
> > What's the reason for this magic number ?
> It's random :-)

OK

> > Did you notice that your request packet (the 4th) was lost on the network ?
> > I guess you captured on the siege_host
>
> I captured this on loadbalancer host :-( It's not network loses.

So please check network stats using "netstat -s", you're having something
causing incoming packets to be dropped, and that really does not make sense
at all.

> > you did not have -vv nor -S so some info are missing
> I recorded this to a file, with -vv:

thank you, it's better now.

>
> 11:20:58.713922 IP (tos 0x0, ttl 64, id 7370, offset 0, flags [DF], proto TCP (6), length 48)
> siege_host.46589 > loadbalancer.8123: Flags , cksum 0xe536 (correct), seq 1849604553, win 14600, options [mss 1460,nop,wscale 4], length 0
> 11:20:58.713951 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 48)
> loadbalancer.8123 > siege_host.46589: Flags [S.], cksum 0x683e (incorrect -> 0x7e18), seq 121266129, ack 1849604554, win 14600, options [mss 1460,nop,wscale 6], length 0
> 11:20:58.714687 IP (tos 0x0, ttl 64, id 7371, offset 0, flags [DF], proto TCP (6), length 40)
> siege_host.46589 > loadbalancer.8123: Flags [.], cksum 0xdf59 (correct), seq 1, ack 1, win 913, length 0
> 11:20:58.714894 IP (tos 0x0, ttl 64, id 7372, offset 0, flags [DF], proto TCP (6), length 190)
> siege_host.46589 > loadbalancer.8123: Flags [P.], cksum 0x11eb (correct), seq 1:151, ack 1, win 913, length 150

Checksum is correct, TTL is not null, everything is fine, still it's being
dropped.

> 11:21:00.717226 IP (tos 0x0, ttl 64, id 7373, offset 0, flags [DF], proto TCP (6), length 40)
> siege_host.46589 > loadbalancer.8123: Flags [F.], cksum 0xdec2 (correct), seq 151, ack 1, win 913, length 0
> 11:21:00.717254 IP (tos 0x0, ttl 64, id 17608, offset 0, flags [DF], proto TCP (6), length 40)
> loadbalancer.8123 > siege_host.46589: Flags [.], cksum 0x6836 (incorrect -> 0xe205), seq 1, ack 1, win 229, length 0
> 11:21:01.723109 IP (tos 0x0, ttl 64, id 7374, offset 0, flags [DF], proto TCP (6), length 190)
> siege_host.46589 > loadbalancer.8123: Flags [P.], cksum 0x11eb (correct), seq 1:151, ack 1, win 913, length 150

And the retransmitted one is exactly the same even with the same checksum but
it's accepted this time.

Among the things that come to mind :
- to you have netfilter loaded ? It must pass this without any issue but
we have to find what causes a packet to be lost.
- check your network sysctls (sysctl -a |fgrep net.), maybe some buffer is
too small ?
- check kernel logs (dmesg) to see if you notice anything suspicious.

The packet *should* be acked and *should* be delivered to haproxy. There is
no reason it is dropped like this, because the TCP stack did not even notice
it (otherwise it would have been ACKed).

Last, what's your kernel version ? It would surprize me a lot that we'd be
facing so big a bug in the network stack, but we have to consider every
possibility.

(...)
> Request are retransmitted:

Yes that's what is observed in your trace, the request is what is in the
PUSH packet which is not ACKed. The fact that it is not ACKed indicates
that the packet was not seen by the TCP stack, which is abnormal since it
reached tcpdump at least. Too small network buffers could explain this
but at such low numbers I'm really doubting.

(...)
> > Wow 16 procs ! I don't know what you intend to do, but it will generally
> > not bring anything and might even reduce the performance.
>
> I have 2x6 core server (24 core in ht).

That doesn't change anything. Workloads consisting in fast connection
setup/teardown do not scale well on multiple cores because there is a
substantial amount of locking in the TCP stack to select a source port,
update counters, etc... And what we're doing previsely is to make this
part work a lot (under load only, here the load was low to moderate).

Multiple cores can help when doing complex processing (ssl, compression)
but not for short sessions.

Willy
Sorry, only registered users may post in this forum.

Click here to login