Welcome! Log In Create A New Profile

Advanced

[PATCH] [MEDIUM] Improve "no free ports" error case

Posted by Krishna Kumar (Engineering) 
Krishna Kumar (Engineering)
[PATCH] [MEDIUM] Improve "no free ports" error case
March 09, 2017 07:40AM
Hi Willy,

We use HAProxy as a Forward Proxy (I know this is not the intended
application for HAProxy) to access outside world from within the DC, and
this requires setting a source port range for return traffic to reach the
correct
box from which a connection was established. On our production boxes, we
see around 500 "no free ports" errors per day, but this could increase to
about 120K errors during big sale events. The reason for this is due to
connect getting a EADDRNOTAVAIL error, since an earlier closed socket
may be in last-ack state, as it may take some time for the remote server to
send the final ack.

The attached patch reduces the number of errors by attempting more ports,
if they are available.

Please review, and let me know if this sounds reasonable to implement.

Thanks,
- Krishna
Willy Tarreau
Re: [PATCH] [MEDIUM] Improve "no free ports" error case
March 09, 2017 08:00AM
Hi Krishna,

On Thu, Mar 09, 2017 at 12:03:19PM +0530, Krishna Kumar (Engineering) wrote:
> Hi Willy,
>
> We use HAProxy as a Forward Proxy (I know this is not the intended
> application for HAProxy) to access outside world from within the DC, and
> this requires setting a source port range for return traffic to reach the
> correct
> box from which a connection was established. On our production boxes, we
> see around 500 "no free ports" errors per day, but this could increase to
> about 120K errors during big sale events. The reason for this is due to
> connect getting a EADDRNOTAVAIL error, since an earlier closed socket
> may be in last-ack state, as it may take some time for the remote server to
> send the final ack.
>
> The attached patch reduces the number of errors by attempting more ports,
> if they are available.
>
> Please review, and let me know if this sounds reasonable to implement.

Well, while the patch looks clean I'm really not convinced it's the correct
approach. Normally you should simply be using the "retries" parameter to
increase the amount of connect retries. There's nothing wrong with setting
it to a really high value if needed. Doesn't it work in your case ?

Also a few other points :
- when the remote server sends the FIN with the last segment, your
connection ends up in CLOSE_WAIT state. Haproxy then closes as
well, sending a FIN and your socket ends up in LAST_ACK waiting
for the server to respond. You may instead ask haproxy to close
with an RST by setting "option nolinger" in the backend. The port
will then always be free locally. The side effect is that if the
RST is lost, the SYN of a new outgoing connection may get an ACK
instead of a SYN-ACK as a reply and will respond to it with an
RST and try again. This will result in all connections working,
some taking slightly longer a time (typically 1 second).

- 500 outgoing ports is a very low value. You should keep in mind
that nowadays most servers use 60 seconds FIN_WAIT/TIME_WAIT
delays (the remote server remains in FIN_WAIT1 while waiting for
your ACK, then enters TIME_WAIT when receiving your FIN). So with
only 500 ports, you can *safely* support only 500/60 = 8 connections
per second. Fortunately in practice it doesn't work like this
since most of the time connections are correctly closed. But if
you start to enter big trouble, you need to understand that you
can very quickly reach some limits. And 500 outgoing ports means
you don't expect to support more than 500 concurrent conns per
proxy, which seems quite low.

Thus normally what you're experiencing should only be dealt with
using configuration :
- increase retries setting
- possibly enable option nolinger (backend only, never on a frontend)
- try to increase the available source port ranges.

Regards,
Willy
Krishna Kumar (Engineering)
Re: [PATCH] [MEDIUM] Improve "no free ports" error case
March 09, 2017 08:30AM
Hi Willy,

Thanks for your comments.

1. About 'retries', I am not sure if it works for connect() failing
synchronously on the
local system (as opposed to getting a timeout/refused via callback). The
document
on retries says:

" <value> is the number of times a connection attempt should be retried
on
a server when a connection either is refused or times out. The
default value is 3.
"

The two conditions above don't fall in our use case. The way I understood
was that
retries happens during the callback handler. Also I am not sure if there is
any way to
circumvent the "1 second" gap for a retry.

2. For nolinger, it was not recommended in the document, and also I wonder
if any data
loss can happen if the socket is not lingered for some time beyond the FIN
packet that
the remote server sent for doing the close(), delayed data packets, etc.

3. Ports: Actually each HAProxy process has 400 ports limitation to a
single backend,
and there are many haproxy processes on this and other servers. The ports
are split per
process and per system. E.g. system1 has 'n' processes and each have a
separate port
range from each other, system2 has 'n' processes and a completely different
port range.
For infra reasons, we are restricting the total port range. The unique
ports for different
haproxy processes running on same system is to avoid attempting to use the
same port
(first port# in the range) by two processes and failing in connect, when
attempting to
connect to the same remote server. Hope I explained that clearly.

Thanks,
- Krishna


On Thu, Mar 9, 2017 at 12:19 PM, Willy Tarreau <[email protected]> wrote:

> Hi Krishna,
>
> On Thu, Mar 09, 2017 at 12:03:19PM +0530, Krishna Kumar (Engineering)
> wrote:
> > Hi Willy,
> >
> > We use HAProxy as a Forward Proxy (I know this is not the intended
> > application for HAProxy) to access outside world from within the DC, and
> > this requires setting a source port range for return traffic to reach the
> > correct
> > box from which a connection was established. On our production boxes, we
> > see around 500 "no free ports" errors per day, but this could increase to
> > about 120K errors during big sale events. The reason for this is due to
> > connect getting a EADDRNOTAVAIL error, since an earlier closed socket
> > may be in last-ack state, as it may take some time for the remote server
> to
> > send the final ack.
> >
> > The attached patch reduces the number of errors by attempting more ports,
> > if they are available.
> >
> > Please review, and let me know if this sounds reasonable to implement.
>
> Well, while the patch looks clean I'm really not convinced it's the correct
> approach. Normally you should simply be using the "retries" parameter to
> increase the amount of connect retries. There's nothing wrong with setting
> it to a really high value if needed. Doesn't it work in your case ?
>
> Also a few other points :
> - when the remote server sends the FIN with the last segment, your
> connection ends up in CLOSE_WAIT state. Haproxy then closes as
> well, sending a FIN and your socket ends up in LAST_ACK waiting
> for the server to respond. You may instead ask haproxy to close
> with an RST by setting "option nolinger" in the backend. The port
> will then always be free locally. The side effect is that if the
> RST is lost, the SYN of a new outgoing connection may get an ACK
> instead of a SYN-ACK as a reply and will respond to it with an
> RST and try again. This will result in all connections working,
> some taking slightly longer a time (typically 1 second).
>
> - 500 outgoing ports is a very low value. You should keep in mind
> that nowadays most servers use 60 seconds FIN_WAIT/TIME_WAIT
> delays (the remote server remains in FIN_WAIT1 while waiting for
> your ACK, then enters TIME_WAIT when receiving your FIN). So with
> only 500 ports, you can *safely* support only 500/60 = 8 connections
> per second. Fortunately in practice it doesn't work like this
> since most of the time connections are correctly closed. But if
> you start to enter big trouble, you need to understand that you
> can very quickly reach some limits. And 500 outgoing ports means
> you don't expect to support more than 500 concurrent conns per
> proxy, which seems quite low.
>
> Thus normally what you're experiencing should only be dealt with
> using configuration :
> - increase retries setting
> - possibly enable option nolinger (backend only, never on a frontend)
> - try to increase the available source port ranges.
>
> Regards,
> Willy
>
Willy Tarreau
Re: [PATCH] [MEDIUM] Improve "no free ports" error case
March 09, 2017 09:10AM
On Thu, Mar 09, 2017 at 12:50:16PM +0530, Krishna Kumar (Engineering) wrote:
> 1. About 'retries', I am not sure if it works for connect() failing
> synchronously on the
> local system (as opposed to getting a timeout/refused via callback).

Yes it normally does. I've been using it for the same purpose in certain
situations (eg: binding to a source port range while some daemons are
later bound into that range).

> The
> document
> on retries says:
>
> " <value> is the number of times a connection attempt should be retried
> on
> a server when a connection either is refused or times out. The
> default value is 3.
> "
>
> The two conditions above don't fall in our use case.

It's still a refused connection :-)

> The way I understood was that
> retries happens during the callback handler. Also I am not sure if there is
> any way to circumvent the "1 second" gap for a retry.

Hmmm I have to check. In fact when the LB algorithm is not determinist
we immediately retry on another server. If we're supposed to end up only
on the same server we indeed apply the delay. But if it's a synchronous
error, I don't know. And I think it's one case (especially -EADDRNOTAVAIL)
where we should immediately retry.

> 2. For nolinger, it was not recommended in the document,

It's indeed strongly recommended against, mainly because we've started
to see it in configs copy-pasted from blogs without understanding the
impacts.

> and also I wonder if any data
> loss can happen if the socket is not lingered for some time beyond the FIN
> packet that
> the remote server sent for doing the close(), delayed data packets, etc.

The data loss happens only with outgoing data, so for HTTP it's data
sent to the client which are at risk. Data coming from the server are
properly consumed. In fact, when you configure "http-server-close",
the nolinger is automatically enabled in your back so that haproxy
can close the server connection without accumulating time-waits.

> 3. Ports: Actually each HAProxy process has 400 ports limitation to a
> single backend,
> and there are many haproxy processes on this and other servers. The ports
> are split per
> process and per system. E.g. system1 has 'n' processes and each have a
> separate port
> range from each other, system2 has 'n' processes and a completely different
> port range.
> For infra reasons, we are restricting the total port range. The unique
> ports for different
> haproxy processes running on same system is to avoid attempting to use the
> same port
> (first port# in the range) by two processes and failing in connect, when
> attempting to
> connect to the same remote server. Hope I explained that clearly.

Yep I clearly see the use case. That's one of the rare cases where it's
interesting to use SNAT between your haproxy nodes and the internet. This
way you'll use a unified ports pool for all your nodes and will not have
to reserve port ranges per system and per process. Each process will then
share the system's local source ports, and each system will have a different
address. Then the SNAT will convert these IP1..N:port1..N to the public IP
address and an available port. This will offer you more flexibility to add
or remove nodes/processes etc. Maybe your total traffic cannot pass through
a single SNAT box though in which case I understand that you don't have
much choice. However you could then at least not force each process' port
range and instead fix the system's local port range so that you know that
all processes of a single machine share a same port range. That's already
better because you won't be forcing to assign ports from unfinished
connections.

Willy
Krishna Kumar (Engineering)
Re: [PATCH] [MEDIUM] Improve "no free ports" error case
March 09, 2017 10:00AM
Hi Willy,

Excellent, I will try this idea, it should definitely help!
Thanks for the explanations.

Regards,
- Krishna


On Thu, Mar 9, 2017 at 1:37 PM, Willy Tarreau <[email protected]> wrote:

> On Thu, Mar 09, 2017 at 12:50:16PM +0530, Krishna Kumar (Engineering)
> wrote:
> > 1. About 'retries', I am not sure if it works for connect() failing
> > synchronously on the
> > local system (as opposed to getting a timeout/refused via callback).
>
> Yes it normally does. I've been using it for the same purpose in certain
> situations (eg: binding to a source port range while some daemons are
> later bound into that range).
>
> > The
> > document
> > on retries says:
> >
> > " <value> is the number of times a connection attempt should be
> retried
> > on
> > a server when a connection either is refused or times out.
> The
> > default value is 3.
> > "
> >
> > The two conditions above don't fall in our use case.
>
> It's still a refused connection :-)
>
> > The way I understood was that
> > retries happens during the callback handler. Also I am not sure if there
> is
> > any way to circumvent the "1 second" gap for a retry.
>
> Hmmm I have to check. In fact when the LB algorithm is not determinist
> we immediately retry on another server. If we're supposed to end up only
> on the same server we indeed apply the delay. But if it's a synchronous
> error, I don't know. And I think it's one case (especially -EADDRNOTAVAIL)
> where we should immediately retry.
>
> > 2. For nolinger, it was not recommended in the document,
>
> It's indeed strongly recommended against, mainly because we've started
> to see it in configs copy-pasted from blogs without understanding the
> impacts.
>
> > and also I wonder if any data
> > loss can happen if the socket is not lingered for some time beyond the
> FIN
> > packet that
> > the remote server sent for doing the close(), delayed data packets, etc.
>
> The data loss happens only with outgoing data, so for HTTP it's data
> sent to the client which are at risk. Data coming from the server are
> properly consumed. In fact, when you configure "http-server-close",
> the nolinger is automatically enabled in your back so that haproxy
> can close the server connection without accumulating time-waits.
>
> > 3. Ports: Actually each HAProxy process has 400 ports limitation to a
> > single backend,
> > and there are many haproxy processes on this and other servers. The ports
> > are split per
> > process and per system. E.g. system1 has 'n' processes and each have a
> > separate port
> > range from each other, system2 has 'n' processes and a completely
> different
> > port range.
> > For infra reasons, we are restricting the total port range. The unique
> > ports for different
> > haproxy processes running on same system is to avoid attempting to use
> the
> > same port
> > (first port# in the range) by two processes and failing in connect, when
> > attempting to
> > connect to the same remote server. Hope I explained that clearly.
>
> Yep I clearly see the use case. That's one of the rare cases where it's
> interesting to use SNAT between your haproxy nodes and the internet. This
> way you'll use a unified ports pool for all your nodes and will not have
> to reserve port ranges per system and per process. Each process will then
> share the system's local source ports, and each system will have a
> different
> address. Then the SNAT will convert these IP1..N:port1..N to the public IP
> address and an available port. This will offer you more flexibility to add
> or remove nodes/processes etc. Maybe your total traffic cannot pass through
> a single SNAT box though in which case I understand that you don't have
> much choice. However you could then at least not force each process' port
> range and instead fix the system's local port range so that you know that
> all processes of a single machine share a same port range. That's already
> better because you won't be forcing to assign ports from unfinished
> connections.
>
> Willy
>
Krishna Kumar (Engineering)
Re: [PATCH] [MEDIUM] Improve "no free ports" error case
March 16, 2017 07:00AM
Hi Willy,

I am facing one problem with using system port range,

Distro: Ubuntu 16.04.1, kernel: 4.4.0-53-generic

When I set to 50000 to 50999, the kernel allocates port in the range 50000
to
50499, the remaining 500 ports do not seem to ever get allocated despite
running
a few thousand connections in parallel. A simple test program that I wrote
that does
a bind to IP and then connects uses all 1000 ports. Quickly checking the
tcp code, I
noticed that the kernel tries to allocate an odd port for bind, leaving the
even ports
for connect. Any idea why I don't get the full port range in bind? I am
using something
like the following when specifying the server:
server abcd google.com:80 source e1.e2.e3.e4
and with the following sysctl:
sysctl -w net.ipv4.ip_local_port_range="50000 50999"

I hope it is OK to add an unrelated question relating to a feature to this
thread:

Is it possible to tell haproxy to use one backend for a request (GET), and
if the
response was 404 (Not Found), use another backend? This resource may be
present in the 2nd backend, but is there any way to try that upon getting
404
from the first?

Thanks,
- Krishna


On Thu, Mar 9, 2017 at 2:22 PM, Krishna Kumar (Engineering) <
[email protected]> wrote:

> Hi Willy,
>
> Excellent, I will try this idea, it should definitely help!
> Thanks for the explanations.
>
> Regards,
> - Krishna
>
>
> On Thu, Mar 9, 2017 at 1:37 PM, Willy Tarreau <[email protected]> wrote:
>
>> On Thu, Mar 09, 2017 at 12:50:16PM +0530, Krishna Kumar (Engineering)
>> wrote:
>> > 1. About 'retries', I am not sure if it works for connect() failing
>> > synchronously on the
>> > local system (as opposed to getting a timeout/refused via callback).
>>
>> Yes it normally does. I've been using it for the same purpose in certain
>> situations (eg: binding to a source port range while some daemons are
>> later bound into that range).
>>
>> > The
>> > document
>> > on retries says:
>> >
>> > " <value> is the number of times a connection attempt should be
>> retried
>> > on
>> > a server when a connection either is refused or times
>> out. The
>> > default value is 3.
>> > "
>> >
>> > The two conditions above don't fall in our use case.
>>
>> It's still a refused connection :-)
>>
>> > The way I understood was that
>> > retries happens during the callback handler. Also I am not sure if
>> there is
>> > any way to circumvent the "1 second" gap for a retry.
>>
>> Hmmm I have to check. In fact when the LB algorithm is not determinist
>> we immediately retry on another server. If we're supposed to end up only
>> on the same server we indeed apply the delay. But if it's a synchronous
>> error, I don't know. And I think it's one case (especially -EADDRNOTAVAIL)
>> where we should immediately retry.
>>
>> > 2. For nolinger, it was not recommended in the document,
>>
>> It's indeed strongly recommended against, mainly because we've started
>> to see it in configs copy-pasted from blogs without understanding the
>> impacts.
>>
>> > and also I wonder if any data
>> > loss can happen if the socket is not lingered for some time beyond the
>> FIN
>> > packet that
>> > the remote server sent for doing the close(), delayed data packets, etc.
>>
>> The data loss happens only with outgoing data, so for HTTP it's data
>> sent to the client which are at risk. Data coming from the server are
>> properly consumed. In fact, when you configure "http-server-close",
>> the nolinger is automatically enabled in your back so that haproxy
>> can close the server connection without accumulating time-waits.
>>
>> > 3. Ports: Actually each HAProxy process has 400 ports limitation to a
>> > single backend,
>> > and there are many haproxy processes on this and other servers. The
>> ports
>> > are split per
>> > process and per system. E.g. system1 has 'n' processes and each have a
>> > separate port
>> > range from each other, system2 has 'n' processes and a completely
>> different
>> > port range.
>> > For infra reasons, we are restricting the total port range. The unique
>> > ports for different
>> > haproxy processes running on same system is to avoid attempting to use
>> the
>> > same port
>> > (first port# in the range) by two processes and failing in connect, when
>> > attempting to
>> > connect to the same remote server. Hope I explained that clearly.
>>
>> Yep I clearly see the use case. That's one of the rare cases where it's
>> interesting to use SNAT between your haproxy nodes and the internet. This
>> way you'll use a unified ports pool for all your nodes and will not have
>> to reserve port ranges per system and per process. Each process will then
>> share the system's local source ports, and each system will have a
>> different
>> address. Then the SNAT will convert these IP1..N:port1..N to the public IP
>> address and an available port. This will offer you more flexibility to add
>> or remove nodes/processes etc. Maybe your total traffic cannot pass
>> through
>> a single SNAT box though in which case I understand that you don't have
>> much choice. However you could then at least not force each process' port
>> range and instead fix the system's local port range so that you know that
>> all processes of a single machine share a same port range. That's already
>> better because you won't be forcing to assign ports from unfinished
>> connections.
>>
>> Willy
>>
>
>
Sorry, only registered users may post in this forum.

Click here to login