Welcome! Log In Create A New Profile

Advanced

[RFC PATCH] MEDIUM: persistent connections for SSL checks

Posted by Steven Davidovitz 
Steven Davidovitz
[RFC PATCH] MEDIUM: persistent connections for SSL checks
March 02, 2017 01:10AM
Having hundreds of HTTP SSL health checks leads to CPU saturation.
This patch allows HTTP health checks without any http-expect directives
to keep the connection open for subsequent health checks. This patch
also does not affect any TCP check code.
---

Notes:
We have a situation where we need to do basic SSL+HTTP health check on a large amount of nodes.
Without persistent connections, the amount of CPU usage is much too high.

This patch is my attempt at resolving this (tested for ourselves). Since this
is my first time mucking around in the HAProxy code, I'm sure there's some discussion necessary.

I also wasn't sure if this should be a configuration option or not, or could be expanded to include
HTTP checks with http-expect directives. Since there's an explicit "Connection: close" added, I excluded them
to be on the safe side.

Thanks!
Steven Davidovitz

include/types/checks.h | 1 +
src/checks.c | 96 +++++++++++++++++++++++++++++---------------------
2 files changed, 57 insertions(+), 40 deletions(-)

diff --git a/include/types/checks.h b/include/types/checks.h
index 283ff3db..0e86a741 100644
--- a/include/types/checks.h
+++ b/include/types/checks.h
@@ -166,6 +166,7 @@ struct check {
short status, code; /* check result, check code */
char desc[HCHK_DESC_LEN]; /* health check description */
int use_ssl; /* use SSL for health checks */
+ int use_ssl_persistent; /* use persistent connections for SSL health checks */
int send_proxy; /* send a PROXY protocol header with checks */
struct list *tcpcheck_rules; /* tcp-check send / expect rules */
struct tcpcheck_rule *current_step; /* current step when using tcpcheck */
diff --git a/src/checks.c b/src/checks.c
index 49bd886b..c03b7abd 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -778,6 +778,7 @@ static void event_srv_chk_w(struct connection *conn)
t->expire = tick_add_ifset(now_ms, s->proxy->timeout.check);
task_queue(t);
}
+ __conn_sock_want_recv(conn);
goto out_nowake;

out_wakeup:
@@ -1349,14 +1350,16 @@ static void event_srv_chk_r(struct connection *conn)
*check->bi->data = '\0';
check->bi->i = 0;

- /* Close the connection... We absolutely want to perform a hard close
- * and reset the connection if some data are pending, otherwise we end
- * up with many TIME_WAITs and eat all the source port range quickly.
- * To avoid sending RSTs all the time, we first try to drain pending
- * data.
- */
- __conn_data_stop_both(conn);
- conn_data_shutw_hard(conn);
+ if (conn->flags & CO_FL_ERROR || !check->use_ssl_persistent) {
+ /* Close the connection... We absolutely want to perform a hard close
+ * and reset the connection if some data are pending, otherwise we end
+ * up with many TIME_WAITs and eat all the source port range quickly.
+ * To avoid sending RSTs all the time, we first try to drain pending
+ * data.
+ */
+ __conn_data_stop_both(conn);
+ conn_data_shutw_hard(conn);
+ }

/* OK, let's not stay here forever */
if (check->result == CHK_RES_FAILED)
@@ -1398,13 +1401,14 @@ static int wake_srv_chk(struct connection *conn)
task_wakeup(check->task, TASK_WOKEN_IO);
}

- if (check->result != CHK_RES_UNKNOWN) {
+ if (check->result != CHK_RES_UNKNOWN && ((conn->flags & CO_FL_ERROR) || !check->use_ssl_persistent)) {
/* We're here because nobody wants to handle the error, so we
* sure want to abort the hard way.
*/
conn_sock_drain(conn);
conn_force_close(conn);
}
+
return 0;
}

@@ -1465,7 +1469,6 @@ static int connect_conn_chk(struct task *t)
struct check *check = t->context;
struct server *s = check->server;
struct connection *conn = check->conn;
- struct protocol *proto;
int ret;
int quickack;

@@ -1505,33 +1508,34 @@ static int connect_conn_chk(struct task *t)
bo_putblk(check->bo, check->send_string, check->send_string_len);
}

- /* prepare a new connection */
- conn_init(conn);
+ if (!conn_xprt_ready(conn)) {
+ /* prepare a new connection */
+ conn_init(conn);

- if (is_addr(&check->addr)) {
- /* we'll connect to the check addr specified on the server */
- conn->addr.to = check->addr;
- }
- else {
- /* we'll connect to the addr on the server */
- conn->addr.to = s->addr;
- }
+ if (is_addr(&check->addr)) {
+ /* we'll connect to the check addr specified on the server */
+ conn->addr.to = check->addr;
+ }
+ else {
+ /* we'll connect to the addr on the server */
+ conn->addr.to = s->addr;
+ }

- if ((conn->addr.to.ss_family == AF_INET) || (conn->addr.to.ss_family == AF_INET6)) {
- int i = 0;
+ if ((conn->addr.to.ss_family == AF_INET) || (conn->addr.to.ss_family == AF_INET6)) {
+ int i = 0;

- i = srv_check_healthcheck_port(check);
- if (i == 0) {
- conn->owner = check;
- return SF_ERR_CHK_PORT;
+ i = srv_check_healthcheck_port(check);
+ if (i == 0) {
+ conn->owner = check;
+ return SF_ERR_CHK_PORT;
+ }
+
+ set_host_port(&conn->addr.to, i);
}

- set_host_port(&conn->addr.to, i);
+ conn_prepare(conn, protocol_by_family(conn->addr.to.ss_family), check->xprt);
}

- proto = protocol_by_family(conn->addr.to.ss_family);
-
- conn_prepare(conn, proto, check->xprt);
conn_attach(conn, check, &check_conn_cb);
conn->target = &s->obj_type;

@@ -1555,15 +1559,21 @@ static int connect_conn_chk(struct task *t)
quickack = 0;
}

- ret = SF_ERR_INTERNAL;
- if (proto->connect)
- ret = proto->connect(conn, check->type, quickack ? 2 : 0);
- conn->flags |= CO_FL_WAKE_DATA;
- if (s->check.send_proxy) {
- conn->send_proxy_ofs = 1;
- conn->flags |= CO_FL_SEND_PROXY;
+ ret = SF_ERR_NONE;
+
+ if (!conn_ctrl_ready(conn) || !conn_xprt_ready(conn)) {
+ ret = conn->ctrl->connect(conn, check->type, quickack ? 2 : 0);
+
+ /* we need to be notified about connection establishment */
+ conn->flags |= CO_FL_WAKE_DATA;
+
+ if (s->check.send_proxy) {
+ conn->send_proxy_ofs = 1;
+ conn->flags |= CO_FL_SEND_PROXY;
+ }
}

+
return ret;
}

@@ -2100,8 +2110,13 @@ static struct task *process_chk_conn(struct task *t)
t->expire = tick_first(t->expire, t_con);
}

- if (check->type)
- conn_data_want_recv(conn); /* prepare for reading a possible reply */
+ if (check->type) {
+ if (conn->flags & CO_FL_CONNECTED)
+ conn_data_want_send(conn);
+ else
+ conn_data_want_recv(conn); /* prepare for reading a possible reply */
+ }
+

goto reschedule;

@@ -2161,7 +2176,7 @@ static struct task *process_chk_conn(struct task *t)
}

/* check complete or aborted */
- if (conn->xprt) {
+ if (conn->xprt && ((conn->flags & CO_FL_ERROR) || !check->use_ssl_persistent)) {
/* The check was aborted and the connection was not yet closed.
* This can happen upon timeout, or when an external event such
* as a failed response coupled with "observe layer7" caused the
@@ -3425,6 +3440,7 @@ int srv_check_healthcheck_port(struct check *chk)
*/
if (!chk->port && !is_addr(&chk->addr)) {
chk->use_ssl |= (srv->use_ssl || (srv->proxy->options & PR_O_TCPCHK_SSL));
+ chk->use_ssl_persistent |= (chk->use_ssl && !(srv->proxy->options2 & PR_O2_EXP_TYPE));
chk->send_proxy |= (srv->pp_opts);
}

--
2.11.1
Hi Steven,

On Wed, Mar 01, 2017 at 04:03:17PM -0800, Steven Davidovitz wrote:
> Having hundreds of HTTP SSL health checks leads to CPU saturation.
> This patch allows HTTP health checks without any http-expect directives
> to keep the connection open for subsequent health checks. This patch
> also does not affect any TCP check code.

I think something like this could possibly work, but at least the
persistent setting should definitely be an option. Indeed, for many
people, checking the connection is as important if not more as testing
the fact that the service works behind. I can give you some examples,
such as if you check another haproxy, this last one will never quit
upon a reload or soft-stop, so your health checks will continuously
check the old process and will not detect a crash of the new one which
listens to the connections.

We could imagine having a more general option (per server, per backend?)
to indicate that HTTP checks want to be performed on persistent connections,
not just the SSL ones. In fact we could specify how many consecutive
checks are allowed over a persistent connection before renewing the
connection. That would cover your use case, allow users to set a
reasonable counter to ensure that after a few checks, the listener
is properly tested, and may be useful to some users even with pure
HTTP (eg: less logs on intermediate firewalls).

Also it is not normal at all that SSL checks lead to CPU saturation.
Normally, health checks are expected to store the last SSL_CTX in the
server struct for later reuse, leading to a TLS resume connection.
There is one case where this doesn't work which is when SNI is being
used on the server lines. Is this your case ? If so a better solution
would be to have at least two (possibly a bit more) SSL_CTX per server
as was mentionned in commit 119a408 ("BUG/MEDIUM: ssl: for a handshake
when server-side SNI changes"). This would both improve the checks
performance and the production traffic performance by avoiding renegs
on SNI change between two consecutive connections.

BTW, very good job for a first submission!

Cheers,
Willy
Steven Davidovitz
Re: [RFC PATCH] MEDIUM: persistent connections for SSL checks
March 07, 2017 03:40AM
Thanks for the response!

On Mon, Mar 6, 2017 at 1:34 AM, Willy Tarreau <[email protected]> wrote:

>
> [snip]
>
> Also it is not normal at all that SSL checks lead to CPU saturation.
> Normally, health checks are expected to store the last SSL_CTX in the
> server struct for later reuse, leading to a TLS resume connection.
> There is one case where this doesn't work which is when SNI is being
> used on the server lines. Is this your case ? If so a better solution
> would be to have at least two (possibly a bit more) SSL_CTX per server
> as was mentionned in commit 119a408 ("BUG/MEDIUM: ssl: for a handshake
> when server-side SNI changes"). This would both improve the checks
> performance and the production traffic performance by avoiding renegs
> on SNI change between two consecutive connections.
>

Interestingly, as far as I can tell, we are running into the problem
described in this forum post:
http://discourse.haproxy.org/t/backend-encryption-and-reusing-ssl-sessions/503/4
Switching the conn_data_shutw_hard call to conn_data_shutw in checks.c
decreased CPU usage completely. Forcing SSLv3 as in this email (
https://www.mail-archive.com/[email protected]/msg09105.html) also
worked.

I haven't had time to dig further, and it may certainly be client
misconfiguration, but could you shed any light on why that might be a
problem?
On Mon, Mar 06, 2017 at 06:34:09PM -0800, Steven Davidovitz wrote:
> Interestingly, as far as I can tell, we are running into the problem
> described in this forum post:
> http://discourse.haproxy.org/t/backend-encryption-and-reusing-ssl-sessions/503/4
> Switching the conn_data_shutw_hard call to conn_data_shutw in checks.c
> decreased CPU usage completely. Forcing SSLv3 as in this email (
> https://www.mail-archive.com/[email protected]/msg09105.html) also
> worked.

This is very interesting! The problem with not using conn_data_shutw_hard()
is that conn_data_shutw() will only *try* to notify the other side about
an imminent close but at the same time we're going to close with SO_NOLINGER
resulting in the close notification to be lost in the middle. And not using
SO_NOLINGER is not an option as we can end up with tons of TIME_WAIT sockets
on haproxy which clearly is not acceptable. But more importantly it means
that we probably have a similar problem with production traffic if you don't
use persistent connections to the server.

But now I'm thinking about something, I'm wondering if in fact it would
not be the lack of call to SSL_shutdown() which causes the connection
not to be kept in haproxy's SSL context. In 1.8-dev this function changed
a bit so that we first call SSL_set_quiet_shutdown() then SSL_shutdown().

> I haven't had time to dig further, and it may certainly be client
> misconfiguration, but could you shed any light on why that might be a
> problem?

It might be useful to use another haproxy after the checks instead of
your server to see if you observe the same effect. And if you can run
a test with 1.8-dev it would be great. I'd rather backport just the
change to ssl_sock_shutw() to 1.7 if it fixes the problem :-)

BTW, given that only the checks are causing the trouble it's easy to
start an independant process for this. Just change all bind addresses
and let the backends run their checks to see the effect.

Willy
Steven Davidovitz
Re: [RFC PATCH] MEDIUM: persistent connections for SSL checks
March 07, 2017 09:30PM
I tried on 1.8dev and was still able to reproduce the problem.
It appears Java is strictly interpreting the TLS spec with regards to
"Closure Alerts".
The section specifies that:

Note that as of TLS 1.1, failure to properly close a connection no longer
requires that a
session not be resumed. This is a change from TLS 1.0 to conform
with widespread implementation practice.
(https://tools.ietf.org/html/rfc5246#section-7.2.1)

Java does not allow session resumption in that case. I haven't been able to
reproduce the issue
by running requests through the actual proxy (i.e. not check-related), but
it certainly might be possible
to trigger this case. I have a simple test case here:
https://github.com/steved/haproxy-java-ssl-check

I thought earlier that changing shutw_hard to shutw works, but now it
appears I need this patch:

diff --git a/src/checks.c b/src/checks.c
index 49bd886b..75aa222b 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -2168,6 +2168,7 @@ static struct task *process_chk_conn(struct task *t)
* server state to be suddenly changed.
*/
conn_sock_drain(conn);
+ conn_data_shutw(conn);
conn_force_close(conn);
}

On Mon, Mar 6, 2017 at 10:48 PM, Willy Tarreau <[email protected]> wrote:

> On Mon, Mar 06, 2017 at 06:34:09PM -0800, Steven Davidovitz wrote:
> > Interestingly, as far as I can tell, we are running into the problem
> > described in this forum post:
> > http://discourse.haproxy.org/t/backend-encryption-and-reusin
> g-ssl-sessions/503/4
> > Switching the conn_data_shutw_hard call to conn_data_shutw in checks.c
> > decreased CPU usage completely. Forcing SSLv3 as in this email (
> > https://www.mail-archive.com/[email protected]/msg09105.html) also
> > worked.
>
> This is very interesting! The problem with not using conn_data_shutw_hard()
> is that conn_data_shutw() will only *try* to notify the other side about
> an imminent close but at the same time we're going to close with
> SO_NOLINGER
> resulting in the close notification to be lost in the middle. And not using
> SO_NOLINGER is not an option as we can end up with tons of TIME_WAIT
> sockets
> on haproxy which clearly is not acceptable. But more importantly it means
> that we probably have a similar problem with production traffic if you
> don't
> use persistent connections to the server.
>
> But now I'm thinking about something, I'm wondering if in fact it would
> not be the lack of call to SSL_shutdown() which causes the connection
> not to be kept in haproxy's SSL context. In 1.8-dev this function changed
> a bit so that we first call SSL_set_quiet_shutdown() then SSL_shutdown().
>
> > I haven't had time to dig further, and it may certainly be client
> > misconfiguration, but could you shed any light on why that might be a
> > problem?
>
> It might be useful to use another haproxy after the checks instead of
> your server to see if you observe the same effect. And if you can run
> a test with 1.8-dev it would be great. I'd rather backport just the
> change to ssl_sock_shutw() to 1.7 if it fixes the problem :-)
>
> BTW, given that only the checks are causing the trouble it's easy to
> start an independant process for this. Just change all bind addresses
> and let the backends run their checks to see the effect.
>
> Willy
>
On Tue, Mar 07, 2017 at 12:27:38PM -0800, Steven Davidovitz wrote:
> I tried on 1.8dev and was still able to reproduce the problem.
> It appears Java is strictly interpreting the TLS spec with regards to
> "Closure Alerts".
> The section specifies that:
>
> Note that as of TLS 1.1, failure to properly close a connection no longer
> requires that a
> session not be resumed. This is a change from TLS 1.0 to conform
> with widespread implementation practice.
> (https://tools.ietf.org/html/rfc5246#section-7.2.1)
>
> Java does not allow session resumption in that case.

And the fact that it's the removal of a requirement doesn't mean one
has to abide by this new rule, so Java is right to proceed like this
eventhough it clearly is suboptimal.

> I haven't been able to reproduce the issue
> by running requests through the actual proxy (i.e. not check-related), but
> it certainly might be possible
> to trigger this case. I have a simple test case here:
> https://github.com/steved/haproxy-java-ssl-check

I think that production traffic takes more time to be processed and
leaves more time for the TLS response to be sent before the close is
performed.

> I thought earlier that changing shutw_hard to shutw works,

It's a matter of time race as usual :-/

> but now it appears I need this patch:
>
> diff --git a/src/checks.c b/src/checks.c
> index 49bd886b..75aa222b 100644
> --- a/src/checks.c
> +++ b/src/checks.c
> @@ -2168,6 +2168,7 @@ static struct task *process_chk_conn(struct task *t)
> * server state to be suddenly changed.
> */
> conn_sock_drain(conn);
> + conn_data_shutw(conn);
> conn_force_close(conn);
> }

I suspect that you'll accumulate TIME_WAIT sockets here on haproxy but
I could be wrong. I'd like you to check on an idle system. If there's
no such thing and if you find that this reliably solves your problem
I'm fine with taking this one, though I think that it only increases
the likeliness to let the connection close cleanly. But if you switch
from 5% of clean close to 95% it's already 20 times less key
computations :-)

Thanks,
Willy
Steven Davidovitz
Re: [RFC PATCH] MEDIUM: persistent connections for SSL checks
March 08, 2017 10:30PM
Looks like just attempting to cleanly shutdown worked fine on an idle
system (~1000 servers, not all java),
I didn't notice an much of an increase in new sessions after the initial
connections. I also didn't notice any TIME_WAIT sockets lingering.

On Tue, Mar 7, 2017 at 10:24 PM, Willy Tarreau <[email protected]> wrote:

> On Tue, Mar 07, 2017 at 12:27:38PM -0800, Steven Davidovitz wrote:
> > I tried on 1.8dev and was still able to reproduce the problem.
> > It appears Java is strictly interpreting the TLS spec with regards to
> > "Closure Alerts".
> > The section specifies that:
> >
> > Note that as of TLS 1.1, failure to properly close a connection no
> longer
> > requires that a
> > session not be resumed. This is a change from TLS 1.0 to conform
> > with widespread implementation practice.
> > (https://tools.ietf.org/html/rfc5246#section-7.2.1)
> >
> > Java does not allow session resumption in that case.
>
> And the fact that it's the removal of a requirement doesn't mean one
> has to abide by this new rule, so Java is right to proceed like this
> eventhough it clearly is suboptimal.
>
> > I haven't been able to reproduce the issue
> > by running requests through the actual proxy (i.e. not check-related),
> but
> > it certainly might be possible
> > to trigger this case. I have a simple test case here:
> > https://github.com/steved/haproxy-java-ssl-check
>
> I think that production traffic takes more time to be processed and
> leaves more time for the TLS response to be sent before the close is
> performed.
>
> > I thought earlier that changing shutw_hard to shutw works,
>
> It's a matter of time race as usual :-/
>
> > but now it appears I need this patch:
> >
> > diff --git a/src/checks.c b/src/checks.c
> > index 49bd886b..75aa222b 100644
> > --- a/src/checks.c
> > +++ b/src/checks.c
> > @@ -2168,6 +2168,7 @@ static struct task *process_chk_conn(struct task
> *t)
> > * server state to be suddenly changed.
> > */
> > conn_sock_drain(conn);
> > + conn_data_shutw(conn);
> > conn_force_close(conn);
> > }
>
> I suspect that you'll accumulate TIME_WAIT sockets here on haproxy but
> I could be wrong. I'd like you to check on an idle system. If there's
> no such thing and if you find that this reliably solves your problem
> I'm fine with taking this one, though I think that it only increases
> the likeliness to let the connection close cleanly. But if you switch
> from 5% of clean close to 95% it's already 20 times less key
> computations :-)
>
> Thanks,
> Willy
>
Attachments:
open | download - 0001-BUG-MINOR-attempt-clean-shutw-for-check.patch (1.5 KB)
Steven Davidovitz
Re: [RFC PATCH] MEDIUM: persistent connections for SSL checks
March 13, 2017 06:00PM
Hi,

Just wanted to follow up. I've been running this patch for a couple days on
an idle system and haven't noticed any problems.
Could this be merged? Is there anything else I can test?

On Wed, Mar 8, 2017 at 1:22 PM, Steven Davidovitz <[email protected]>
wrote:

> Looks like just attempting to cleanly shutdown worked fine on an idle
> system (~1000 servers, not all java),
> I didn't notice an much of an increase in new sessions after the initial
> connections. I also didn't notice any TIME_WAIT sockets lingering.
>
> On Tue, Mar 7, 2017 at 10:24 PM, Willy Tarreau <[email protected]> wrote:
>
>> On Tue, Mar 07, 2017 at 12:27:38PM -0800, Steven Davidovitz wrote:
>> > I tried on 1.8dev and was still able to reproduce the problem.
>> > It appears Java is strictly interpreting the TLS spec with regards to
>> > "Closure Alerts".
>> > The section specifies that:
>> >
>> > Note that as of TLS 1.1, failure to properly close a connection no
>> longer
>> > requires that a
>> > session not be resumed. This is a change from TLS 1.0 to conform
>> > with widespread implementation practice.
>> > (https://tools.ietf.org/html/rfc5246#section-7.2.1)
>> >
>> > Java does not allow session resumption in that case.
>>
>> And the fact that it's the removal of a requirement doesn't mean one
>> has to abide by this new rule, so Java is right to proceed like this
>> eventhough it clearly is suboptimal.
>>
>> > I haven't been able to reproduce the issue
>> > by running requests through the actual proxy (i.e. not check-related),
>> but
>> > it certainly might be possible
>> > to trigger this case. I have a simple test case here:
>> > https://github.com/steved/haproxy-java-ssl-check
>>
>> I think that production traffic takes more time to be processed and
>> leaves more time for the TLS response to be sent before the close is
>> performed.
>>
>> > I thought earlier that changing shutw_hard to shutw works,
>>
>> It's a matter of time race as usual :-/
>>
>> > but now it appears I need this patch:
>> >
>> > diff --git a/src/checks.c b/src/checks.c
>> > index 49bd886b..75aa222b 100644
>> > --- a/src/checks.c
>> > +++ b/src/checks.c
>> > @@ -2168,6 +2168,7 @@ static struct task *process_chk_conn(struct task
>> *t)
>> > * server state to be suddenly changed.
>> > */
>> > conn_sock_drain(conn);
>> > + conn_data_shutw(conn);
>> > conn_force_close(conn);
>> > }
>>
>> I suspect that you'll accumulate TIME_WAIT sockets here on haproxy but
>> I could be wrong. I'd like you to check on an idle system. If there's
>> no such thing and if you find that this reliably solves your problem
>> I'm fine with taking this one, though I think that it only increases
>> the likeliness to let the connection close cleanly. But if you switch
>> from 5% of clean close to 95% it's already 20 times less key
>> computations :-)
>>
>> Thanks,
>> Willy
>>
>
>
Hi Steven,

On Mon, Mar 13, 2017 at 09:54:49AM -0700, Steven Davidovitz wrote:
> Hi,
>
> Just wanted to follow up. I've been running this patch for a couple days on
> an idle system and haven't noticed any problems.
> Could this be merged? Is there anything else I can test?

I'm personally fine with it but I'd rather have Emeric approve it, as
he knows better than me the possible impacts of shutting down cleanly
or not on SSL.

Emeric, I've re-attached the patch. Using conn_data_shutw() instead of
conn_data_shutw_hard() causes the "clean" flag to be set when calling
ssl_sock_shutw() and SSL_set_quiet_shutdown() not to be called so that
we perform a clean shutdown. The purpose is to give a chance to the
server to store the context and avoid renegociating.

This will likely have to be backported to 1.7, 1.6 and 1.5.

Willy
From a544fcb1b8b33d2f7b4b66280484f6e835a5df69 Mon Sep 17 00:00:00 2001
From: Steven Davidovitz <[email protected]>
Date: Wed, 8 Mar 2017 11:06:20 -0800
Subject: [PATCH] BUG/MINOR: attempt clean shutw for check
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4

Strict interpretation of TLS can cause SSL sessions
to be thrown away when the socket is shutdown without
sending a "close notify".
---
src/checks.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/checks.c b/src/checks.c
index 49bd886b..b3ad96ec 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -1349,14 +1349,15 @@ static void event_srv_chk_r(struct connection *conn)
*check->bi->data = '\0';
check->bi->i = 0;

- /* Close the connection... We absolutely want to perform a hard close
- * and reset the connection if some data are pending, otherwise we end
- * up with many TIME_WAITs and eat all the source port range quickly.
- * To avoid sending RSTs all the time, we first try to drain pending
- * data.
+ /* Close the connection... We still attempt to nicely close if,
+ * for instance, SSL needs to send a "close notify." Later, we perform
+ * a hard close and reset the connection if some data are pending,
+ * otherwise we end up with many TIME_WAITs and eat all the source port
+ * range quickly. To avoid sending RSTs all the time, we first try to
+ * drain pending data.
*/
__conn_data_stop_both(conn);
- conn_data_shutw_hard(conn);
+ conn_data_shutw(conn);

/* OK, let's not stay here forever */
if (check->result == CHK_RES_FAILED)
--
2.11.1
On Mon, Mar 13, 2017 at 06:10:23PM +0100, Willy Tarreau wrote:
> > Just wanted to follow up. I've been running this patch for a couple days on
> > an idle system and haven't noticed any problems.
> > Could this be merged? Is there anything else I can test?
>
> I'm personally fine with it but I'd rather have Emeric approve it, as
> he knows better than me the possible impacts of shutting down cleanly
> or not on SSL.
>
> Emeric, I've re-attached the patch. Using conn_data_shutw() instead of
> conn_data_shutw_hard() causes the "clean" flag to be set when calling
> ssl_sock_shutw() and SSL_set_quiet_shutdown() not to be called so that
> we perform a clean shutdown. The purpose is to give a chance to the
> server to store the context and avoid renegociating.

Now applied with Emeric's blessing.

Thanks,
Willy
Steven Davidovitz
Re: [RFC PATCH] MEDIUM: persistent connections for SSL checks
March 15, 2017 06:00PM
Thank you! I may one day follow-up on persistent connections, but this does
this trick for now.

On Wed, Mar 15, 2017 at 3:44 AM, Willy Tarreau <[email protected]> wrote:

> On Mon, Mar 13, 2017 at 06:10:23PM +0100, Willy Tarreau wrote:
> > > Just wanted to follow up. I've been running this patch for a couple
> days on
> > > an idle system and haven't noticed any problems.
> > > Could this be merged? Is there anything else I can test?
> >
> > I'm personally fine with it but I'd rather have Emeric approve it, as
> > he knows better than me the possible impacts of shutting down cleanly
> > or not on SSL.
> >
> > Emeric, I've re-attached the patch. Using conn_data_shutw() instead of
> > conn_data_shutw_hard() causes the "clean" flag to be set when calling
> > ssl_sock_shutw() and SSL_set_quiet_shutdown() not to be called so that
> > we perform a clean shutdown. The purpose is to give a chance to the
> > server to store the context and avoid renegociating.
>
> Now applied with Emeric's blessing.
>
> Thanks,
> Willy
>
On Wed, Mar 15, 2017 at 09:53:11AM -0700, Steven Davidovitz wrote:
> Thank you! I may one day follow-up on persistent connections, but this does
> this trick for now.

No problem, I think persistent connections must be dealt with more globaly
and not just for SSL though.

Cheers,
Willy
Sorry, only registered users may post in this forum.

Click here to login