Welcome! Log In Create A New Profile

Advanced

Problems with haproxy 1.7.3 on FreeBSD 11.0-p8

Posted by Matthias Fechner 
Willy Tarreau
Re: Problems with haproxy 1.7.3 on FreeBSD 11.0-p8
March 17, 2017 10:10AM
Hi Dmitry,

On Wed, Mar 15, 2017 at 12:45:54AM +0300, Dmitry Sivachenko wrote:
> I committed your patch to FreeBSD ports.

I was just reported an undesired side effect of this patch with smtp
in clear without proxy-proto :-(

The problem is that we're using the CONNECTED flag to indicate whether
we've just detected the transition from pending handshake to connected
or not. And the data layer also needs to see the transition so we can't
set it too early.

Thus in order to fix Matthias' issue I'm tempted by extending the
CONN_STATE flag to cover all handshakes, but I'm scared to break
complex health checks by calling them multiple times when multiple
layers are stacked (eg: ssl + send_proxy).

The root cause of the problem is the fact that we don't cover the
handshakes completion when deciding to wake the data layer up, and
that these ones are supposed to (sometimes) complete a connection
establishment (eg: Matthias' case where the successful connect()
didn't leave room for another event to report this).

I *think* we can use as a universal event the absence of any
handshake, any pending L4/L6 pending connection and the absence
of the CO_FL_CONNECTED flag, indicating we're just finishing a
connection validation. I need to verify if it would work both
for checks and regular connections, and on both sides, in a wide
variety of combinations. I also don't feel much confident by the
fact that the stream interface layer validates the connection
establishment without checking for pending handshakes, so such
a change requires extreme care :-/

I'm now working on this (I didn't expect to do this today) but I
already feel like I'm pulling one thread and that the whole ball
is coming with it. Let's hope that it will end up with all these
design bugs being smashed :-/

In your case, if you get bad reports it might be better to go back
to the previous package with the temporary revert. But I hope to
have good news today so that you don't have to emit two package
updates.

Regards,
Willy
Dmitry Sivachenko
Re: Problems with haproxy 1.7.3 on FreeBSD 11.0-p8
March 17, 2017 10:20AM
> On 17 Mar 2017, at 12:04, Willy Tarreau <[email protected]> wrote:
>
> Hi Dmitry,
>
> On Wed, Mar 15, 2017 at 12:45:54AM +0300, Dmitry Sivachenko wrote:
>> I committed your patch to FreeBSD ports.
>
> I was just reported an undesired side effect of this patch with smtp
> in clear without proxy-proto :-(
>
> [...]



Okay, thanks for information. I have no other complains so far, so I wait a bit for an update.
Matthias Fechner
Re: Problems with haproxy 1.7.3 on FreeBSD 11.0-p8
March 17, 2017 06:00PM
Dear Willy and Dmitry,

Am 14.03.17 um 22:17 schrieb Willy Tarreau:
> Or you may prefer to wait for 1.7.4. It's not planned yet given that
> there are other fixes in the wild waiting for some feedback though.
>
> Thanks guys for the detailed feedback, it's now time to turn the page
> and switch to less difficult ones!

I rolled now the 1.7.3 (version with the new patch) to the test
environment and it seems to work reliably.

But the other message you wrote seems to point to another problem.

If you need any help for testing, please let me know.

Gruß,
Matthias

--
"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the universe trying to
produce bigger and better idiots. So far, the universe is winning." --
Rich Cook
Aleksandar Lazic
Re: Problems with haproxy 1.7.3 on FreeBSD 11.0-p8
March 17, 2017 06:00PM
Willy.

Am 14-03-2017 22:17, schrieb Willy Tarreau:
> Matthias,
>
> I could finally track the problem down to a 5-year old bug in the
> connection handler. It already used to affect Unix sockets but it
> requires so rare a set of options and even then its occurrence rate
> is so low that probably nobody noticed it yet.
>
> I'm attaching the patch to be applied on top of 1.7.3 which fixes it,
> it will be merged into next version.
>
> Dmitry, you may prefer to take this one than to revert the previous
> one from your ports, especially considering that a few connect()
> immediately succeed over the loopback on FreeBSD and that it was
> absolutely needed to trigger the bug (as well as the previously fixed
> one, which had less impact).
>
> Or you may prefer to wait for 1.7.4. It's not planned yet given that
> there are other fixes in the wild waiting for some feedback though.
>
> Thanks guys for the detailed feedback, it's now time to turn the page
> and switch to less difficult ones!

I love your commit massages ;-).

They are very detailed and sometimes bigger the the code change.

Cheers
Aleks

> Willy
Willy Tarreau
Re: Problems with haproxy 1.7.3 on FreeBSD 11.0-p8
March 17, 2017 07:30PM
Hi Aleks,

On Fri, Mar 17, 2017 at 05:57:02PM +0100, Aleksandar Lazic wrote:
> I love your commit massages ;-).
>
> They are very detailed and sometimes bigger the the code change.

That's expected, especially on a bug. The code is the result of a
long analysis. If this analysis is lost, next time we have to dig
through the same issue or have to decide wether to roll it back,
we don't know. The commit message is here to keep a trace of the
analysis work. That's why I *always* complain when people don't
provide detailed commit messages to justify their changes.

Cheers,
Willy
Pavlos Parissis
Re: Problems with haproxy 1.7.3 on FreeBSD 11.0-p8
March 17, 2017 08:50PM
On 17/03/2017 05:57 μμ, Aleksandar Lazic wrote:
> Willy.
>
> Am 14-03-2017 22:17, schrieb Willy Tarreau:
>> Matthias,
>>
>> I could finally track the problem down to a 5-year old bug in the
>> connection handler. It already used to affect Unix sockets but it
>> requires so rare a set of options and even then its occurrence rate
>> is so low that probably nobody noticed it yet.
>>
>> I'm attaching the patch to be applied on top of 1.7.3 which fixes it,
>> it will be merged into next version.
>>
>> Dmitry, you may prefer to take this one than to revert the previous
>> one from your ports, especially considering that a few connect()
>> immediately succeed over the loopback on FreeBSD and that it was
>> absolutely needed to trigger the bug (as well as the previously fixed
>> one, which had less impact).
>>
>> Or you may prefer to wait for 1.7.4. It's not planned yet given that
>> there are other fixes in the wild waiting for some feedback though.
>>
>> Thanks guys for the detailed feedback, it's now time to turn the page
>> and switch to less difficult ones!
>
> I love your commit massages ;-).
>

+1

> They are very detailed and sometimes bigger the the code change.
>

Indeed. All commits, in any project, should be like this.

I personally follow the following rules:

1. short commit should be written, so that can be placed in the following sentence:
If you apply this commit it will <short commit message>

2. Long commit message should clearly describe the problem, why it is a problem
and how the commit fixes the problem.

Having said, I have colleagues that not only follow the above rules but complain
to me about my very lengthy commit messages. I usually ignore them and redirect
their *rant* to /dev/null.

Cheers,
Pavlos
Willy Tarreau
Re: Problems with haproxy 1.7.3 on FreeBSD 11.0-p8
March 18, 2017 01:20PM
On Fri, Mar 17, 2017 at 12:10:43PM +0300, Dmitry Sivachenko wrote:
>
> > On 17 Mar 2017, at 12:04, Willy Tarreau <[email protected]> wrote:
> >
> > Hi Dmitry,
> >
> > On Wed, Mar 15, 2017 at 12:45:54AM +0300, Dmitry Sivachenko wrote:
> >> I committed your patch to FreeBSD ports.
> >
> > I was just reported an undesired side effect of this patch with smtp
> > in clear without proxy-proto :-(
> >
> > [...]
>
>
>
> Okay, thanks for information. I have no other complains so far, so I wait a bit for an update.

OK here's a temporary patch. It includes a revert of the previous one and
adds a condition for the wake-up. At least it passes all my tests, including
those involving synchronous connection reports.

I'm not merging it yet as I'm wondering whether a reliable definitive
solution should be done once for all (and backported) by addressing the
root cause instead of constantly working around its consequences.

Willy
From 42121fa95cf77be0384ab6ce597a88c2fe6e2cd6 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <[email protected]>
Date: Sat, 18 Mar 2017 12:04:38 +0100
Subject: WIP/BUG/MAJOR: connection: also call ->wake() after end of handshakes
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4

We have an issue with the relation between CO_FL_CONNECTED and the
various handshake flags which was uncovered by recent fix 7bf3fa3
("BUG/MAJOR: connection: update CO_FL_CONNECTED before calling the
data layer").

The flag was normally set *after* calling ->wake() to indicate that
both the control layer and the transport layer are ready, and that
the connection is ready to exchange data. It is important that the
->wake() call sees the transition with both CO_FL_{L4,L6}_CONN cleared
and CO_FL_CONNECTED not yet set, to detect the status change.

But we have an issue with handshakes like the proxy protocol. These
ones have to be completed before the data layer may be used, and
after completion polling for data may be enabled. The problem comes
from the relation between CO_FL_CONNECT and the handshakes. In fact,
the SSL handshake is used to complete the transport layer setup, so
there's a direct 1-to-1 relation between L6_CONN and the SSL handshake.
It will be broken if we later decide to implement upgradable SSL by
the way. Some handshakes like accepting the proxy protocol or sending
it area easily hidden behind the SSL handshake (which has to be
performed on top of it), so they're done with CO_FL_CONNECTED unset.
But if we only have the proxy protocol and no SSL, we end up in a
situation where CO_FL_CONNECTED might already be set and where the
removal of the handshake flag doesn't constitute a sufficient
condition to notify the data layer about the availability of the
connection, which fails each time the connection succeeds
synchronously without any data to send.

It could be argued that the handshake flags have to be made part of
the condition to et CO_FL_CONNECTED but that would currently break a
part of the health checks. Also a handshake could appear at any
moment even after a connection is established so we'd lose the
ability to detect a second end of handshake.

For now the situation is not clean :
- session_accept() only sets CO_FL_CONNECTED if there's no pending
handshake ;

- conn_fd_handler() will set it once L4 and L6 are complete, which
will do what session_accept() above refrained from doing even if
an accept_proxy handshake is still pending ;

- ssl_sock_info_cbk() and ssl_sock_handshake() consider that a
handshake performed with CO_FL_CONNECTED set is a renegociation ;

- all ssl_fc_* sample fetch functions wait for CO_FL_CONNECTED before
accepting to fetch information

Only health checks validate the end of handshakes and the connected flag
separately.

This patch aims at solving the side effects in a backportable way
before this is reworked in depth :
- we need to call ->wake() to report connection success, measure
connection time, and notify that the data layer is ready ; this
has to be done either if we switch from pending {L4,L6}_CONN to
nothing, or if we notice some handshakes were pending and are
now done.

- we must only set CO_FL_CONNECTED after calling ->wake() because
this one requires all flags down to detect the transition ; this
rolls back previous change ;

- we must ensure that we'll see CO_FL_{CONNECTED,L4,L6} cleared at
least once in each connection's lifetime to ensure no wake up
event may be lost.

- we document that CO_FL_CONNECTED exactly means "L4 connection
setup confirmed at least once, L6 connection setup confirmed
at least once or not necessary, all this regardless of any
possibly remaining handshakes or future L6 negociations".

For later, si_conn_wake_cb() has to be fixed not to consider the
connection's flags but the stream-interface's and channel status
instead, making it agnostic to the change on the CO_FL_CONNECTED flag,
and other data layers (checks) will likely have to do something
similar.
---
include/types/connection.h | 2 +-
src/connection.c | 25 ++++++++++++++++++-------
2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/types/connection.h b/include/types/connection.h
index c644dd5..6ec0e4f 100644
--- a/include/types/connection.h
+++ b/include/types/connection.h
@@ -98,7 +98,7 @@ enum {

/* flags used to report connection status and errors */
CO_FL_ERROR = 0x00100000, /* a fatal error was reported */
- CO_FL_CONNECTED = 0x00200000, /* the connection is now established */
+ CO_FL_CONNECTED = 0x00200000, /* L4+L6 now ready ; extra handshakes may or may not exist */
CO_FL_WAIT_L4_CONN = 0x00400000, /* waiting for L4 to be connected */
CO_FL_WAIT_L6_CONN = 0x00800000, /* waiting for L6 to be connected (eg: SSL) */

diff --git a/src/connection.c b/src/connection.c
index bc34ae7..ab5fcd4 100644
--- a/src/connection.c
+++ b/src/connection.c
@@ -132,21 +132,32 @@ void conn_fd_handler(int fd)
}

leave:
- /* Verify if the connection just established. The CO_FL_CONNECTED flag
- * being included in CO_FL_CONN_STATE, its change will be noticed by
- * the next block and be used to wake up the data layer.
+ /* The wake callback is normally used to notify about connection establishment
+ * and critical errors. The called functions need to detect the change so the
+ * CO_FL_CONNECTED flag must not have been updated yet. However we need to
+ * consider the following situations to wake up the data layer :
+ * - change among the CO_FL_CONN_STATE flags :
+ * {DATA,SOCK}_{RD,WR}_SH, ERROR, CONNECTED, WAIT_L4_CONN, WAIT_L6_CONN
+ * - absence of any of {L4,L6}_CONN and CONNECTED, indicating the
+ * end of handshake and transition to CONNECTED
+ * - end of handshakes
*/
- if (unlikely(!(conn->flags & (CO_FL_WAIT_L4_CONN | CO_FL_WAIT_L6_CONN | CO_FL_CONNECTED))))
- conn->flags |= CO_FL_CONNECTED;

/* The wake callback may be used to process a critical error and abort the
* connection. If so, we don't want to go further as the connection will
* have been released and the FD destroyed.
*/
if ((conn->flags & CO_FL_WAKE_DATA) &&
- ((conn->flags ^ flags) & CO_FL_CONN_STATE) &&
- conn->data->wake(conn) < 0)
+ (((conn->flags ^ flags) & CO_FL_CONN_STATE) ||
+ ((flags & CO_FL_HANDSHAKE) && !(conn->flags & CO_FL_HANDSHAKE))) &&
+ conn->data->wake(conn) < 0) {
return;
+ }
+
+ /* Now set the CO_FL_CONNECTED flag if the connection was just established. */
+
+ if (unlikely(!(conn->flags & (CO_FL_WAIT_L4_CONN | CO_FL_WAIT_L6_CONN | CO_FL_CONNECTED))))
+ conn->flags |= CO_FL_CONNECTED;

/* remove the events before leaving */
fdtab[fd].ev &= FD_POLL_STICKY;
--
2.8.0.rc2.1.gbe9624a
Willy Tarreau
Re: Problems with haproxy 1.7.3 on FreeBSD 11.0-p8
March 19, 2017 12:50PM
Hi,

On Sat, Mar 18, 2017 at 01:12:09PM +0100, Willy Tarreau wrote:
> OK here's a temporary patch. It includes a revert of the previous one and
> adds a condition for the wake-up. At least it passes all my tests, including
> those involving synchronous connection reports.
>
> I'm not merging it yet as I'm wondering whether a reliable definitive
> solution should be done once for all (and backported) by addressing the
> root cause instead of constantly working around its consequences.

And here come two patches as a replacement for this temporary one. They
are safer and have been done after throrough code review. I spotted a
small tens of dirty corner cases having accumulated over the years due
to the unclear meaning of the CO_FL_CONNECTED flag. They'll have to be
addressed, but the current patches protect against these corner cases.
They survived all tests involving delayed connections and checks with
and without all handshake combinations, with tcp (immediate and delayed
requests and responses) and http (immediate, delayed requests and responses
and pipelining).

I'm resending the first one you already got Dmitry to make things easier
to follow for everyone. These three are to be applied on top of 1.7.3. I
still have a few other issues to deal with regarding 1.7 before doing a
new release (hopefully by the beginning of this week).

Thanks,
Willy
From afbf56b951967e8fa4d509e423fdcb11c27d40e2 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <[email protected]>
Date: Tue, 14 Mar 2017 20:19:29 +0100
Subject: BUG/MAJOR: connection: update CO_FL_CONNECTED before calling the data
layer
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4

Matthias Fechner reported a regression in 1.7.3 brought by the backport
of commit 819efbf ("BUG/MEDIUM: tcp: don't poll for write when connect()
succeeds"), causing some connections to fail to establish once in a while.
While this commit itself was a fix for a bad sequencing of connection
events, it in fact unveiled a much deeper bug going back to the connection
rework era in v1.5-dev12 : 8f8c92f ("MAJOR: connection: add a new
CO_FL_CONNECTED flag").

It's worth noting that in a lab reproducing a similar environment as
Matthias' about only 1 every 19000 connections exhibit this behaviour,
making the issue not so easy to observe. A trick to make the problem
more observable consists in disabling non-blocking mode on the socket
before calling connect() and re-enabling it later, so that connect()
always succeeds. Then it becomes 100% reproducible.

The problem is that this CO_FL_CONNECTED flag is tested after deciding to
call the data layer (typically the stream interface but might be a health
check as well), and that the decision to call the data layer relies on a
change of one of the flags covered by the CO_FL_CONN_STATE set, which is
made of CO_FL_CONNECTED among others.

Before the fix above, this bug couldn't appear with TCP but it could
appear with Unix sockets. Indeed, connect() was always considered
blocking so the CO_FL_WAIT_L4_CONN connection flag was always set, and
polling for write events was always enabled. This used to guarantee that
the conn_fd_handler() could detect a change among the CO_FL_CONN_STATE
flags.

Now with the fix above, if a connect() immediately succeeds for non-ssl
connection with send-proxy enabled, and no data in the buffer (thus TCP
mode only), the CO_FL_WAIT_L4_CONN flag is not set, the lack of data in
the buffer doesn't enable polling flags for the data layer, the
CO_FL_CONNECTED flag is not set due to send-proxy still being pending,
and once send-proxy is done, its completion doesn't cause the data layer
to be woken up due to the fact that CO_FL_CONNECT is still not present
and that the CO_FL_SEND_PROXY flag is not watched in CO_FL_CONN_STATE.

Then no progress is made when data are received from the client (and
attempted to be forwarded), because a CF_WRITE_NULL (or CF_WRITE_PARTIAL)
flag is needed for the stream-interface state to turn from SI_ST_CON to
SI_ST_EST, allowing ->chk_snd() to be called when new data arrive. And
the only way to set this flag is to call the data layer of course.

After the connect timeout, the connection gets killed and if in the mean
time some data have accumulated in the buffer, the retry will succeed.

This patch fixes this situation by simply placing the update of
CO_FL_CONNECTED where it should have been, before the check for a flag
change needed to wake up the data layer and not after.

This fix must be backported to 1.7, 1.6 and 1.5. Versions not having
the patch above are still affected for unix sockets.

Special thanks to Matthias Fechner who provided a very detailed bug
report with a bisection designating the faulty patch, and to Olivier
Houchard for providing full access to a pretty similar environment where
the issue could first be reproduced.
(cherry picked from commit 7bf3fa3c23f6a1b7ed1212783507ac50f7e27544)
---
src/connection.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/connection.c b/src/connection.c
index 26fc5f6..1e4c9aa 100644
--- a/src/connection.c
+++ b/src/connection.c
@@ -131,6 +131,13 @@ void conn_fd_handler(int fd)
}

leave:
+ /* Verify if the connection just established. The CO_FL_CONNECTED flag
+ * being included in CO_FL_CONN_STATE, its change will be noticed by
+ * the next block and be used to wake up the data layer.
+ */
+ if (unlikely(!(conn->flags & (CO_FL_WAIT_L4_CONN | CO_FL_WAIT_L6_CONN | CO_FL_CONNECTED))))
+ conn->flags |= CO_FL_CONNECTED;
+
/* The wake callback may be used to process a critical error and abort the
* connection. If so, we don't want to go further as the connection will
* have been released and the FD destroyed.
@@ -140,10 +147,6 @@ void conn_fd_handler(int fd)
conn->data->wake(conn) < 0)
return;

- /* Last check, verify if the connection just established */
- if (unlikely(!(conn->flags & (CO_FL_WAIT_L4_CONN | CO_FL_WAIT_L6_CONN | CO_FL_CONNECTED))))
- conn->flags |= CO_FL_CONNECTED;
-
/* remove the events before leaving */
fdtab[fd].ev &= FD_POLL_STICKY;

--
2.8.0.rc2.1.gbe9624a

From 8ca791a60b4ebd8f5b5042c6e0bf09fe9d5e3308 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <[email protected]>
Date: Sat, 18 Mar 2017 17:11:37 +0100
Subject: BUG/MAJOR: stream-int: do not depend on connection flags to detect
connection
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4

Recent fix 7bf3fa3 ("BUG/MAJOR: connection: update CO_FL_CONNECTED before
calling the data layer") marked an end to a fragile situation where the
absence of CO_FL_{CONNECTED,L4,L6}* flags is used to mark the completion
of a connection setup. The problem is that by setting the CO_FL_CONNECTED
flag earlier, we can indeed call the ->wake() function from conn_fd_handler
but the stream-interface's wake function needs to see CO_FL_CONNECTED unset
to detect that a connection has just been established, so if there's no
pending data in the buffer, the connection times out. The other ->wake()
functions (health checks and idle connections) don't do this though.

So instead of trying to detect a subtle change in connection flags,
let's simply rely on the stream-interface's state and validate that the
connection is properly established and that handshakes are completed
before reporting the WRITE_NULL indicating that a pending connection was
just completed.

This patch passed all tests of handshake and non-handshake combinations,
with synchronous and asynchronous connect() and should be safe for backport
to 1.7, 1.6 and 1.5 when the fix above is already present.

(cherry picked from commit 52821e27376f89b41167565b01d975f47266284c)
---
src/stream_interface.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/stream_interface.c b/src/stream_interface.c
index 758aec7..651340b 100644
--- a/src/stream_interface.c
+++ b/src/stream_interface.c
@@ -563,7 +563,8 @@ static int si_conn_wake_cb(struct connection *conn)
if (conn->flags & CO_FL_ERROR)
si->flags |= SI_FL_ERR;

- if (unlikely(!(conn->flags & (CO_FL_WAIT_L4_CONN | CO_FL_WAIT_L6_CONN | CO_FL_CONNECTED)))) {
+ if ((si->state < SI_ST_EST) &&
+ (conn->flags & (CO_FL_CONNECTED | CO_FL_HANDSHAKE)) == CO_FL_CONNECTED) {
si->exp = TICK_ETERNITY;
oc->flags |= CF_WRITE_NULL;
}
--
2.8.0.rc2.1.gbe9624a

From 9a4e166382fb90a166ae91ad3071842cdef9cc27 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <[email protected]>
Date: Sun, 19 Mar 2017 07:54:28 +0100
Subject: BUG/MEDIUM: connection: ensure to always report the end of handshakes
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4

Despite the previous commit working fine on all tests, it's still not
sufficient to completely address the problem. If the connection handler
is called with an event validating an L4 connection but some handshakes
remain (eg: accept-proxy), it will still wake the function up, which
will not report the activity, and will not detect a change once the
handshake it complete so it will not notify the ->wake() handler.

In fact the only reason why the ->wake() handler is still called here
is because after dropping the last handshake, we try to call ->recv()
and ->send() in turn and change the flags in order to detect a data
activity. But if for any reason the data layer is not interested in
reading nor writing, it will not get these events.

A cleaner way to address this is to call the ->wake() handler only
on definitive status changes (shut, error), on real data activity,
and on a complete connection setup, measured as CONNECTED with no
more handshake pending.

It could be argued that the handshake flags have to be made part of
the condition to set CO_FL_CONNECTED but that would currently break
a part of the health checks. Also a handshake could appear at any
moment even after a connection is established so we'd lose the
ability to detect a second end of handshake.

For now the situation around CO_FL_CONNECTED is not clean :
- session_accept() only sets CO_FL_CONNECTED if there's no pending
handshake ;

- conn_fd_handler() will set it once L4 and L6 are complete, which
will do what session_accept() above refrained from doing even if
an accept_proxy handshake is still pending ;

- ssl_sock_infocbk() and ssl_sock_handshake() consider that a
handshake performed with CO_FL_CONNECTED set is a renegociation ;
=> they should instead filter on CO_FL_WAIT_L6_CONN

- all ssl_fc_* sample fetch functions wait for CO_FL_CONNECTED before
accepting to fetch information
=> they should also get rid of any pending handshake

- smp_fetch_fc_rcvd_proxy() uses !CO_FL_CONNECTED instead of
CO_FL_ACCEPT_PROXY

- health checks (standard and tcp-checks) don't check for HANDSHAKE
and may report a successful check based on CO_FL_CONNECTED while
not yet done (eg: send buffer full on send_proxy).

This patch aims at solving some of these side effects in a backportable
way before this is reworked in depth :
- we need to call ->wake() to report connection success, measure
connection time, notify that the data layer is ready and update
the data layer after activity ; this has to be done either if
we switch from pending {L4,L6}_CONN to nothing with no handshakes
left, or if we notice some handshakes were pending and are now
done.

- we document that CO_FL_CONNECTED exactly means "L4 connection
setup confirmed at least once, L6 connection setup confirmed
at least once or not necessary, all this regardless of any
possibly remaining handshakes or future L6 negociations".

This patch also renames CO_FL_CONN_STATUS to the more explicit
CO_FL_NOTIFY_DATA, and works around the previous flags trick consiting
in setting an impossible combination of flags to notify the data layer,
by simply clearing the current flags.

This fix should be backported to 1.7, 1.6 and 1.5.

(cherry picked from commit 3c0cc49d30968cf839a1d3a747de6adda18d26db)
---
include/types/connection.h | 10 +++++-----
src/connection.c | 41 ++++++++++++++++++++++++++---------------
2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/include/types/connection.h b/include/types/connection.h
index b65ad27..02eac93 100644
--- a/include/types/connection.h
+++ b/include/types/connection.h
@@ -95,15 +95,15 @@ enum {
CO_FL_SOCK_RD_SH = 0x00040000, /* SOCK layer was notified about shutr/read0 */
CO_FL_SOCK_WR_SH = 0x00080000, /* SOCK layer asked for shutw */

- /* flags used to report connection status and errors */
+ /* flags used to report connection errors or other closing conditions */
CO_FL_ERROR = 0x00100000, /* a fatal error was reported */
- CO_FL_CONNECTED = 0x00200000, /* the connection is now established */
+ CO_FL_NOTIFY_DATA = 0x001F0000, /* any shut/error flags above needs to be reported */
+
+ /* flags used to report connection status updates */
+ CO_FL_CONNECTED = 0x00200000, /* L4+L6 now ready ; extra handshakes may or may not exist */
CO_FL_WAIT_L4_CONN = 0x00400000, /* waiting for L4 to be connected */
CO_FL_WAIT_L6_CONN = 0x00800000, /* waiting for L6 to be connected (eg: SSL) */

- /* synthesis of the flags above */
- CO_FL_CONN_STATE = 0x00FF0000, /* all shut/connected flags */
-
/*** All the flags below are used for connection handshakes. Any new
* handshake should be added after this point, and CO_FL_HANDSHAKE
* should be updated.
diff --git a/src/connection.c b/src/connection.c
index 1e4c9aa..3629094 100644
--- a/src/connection.c
+++ b/src/connection.c
@@ -99,19 +99,21 @@ void conn_fd_handler(int fd)
*/
if (conn->xprt && fd_recv_ready(fd) &&
((conn->flags & (CO_FL_DATA_RD_ENA|CO_FL_WAIT_ROOM|CO_FL_ERROR|CO_FL_HANDSHAKE)) == CO_FL_DATA_RD_ENA)) {
- /* force detection of a flag change : it's impossible to have both
- * CONNECTED and WAIT_CONN so we're certain to trigger a change.
+ /* force reporting of activity by clearing the previous flags :
+ * we'll have at least ERROR or CONNECTED at the end of an I/O,
+ * both of which will be detected below.
*/
- flags = CO_FL_WAIT_L4_CONN | CO_FL_CONNECTED;
+ flags = 0;
conn->data->recv(conn);
}

if (conn->xprt && fd_send_ready(fd) &&
((conn->flags & (CO_FL_DATA_WR_ENA|CO_FL_WAIT_DATA|CO_FL_ERROR|CO_FL_HANDSHAKE)) == CO_FL_DATA_WR_ENA)) {
- /* force detection of a flag change : it's impossible to have both
- * CONNECTED and WAIT_CONN so we're certain to trigger a change.
+ /* force reporting of activity by clearing the previous flags :
+ * we'll have at least ERROR or CONNECTED at the end of an I/O,
+ * both of which will be detected below.
*/
- flags = CO_FL_WAIT_L4_CONN | CO_FL_CONNECTED;
+ flags = 0;
conn->data->send(conn);
}

@@ -129,21 +131,30 @@ void conn_fd_handler(int fd)
if (!tcp_connect_probe(conn))
goto leave;
}
-
leave:
- /* Verify if the connection just established. The CO_FL_CONNECTED flag
- * being included in CO_FL_CONN_STATE, its change will be noticed by
- * the next block and be used to wake up the data layer.
- */
+ /* Verify if the connection just established. */
if (unlikely(!(conn->flags & (CO_FL_WAIT_L4_CONN | CO_FL_WAIT_L6_CONN | CO_FL_CONNECTED))))
conn->flags |= CO_FL_CONNECTED;

- /* The wake callback may be used to process a critical error and abort the
- * connection. If so, we don't want to go further as the connection will
- * have been released and the FD destroyed.
+ /* The wake callback is normally used to notify the data layer about
+ * data layer activity (successful send/recv), connection establishment,
+ * shutdown and fatal errors. We need to consider the following
+ * situations to wake up the data layer :
+ * - change among the CO_FL_NOTIFY_DATA flags :
+ * {DATA,SOCK}_{RD,WR}_SH, ERROR,
+ * - absence of any of {L4,L6}_CONN and CONNECTED, indicating the
+ * end of handshake and transition to CONNECTED
+ * - raise of CONNECTED with HANDSHAKE down
+ * - end of HANDSHAKE with CONNECTED set
+ * - regular data layer activity
+ *
+ * Note that the wake callback is allowed to release the connection and
+ * the fd (and return < 0 in this case).
*/
if ((conn->flags & CO_FL_WAKE_DATA) &&
- ((conn->flags ^ flags) & CO_FL_CONN_STATE) &&
+ (((conn->flags ^ flags) & CO_FL_NOTIFY_DATA) ||
+ ((flags & (CO_FL_CONNECTED|CO_FL_HANDSHAKE)) != CO_FL_CONNECTED &&
+ (conn->flags & (CO_FL_CONNECTED|CO_FL_HANDSHAKE)) == CO_FL_CONNECTED)) &&
conn->data->wake(conn) < 0)
return;

--
2.8.0.rc2.1.gbe9624a
Dmitry Sivachenko
Re: Problems with haproxy 1.7.3 on FreeBSD 11.0-p8
March 19, 2017 05:50PM
> On 19 Mar 2017, at 14:40, Willy Tarreau <[email protected]> wrote:
>
> Hi,
>
> On Sat, Mar 18, 2017 at 01:12:09PM +0100, Willy Tarreau wrote:
>> OK here's a temporary patch. It includes a revert of the previous one and
>> adds a condition for the wake-up. At least it passes all my tests, including
>> those involving synchronous connection reports.
>>
>> I'm not merging it yet as I'm wondering whether a reliable definitive
>> solution should be done once for all (and backported) by addressing the
>> root cause instead of constantly working around its consequences.
>
> And here come two patches as a replacement for this temporary one. They
> are safer and have been done after throrough code review. I spotted a
> small tens of dirty corner cases having accumulated over the years due
> to the unclear meaning of the CO_FL_CONNECTED flag. They'll have to be
> addressed, but the current patches protect against these corner cases.
> They survived all tests involving delayed connections and checks with
> and without all handshake combinations, with tcp (immediate and delayed
> requests and responses) and http (immediate, delayed requests and responses
> and pipelining).
>
> I'm resending the first one you already got Dmitry to make things easier
> to follow for everyone. These three are to be applied on top of 1.7.3. I
> still have a few other issues to deal with regarding 1.7 before doing a
> new release (hopefully by the beginning of this week).



Thank a lot!

I just incorporated the latest fixes to FreeBSD ports tree.
Matthias Fechner
Re: Problems with haproxy 1.7.3 on FreeBSD 11.0-p8
March 20, 2017 10:10AM
Hi Willy, Hi Dmitry,

Am 19.03.2017 um 12:40 schrieb Willy Tarreau:
> And here come two patches as a replacement for this temporary one. They
> are safer and have been done after throrough code review. I spotted a
> small tens of dirty corner cases having accumulated over the years due
> to the unclear meaning of the CO_FL_CONNECTED flag. They'll have to be
> addressed, but the current patches protect against these corner cases.
> They survived all tests involving delayed connections and checks with
> and without all handshake combinations, with tcp (immediate and delayed
> requests and responses) and http (immediate, delayed requests and responses
> and pipelining).
>
> I'm resending the first one you already got Dmitry to make things easier
> to follow for everyone. These three are to be applied on top of 1.7.3. I
> still have a few other issues to deal with regarding 1.7 before doing a
> new release (hopefully by the beginning of this week).
>

thank you both.
I have the patch running in PRD and face no problems so far.
I will deeply test them the next days and in case I find something I
will report it immediatly.


Gruß
Matthias

--

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the universe trying to
produce bigger and better idiots. So far, the universe is winning." --
Rich Cook
Willy Tarreau
Re: Problems with haproxy 1.7.3 on FreeBSD 11.0-p8
March 20, 2017 10:20AM
On Mon, Mar 20, 2017 at 10:07:41AM +0100, Matthias Fechner wrote:
> Hi Willy, Hi Dmitry,
>
> Am 19.03.2017 um 12:40 schrieb Willy Tarreau:
> > And here come two patches as a replacement for this temporary one. They
> > are safer and have been done after throrough code review. I spotted a
> > small tens of dirty corner cases having accumulated over the years due
> > to the unclear meaning of the CO_FL_CONNECTED flag. They'll have to be
> > addressed, but the current patches protect against these corner cases.
> > They survived all tests involving delayed connections and checks with
> > and without all handshake combinations, with tcp (immediate and delayed
> > requests and responses) and http (immediate, delayed requests and responses
> > and pipelining).
> >
> > I'm resending the first one you already got Dmitry to make things easier
> > to follow for everyone. These three are to be applied on top of 1.7.3. I
> > still have a few other issues to deal with regarding 1.7 before doing a
> > new release (hopefully by the beginning of this week).
> >
>
> thank you both.
> I have the patch running in PRD and face no problems so far.
> I will deeply test them the next days and in case I find something I
> will report it immediatly.

Much appreciated, thank you Matthias!

Willy
Sorry, only registered users may post in this forum.

Click here to login