Welcome! Log In Create A New Profile

Advanced

HAProxy 1.8.X crashing

Posted by UPPALAPATI, PRAVEEN 
UPPALAPATI, PRAVEEN
HAProxy 1.8.X crashing
April 11, 2018 04:20PM
Hi Haproxy-Team,

I tried compiling different minor versions of 1.8.x releases and all the minor versions are crashing whe trying to use option http-proxy:

Configuration that is causing issue:

listen http_proxy-8888
bind *:9876
mode http
option httplog
http-request set-uri http://%[url_param(idnsredirHost)]%[capture.req.uri<http://%25[url_param(idnsredirHost)]%25[capture.req.uri>]
option http_proxy

If I don't use option http_proxy things work normally. Following is from the core dump:

: #0 0x0000000000454839 in cs_destroy (cs=0x207edd0) at include/proto/connection.h:704
704 cs->conn->mux->detach(cs);
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x 86_64 libselinux-2.5-11.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x0000000000454839 in cs_destroy (cs=0x207edd0) at include/proto/connection.h:704
#1 si_release_endpoint (si=0x2083540) at include/proto/stream_interface.h:162
#2 stream_free (s=0x20832e0) at src/stream.c:398
#3 process_stream (t=<optimized out>) at src/stream.c:2513
#4 0x00000000004bc38e in process_runnable_tasks () at src/task.c:229
#5 0x0000000000408d9c in run_poll_loop () at src/haproxy.c:2399
#6 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2461
#7 main (argc=<optimized out>, argv=0x7ffe6e2cf2d8) at src/haproxy.c:3065
(gdb) quit



HaProxy Install Details:

HA-Proxy version 1.8.4-1deb90d 2018/02/08
Copyright 2000-2018 Willy Tarreau <[email protected]>

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label
OPTIONS = USE_LIBCRYPT=1 USE_ZLIB=1 USE_THREAD=1 USE_OPENSSL=1 USE_PCRE=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

Please let me know what's the root cause this option works fine with 1.7.x version.

Thanks,
Praveen.
Olivier Houchard
Re: HAProxy 1.8.X crashing
April 11, 2018 05:40PM
Hi Praveen,

On Wed, Apr 11, 2018 at 02:16:28PM +0000, UPPALAPATI, PRAVEEN wrote:
> Hi Haproxy-Team,
>
> I tried compiling different minor versions of 1.8.x releases and all the minor versions are crashing whe trying to use option http-proxy:
>
> Configuration that is causing issue:
>
> listen http_proxy-8888
> bind *:9876
> mode http
> option httplog
> http-request set-uri http://%[url_param(idnsredirHost)]%[capture.req.uri<http://%25[url_param(idnsredirHost)]%25[capture.req.uri>]
> option http_proxy
>
> If I don't use option http_proxy things work normally. Following is from the core dump:
>
> : #0 0x0000000000454839 in cs_destroy (cs=0x207edd0) at include/proto/connection.h:704
> 704 cs->conn->mux->detach(cs);
> Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-8.el7.x86_64 libcom_err-1.42.9-10.el7.x 86_64 libselinux-2.5-11.el7.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
> (gdb) bt
> #0 0x0000000000454839 in cs_destroy (cs=0x207edd0) at include/proto/connection.h:704
> #1 si_release_endpoint (si=0x2083540) at include/proto/stream_interface.h:162
> #2 stream_free (s=0x20832e0) at src/stream.c:398
> #3 process_stream (t=<optimized out>) at src/stream.c:2513
> #4 0x00000000004bc38e in process_runnable_tasks () at src/task.c:229
> #5 0x0000000000408d9c in run_poll_loop () at src/haproxy.c:2399
> #6 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2461
> #7 main (argc=<optimized out>, argv=0x7ffe6e2cf2d8) at src/haproxy.c:3065
> (gdb) quit
>
>
[...]
> Please let me know what's the root cause this option works fine with 1.7.x version.
>

It's related to changes we made in the architecture in 1.8.
The attached patch should fix it. It was made for master, but should apply to
1.8 as well.

Thanks for reporting !

Olivier
From 7c9f06727cf60acf873353ac71283ff9c562aeee Mon Sep 17 00:00:00 2001
From: Olivier Houchard <[email protected]>
Date: Wed, 11 Apr 2018 17:23:17 +0200
Subject: [PATCH] BUG/MINOR: connection: Setup a mux when in proxy mode.

We were allocating a new connection when in proxy mode, but did not provide
it a mux, thus crashing later.

This should be backported to 1.8.
---
src/proto_http.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/src/proto_http.c b/src/proto_http.c
index 80e001d69..817692c48 100644
--- a/src/proto_http.c
+++ b/src/proto_http.c
@@ -62,6 +62,7 @@
#include <proto/h1.h>
#include <proto/log.h>
#include <proto/hdr_idx.h>
+#include <proto/mux_pt.h>
#include <proto/pattern.h>
#include <proto/proto_tcp.h>
#include <proto/proto_http.h>
@@ -3718,6 +3719,8 @@ int http_process_request(struct stream *s, struct channel *req, int an_bit)

return 0;
}
+ /* XXX: We probably need a better mux */
+ conn_install_mux(conn, &mux_pt_ops, objt_cs(s->si[1].end));

path = http_get_path(txn);
url2sa(req->buf->p + msg->sl.rq.u,
--
2.14.3
Willy Tarreau
Re: HAProxy 1.8.X crashing
April 12, 2018 09:00AM
Hi Olivier,

On Wed, Apr 11, 2018 at 05:29:15PM +0200, Olivier Houchard wrote:
> From 7c9f06727cf60acf873353ac71283ff9c562aeee Mon Sep 17 00:00:00 2001
> From: Olivier Houchard <[email protected]>
> Date: Wed, 11 Apr 2018 17:23:17 +0200
> Subject: [PATCH] BUG/MINOR: connection: Setup a mux when in proxy mode.
>
> We were allocating a new connection when in proxy mode, but did not provide
> it a mux, thus crashing later.
>
> This should be backported to 1.8.
> ---
> src/proto_http.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/src/proto_http.c b/src/proto_http.c
> index 80e001d69..817692c48 100644
> --- a/src/proto_http.c
> +++ b/src/proto_http.c
> @@ -62,6 +62,7 @@
> #include <proto/h1.h>
> #include <proto/log.h>
> #include <proto/hdr_idx.h>
> +#include <proto/mux_pt.h>
> #include <proto/pattern.h>
> #include <proto/proto_tcp.h>
> #include <proto/proto_http.h>
> @@ -3718,6 +3719,8 @@ int http_process_request(struct stream *s, struct channel *req, int an_bit)
>
> return 0;
> }
> + /* XXX: We probably need a better mux */
> + conn_install_mux(conn, &mux_pt_ops, objt_cs(s->si[1].end));
>
> path = http_get_path(txn);
> url2sa(req->buf->p + msg->sl.rq.u,

While I can understand how missing this can have caused trouble, I'm not
sure it's the best solution, and I'm a bit worried about having various
code places choose a mux and install it. I suspect we're "simply" missing
a test in the backend code to cover the case where the connection already
exists but a mux was not yet installed (a situation that didn't exist
prior to muxes which is why we didn't notice this). I think that this
happens in connect_server() around line 1177 (return SF_ERR_INTERNAL)
when conn_xprt_ready() indicates the transport was not yet initialized.
It's likely that the error unrolling fails to consider the case where
the mux wasn't initialized, leading to the crash Praveen experienced.

If this is right, it would mean two fixes :
- one in the error unrolling path to ensure we don't destroy a non
allocated mux ;
- one above the return SF_ERR_INTERNAL above in connect_server() to
handle the case where the connection already exists but not the mux.

What do you think ?

Willy
Olivier Houchard
Re: HAProxy 1.8.X crashing
April 12, 2018 01:20PM
Hi Willy,

On Thu, Apr 12, 2018 at 08:53:51AM +0200, Willy Tarreau wrote:
> Hi Olivier,
>
> On Wed, Apr 11, 2018 at 05:29:15PM +0200, Olivier Houchard wrote:
> > From 7c9f06727cf60acf873353ac71283ff9c562aeee Mon Sep 17 00:00:00 2001
> > From: Olivier Houchard <[email protected]>
> > Date: Wed, 11 Apr 2018 17:23:17 +0200
> > Subject: [PATCH] BUG/MINOR: connection: Setup a mux when in proxy mode.
> >
> > We were allocating a new connection when in proxy mode, but did not provide
> > it a mux, thus crashing later.
> >
> > This should be backported to 1.8.
> > ---
> > src/proto_http.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/src/proto_http.c b/src/proto_http.c
> > index 80e001d69..817692c48 100644
> > --- a/src/proto_http.c
> > +++ b/src/proto_http.c
> > @@ -62,6 +62,7 @@
> > #include <proto/h1.h>
> > #include <proto/log.h>
> > #include <proto/hdr_idx.h>
> > +#include <proto/mux_pt.h>
> > #include <proto/pattern.h>
> > #include <proto/proto_tcp.h>
> > #include <proto/proto_http.h>
> > @@ -3718,6 +3719,8 @@ int http_process_request(struct stream *s, struct channel *req, int an_bit)
> >
> > return 0;
> > }
> > + /* XXX: We probably need a better mux */
> > + conn_install_mux(conn, &mux_pt_ops, objt_cs(s->si[1].end));
> >
> > path = http_get_path(txn);
> > url2sa(req->buf->p + msg->sl.rq.u,
>
> While I can understand how missing this can have caused trouble, I'm not
> sure it's the best solution, and I'm a bit worried about having various
> code places choose a mux and install it. I suspect we're "simply" missing
> a test in the backend code to cover the case where the connection already
> exists but a mux was not yet installed (a situation that didn't exist
> prior to muxes which is why we didn't notice this). I think that this
> happens in connect_server() around line 1177 (return SF_ERR_INTERNAL)
> when conn_xprt_ready() indicates the transport was not yet initialized.
> It's likely that the error unrolling fails to consider the case where
> the mux wasn't initialized, leading to the crash Praveen experienced.
>

I'm not sure about this. My point was to try to install the mux when we can
choose one, the day we have more relevant muxes, if it ever happens.
This certainly fixes Praveen's crash.
I don't even know how we could reach line 1177.

> If this is right, it would mean two fixes :
> - one in the error unrolling path to ensure we don't destroy a non
> allocated mux ;

And what do we do then ? Without the mux we can't destroy the cs nor the
connection.

> - one above the return SF_ERR_INTERNAL above in connect_server() to
> handle the case where the connection already exists but not the mux.

I'm not sure how this can happen, and what we should do if that happens.

Regards,

Olivier
Willy Tarreau
Re: HAProxy 1.8.X crashing
April 12, 2018 11:40PM
On Thu, Apr 12, 2018 at 01:14:58PM +0200, Olivier Houchard wrote:
> > > @@ -3718,6 +3719,8 @@ int http_process_request(struct stream *s, struct channel *req, int an_bit)
> > >
> > > return 0;
> > > }
> > > + /* XXX: We probably need a better mux */
> > > + conn_install_mux(conn, &mux_pt_ops, objt_cs(s->si[1].end));
> > >
> > > path = http_get_path(txn);
> > > url2sa(req->buf->p + msg->sl.rq.u,
> >
> > While I can understand how missing this can have caused trouble, I'm not
> > sure it's the best solution, and I'm a bit worried about having various
> > code places choose a mux and install it. I suspect we're "simply" missing
> > a test in the backend code to cover the case where the connection already
> > exists but a mux was not yet installed (a situation that didn't exist
> > prior to muxes which is why we didn't notice this). I think that this
> > happens in connect_server() around line 1177 (return SF_ERR_INTERNAL)
> > when conn_xprt_ready() indicates the transport was not yet initialized.
> > It's likely that the error unrolling fails to consider the case where
> > the mux wasn't initialized, leading to the crash Praveen experienced.
> >
>
> I'm not sure about this. My point was to try to install the mux when we can
> choose one, the day we have more relevant muxes, if it ever happens.
> This certainly fixes Praveen's crash.

I totally understand, I just claim that I think we're likely buying a
carpet to put it on the dust instead of removing the dust. It's more of
a conceptual issue : you're forced to select an arbitrary mux at a moment
where you have no idea which one will be needed for this outgoing connection.
In 1.9 it could be H2, later it could be FCGI, etc. And generally when we
have to fill arbitrary information, we pay it the high price later because
it creates unnatural bonds between some unrelated elements.

The single fact that we're forced to pick an arbitrary mux to avoid a crash
somewhere else tells me we have an issue. If one specific code path doesn't
support being called with a muxless connection, this same code path might
also be triggered when the malloc() to create the mux fails and we want to
gracefully abort and try to close this muxless connection after sending a
500 Server Error status.

Also, the only reason why we create the connection this early is that we
need a place where to put the IP addresses, and given that connections
are small (only very slightly larger than the addresses they convey), it
totally makes sense to pre-populate the address there. If we start to add
more complex muxes, we'll even possibly end up allocating more resources
(think h2+hpack contexts), trying to connect and to initiate a handshake
to a target before the rest is finished or whatever funny stuff.

Now don't get me wrong, it would very well be that for now, given the
current state of the code, we're missing many checks and this solution
will be an efficient short-term fix. But for the reasons above it doesn't
sound logical to me. Maybe we're simply missing an "if (conn->mux)" or
"if (cs->mux)" somewhere on the free path before calling something for
the mux because it was assumed that it did obvously exist.

> I don't even know how we could reach line 1177.

I'm uncertain to be honest, I just think that there is a small possibility
if the connection exists and not the transport, but I could be wrong.
>
> > If this is right, it would mean two fixes :
> > - one in the error unrolling path to ensure we don't destroy a non
> > allocated mux ;
>
> And what do we do then ? Without the mux we can't destroy the cs nor the
> connection.

That may be the problem. But if a mux fails to initialize, then we can't
destroy them either ? Is cs_destroy() the problem here maybe ? If so, I
suspect that this approach would be more robust and would better match
the validity tests on conn->mux that are performed at many places in the
connection code :

if (cs->conn->mux) {
cs->conn->mux->detach(cs);
} else {
conn_stop_tracking(cs->conn);
conn_xprt_close(cs->conn);
conn_free(cs->conn);
}
cs_free(cs);

> > - one above the return SF_ERR_INTERNAL above in connect_server() to
> > handle the case where the connection already exists but not the mux.
>
> I'm not sure how this can happen, and what we should do if that happens.

At least in the current situation we pre-create a connection just to start
to fill some information (source/destination address), without knowing the
mux yet. It's an initialization process, it may look beautiful or ugly
depending on the perspective, but in any case we must ensure that the
code overall remains reliable in this situation. And to me it looks less
ugly than picking a possibly wrong mux just to have a valid pointer
avoiding a segfault.

But if it turns out that stabilizing the destroy path (or error path) is
not sufficient and that it breaks somewhere else, we can well adopt this
solution for the short term and see how to address it for the long term;
we already know that the outgoing connection code is severely limited to
the point of currently preventing us from using H2 on the backend, and
that's exactly why we're currently working on it.

Cheers,
Willy
Olivier Houchard
Re: HAProxy 1.8.X crashing
April 13, 2018 04:00PM
Hi,

On Thu, Apr 12, 2018 at 11:33:43PM +0200, Willy Tarreau wrote:
> That may be the problem. But if a mux fails to initialize, then we can't
> destroy them either ? Is cs_destroy() the problem here maybe ? If so, I
> suspect that this approach would be more robust and would better match
> the validity tests on conn->mux that are performed at many places in the
> connection code :
>
> if (cs->conn->mux) {
> cs->conn->mux->detach(cs);
> } else {
> conn_stop_tracking(cs->conn);
> conn_xprt_close(cs->conn);
> conn_free(cs->conn);
> }
> cs_free(cs);
>
> > > - one above the return SF_ERR_INTERNAL above in connect_server() to
> > > handle the case where the connection already exists but not the mux.
> >
> > I'm not sure how this can happen, and what we should do if that happens.
>
> At least in the current situation we pre-create a connection just to start
> to fill some information (source/destination address), without knowing the
> mux yet. It's an initialization process, it may look beautiful or ugly
> depending on the perspective, but in any case we must ensure that the
> code overall remains reliable in this situation. And to me it looks less
> ugly than picking a possibly wrong mux just to have a valid pointer
> avoiding a segfault.
>
> But if it turns out that stabilizing the destroy path (or error path) is
> not sufficient and that it breaks somewhere else, we can well adopt this
> solution for the short term and see how to address it for the long term;
> we already know that the outgoing connection code is severely limited to
> the point of currently preventing us from using H2 on the backend, and
> that's exactly why we're currently working on it.

Ok, here is a patch that does exactly what you suggest. I'm not entirely
happy with it, but it'll do the job, as a stopgap. I want this crash
fixed :)

Olivier
From 84f50217a88b2c8bb5152045e912a0e39f12d204 Mon Sep 17 00:00:00 2001
From: Olivier Houchard <[email protected]>
Date: Fri, 13 Apr 2018 15:50:27 +0200
Subject: [PATCH] BUG/MINOR: connection: Make sure we have a mux before calling
detach().

In some cases, we call cs_destroy() very early, so early the connection
doesn't yet have a mux, so we can't call mux->detach(). In this case,
just destroy the associated connection.

This should be backported to 1.8.
---
include/proto/connection.h | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/proto/connection.h b/include/proto/connection.h
index bc8d88484..8566736fd 100644
--- a/include/proto/connection.h
+++ b/include/proto/connection.h
@@ -699,7 +699,20 @@ static inline void conn_free(struct connection *conn)
/* Release a conn_stream, and kill the connection if it was the last one */
static inline void cs_destroy(struct conn_stream *cs)
{
- cs->conn->mux->detach(cs);
+ if (cs->conn->mux)
+ cs->conn->mux->detach(cs);
+ else {
+ /* It's too early to have a mux, let's just destroy
+ * the connection
+ */
+ struct connection *conn = cs->conn;
+
+ conn_stop_tracking(conn);
+ conn_full_close(conn);
+ if (conn->destroy_cb)
+ conn->destroy_cb(conn);
+ conn_free(conn);
+ }
cs_free(cs);
}

--
2.14.3
Willy Tarreau
Re: HAProxy 1.8.X crashing
April 13, 2018 04:10PM
On Fri, Apr 13, 2018 at 03:56:31PM +0200, Olivier Houchard wrote:
> Ok, here is a patch that does exactly what you suggest. I'm not entirely
> happy with it, but it'll do the job, as a stopgap. I want this crash
> fixed :)

Great, now merged, thank you! If you don't mind, I changed the tag to
BUG/MEDIUM since the bug is causing a crash. And if you mind it's too
late :-)

Willy
UPPALAPATI, PRAVEEN
RE: HAProxy 1.8.X crashing
April 13, 2018 04:10PM
Hi Oliver,

The crash got fixed with the patch you provided before.

Do you thing the latest patch will be the right solution?

Thanks,
Praveen.

-----Original Message-----
From: Olivier Houchard [mailto:[email protected]]
Sent: Friday, April 13, 2018 8:57 AM
To: Willy Tarreau <[email protected]>
Cc: UPPALAPATI, PRAVEEN <[email protected]>; haproxy@formilux.org
Subject: Re: HAProxy 1.8.X crashing

Hi,

On Thu, Apr 12, 2018 at 11:33:43PM +0200, Willy Tarreau wrote:
> That may be the problem. But if a mux fails to initialize, then we can't
> destroy them either ? Is cs_destroy() the problem here maybe ? If so, I
> suspect that this approach would be more robust and would better match
> the validity tests on conn->mux that are performed at many places in the
> connection code :
>
> if (cs->conn->mux) {
> cs->conn->mux->detach(cs);
> } else {
> conn_stop_tracking(cs->conn);
> conn_xprt_close(cs->conn);
> conn_free(cs->conn);
> }
> cs_free(cs);
>
> > > - one above the return SF_ERR_INTERNAL above in connect_server() to
> > > handle the case where the connection already exists but not the mux.
> >
> > I'm not sure how this can happen, and what we should do if that happens..
>
> At least in the current situation we pre-create a connection just to start
> to fill some information (source/destination address), without knowing the
> mux yet. It's an initialization process, it may look beautiful or ugly
> depending on the perspective, but in any case we must ensure that the
> code overall remains reliable in this situation. And to me it looks less
> ugly than picking a possibly wrong mux just to have a valid pointer
> avoiding a segfault.
>
> But if it turns out that stabilizing the destroy path (or error path) is
> not sufficient and that it breaks somewhere else, we can well adopt this
> solution for the short term and see how to address it for the long term;
> we already know that the outgoing connection code is severely limited to
> the point of currently preventing us from using H2 on the backend, and
> that's exactly why we're currently working on it.

Ok, here is a patch that does exactly what you suggest. I'm not entirely
happy with it, but it'll do the job, as a stopgap. I want this crash
fixed :)

Olivier
Olivier Houchard
Re: HAProxy 1.8.X crashing
April 13, 2018 04:20PM
Hi Praveen,

On Fri, Apr 13, 2018 at 02:03:47PM +0000, UPPALAPATI, PRAVEEN wrote:
> Hi Oliver,
>
> The crash got fixed with the patch you provided before.
>
> Do you thing the latest patch will be the right solution?
>
> Thanks,
> Praveen.
>

It should be fine.
Regards,

Olivier
Willy Tarreau
Re: HAProxy 1.8.X crashing
April 13, 2018 04:20PM
Hi Praveen,

On Fri, Apr 13, 2018 at 02:03:47PM +0000, UPPALAPATI, PRAVEEN wrote:
> Hi Oliver,
>
> The crash got fixed with the patch you provided before.

Thanks for this useful feedback!

> Do you thing the latest patch will be the right solution?

It should, and if it doesn't, it means we have other bugs at other places
that could still strike on rare situations (eg: memory shortage), so that's
why I really want to ensure that we make the code more robust against this
case. The muxes are a very new thing introduced into haproxy, I'm not that
much surprized that we got a few such inconsistencies, but in general since
they affect everything, any such bug there has a very high risk of being
triggered, which is why I'm confident enough for the long term : if you
don't get it anymore, it's really likely that it's fixed.

Thanks,
Willy
Sorry, only registered users may post in this forum.

Click here to login