Welcome! Log In Create A New Profile

Advanced

Hang in haproxy 1.8.13

Posted by David King 
David King
Hang in haproxy 1.8.13
September 11, 2018 10:10AM
Hi all

i was hoping for some help with see if this is a bug or a mis config

i have a config which works fine in 1.7.9 and 1.7.11, but when running it
1.8.13 is seems to fail to respond to the client

I've simplified down and anonymised the config, but basically it's being
used to do A/B failiover at haproxy so it uses nested frontend and backends
with unix sockets to achieve this

The config is

------
global
pidfile /var/run/haproxy.pid
stats socket /var/run/haproxy.sock mode 600 level admin
stats timeout 2m
maxconn 4000
unix-bind user root mode 666
daemon

defaults
timeout connect 15s
timeout server 1m
timeout http-request 15s
timeout check 15s

frontend frontend1
bind 127.0.0.1:8080
mode http
timeout client 1m
default_backend api

frontend api-A
bind /var/run/haproxy/api-A.sock
mode http
default_backend api-A

frontend api-B
bind /var/run/haproxy/api-B.sock
mode http
default_backend api-B

backend api
mode http
balance roundrobin
server api-A /var/run/haproxy/api-A.sock disabled
server api-B /var/run/haproxy/api-B.sock

backend api-A
mode http
balance roundrobin
option forwardfor
server server-1-A <redacted>
server server-2-B <redacted>

backend api-B
mode http
balance roundrobin
option forwardfor
server server-1-B <redacted>
server server-2-B <redacted>
---------

When running with 1.7.11, its all nice and quick

curl http://10.2.74.41:8443 -vvv
* Rebuilt URL to: http://10.2.74.41:8443/
* Trying 10.2.74.41...
* TCP_NODELAY set
* Connected to 10.2.74.41 (10.2.74.41) port 8443 (#0)
> GET / HTTP/1.1
> Host: 10.2.74.41:8443
> User-Agent: curl/7.61.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: application/problem+json; charset=utf-8
< Date: Tue, 11 Sep 2018 07:50:54 GMT
< Content-Length: 27
<
* Connection #0 to host 10.2.74.41 left intact

00000004:frontend1.accept(0005)=0008 from [10.2.74.41:44161]
00000004:frontend1.clireq[0008:ffffffff]: GET / HTTP/1.1
00000004:frontend1.clihdr[0008:ffffffff]: Host: 10.2.74.41:8443
00000004:frontend1.clihdr[0008:ffffffff]: User-Agent: curl/7.61.0
00000004:frontend1.clihdr[0008:ffffffff]: Accept: */*
00000005:api-B.accept(0007)=000a from [unix:1]
00000005:api-B.clireq[000a:ffffffff]: GET / HTTP/1.1
00000005:api-B.clihdr[000a:ffffffff]: Host: 10.2.74.41:8443
00000005:api-B.clihdr[000a:ffffffff]: User-Agent: curl/7.61.0
00000005:api-B.clihdr[000a:ffffffff]: Accept: */*
00000005:api-B.srvrep[000a:000b]: HTTP/1.1 404 Not Found
00000005:api-B.srvhdr[000a:000b]: Content-Type: application/problem+json;
charset=utf-8
00000005:api-B.srvhdr[000a:000b]: Date: Tue, 11 Sep 2018 07:51:22 GMT
00000005:api-B.srvhdr[000a:000b]: Content-Length: 27
00000004:api.srvrep[0008:0009]: HTTP/1.1 404 Not Found
00000004:api.srvhdr[0008:0009]: Content-Type: application/problem+json;
charset=utf-8
00000004:api.srvhdr[0008:0009]: Date: Tue, 11 Sep 2018 07:51:22 GMT
00000004:api.srvhdr[0008:0009]: Content-Length: 27
00000007:frontend1.clicls[0008:0009]
00000007:frontend1.closed[0008:0009]
00000006:api-B.clicls[000a:000b]
00000006:api-B.closed[000a:000b]


however with 1.8.13 the client doesn't gets the response and ends up with
a 504, however the haproxy debug log shows the response is reviced by
haproxy, but never passed to the client

curl http://10.2.74.41:8443 -vvv
* Rebuilt URL to: http://10.2.74.41:8443/
* Trying 10.2.74.41...
* TCP_NODELAY set
* Connected to 10.2.74.41 (10.2.74.41) port 8443 (#0)
> GET / HTTP/1.1
> Host: 10.2.74.41:8443
> User-Agent: curl/7.61.0
> Accept: */*
>
*** HANGS HERE TILL 504 ****

from haproxy
00000000:frontend1.accept(0006)=000b from [10.2.74.41:57043] ALPN=<none>
00000000:frontend1.clireq[000b:ffffffff]: GET / HTTP/1.1
00000000:frontend1.clihdr[000b:ffffffff]: Host: 10.2.74.41:8443
00000000:frontend1.clihdr[000b:ffffffff]: User-Agent: curl/7.61.0
00000000:frontend1.clihdr[000b:ffffffff]: Accept: */*
00000001:api-A.accept(0007)=000d from [unix:1] ALPN=<none>
00000001:api-A.clireq[000d:ffffffff]: GET / HTTP/1.1
00000001:api-A.clihdr[000d:ffffffff]: Host: 10.2.74.41:8443
00000001:api-A.clihdr[000d:ffffffff]: User-Agent: curl/7.61.0
00000001:api-A.clihdr[000d:ffffffff]: Accept: */*
00000001:api-A.srvrep[000d:000e]: HTTP/1.1 404 Not Found
00000001:api-A.srvhdr[000d:000e]: Content-Type: application/problem+json;
charset=utf-8
00000001:api-A.srvhdr[000d:000e]: Date: Tue, 11 Sep 2018 07:48:39 GMT
00000001:api-A.srvhdr[000d:000e]: Content-Length: 27
** 5 seconds
00000002:api-A.clicls[adfd:ffffffff]
00000002:api-A.closed[adfd:ffffffff]
** 15 seconds
00000000:api.srvcls[000b:adfd]
00000000:api.clicls[adfd:adfd]
00000000:api.closed[adfd:adfd]



any idea why this happens? sorry for long post!

Dave
David King
Re: Hang in haproxy 1.8.13
September 11, 2018 12:00PM
Apologies, i forgot to mention this is running on FreeBSD 11.1

I've just run the same tests on Centos and there is no issue

On Tue, 11 Sep 2018 at 09:05, David King <[email protected]>
wrote:

> Hi all
>
> i was hoping for some help with see if this is a bug or a mis config
>
> i have a config which works fine in 1.7.9 and 1.7.11, but when running it
> 1.8.13 is seems to fail to respond to the client
>
> I've simplified down and anonymised the config, but basically it's being
> used to do A/B failiover at haproxy so it uses nested frontend and backends
> with unix sockets to achieve this
>
> The config is
>
> ------
> global
> pidfile /var/run/haproxy.pid
> stats socket /var/run/haproxy.sock mode 600 level admin
> stats timeout 2m
> maxconn 4000
> unix-bind user root mode 666
> daemon
>
> defaults
> timeout connect 15s
> timeout server 1m
> timeout http-request 15s
> timeout check 15s
>
> frontend frontend1
> bind 127.0.0.1:8080
> mode http
> timeout client 1m
> default_backend api
>
> frontend api-A
> bind /var/run/haproxy/api-A.sock
> mode http
> default_backend api-A
>
> frontend api-B
> bind /var/run/haproxy/api-B.sock
> mode http
> default_backend api-B
>
> backend api
> mode http
> balance roundrobin
> server api-A /var/run/haproxy/api-A.sock disabled
> server api-B /var/run/haproxy/api-B.sock
>
> backend api-A
> mode http
> balance roundrobin
> option forwardfor
> server server-1-A <redacted>
> server server-2-B <redacted>
>
> backend api-B
> mode http
> balance roundrobin
> option forwardfor
> server server-1-B <redacted>
> server server-2-B <redacted>
> ---------
>
> When running with 1.7.11, its all nice and quick
>
> curl http://10.2.74.41:8443 -vvv
> * Rebuilt URL to: http://10.2.74.41:8443/
> * Trying 10.2.74.41...
> * TCP_NODELAY set
> * Connected to 10.2.74.41 (10.2.74.41) port 8443 (#0)
> > GET / HTTP/1.1
> > Host: 10.2.74.41:8443
> > User-Agent: curl/7.61.0
> > Accept: */*
> >
> < HTTP/1.1 404 Not Found
> < Content-Type: application/problem+json; charset=utf-8
> < Date: Tue, 11 Sep 2018 07:50:54 GMT
> < Content-Length: 27
> <
> * Connection #0 to host 10.2.74.41 left intact
>
> 00000004:frontend1.accept(0005)=0008 from [10.2.74.41:44161]
> 00000004:frontend1.clireq[0008:ffffffff]: GET / HTTP/1.1
> 00000004:frontend1.clihdr[0008:ffffffff]: Host: 10.2.74.41:8443
> 00000004:frontend1.clihdr[0008:ffffffff]: User-Agent: curl/7.61.0
> 00000004:frontend1.clihdr[0008:ffffffff]: Accept: */*
> 00000005:api-B.accept(0007)=000a from [unix:1]
> 00000005:api-B.clireq[000a:ffffffff]: GET / HTTP/1.1
> 00000005:api-B.clihdr[000a:ffffffff]: Host: 10.2.74.41:8443
> 00000005:api-B.clihdr[000a:ffffffff]: User-Agent: curl/7.61.0
> 00000005:api-B.clihdr[000a:ffffffff]: Accept: */*
> 00000005:api-B.srvrep[000a:000b]: HTTP/1.1 404 Not Found
> 00000005:api-B.srvhdr[000a:000b]: Content-Type: application/problem+json;
> charset=utf-8
> 00000005:api-B.srvhdr[000a:000b]: Date: Tue, 11 Sep 2018 07:51:22 GMT
> 00000005:api-B.srvhdr[000a:000b]: Content-Length: 27
> 00000004:api.srvrep[0008:0009]: HTTP/1.1 404 Not Found
> 00000004:api.srvhdr[0008:0009]: Content-Type: application/problem+json;
> charset=utf-8
> 00000004:api.srvhdr[0008:0009]: Date: Tue, 11 Sep 2018 07:51:22 GMT
> 00000004:api.srvhdr[0008:0009]: Content-Length: 27
> 00000007:frontend1.clicls[0008:0009]
> 00000007:frontend1.closed[0008:0009]
> 00000006:api-B.clicls[000a:000b]
> 00000006:api-B.closed[000a:000b]
>
>
> however with 1.8.13 the client doesn't gets the response and ends up with
> a 504, however the haproxy debug log shows the response is reviced by
> haproxy, but never passed to the client
>
> curl http://10.2.74.41:8443 -vvv
> * Rebuilt URL to: http://10.2.74.41:8443/
> * Trying 10.2.74.41...
> * TCP_NODELAY set
> * Connected to 10.2.74.41 (10.2.74.41) port 8443 (#0)
> > GET / HTTP/1.1
> > Host: 10.2.74.41:8443
> > User-Agent: curl/7.61.0
> > Accept: */*
> >
> *** HANGS HERE TILL 504 ****
>
> from haproxy
> 00000000:frontend1.accept(0006)=000b from [10.2.74.41:57043] ALPN=<none>
> 00000000:frontend1.clireq[000b:ffffffff]: GET / HTTP/1.1
> 00000000:frontend1.clihdr[000b:ffffffff]: Host: 10.2.74.41:8443
> 00000000:frontend1.clihdr[000b:ffffffff]: User-Agent: curl/7.61.0
> 00000000:frontend1.clihdr[000b:ffffffff]: Accept: */*
> 00000001:api-A.accept(0007)=000d from [unix:1] ALPN=<none>
> 00000001:api-A.clireq[000d:ffffffff]: GET / HTTP/1.1
> 00000001:api-A.clihdr[000d:ffffffff]: Host: 10.2.74.41:8443
> 00000001:api-A.clihdr[000d:ffffffff]: User-Agent: curl/7.61.0
> 00000001:api-A.clihdr[000d:ffffffff]: Accept: */*
> 00000001:api-A.srvrep[000d:000e]: HTTP/1.1 404 Not Found
> 00000001:api-A.srvhdr[000d:000e]: Content-Type: application/problem+json;
> charset=utf-8
> 00000001:api-A.srvhdr[000d:000e]: Date: Tue, 11 Sep 2018 07:48:39 GMT
> 00000001:api-A.srvhdr[000d:000e]: Content-Length: 27
> ** 5 seconds
> 00000002:api-A.clicls[adfd:ffffffff]
> 00000002:api-A.closed[adfd:ffffffff]
> ** 15 seconds
> 00000000:api.srvcls[000b:adfd]
> 00000000:api.clicls[adfd:adfd]
> 00000000:api.closed[adfd:adfd]
>
>
>
> any idea why this happens? sorry for long post!
>
> Dave
>
Lukas Tribus
Re: Hang in haproxy 1.8.13
September 11, 2018 12:40PM
On Tue, 11 Sep 2018 at 11:55, David King <[email protected]> wrote:
>
> Apologies, i forgot to mention this is running on FreeBSD 11.1
>
> I've just run the same tests on Centos and there is no issue

Could you retry with the current development tree (1.9) from git?
There are a number of fixes waiting to be backported to 1.8 and also a
number of already backported fixes (but post 1.8.13).


Lukas
Olivier Houchard
Re: Hang in haproxy 1.8.13
September 11, 2018 01:40PM
Hi,

On Tue, Sep 11, 2018 at 12:36:08PM +0200, Lukas Tribus wrote:
> On Tue, 11 Sep 2018 at 11:55, David King <[email protected]> wrote:
> >
> > Apologies, i forgot to mention this is running on FreeBSD 11.1
> >
> > I've just run the same tests on Centos and there is no issue
>
> Could you retry with the current development tree (1.9) from git?
> There are a number of fixes waiting to be backported to 1.8 and also a
> number of already backported fixes (but post 1.8.13).
>
>

I just tested, and it seems to happen with the latest 1.8, but not with 1.9.
Not using kqueue (by using -dk) seems to work around the issue.
It's quite interesting it doesn't happen on Centos, it means it is probably
kqueue-specific (or that it works by accident with epoll/poll/select).
I'm investigating.

Regards,

Olivier
David King
Re: Hang in haproxy 1.8.13
September 11, 2018 02:10PM
Hi,

> I just tested, and it seems to happen with the latest 1.8, but not with
1.9.
> Not using kqueue (by using -dk) seems to work around the issue.
> It's quite interesting it doesn't happen on Centos, it means it is
probably
> kqueue-specific (or that it works by accident with epoll/poll/select).
> I'm investigating.
>
> Regards,
>
> Olivier


so i've been running some tests from builds from source, it seems to be
1.8.13 specific, as 1.8.12 works fine, and as Oliver said 1.9 is OK as well

Thanks

Dave

On Tue, 11 Sep 2018 at 12:29, Olivier Houchard <[email protected]>
wrote:

> Hi,
>
> On Tue, Sep 11, 2018 at 12:36:08PM +0200, Lukas Tribus wrote:
> > On Tue, 11 Sep 2018 at 11:55, David King <[email protected]>
> wrote:
> > >
> > > Apologies, i forgot to mention this is running on FreeBSD 11.1
> > >
> > > I've just run the same tests on Centos and there is no issue
> >
> > Could you retry with the current development tree (1.9) from git?
> > There are a number of fixes waiting to be backported to 1.8 and also a
> > number of already backported fixes (but post 1.8.13).
> >
> >
>
> I just tested, and it seems to happen with the latest 1.8, but not with
> 1.9.
> Not using kqueue (by using -dk) seems to work around the issue.
> It's quite interesting it doesn't happen on Centos, it means it is probably
> kqueue-specific (or that it works by accident with epoll/poll/select).
> I'm investigating.
>
> Regards,
>
> Olivier
>
>
>
Olivier Houchard
Re: Hang in haproxy 1.8.13
September 11, 2018 03:00PM
Hi,

On Tue, Sep 11, 2018 at 12:58:40PM +0100, David King wrote:
> Hi,
>
> > I just tested, and it seems to happen with the latest 1.8, but not with
> 1.9.
> > Not using kqueue (by using -dk) seems to work around the issue.
> > It's quite interesting it doesn't happen on Centos, it means it is
> probably
> > kqueue-specific (or that it works by accident with epoll/poll/select).
> > I'm investigating.
> >
> > Regards,
> >
> > Olivier
>
>
> so i've been running some tests from builds from source, it seems to be
> 1.8.13 specific, as 1.8.12 works fine, and as Oliver said 1.9 is OK as well
>
> Thanks
>
> Dave

Ok I think I figured it out.
The bug was present in master too, but was masked for some reason.

The patches attached should fix this. The first one is for master, and the
second one for 1.8, as the master patch didn't apply cleanly on 1.8.

Regards,

Olivier
From d950da31340528c37173fc74d1c0f635c977cd03 Mon Sep 17 00:00:00 2001
From: Olivier Houchard <[email protected]>
Date: Tue, 11 Sep 2018 14:44:51 +0200
Subject: [PATCH] BUG/MAJOR: kqueue: Don't reset the changes number by
accident.

In _update_fd(), if the fd wasn't polled, and we don't want it to be polled,
we just returned 0, however, we should return changes instead, or all previous
changes will be lost.

This should be backported to 1.8.
---
src/ev_kqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index 087a07e7..e2f04f70 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -44,7 +44,7 @@ static int _update_fd(int fd, int start)
if (!(fdtab[fd].thread_mask & tid_bit) || !(en & FD_EV_POLLED_RW)) {
if (!(polled_mask[fd] & tid_bit)) {
/* fd was not watched, it's still not */
- return 0;
+ return changes;
}
/* fd totally removed from poll list */
EV_SET(&kev[changes++], fd, EVFILT_READ, EV_DELETE, 0, 0, NULL);
--
2.14.3

From e987a5bda41a1b560a352b4ec2d54a8ebcd5965a Mon Sep 17 00:00:00 2001
From: Olivier Houchard <[email protected]>
Date: Tue, 11 Sep 2018 14:44:51 +0200
Subject: [PATCH] BUG/MAJOR: kqueue: Don't reset the changes number by
accident.

In _update_fd(), if the fd wasn't polled, and we don't want it to be polled,
we just returned 0, however, we should return changes instead, or all previous
changes will be lost.

This should be backported to 1.8.
---
src/ev_kqueue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/ev_kqueue.c b/src/ev_kqueue.c
index 1f4762e6..c88cf6f3 100644
--- a/src/ev_kqueue.c
+++ b/src/ev_kqueue.c
@@ -44,7 +44,7 @@ static int _update_fd(int fd, int start)
if (!(fdtab[fd].thread_mask & tid_bit) || !(en & FD_EV_POLLED_RW)) {
if (!(fdtab[fd].polled_mask & tid_bit)) {
/* fd was not watched, it's still not */
- return 0;
+ return changes;
}
/* fd totally removed from poll list */
EV_SET(&kev[changes++], fd, EVFILT_READ, EV_DELETE, 0, 0, NULL);
--
2.14.3
Willy Tarreau
Re: Hang in haproxy 1.8.13
September 11, 2018 03:00PM
Hi Olivier,

On Tue, Sep 11, 2018 at 02:51:57PM +0200, Olivier Houchard wrote:
> Ok I think I figured it out.
> The bug was present in master too, but was masked for some reason.
>
> The patches attached should fix this. The first one is for master, and the
> second one for 1.8, as the master patch didn't apply cleanly on 1.8.

Now applied to master, thank you!
Willy
David King
Re: Hang in haproxy 1.8.13
September 11, 2018 03:20PM
Fantastic, thanks everyone

i guess i would need to wait to 1.8.14 for seeing it on stable?

On Tue, 11 Sep 2018 at 13:53, Willy Tarreau <[email protected]> wrote:

> Hi Olivier,
>
> On Tue, Sep 11, 2018 at 02:51:57PM +0200, Olivier Houchard wrote:
> > Ok I think I figured it out.
> > The bug was present in master too, but was masked for some reason.
> >
> > The patches attached should fix this. The first one is for master, and
> the
> > second one for 1.8, as the master patch didn't apply cleanly on 1.8.
>
> Now applied to master, thank you!
> Willy
>
Willy Tarreau
Re: Hang in haproxy 1.8.13
September 11, 2018 03:30PM
On Tue, Sep 11, 2018 at 02:10:15PM +0100, David King wrote:
> Fantastic, thanks everyone
>
> i guess i would need to wait to 1.8.14 for seeing it on stable?

In fact we'll backport it ASAP along with a few possibly other pending
patches. We take care of maintaining the ordering between the patches when
doing the backports, which is why sometimes we seem to "sleep" over a few
of them. However you can safely apply Olivier's patch directly on your
1.8 tree.

Willy
David King
Re: Hang in haproxy 1.8.13
September 11, 2018 03:30PM
Awesome, thanks!

On Tue, 11 Sep 2018 at 14:19, Willy Tarreau <[email protected]> wrote:

> On Tue, Sep 11, 2018 at 02:10:15PM +0100, David King wrote:
> > Fantastic, thanks everyone
> >
> > i guess i would need to wait to 1.8.14 for seeing it on stable?
>
> In fact we'll backport it ASAP along with a few possibly other pending
> patches. We take care of maintaining the ordering between the patches when
> doing the backports, which is why sometimes we seem to "sleep" over a few
> of them. However you can safely apply Olivier's patch directly on your
> 1.8 tree.
>
> Willy
>
Sorry, only registered users may post in this forum.

Click here to login