Welcome! Log In Create A New Profile

Advanced

mworker: seamless reloads broken since 1.8.1

Posted by Pierre Cheynier 
Pierre Cheynier
mworker: seamless reloads broken since 1.8.1
January 05, 2018 12:00PM
Hi list,

We've recently tried to upgrade from 1.8.0 to 1.8.1, then 1.8.2, 1.8.3
on a preprod environment and noticed that the reload is not so seamless
since 1.8.1 (easily getting TCP RSTs while reloading).

Having a short look on the haproxy-1.8 git remote on the changes
affecting haproxy.c, c2b28144 can be eliminated, so 3 commits remains:

* 3ce53f66 MINOR: threads: Fix pthread_setaffinity_np on FreeBSD.  (5
weeks ago)
* f926969a BUG/MINOR: mworker: detach from tty when in daemon mode  (5
weeks ago)
* 4e612023 BUG/MINOR: mworker: fix validity check for the pipe FDs  (5
weeks ago)

In case it matters: we use threads and did the usual worker setup (which
again works very well in 1.8.0).
Here is a config extract:

$ cat /etc/haproxy/haproxy.cfg:
(...)
user haproxy
group haproxy
nbproc 1
daemon
stats socket /var/lib/haproxy/stats level admin mode 644 expose-fd listeners
stats timeout 2m
nbthread 11
(...)

$ cat /etc/sysconfig/haproxy
(...)
CONFIG="/etc/haproxy/haproxy.cfg"
PIDFILE="/run/haproxy.pid"
OPTIONS="-x /var/lib/haproxy/stats"
(...)

$ cat /usr/lib/systemd/system/haproxy.service
[Unit]
Description=HAProxy Load Balancer
After=syslog.target network.target

[Service]
EnvironmentFile=/etc/sysconfig/haproxy
ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE $OPTIONS
ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
ExecReload=/bin/kill -USR2 $MAINPID
Type=forking
KillMode=mixed
Restart=always

Does the behavior observed sounds consistent regarding the changes that
occurred between 1.8.0 and 1.8.1 ? Before trying to bisect, compile,
test etc. I'd like to get your feedback.

Thanks in advance,

Pierre
Lukas Tribus
Re: mworker: seamless reloads broken since 1.8.1
January 05, 2018 01:10PM
Hello Pierre,


On Fri, Jan 5, 2018 at 11:48 AM, Pierre Cheynier <[email protected]> wrote:
> Hi list,
>
> We've recently tried to upgrade from 1.8.0 to 1.8.1, then 1.8.2, 1.8.3
> on a preprod environment and noticed that the reload is not so seamless
> since 1.8.1 (easily getting TCP RSTs while reloading).
>
> Having a short look on the haproxy-1.8 git remote on the changes
> affecting haproxy.c, c2b28144 can be eliminated, so 3 commits remains:
>
> * 3ce53f66 MINOR: threads: Fix pthread_setaffinity_np on FreeBSD. (5
> weeks ago)
> * f926969a BUG/MINOR: mworker: detach from tty when in daemon mode (5
> weeks ago)
> * 4e612023 BUG/MINOR: mworker: fix validity check for the pipe FDs (5
> weeks ago)
>
> In case it matters: we use threads and did the usual worker setup (which
> again works very well in 1.8.0).

Ok, so the change in behavior is between 1.8.0 and 1.8.1.



> $ cat /usr/lib/systemd/system/haproxy.service
> [Unit]
> Description=HAProxy Load Balancer
> After=syslog.target network.target
>
> [Service]
> EnvironmentFile=/etc/sysconfig/haproxy
> ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
> ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE $OPTIONS
> ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
> ExecReload=/bin/kill -USR2 $MAINPID
> Type=forking
> KillMode=mixed
> Restart=always

Your systemd configuration is not uptodate.

Please:
- make sure haproxy is compiled with USE_SYSTEMD=1
- update the unit file: start haproxy with -Ws instead of -W (ExecStart)
- update the unit file: use Type=notify instead of Type=forking

We always ship an uptodate unit file in
contrib/systemd/haproxy.service.in (just make sure you maintain the
$OPTIONS variable, otherwise you are missing the -x call for the
seamless reload).
Run "systemctl daemon-reload" after updating the unit file and
completely stop the old service (don't reload after updating the unit
file), to make sure you have a "clean" situation.

I don't see how this systemd thing would affect the actual seamless
reload (systemd shouldn't be a requirement), but lets fix it
nonetheless before continuing the troubleshooting. Maybe the
regression only affects non-systemd mode.



Regards,
Lukas
William Lallemand
Re: mworker: seamless reloads broken since 1.8.1
January 05, 2018 02:20PM
Hi,

> > $ cat /usr/lib/systemd/system/haproxy.service
> > [Unit]
> > Description=HAProxy Load Balancer
> > After=syslog.target network.target
> >
> > [Service]
> > EnvironmentFile=/etc/sysconfig/haproxy
> > ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
> > ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE $OPTIONS
> > ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
> > ExecReload=/bin/kill -USR2 $MAINPID
> > Type=forking
> > KillMode=mixed
> > Restart=always
>
> Your systemd configuration is not uptodate.
>
> Please:
> - make sure haproxy is compiled with USE_SYSTEMD=1
> - update the unit file: start haproxy with -Ws instead of -W (ExecStart)
> - update the unit file: use Type=notify instead of Type=forking

In fact that should work with this configuration too.

> We always ship an uptodate unit file in
> contrib/systemd/haproxy.service.in (just make sure you maintain the
> $OPTIONS variable, otherwise you are missing the -x call for the
> seamless reload).

You don't need the -x with -W or -Ws, it's added automaticaly by the master
during a reload.

> Run "systemctl daemon-reload" after updating the unit file and
> completely stop the old service (don't reload after updating the unit
> file), to make sure you have a "clean" situation.
>
> I don't see how this systemd thing would affect the actual seamless
> reload (systemd shouldn't be a requirement), but lets fix it
> nonetheless before continuing the troubleshooting. Maybe the
> regression only affects non-systemd mode.

Shouldn't be a problem, but it's better to use -Ws with systemd.

During a reload, if the -x fail, you should have this kind of errors:

[WARNING] 004/135908 (12013) : Failed to connect to the old process socket '/tmp/sock4'
[ALERT] 004/135908 (12013) : Failed to get the sockets from the old process!

Are you seeing anything like this?

--
William Lallemand
Pierre Cheynier
Re: mworker: seamless reloads broken since 1.8.1
January 05, 2018 03:40PM
> Hi,
>
>>> $ cat /usr/lib/systemd/system/haproxy.service
>>> [Unit]
>>> Description=HAProxy Load Balancer
>>> After=syslog.target network.target
>>>
>>> [Service]
>>> EnvironmentFile=/etc/sysconfig/haproxy
>>> ExecStartPre=/usr/sbin/haproxy -f $CONFIG -c -q
>>> ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE $OPTIONS
>>> ExecReload=/usr/sbin/haproxy -f $CONFIG -c -q
>>> ExecReload=/bin/kill -USR2 $MAINPID
>>> Type=forking
>>> KillMode=mixed
>>> Restart=always
>> Your systemd configuration is not uptodate.
>>
>> Please:
>> - make sure haproxy is compiled with USE_SYSTEMD=1
>> - update the unit file: start haproxy with -Ws instead of -W (ExecStart)
>> - update the unit file: use Type=notify instead of Type=forking
> In fact that should work with this configuration too.
OK, I have to admit that we started experiments on 1.8-dev2, at that
time I had to do that to make it work.
And true, we build the RPM and so didn't notice there was some updates
after the 1.8.0 release for the systemd unit file provided in contrib/.
Currently recompiling, bumping the release on CI / dev environment etc...
>
>> We always ship an uptodate unit file in
>> contrib/systemd/haproxy.service.in (just make sure you maintain the
>> $OPTIONS variable, otherwise you are missing the -x call for the
>> seamless reload).
> You don't need the -x with -W or -Ws, it's added automaticaly by the master
> during a reload.
Interesting. Is this new ? Because I noticed it was not the case at some
point.
>> Run "systemctl daemon-reload" after updating the unit file and
>> completely stop the old service (don't reload after updating the unit
>> file), to make sure you have a "clean" situation.
>>
>> I don't see how this systemd thing would affect the actual seamless
>> reload (systemd shouldn't be a requirement), but lets fix it
>> nonetheless before continuing the troubleshooting. Maybe the
>> regression only affects non-systemd mode.
> Shouldn't be a problem, but it's better to use -Ws with systemd.
>
> During a reload, if the -x fail, you should have this kind of errors:
>
> [WARNING] 004/135908 (12013) : Failed to connect to the old process socket '/tmp/sock4'
> [ALERT] 004/135908 (12013) : Failed to get the sockets from the old process!
>
> Are you seeing anything like this?
Yes, in > 1.8.0. If I rollback to 1.8.0 it's fine on this aspect.

I'll give updates after applying Lukas recommendations.

Pierre
Pierre Cheynier
Re: mworker: seamless reloads broken since 1.8.1
January 05, 2018 04:00PM
>> Hi,
>>
>>> Your systemd configuration is not uptodate.
>>>
>>> Please:
>>> - make sure haproxy is compiled with USE_SYSTEMD=1
>>> - update the unit file: start haproxy with -Ws instead of -W (ExecStart)
>>> - update the unit file: use Type=notify instead of Type=forking
>> In fact that should work with this configuration too.
> OK, I have to admit that we started experiments on 1.8-dev2, at that
> time I had to do that to make it work.
> And true, we build the RPM and so didn't notice there was some updates
> after the 1.8.0 release for the systemd unit file provided in contrib/.
> Currently recompiling, bumping the release on CI / dev environment etc...
>>
>>> We always ship an uptodate unit file in
>>> contrib/systemd/haproxy.service.in (just make sure you maintain the
>>> $OPTIONS variable, otherwise you are missing the -x call for the
>>> seamless reload).
>> You don't need the -x with -W or -Ws, it's added automaticaly by the master
>> during a reload.
> Interesting. Is this new ? Because I noticed it was not the case at some
> point.
>>> Run "systemctl daemon-reload" after updating the unit file and
>>> completely stop the old service (don't reload after updating the unit
>>> file), to make sure you have a "clean" situation.
>>>
>>> I don't see how this systemd thing would affect the actual seamless
>>> reload (systemd shouldn't be a requirement), but lets fix it
>>> nonetheless before continuing the troubleshooting. Maybe the
>>> regression only affects non-systemd mode.
>> Shouldn't be a problem, but it's better to use -Ws with systemd.
>>
>> During a reload, if the -x fail, you should have this kind of errors:
>>
>> [WARNING] 004/135908 (12013) : Failed to connect to the old process socket '/tmp/sock4'
>> [ALERT] 004/135908 (12013) : Failed to get the sockets from the old process!
>>
>> Are you seeing anything like this?
> Yes, in > 1.8.0. If I rollback to 1.8.0 it's fine on this aspect.
>
> I'll give updates after applying Lukas recommendations.
>
> Pierre
>
OK so now that I've applied all of Lukas recos (I kept the -x added ) :

* I don't see any ALERT log anymore.. Only the WARNs

Jan 05 14:47:12 hostname systemd[1]: Reloaded HAProxy Load Balancer.
Jan 05 14:47:12 hostname haproxy[59888]: [WARNING] 004/144712 (59888) :
Former worker 61331 exited with code 0
Jan 05 14:47:25 hostname haproxy[59888]: [WARNING] 004/144712 (59888) :
Reexecuting Master process
Jan 05 14:47:26 hostname systemd[1]: Reloaded HAProxy Load Balancer.
Jan 05 14:47:26 hostname haproxy[59888]: [WARNING] 004/144726 (59888) :
Former worker 61355 exited with code 0

* I still observe the same issue (here doing an ab during a
rolling/upgrade of my test app => consequently triggering N reloads on
HAProxy as long as the app instances are created/destroyed).

$ ab -n100000  http://test-app.tld/
(..)
Benchmarking test-app.tld (be patient)
apr_socket_recv: Connection reset by peer (104)
Total of 3031 requests completed

Pierre
William Lallemand
Re: mworker: seamless reloads broken since 1.8.1
January 05, 2018 04:50PM
On Fri, Jan 05, 2018 at 03:52:22PM +0100, Pierre Cheynier wrote:
> OK so now that I've applied all of Lukas recos (I kept the -x added ) :
>
> * I don't see any ALERT log anymore.. Only the WARNs
>

I'm still seing a few of them in journalctl. Maybe you don't see those emitted
by the workers, there is still room for improvement there. I'm taking notes.

> Jan 05 14:47:12 hostname systemd[1]: Reloaded HAProxy Load Balancer.
> Jan 05 14:47:12 hostname haproxy[59888]: [WARNING] 004/144712 (59888) :
> Former worker 61331 exited with code 0
> Jan 05 14:47:25 hostname haproxy[59888]: [WARNING] 004/144712 (59888) :
> Reexecuting Master process
> Jan 05 14:47:26 hostname systemd[1]: Reloaded HAProxy Load Balancer.
> Jan 05 14:47:26 hostname haproxy[59888]: [WARNING] 004/144726 (59888) :
> Former worker 61355 exited with code 0
>
> * I still observe the same issue (here doing an ab during a
> rolling/upgrade of my test app => consequently triggering N reloads on
> HAProxy as long as the app instances are created/destroyed).
>
> $ ab -n100000  http://test-app.tld/
> (..)
> Benchmarking test-app.tld (be patient)
> apr_socket_recv: Connection reset by peer (104)
> Total of 3031 requests completed
>

I'm able to reproduce, looks like it happens with the nbthread parameter only,
I'll try to find the problem in the code.

--
William Lallemand
Pierre Cheynier
Re: mworker: seamless reloads broken since 1.8.1
January 05, 2018 05:20PM
On 05/01/2018 16:44, William Lallemand wrote:
> I'm able to reproduce, looks like it happens with the nbthread parameter only,
Exact, I observe the same.
At least I have a workaround for now to perform the upgrade.
> I'll try to find the problem in the code.
>
Thanks !

Pierre
Lukas Tribus
Re: mworker: seamless reloads broken since 1.8.1
January 08, 2018 10:30AM
Hello,


On Fri, Jan 5, 2018 at 4:44 PM, William Lallemand
<[email protected]> wrote:
> I'm able to reproduce, looks like it happens with the nbthread parameter only,
> I'll try to find the problem in the code.

FYI there is a report on discourse mentioning this problem, and the
poster appears to be able to reproduce the problem without nbthread
paramter as well:

https://discourse.haproxy.org/t/seamless-reloads-dont-work-with-systemd/1954


Lukas
Pierre Cheynier
Re: mworker: seamless reloads broken since 1.8.1
January 08, 2018 02:40PM
Hi,

On 08/01/2018 10:24, Lukas Tribus wrote:
>
> FYI there is a report on discourse mentioning this problem, and the
> poster appears to be able to reproduce the problem without nbthread
> paramter as well:
>
> https://discourse.haproxy.org/t/seamless-reloads-dont-work-with-systemd/1954
>
>
> Lukas
I retried this morning, I confirm that on 1.8.3, using

$ haproxy -vv
HA-Proxy version 1.8.3-205f675 2017/12/30
Copyright 2000-2017 Willy Tarreau <[email protected]>

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
-fwrapv -Wno-unused-label -DTCP_USER_TIMEOUT=18
  OPTIONS = USE_LINUX_TPROXY=1 USE_GETADDRINFO=1 USE_ZLIB=1
USE_REGPARM=1 USE_OPENSSL=1 USE_SYSTEMD=1 USE_PCRE=1 USE_PCRE_JIT=1
USE_TFO=1 USE_NS=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

I get RSTs (not seamless reloads) when I introduce the global/nbthread
X, after a systemctl haproxy restart.

Pierre
Pierre Cheynier
Re: mworker: seamless reloads broken since 1.8.1
January 17, 2018 05:10PM
Hi,

On 08/01/2018 14:32, Pierre Cheynier wrote:
> I retried this morning, I confirm that on 1.8.3, using
(...)
> I get RSTs (not seamless reloads) when I introduce the global/nbthread
> X, after a systemctl haproxy restart.

Any news on that ?

I saw one mworker commit ("execvp failure depending on argv[0]") but I
guess it's completely independent.

Thanks,

Pierre
Willy Tarreau
Re: mworker: seamless reloads broken since 1.8.1
January 23, 2018 06:50PM
Hi Pierre,

On Wed, Jan 17, 2018 at 05:03:18PM +0100, Pierre Cheynier wrote:
> Hi,
>
> On 08/01/2018 14:32, Pierre Cheynier wrote:
> > I retried this morning, I confirm that on 1.8.3, using
> (...)
> > I get RSTs (not seamless reloads) when I introduce the global/nbthread
> > X, after a systemctl haproxy restart.
>
> Any news on that ?
>
> I saw one mworker commit ("execvp failure depending on argv[0]") but I
> guess it's completely independent.

In another thread with Marc Fournier, we've identified a real issue with
the way threads start the listeners and close the mworker pipe. It causes
all sort of random behaviours, like closing just created listeners. That
could very possibly match what you're seeing.

I'm switching to this now after having dealt with the polling fixes,
I'll try to have something testable this evening or tomorrow.

Cheers,
Willy
Willy Tarreau
Re: mworker: seamless reloads broken since 1.8.1
January 23, 2018 07:40PM
On Tue, Jan 23, 2018 at 06:43:51PM +0100, Willy Tarreau wrote:
> I'm switching to this now after having dealt with the polling fixes,
> I'll try to have something testable this evening or tomorrow.

Pierre, please give a try to the latest 1.8 branch or the next nightly
snapshot tomorrow morning. It addresses the aforementionned issue, and
I hope it's the same you're facing.

Cheers,
Willy
Pierre Cheynier
Re: mworker: seamless reloads broken since 1.8.1
January 24, 2018 03:20PM
On 23/01/2018 19:29, Willy Tarreau wrote:
> Pierre, please give a try to the latest 1.8 branch or the next nightly
> snapshot tomorrow morning. It addresses the aforementionned issue, and
> I hope it's the same you're facing.
>
> Cheers,
> Willy
Willy, I confirm that it works well again running the following version:

$ haproxy -v
HA-Proxy version 1.8.3-945f4cf 2018/01/23

Added nbthread again, reloads are transparents.

Thanks,

Pierre
Willy Tarreau
Re: mworker: seamless reloads broken since 1.8.1
January 24, 2018 07:40PM
Hi Pierre,

On Wed, Jan 24, 2018 at 03:07:54PM +0100, Pierre Cheynier wrote:
> Willy, I confirm that it works well again running the following version:
>
> $ haproxy -v
> HA-Proxy version 1.8.3-945f4cf 2018/01/23
>
> Added nbthread again, reloads are transparents.

Excellent, many thanks for confirming!

Willy
William Dauchy
Re: mworker: seamless reloads broken since 1.8.1
February 20, 2018 06:30PM
Hello,

I retrieve this old thread since we are getting the issue again with:
# haproxy -vv
HA-Proxy version 1.8.4-1deb90d 2018/02/08

I am trying to see whether I can reproduce it easily.

Best,
--
William
Sorry, only registered users may post in this forum.

Click here to login