Welcome! Log In Create A New Profile

Advanced

Segfault on haproxy 1.7.10 with state file and slowstart

Posted by Raghu Udiyar 
Raghu Udiyar
Segfault on haproxy 1.7.10 with state file and slowstart
January 11, 2018 10:00AM
Hello,

Haproxy 1.7.10 segfaults when the srv_admin_state is set to
SRV_ADMF_CMAINT (0x04)
for a backend server, and that backend has the `slowstart` option set.

The following configuration reproduces it :

-----------------------------
# haproxy.cfg (replace <path-to-state-folder> below)

global
maxconn 30000
user haproxy
group haproxy
server-state-file /<path-to-state-folder>/servers.state

log-tag haproxy
nbproc 1
cpu-map 1 2
stats socket /run/haproxy.sock level admin
stats socket /run/haproxy_op.sock mode 666 level operator

defaults
mode http
option forwardfor

option dontlognull
option httplog
log 127.0.0.1 local1 debug

timeout connect 5s
timeout client 50s
timeout server 50s
timeout http-request 8s

load-server-state-from-file global
listen admin
bind *:9002
stats enable
stats auth haproxyadmin:xxxxxxx

frontend testserver
bind *:9000
option tcp-smart-accept
option splice-request
option splice-response
default_backend testservers

backend testservers
balance roundrobin
option tcp-smart-connect
option splice-request
option splice-response
timeout server 2s
timeout queue 2s
default-server maxconn 10 *slowstart 10s* weight 1
server testserver15 10.0.19.10:9003 check
server testserver16 10.0.19.12:9003 check

server testserver17 169.254.0.9:9003 disabled check
server testserver20 169.254.0.9:9003 disabled check


# servers.state file

1
# be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state
srv_uweight srv_iweight srv_time_since_last_change srv_check_status
srv_check_result srv_check_health srv_check_state srv_agent_state
bk_f_forced_id srv_f_forced_id
4 testservers 1 testserver15 10.0.19.10 2 0 1 1 924 6 3 4 6 0 0 0
4 testservers 2 testserver16 10.0.19.12 2 0 1 1 924 6 3 4 6 0 0 0
4 testservers 3 testserver17 169.254.0.9 0 5 1 1 924 1 0 0 14 0 0 0
4 testservers 4 testserver20 10.0.19.17 0 *4* 1 1 454 6 3 4 6 0 0 0

--------------------

The state *4* above for testserver20 causes the segfault, and only occurs
when slowstart is set.

The configuration check can reproduce it ie: haproxy -c -f haproxy.cfg

The backtrace :

(gdb) bt
#0 task_schedule (when=-508447097, task=0x0) at include/proto/task.h:244
#1 srv_clr_admin_flag (mode=SRV_ADMF_FMAINT, s=0x1fb0fd0) at
src/server.c:626
#2 srv_adm_set_ready (s=0x1fb0fd0) at include/proto/server.h:231
#3 srv_update_state (params=0x7ffe4f15e7d0, version=1, srv=0x1fb0fd0) at
src/server.c:2289
#4 apply_server_state () at src/server.c:2664
#5 0x000000000044b60f in init (argc=<optimized out>, [email protected]=4,
argv=<optimized out>,
[email protected]=0x7ffe4f160d38) at src/haproxy.c:975
#6 0x00000000004491be in main (argc=4, argv=0x7ffe4f160d38) at
src/haproxy.c:1795


The way we use the state file is to have servers with `disabled` option in
the configuration; and during scaling update the backend address and mark
as active using the socket. The 169.254.0.9 address is a dummy address for
the disabled servers.

Can someone take a look? I couldn't find any related bugs fixed in 1.8.

Thanks
-- Raghu
Willy Tarreau
Re: Segfault on haproxy 1.7.10 with state file and slowstart
January 12, 2018 11:50AM
Hello Raghu,

On Thu, Jan 11, 2018 at 02:20:34PM +0530, Raghu Udiyar wrote:
> Hello,
>
> Haproxy 1.7.10 segfaults when the srv_admin_state is set to
> SRV_ADMF_CMAINT (0x04)
> for a backend server, and that backend has the `slowstart` option set.
>
> The following configuration reproduces it :
(...)

Thanks for all the details, they made it easy to reproduce.

From what I'm seeing, it's a fundamental design issue in the state
file handling in 1.7. It starts checks before they have been
initialized, and try to wake up a NULL task. In 1.8 due to the more
dynamic config, the initialization sequence has changed and checks
are initialized before parsing the state file, but I don't feel at
ease with doing in 1.7 since I don't know if some config elements
may remain non-updated.

So instead I've just protected against using the task wakeups during
the state file parsing, and they will be initialized later with the
appropriate parameters.

Could you please check the attached patch on top of 1.7.10 ?

Thanks,
Willy
diff --git a/src/server.c b/src/server.c
index 66e8f8a..e847723 100644
--- a/src/server.c
+++ b/src/server.c
@@ -318,8 +318,10 @@ void srv_set_running(struct server *s, const char *reason)
s->last_change = now.tv_sec;

s->state = SRV_ST_STARTING;
- if (s->slowstart > 0)
- task_schedule(s->warmup, tick_add(now_ms, MS_TO_TICKS(MAX(1000, s->slowstart / 20))));
+ if (s->slowstart > 0) {
+ if (s->warmup)
+ task_schedule(s->warmup, tick_add(now_ms, MS_TO_TICKS(MAX(1000, s->slowstart / 20))));
+ }
else
s->state = SRV_ST_RUNNING;

@@ -622,8 +624,10 @@ void srv_clr_admin_flag(struct server *s, enum srv_admin mode)
s->state = SRV_ST_STOPPING;
else {
s->state = SRV_ST_STARTING;
- if (s->slowstart > 0)
- task_schedule(s->warmup, tick_add(now_ms, MS_TO_TICKS(MAX(1000, s->slowstart / 20))));
+ if (s->slowstart > 0) {
+ if (s->warmup)
+ task_schedule(s->warmup, tick_add(now_ms, MS_TO_TICKS(MAX(1000, s->slowstart / 20))));
+ }
else
s->state = SRV_ST_RUNNING;
}
Raghu Udiyar
Re: Segfault on haproxy 1.7.10 with state file and slowstart
January 14, 2018 01:40PM
Hi Willy

Yes, this patch works, no more segfaults.

However, unrelated to this issue, while testing I noticed that the address
from the state file is not applied if the configuration does not use a
hostname.

In the example I provided, the configuration originally has the IP address
set to 169.254.0.9, which is then updated through the socket to say 10.0.19.17.
Expectation was that upon haproxy restart the address in the state fill
will override the address in the configuration, but this does not happen.

I see the relevant code in srv_init_addr :

if (srv->hostname)
return_code |= srv_iterate_initaddr(srv);

As per the server-template example in the dynamic scaling blog post, this
should have been supported. Am I missing some configuration?

Thanks!
-- Raghu


-- Raghu

On Fri, Jan 12, 2018 at 4:14 PM, Willy Tarreau <[email protected]> wrote:

> Hello Raghu,
>
> On Thu, Jan 11, 2018 at 02:20:34PM +0530, Raghu Udiyar wrote:
> > Hello,
> >
> > Haproxy 1.7.10 segfaults when the srv_admin_state is set to
> > SRV_ADMF_CMAINT (0x04)
> > for a backend server, and that backend has the `slowstart` option set.
> >
> > The following configuration reproduces it :
> (...)
>
> Thanks for all the details, they made it easy to reproduce.
>
> From what I'm seeing, it's a fundamental design issue in the state
> file handling in 1.7. It starts checks before they have been
> initialized, and try to wake up a NULL task. In 1.8 due to the more
> dynamic config, the initialization sequence has changed and checks
> are initialized before parsing the state file, but I don't feel at
> ease with doing in 1.7 since I don't know if some config elements
> may remain non-updated.
>
> So instead I've just protected against using the task wakeups during
> the state file parsing, and they will be initialized later with the
> appropriate parameters.
>
> Could you please check the attached patch on top of 1.7.10 ?
>
> Thanks,
> Willy
>
Willy Tarreau
Re: Segfault on haproxy 1.7.10 with state file and slowstart
January 14, 2018 08:50PM
Hi Raghu,

On Sun, Jan 14, 2018 at 05:59:16PM +0530, Raghu Udiyar wrote:
> Hi Willy
>
> Yes, this patch works, no more segfaults.
>
> However, unrelated to this issue, while testing I noticed that the address
> from the state file is not applied if the configuration does not use a
> hostname.
>
> In the example I provided, the configuration originally has the IP address
> set to 169.254.0.9, which is then updated through the socket to say 10.0.19.17.

I noticed it as well but I thought I was the one doing something wrong.
CCing Baptiste in case he remembers certain cases that have to be
addressed a specific way by the state file. I remember this stuff was
tested quite a bit so I suspect that it could depend on certain
conditions.

> Expectation was that upon haproxy restart the address in the state fill
> will override the address in the configuration, but this does not happen.
>
> I see the relevant code in srv_init_addr :
>
> if (srv->hostname)
> return_code |= srv_iterate_initaddr(srv);
>
> As per the server-template example in the dynamic scaling blog post, this
> should have been supported. Am I missing some configuration?

I can't really tell, let's wait for Baptiste's opinion on this.

Thanks,
Willy
Raghu Udiyar
Re: Segfault on haproxy 1.7.10 with state file and slowstart
February 24, 2018 03:30PM
Hi Baptiste

Were you able to look into this? A workaround is to use a dummy hostname,
and init-addr like so :

server testserver13 dummyhost:9003 check init-addr last,10.0.19.10
server testserver14 dummyhost:9003 check init-addr last,10.0.19.12

# disabled placeholder
server testserver15 dummyhost:9003 disabled check init-addr
last,169.254.0.9


This works as expected, but seems like a hack to me. And the blog post
should be updated to indicate this behaviour.


-- Raghu

On Mon, Jan 15, 2018 at 1:10 AM, Willy Tarreau <[email protected]> wrote:

> Hi Raghu,
>
> On Sun, Jan 14, 2018 at 05:59:16PM +0530, Raghu Udiyar wrote:
> > Hi Willy
> >
> > Yes, this patch works, no more segfaults.
> >
> > However, unrelated to this issue, while testing I noticed that the
> address
> > from the state file is not applied if the configuration does not use a
> > hostname.
> >
> > In the example I provided, the configuration originally has the IP
> address
> > set to 169.254.0.9, which is then updated through the socket to say
> 10.0.19.17.
>
> I noticed it as well but I thought I was the one doing something wrong.
> CCing Baptiste in case he remembers certain cases that have to be
> addressed a specific way by the state file. I remember this stuff was
> tested quite a bit so I suspect that it could depend on certain
> conditions.
>
> > Expectation was that upon haproxy restart the address in the state fill
> > will override the address in the configuration, but this does not happen.
> >
> > I see the relevant code in srv_init_addr :
> >
> > if (srv->hostname)
> > return_code |= srv_iterate_initaddr(srv);
> >
> > As per the server-template example in the dynamic scaling blog post, this
> > should have been supported. Am I missing some configuration?
>
> I can't really tell, let's wait for Baptiste's opinion on this.
>
> Thanks,
> Willy
>
Hi Willy, Baptiste

It looks like this was the intended behaviour as per the documentation. But
it looks to be simple to enable the init-addr behaviour for IP addresses as
well. Can you please review the attached patch, this is against devel.
Attachments:
open | download - 0001-MEDIUM-init-allow-init-addr-to-work-with-server-ip-a.patch (2.6 KB)
Hi Raghu,

On Wed, Mar 28, 2018 at 08:30:39PM +0000, Raghu Udiyar wrote:
> Hi Willy, Baptiste
>
> It looks like this was the intended behaviour as per the documentation. But
> it looks to be simple to enable the init-addr behaviour for IP addresses as
> well. Can you please review the attached patch, this is against devel.

Baptiste, I missed this one. Could you please take a look ?

Thanks,
Willy
Sorry, only registered users may post in this forum.

Click here to login