Hello,
Haproxy 1.7.10 segfaults when the srv_admin_state is set to
SRV_ADMF_CMAINT (0x04)
for a backend server, and that backend has the `slowstart` option set.
The following configuration reproduces it :
-----------------------------
# haproxy.cfg (replace <path-to-state-folder> below)
global
maxconn 30000
user haproxy
group haproxy
server-state-file /<path-to-state-folder>/servers.state
log-tag haproxy
nbproc 1
cpu-map 1 2
stats socket /run/haproxy.sock level admin
stats socket /run/haproxy_op.sock mode 666 level operator
defaults
mode http
option forwardfor
option dontlognull
option httplog
log 127.0.0.1 local1 debug
timeout connect 5s
timeout client 50s
timeout server 50s
timeout http-request 8s
load-server-state-from-file global
listen admin
bind *:9002
stats enable
stats auth haproxyadmin:xxxxxxx
frontend testserver
bind *:9000
option tcp-smart-accept
option splice-request
option splice-response
default_backend testservers
backend testservers
balance roundrobin
option tcp-smart-connect
option splice-request
option splice-response
timeout server 2s
timeout queue 2s
default-server maxconn 10 *slowstart 10s* weight 1
server testserver15 10.0.19.10:9003 check
server testserver16 10.0.19.12:9003 check
server testserver17 169.254.0.9:9003 disabled check
server testserver20 169.254.0.9:9003 disabled check
# servers.state file
1
# be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state
srv_uweight srv_iweight srv_time_since_last_change srv_check_status
srv_check_result srv_check_health srv_check_state srv_agent_state
bk_f_forced_id srv_f_forced_id
4 testservers 1 testserver15 10.0.19.10 2 0 1 1 924 6 3 4 6 0 0 0
4 testservers 2 testserver16 10.0.19.12 2 0 1 1 924 6 3 4 6 0 0 0
4 testservers 3 testserver17 169.254.0.9 0 5 1 1 924 1 0 0 14 0 0 0
4 testservers 4 testserver20 10.0.19.17 0 *4* 1 1 454 6 3 4 6 0 0 0
--------------------
The state *4* above for testserver20 causes the segfault, and only occurs
when slowstart is set.
The configuration check can reproduce it ie: haproxy -c -f haproxy.cfg
The backtrace :
(gdb) bt
#0 task_schedule (when=-508447097, task=0x0) at include/proto/task.h:244
#1 srv_clr_admin_flag (mode=SRV_ADMF_FMAINT, s=0x1fb0fd0) at
src/server.c:626
#2 srv_adm_set_ready (s=0x1fb0fd0) at include/proto/server.h:231
#3 srv_update_state (params=0x7ffe4f15e7d0, version=1, srv=0x1fb0fd0) at
src/server.c:2289
#4 apply_server_state () at src/server.c:2664
#5 0x000000000044b60f in init (argc=<optimized out>,
[email protected]=4,
argv=<optimized out>,
[email protected]=0x7ffe4f160d38) at src/haproxy.c:975
#6 0x00000000004491be in main (argc=4, argv=0x7ffe4f160d38) at
src/haproxy.c:1795
The way we use the state file is to have servers with `disabled` option in
the configuration; and during scaling update the backend address and mark
as active using the socket. The 169.254.0.9 address is a dummy address for
the disabled servers.
Can someone take a look? I couldn't find any related bugs fixed in 1.8.
Thanks
-- Raghu