Welcome! Log In Create A New Profile

Advanced

haproxy=1.8.5 stuck in thread syncing

Posted by Максим Куприянов 
Максим Куприянов
haproxy=1.8.5 stuck in thread syncing
March 28, 2018 09:50AM
Hi!

Yesterday one of our haproxies (1.8.5) with nbthread=8 set in its config
stuck with 800% CPU usage. Some responses were served successfully but many
of them just timed out. perf top showed this:
59.19% [.] thread_enter_sync
32.68% [.] fwrr_get_next_server

We made a core and here is a full bt:
Core was generated by `/usr/sbin/haproxy'.
#0 0x000055a9807b984b in fwrr_get_next_server ([email protected]=0x55a982e91c40,
[email protected]=0x55a982eb7940) at src/lb_fwrr.c:473
473 HA_SPIN_LOCK(LBPRM_LOCK, &p->lbprm.lock);
(gdb) bt
#0 0x000055a9807b984b in fwrr_get_next_server ([email protected]=0x55a982e91c40,
[email protected]=0x55a982eb7940) at src/lb_fwrr.c:473
#1 0x000055a98078ab5d in assign_server (s=0x7f4bbc0906a0) at
src/backend.c:604
#2 0x000055a98078be6d in assign_server_and_queue ([email protected]=0x7f4bbc0906a0)
at src/backend.c:872
#3 0x000055a98078d333 in srv_redispatch_connect ([email protected]=0x7f4bbc0906a0)
at src/backend.c:1284
#4 0x000055a9807369d8 in sess_prepare_conn_req (s=0x7f4bbc0906a0) at
src/stream.c:1094
#5 process_stream (t=<optimized out>) at src/stream.c:2219
#6 0x000055a9807b1e68 in process_runnable_tasks () at src/task.c:311
#7 0x000055a980766ff4 in run_poll_loop () at src/haproxy.c:2399
#8 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2461
#9 0x00007f4bcd00d184 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f4bcc28b03d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) thread apply all bt

Thread 8 (Thread 0x7f4bcd888980 (LWP 133188)):
#0 fwrr_update_position (grp=0x55a982e92858, grp=0x55a982e92858,
s=0x55a982e9a2f0) at src/lb_fwrr.c:454
#1 fwrr_get_next_server ([email protected]=0x55a982e91c40,
[email protected]=0x55a982e9a2f0) at src/lb_fwrr.c:521
#2 0x000055a98078ab5d in assign_server (s=0x7f4bc05ed8c0) at
src/backend.c:604
#3 0x000055a98078be6d in assign_server_and_queue ([email protected]=0x7f4bc05ed8c0)
at src/backend.c:872
#4 0x000055a98078d333 in srv_redispatch_connect ([email protected]=0x7f4bc05ed8c0)
at src/backend.c:1284
#5 0x000055a9807369d8 in sess_prepare_conn_req (s=0x7f4bc05ed8c0) at
src/stream.c:1094
#6 process_stream (t=<optimized out>) at src/stream.c:2219
#7 0x000055a9807b1e68 in process_runnable_tasks () at src/task.c:311
#8 0x000055a980766ff4 in run_poll_loop () at src/haproxy.c:2399
#9 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2461
#10 0x000055a9806d1a57 in main (argc=<optimized out>, argv=<optimized out>)
at src/haproxy.c:3050

Thread 7 (Thread 0x7f4bb3fff700 (LWP 133195)):
#0 0x000055a9807bbad6 in thread_sync_barrier (barrier=0x55a980a2f4e0
<barrier>) at src/hathreads.c:110
#1 thread_enter_sync () at src/hathreads.c:123
#2 0x000055a98076705e in sync_poll_loop () at src/haproxy.c:2376
#3 run_poll_loop () at src/haproxy.c:2431
#4 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2461
#5 0x00007f4bcd00d184 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007f4bcc28b03d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 6 (Thread 0x7f4bc89d8700 (LWP 133194)):
#0 0x000055a9807bbad6 in thread_sync_barrier (barrier=0x55a980a2f4e0
<barrier>) at src/hathreads.c:110
#1 thread_enter_sync () at src/hathreads.c:123
#2 0x000055a98076705e in sync_poll_loop () at src/haproxy.c:2376
#3 run_poll_loop () at src/haproxy.c:2431
#4 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2461
#5 0x00007f4bcd00d184 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007f4bcc28b03d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 5 (Thread 0x7f4bc91d9700 (LWP 133193)):
#0 0x000055a9807b984b in fwrr_get_next_server ([email protected]=0x55a982e91c40,
[email protected]=0x55a982eb28a0) at src/lb_fwrr.c:473
#1 0x000055a98078ab5d in assign_server (s=0x7f4bc0aa4690) at
src/backend.c:604
#2 0x000055a98078be6d in assign_server_and_queue ([email protected]=0x7f4bc0aa4690)
at src/backend.c:872
#3 0x000055a98078d333 in srv_redispatch_connect ([email protected]=0x7f4bc0aa4690)
at src/backend.c:1284
#4 0x000055a9807369d8 in sess_prepare_conn_req (s=0x7f4bc0aa4690) at
src/stream.c:1094
#5 process_stream (t=<optimized out>) at src/stream.c:2219
#6 0x000055a9807b1e68 in process_runnable_tasks () at src/task.c:311
#7 0x000055a980766ff4 in run_poll_loop () at src/haproxy.c:2399
#8 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2461
#9 0x00007f4bcd00d184 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f4bcc28b03d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 4 (Thread 0x7f4bc99da700 (LWP 133192)):
---Type <return> to continue, or q <return> to quit---
#0 0x000055a9807bbad6 in thread_sync_barrier (barrier=0x55a980a2f4e0
<barrier>) at src/hathreads.c:110
#1 thread_enter_sync () at src/hathreads.c:123
#2 0x000055a98076705e in sync_poll_loop () at src/haproxy.c:2376
#3 run_poll_loop () at src/haproxy.c:2431
#4 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2461
#5 0x00007f4bcd00d184 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007f4bcc28b03d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 3 (Thread 0x7f4bca1db700 (LWP 133191)):
#0 0x000055a9807bbad6 in thread_sync_barrier (barrier=0x55a980a2f4e0
<barrier>) at src/hathreads.c:110
#1 thread_enter_sync () at src/hathreads.c:123
#2 0x000055a98076705e in sync_poll_loop () at src/haproxy.c:2376
#3 run_poll_loop () at src/haproxy.c:2431
#4 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2461
#5 0x00007f4bcd00d184 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007f4bcc28b03d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 2 (Thread 0x7f4bca9dc700 (LWP 133190)):
#0 thread_sync_barrier (barrier=0x55a980a2f4e0 <barrier>) at
src/hathreads.c:110
#1 thread_enter_sync () at src/hathreads.c:123
#2 0x000055a98076705e in sync_poll_loop () at src/haproxy.c:2376
#3 run_poll_loop () at src/haproxy.c:2431
#4 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2461
#5 0x00007f4bcd00d184 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007f4bcc28b03d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Thread 1 (Thread 0x7f4bcb1dd700 (LWP 133189)):
#0 0x000055a9807b984b in fwrr_get_next_server ([email protected]=0x55a982e91c40,
[email protected]=0x55a982eb7940) at src/lb_fwrr.c:473
#1 0x000055a98078ab5d in assign_server (s=0x7f4bbc0906a0) at
src/backend.c:604
#2 0x000055a98078be6d in assign_server_and_queue ([email protected]=0x7f4bbc0906a0)
at src/backend.c:872
#3 0x000055a98078d333 in srv_redispatch_connect ([email protected]=0x7f4bbc0906a0)
at src/backend.c:1284
#4 0x000055a9807369d8 in sess_prepare_conn_req (s=0x7f4bbc0906a0) at
src/stream.c:1094
#5 process_stream (t=<optimized out>) at src/stream.c:2219
#6 0x000055a9807b1e68 in process_runnable_tasks () at src/task.c:311
#7 0x000055a980766ff4 in run_poll_loop () at src/haproxy.c:2399
#8 run_thread_poll_loop (data=<optimized out>) at src/haproxy.c:2461
#9 0x00007f4bcd00d184 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f4bcc28b03d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Version info:
# haproxy -vv
HA-Proxy version 1.8.5-1 2018/03/26
Copyright 2000-2018 Willy Tarreau <[email protected]>

Build options :
TARGET = linux2628
CPU = generic
CC = gcc
CFLAGS = -g -O2 -fPIE -fstack-protector --param=ssp-buffer-size=4
-Wformat -Werror=format-security -D_FORTIFY_SOURCE=2
OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_THREAD=1
USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 USE_PCRE_JIT=1 USE_TFO=1 USE_NS=1

Default settings :
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
Running on OpenSSL version : OpenSSL 1.0.1f 6 Jan 2014
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.1
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.31 2012-07-06
Running on PCRE version : 8.31 2012-07-06
PCRE library supports JIT : no (libpcre build without JIT?)
Built with zlib version : 1.2.8
Running on zlib version : 1.2.8
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

Please take a look.

--
Best regards,
Maksim Kupriianov
Christopher Faulet
Re: haproxy=1.8.5 stuck in thread syncing
March 28, 2018 10:30AM
Le 28/03/2018 à 09:36, Максим Куприянов a écrit :
> Hi!
>
> Yesterday one of our haproxies (1.8.5) with nbthread=8 set in its config
> stuck with 800% CPU usage. Some responses were served successfully but
> many of them just timed out. perf top showed this:
>  59.19%  [.] thread_enter_sync
>  32.68%  [.] fwrr_get_next_server
>

Hi,

Could you share your configuration please ? It will help to diagnose the
problem. In your logs, what is the values of srv_queue and backend_queue
fields ?

Thanks,
--
Christopher Faulet
Максим Куприянов
Re: haproxy=1.8.5 stuck in thread syncing
March 28, 2018 02:20PM
Hi!

I'm sorry but configuration it's too huge too share (over 100 different
proxy sections). This is also the reason I can't exactly determine the
failing section. Is there a way to get this data from core-file?

2018-03-28 11:18 GMT+03:00 Christopher Faulet <[email protected]>:

> Le 28/03/2018 à 09:36, Максим Куприянов a écrit :
>
>> Hi!
>>
>> Yesterday one of our haproxies (1.8.5) with nbthread=8 set in its config
>> stuck with 800% CPU usage. Some responses were served successfully but many
>> of them just timed out. perf top showed this:
>> 59.19% [.] thread_enter_sync
>> 32.68% [.] fwrr_get_next_server
>>
>>
> Hi,
>
> Could you share your configuration please ? It will help to diagnose the
> problem. In your logs, what is the values of srv_queue and backend_queue
> fields ?
>
> Thanks,
> --
> Christopher Faulet
>
Christopher Faulet
Re: haproxy=1.8.5 stuck in thread syncing
March 29, 2018 11:30AM
Le 28/03/2018 à 14:16, Максим Куприянов a écrit :
> Hi!
>
> I'm sorry but configuration it's too huge too share (over 100 different
> proxy sections). This is also the reason I can't exactly determine the
> failing section. Is there a way to get this data from core-file?
>
> 2018-03-28 11:18 GMT+03:00 Christopher Faulet <[email protected]
> <mailto:[email protected]>>:
>
> Le 28/03/2018 à 09:36, Максим Куприянов a écrit :
>
> Hi!
>
> Yesterday one of our haproxies (1.8.5) with nbthread=8 set in
> its config stuck with 800% CPU usage. Some responses were served
> successfully but many of them just timed out. perf top showed this:
>   59.19%  [.] thread_enter_sync
>   32.68%  [.] fwrr_get_next_server
>
>
> Hi,
>
> Could you share your configuration please ? It will help to diagnose
> the problem. In your logs, what is the values of srv_queue and
> backend_queue fields ?
>

Hi,

Ok, I partly reproduce your problem using a backend, with an hundred
servers and a maxconn to 2 for each one. In this case, I observe same
CPUs consumption. I have no timeouts (it probably depends on your
values) but performances are quite low.

I think you're hitting a limitation of the current design. We have no
mechanism to migrate entities between threads. So to force threads
wakeup, we use the sync point. It was not designed to be called very
often. In your case, it eats all the CPU.

I attached 3 patches. They add a mechanism to wakeup threads selectively
without any lock or loop. They must be applied on HAProxy 1.8 (it will
not work on the upstream). So you can check if it fixes your problem or
not. It will be useful to validate it is a design limitation and not a bug.

This is just an experimentation. I hope it works well but I didn't do a
lot of testing. If yes, I'll then discuss with Willy if it is pertinent
or not to do the threads wakeup this way. But, in all cases, it will
probably not be backported in HAProxy 1.8.

--
Christopher Faulet
Максим Куприянов
Re: haproxy=1.8.5 stuck in thread syncing
April 11, 2018 07:10PM
Hi!

Thank you very much for the patches. Looks like they helped.

2018-03-29 14:25 GMT+05:00 Christopher Faulet <[email protected]>:

> Le 28/03/2018 à 14:16, Максим Куприянов a écrit :
>
>> Hi!
>>
>> I'm sorry but configuration it's too huge too share (over 100 different
>> proxy sections). This is also the reason I can't exactly determine the
>> failing section. Is there a way to get this data from core-file?
>>
>> 2018-03-28 11:18 GMT+03:00 Christopher Faulet <[email protected]
>> <mailto:[email protected]>>:
>>
>> Le 28/03/2018 à 09:36, Максим Куприянов a écrit :
>>
>> Hi!
>>
>> Yesterday one of our haproxies (1.8.5) with nbthread=8 set in
>> its config stuck with 800% CPU usage. Some responses were served
>> successfully but many of them just timed out. perf top showed
>> this:
>> 59.19% [.] thread_enter_sync
>> 32.68% [.] fwrr_get_next_server
>>
>>
>> Hi,
>>
>> Could you share your configuration please ? It will help to diagnose
>> the problem. In your logs, what is the values of srv_queue and
>> backend_queue fields ?
>>
>>
> Hi,
>
> Ok, I partly reproduce your problem using a backend, with an hundred
> servers and a maxconn to 2 for each one. In this case, I observe same CPUs
> consumption. I have no timeouts (it probably depends on your values) but
> performances are quite low.
>
> I think you're hitting a limitation of the current design. We have no
> mechanism to migrate entities between threads. So to force threads wakeup,
> we use the sync point. It was not designed to be called very often. In your
> case, it eats all the CPU.
>
> I attached 3 patches. They add a mechanism to wakeup threads selectively
> without any lock or loop. They must be applied on HAProxy 1.8 (it will not
> work on the upstream). So you can check if it fixes your problem or not. It
> will be useful to validate it is a design limitation and not a bug.
>
> This is just an experimentation. I hope it works well but I didn't do a
> lot of testing. If yes, I'll then discuss with Willy if it is pertinent or
> not to do the threads wakeup this way. But, in all cases, it will probably
> not be backported in HAProxy 1.8.
>
> --
> Christopher Faulet
>
Sorry, only registered users may post in this forum.

Click here to login