Welcome! Log In Create A New Profile

Advanced

[PATCH] REGTEST/MINOR: loadtest: add a test for connection counters

Posted by PiBa-NL 
Hi List, Willy,

I've created a regtest that checks that when concurrent connections are
being handled that the connection counters are kept properly.

I think it could be committed as attached. It takes a few seconds to
run. It currently fails on 1.9-dev2 (also fails on 1.8.13 with kqueue on
FreeBSD, adding a 'nokqueue' on 1.8.13 makes it succeed though..).

I think it might be a good and reproducible test to run.?

Or does it need more tweaking? Thoughts appreciated :).

Regards,

PiBa-NL (Pieter)

From 4b1af997e796e1bb2098c5f66ac24690841c72e8 Mon Sep 17 00:00:00 2001
From: PiBa-NL <[email protected]>
Date: Sat, 15 Sep 2018 01:51:54 +0200
Subject: [PATCH] REGTEST/MINOR: loadtest: add a test for connection counters
after running 3000 requests in a loop

---
reg-tests/loadtest/b00000-loadtest.vtc | 94 ++++++++++++++++++++++++++
1 file changed, 94 insertions(+)
create mode 100644 reg-tests/loadtest/b00000-loadtest.vtc

diff --git a/reg-tests/loadtest/b00000-loadtest.vtc b/reg-tests/loadtest/b00000-loadtest.vtc
new file mode 100644
index 00000000..f66df5ee
--- /dev/null
+++ b/reg-tests/loadtest/b00000-loadtest.vtc
@@ -0,0 +1,94 @@
+# Checks that request and connection counters are properly kept
+
+varnishtest "Seamless reload issue with abns sockets"
+feature ignore_unknown_macro
+
+server s1 {
+ rxreq
+ expect req.http.TESTsize == 1000
+ txresp
+} -repeat 3 -start
+
+syslog Slg_1 -level notice {
+ recv
+} -repeat 15 -start
+
+haproxy h1 -W -D -conf {
+ global
+ nbthread 3
+ log ${Slg_1_addr}:${Slg_1_port} local0
+ maxconn 500000
+ #nokqueue
+
+ defaults
+ mode http
+ option dontlog-normal
+ log global
+ option httplog
+ timeout connect 3s
+ timeout client 4s
+ timeout server 15s
+
+ frontend fe1
+ maxconn 20001
+ bind "[email protected]${fe_1}"
+ acl donelooping hdr(TEST) -m len 1000
+ http-request set-header TEST "%[hdr(TEST)]x"
+ use_backend b2 if donelooping
+ default_backend b1
+
+ backend b1
+ fullconn 20000
+ server srv1 ${h1_fe_1_addr}:${h1_fe_1_port} maxconn 20000
+
+ frontend fe2
+ bind "[email protected]${fe_2}"
+ default_backend b2
+
+ backend b2
+ # haproxy 1.8 does not have the ,length converter.
+ acl OK hdr(TEST) -m len 1000
+ http-request deny deny_status 200 if OK
+ http-request deny deny_status 400
+
+ # haproxy 1.9 does have a ,length converter.
+ #http-request set-header TESTsize "%[hdr(TEST),length]"
+ #http-request del-header TEST
+ #server srv2 ${s1_addr}:${s1_port}
+
+} -start
+
+barrier b1 cond 3
+
+client c1 -connect ${h1_fe_1_sock} {
+ timeout 17
+ barrier b1 sync
+ txreq -url "/"
+ rxresp
+ expect resp.status == 200
+} -start
+client c2 -connect ${h1_fe_1_sock} {
+ timeout 17
+ barrier b1 sync
+ txreq -url "/"
+ rxresp
+ expect resp.status == 200
+} -start
+client c3 -connect ${h1_fe_1_sock} {
+ timeout 17
+ barrier b1 sync
+ txreq -url "/"
+ rxresp
+ expect resp.status == 200
+} -start
+
+client c1 -wait
+client c2 -wait
+client c3 -wait
+
+haproxy h1 -cli {
+ send "show info"
+ expect ~ "CumConns: 3001"
+ send "show info"
+ expect ~ "CumReq: 3002"
+}
--
2.18.0.windows.1
Hi Pieter,

On Sat, Sep 15, 2018 at 02:03:45AM +0200, PiBa-NL wrote:
> Hi List, Willy,
>
> I've created a regtest that checks that when concurrent connections are
> being handled that the connection counters are kept properly.
>
> I think it could be committed as attached. It takes a few seconds to run. It
> currently fails on 1.9-dev2 (also fails on 1.8.13 with kqueue on FreeBSD,
> adding a 'nokqueue' on 1.8.13 makes it succeed though..).
>
> I think it might be a good and reproducible test to run.?
>
> Or does it need more tweaking? Thoughts appreciated :).

I took some time this morning to give it a test. For now it fails here,
after dumping 2200 lines of not really usable output that I didn't
investigate. From what I'm seeing it seems to moderately stress the
local machine so it has many reasons for failing (lack of source
ports, improperly tuned conntrack, ulimit, etc), and it takes far too
long a time to be usable as a default test, or this one alone will be
enough to discourage anyone from regularly running "make reg-tests".

I think we should create a distinct category for such tests, because
I see some value in it when it's made to reproduce a very specific
class of issues which is very unlikely to trigger unless someone is
working on it. In this case it is not a problem that it dumps a lot
of output, as it will be useful for the person knowing what to look
for there. Probably that such tests should be run by hand and dump
their log into a related file. Imagine for example that we would
have this :

$ make reg-tests/heavy/conn-counter-3000-req.log

It would run the test on reg-tests/heavy/conn-counter-3000-req.vtc and
would produce the log into reg-tests/heavy/conn-counter-3000-req.log.
We could use a similar thing to test for select/poll/epoll/kqueue, to
test for timeouts, race conditions (eg show sess in threads). This is
very likely something to brainstorm about. You might have other ideas
related to certain issues you faced in the past. Fred is unavailable
this week but I'd be very interested in his opinion on such things.

Thus for now I'm not applying your patch, but I'm interested in seeing
what can be done with it.

Thanks,
Willy
Hi Willy,

Op 19-9-2018 om 7:36 schreef Willy Tarreau:
> Hi Pieter,
>
> I took some time this morning to give it a test. For now it fails here,
> after dumping 2200 lines of not really usable output that I didn't
> investigate. From what I'm seeing it seems to moderately stress the
> local machine so it has many reasons for failing (lack of source
> ports, improperly tuned conntrack, ulimit, etc), and it takes far too
> long a time to be usable as a default test, or this one alone will be
> enough to discourage anyone from regularly running "make reg-tests".
Test takes like 5 seconds to run here, and while that is a bit long if
you get a hundred more similar tests and want to continue tweaking
developments while running tests in between. It wouldn't hurt to run
such a (series) of longer tests before creating a patch and submitting
it for inclusion on the official git repository in my opinion, or before
a release.?. My attempt was to test a bit differently than just looking
for regressions of known fixed bugs, and putting a little load on
haproxy so that threads and simultaneous actions 'might' get into
conflicts/locks/stuff which might by chance, show up, which is why i
choose to go a little higher on the number of round-trips with ever
slightly increasing payload..

For me the test produces like 345 lines of output as attached. which
seems not to bad (if the test succeeds).. Besides the 2 instances of cli
output for stats, its seems not that much different from other tests..
And with 1.8.13 on FreeBSD (without qkueue) it succeeds:  #    top TEST
../test/b00000-loadtest.vtc passed (4.800

Taking into account conntrack and ulimit, would that mean we can never
'reg-test' if haproxy can really handle like 10000 connections without
issue? Or should the environment be configured by the test?? ,that seems
very tricky at least and probably would be undesirable.. (i just today
figured i could run reg-tests also on my production box to compare if a
new build showed issues there that my test-box might not.. i wouldn't
want system settings to changed by a reg-test run..)

> I think we should create a distinct category for such tests
Agreed, which is why i used the currently non-existing '/loadtest/'
folder. If '/heavy/' is better thats of course alright for me to.
> , because
> I see some value in it when it's made to reproduce a very specific
> class of issues which is very unlikely to trigger unless someone is
> working on it. In this case it is not a problem that it dumps a lot
> of output, as it will be useful for the person knowing what to look
> for there. Probably that such tests should be run by hand and dump
> their log into a related file. Imagine for example that we would
> have this :
>
> $ make reg-tests/heavy/conn-counter-3000-req.log
I'm not exactly sure..("make: don't know how to make reg-tests. Stop").
i would still like to have a way to run all 'applicable' tests with 1
command, even if it takes a hour or so to verify haproxy is working
'perfectly'. But like [email protected] tests cant work on FreeBSD, but they should
not 'fail', perhaps get skipped automatically though.?. Anyhow thats a
question for my other mail-topic (
https://www.mail-archive.com/[email protected]/msg31195.html )
> It would run the test on reg-tests/heavy/conn-counter-3000-req.vtc and
> would produce the log into reg-tests/heavy/conn-counter-3000-req.log.
> We could use a similar thing to test for select/poll/epoll/kqueue, to
> test for timeouts, race conditions (eg show sess in threads). This is
> very likely something to brainstorm about. You might have other ideas
> related to certain issues you faced in the past. Fred is unavailable
> this week but I'd be very interested in his opinion on such things.
>
> Thus for now I'm not applying your patch, but I'm interested in seeing
> what can be done with it.
Okay no problem :) , ill keep running this particular test myself for
the moment, it 'should' be able to pass normally..  (On my environment
anyhow..)
> Thanks,
> Willy

Thanks for your comments, and thoughts.

I'm interested in Fred's and anyone elses opinion ;) , and well maybe
this particular test-case could be replaced by something simpler/faster/
with more or less the same likelihood of catching yet unknown issues..?
Looking forward to reactions :) .

Thanks and regards,

PiBa-NL (Pieter)

**** top 0.0 extmacro def pwd=/usr/ports/net/haproxy-devel
**** top 0.0 extmacro def localhost=127.0.0.1
**** top 0.0 extmacro def bad_backend=127.0.0.1 58530
**** top 0.0 extmacro def bad_ip=192.0.2.255
**** top 0.0 macro def testdir=/usr/ports/net/haproxy-devel/./test
**** top 0.0 macro def tmpdir=/tmp/vtc.35996.290f74a9
* top 0.0 TEST ./test/b00000-loadtest.vtc starting
** top 0.0 === varnishtest "Seamless reload issue with abns sockets"
* top 0.0 TEST Seamless reload issue with abns sockets
** top 0.0 === feature ignore_unknown_macro
** top 0.0 === server s1 {
** s1 0.0 Starting server
**** s1 0.0 macro def s1_addr=127.0.0.1
**** s1 0.0 macro def s1_port=58531
**** s1 0.0 macro def s1_sock=127.0.0.1 58531
* s1 0.0 Listen on 127.0.0.1 58531
** top 0.0 === syslog Slg_1 -level notice {
** Slg_1 0.0 Starting syslog server
**** Slg_1 0.0 macro def Slg_1_addr=127.0.0.1
**** Slg_1 0.0 macro def Slg_1_port=13952
**** Slg_1 0.0 macro def Slg_1_sock=127.0.0.1 13952
* Slg_1 0.0 Bound on 127.0.0.1 13952
** s1 0.0 Started on 127.0.0.1 58531
*** s1 0.0 Iteration 0
** top 0.0 === haproxy h1 -W -D -conf {
** Slg_1 0.0 Started on 127.0.0.1 13952 (level: 5)
*** Slg_1 0.0 Iteration 0
** Slg_1 0.0 === recv
**** h1 0.0 macro def h1_cli_sock=::1 58532
**** h1 0.0 macro def h1_cli_addr=::1
**** h1 0.0 macro def h1_cli_port=58532
**** h1 0.0 setenv(cli, 6)
**** h1 0.0 macro def h1_fe_1_sock=::1 58533
**** h1 0.0 macro def h1_fe_1_addr=::1
**** h1 0.0 macro def h1_fe_1_port=58533
**** h1 0.0 setenv(fe_1, 7)
**** h1 0.0 macro def h1_fe_2_sock=::1 58534
**** h1 0.0 macro def h1_fe_2_addr=::1
**** h1 0.0 macro def h1_fe_2_port=58534
**** h1 0.0 setenv(fe_2, 8)
**** h1 0.0 conf| global
**** h1 0.0 conf|\tstats socket /tmp/vtc.35996.290f74a9/h1/stats.sock level admin mode 600
**** h1 0.0 conf| stats socket "[email protected]${cli}" level admin
**** h1 0.0 conf|
**** h1 0.0 conf| global
**** h1 0.0 conf| nbthread 3
**** h1 0.0 conf| log 127.0.0.1:13952 local0
**** h1 0.0 conf| maxconn 500000
**** h1 0.0 conf| nokqueue
**** h1 0.0 conf|
**** h1 0.0 conf| defaults
**** h1 0.0 conf| mode http
**** h1 0.0 conf| option dontlog-normal
**** h1 0.0 conf| log global
**** h1 0.0 conf| option httplog
**** h1 0.0 conf| timeout connect 3s
**** h1 0.0 conf| timeout client 4s
**** h1 0.0 conf| timeout server 15s
**** h1 0.0 conf|
**** h1 0.0 conf| frontend fe1
**** h1 0.0 conf| maxconn 20001
**** h1 0.0 conf| bind "[email protected]${fe_1}"
**** h1 0.0 conf| acl donelooping hdr(TEST) -m len 1000
**** h1 0.0 conf| http-request set-header TEST "%[hdr(TEST)]x"
**** h1 0.0 conf| use_backend b2 if donelooping
**** h1 0.0 conf| default_backend b1
**** h1 0.0 conf|
**** h1 0.0 conf| backend b1
**** h1 0.0 conf| fullconn 20000
**** h1 0.0 conf| server srv1 ::1:58533 maxconn 20000
**** h1 0.0 conf|
**** h1 0.0 conf| frontend fe2
**** h1 0.0 conf| bind "[email protected]${fe_2}"
**** h1 0.0 conf| default_backend b2
**** h1 0.0 conf|
**** h1 0.0 conf| backend b2
**** h1 0.0 conf| # haproxy 1.8 does not have the ,length converter.
**** h1 0.0 conf| acl OK hdr(TEST) -m len 1000
**** h1 0.0 conf| http-request deny deny_status 200 if OK
**** h1 0.0 conf| http-request deny deny_status 400
**** h1 0.0 conf|
**** h1 0.0 conf| # haproxy 1.9 does have a ,length converter.
**** h1 0.0 conf| #http-request set-header TESTsize "%[hdr(TEST),length]"
**** h1 0.0 conf| #http-request del-header TEST
**** h1 0.0 conf| #server srv2 127.0.0.1:58531
**** h1 0.0 conf|
** h1 0.0 haproxy_start
**** h1 0.0 opt_worker 1 opt_daemon 1 opt_check_mode 0
**** h1 0.0 argv|exec haproxy -D -W -f /tmp/vtc.35996.290f74a9/h1/cfg -p /tmp/vtc.35996.290f74a9/h1/pid
**** h1 0.0 XXX 10 @586
*** h1 0.0 PID: 36000
**** h1 0.0 macro def h1_pid=36000
**** h1 0.0 macro def h1_name=/tmp/vtc.35996.290f74a9/h1
*** h1 0.0 wait-pid-file
**** Slg_1 0.0 syslog|<133>Sep 19 20:50:29 haproxy[36000]: Proxy fe1 started.
*** Slg_1 0.0 shutting fd 4
*** Slg_1 0.0 Iteration 1
** Slg_1 0.0 === recv
**** Slg_1 0.0 syslog|<133>Sep 19 20:50:29 haproxy[36000]: Proxy b1 started.
*** Slg_1 0.0 shutting fd 4
*** Slg_1 0.0 Iteration 2
** Slg_1 0.0 === recv
**** Slg_1 0.0 syslog|<133>Sep 19 20:50:29 haproxy[36000]: Proxy fe2 started.
*** Slg_1 0.0 shutting fd 4
*** Slg_1 0.0 Iteration 3
** Slg_1 0.0 === recv
**** Slg_1 0.0 syslog|<133>Sep 19 20:50:29 haproxy[36000]: Proxy b2 started.
*** Slg_1 0.0 shutting fd 4
*** Slg_1 0.0 Iteration 4
** Slg_1 0.0 === recv
*** h1 0.0 debug|[WARNING] 261/205029 (36000) : [haproxy.main()] FD limit (234909) too low for maxconn=500000/maxsock=1000042. Please raise 'ulimit-n' to 1000042 or more to avoid any trouble.
** h1 0.0 haproxy PID 36001 successfully started
** top 0.0 === barrier b1 cond 3
** top 0.0 === client c1 -connect ${h1_fe_1_sock} {
** c1 0.0 Starting client
** top 0.0 === client c2 -connect ${h1_fe_1_sock} {
** c2 0.0 Starting client
*** c1 0.0 Connect to ::1 58533
** top 0.0 === client c3 -connect ${h1_fe_1_sock} {
** c3 0.0 Starting client
*** c1 0.0 connected fd 9 from ::1 58535 to ::1 58533
** c1 0.0 === timeout 17
** c1 0.0 === barrier b1 sync
**** c1 0.0 Barrier(b1) wait 1 of 3
*** c2 0.0 Connect to ::1 58533
** top 0.0 === client c1 -wait
** c1 0.0 Waiting for client
*** c2 0.0 connected fd 12 from ::1 58536 to ::1 58533
** c2 0.0 === timeout 17
** c2 0.0 === barrier b1 sync
**** c2 0.0 Barrier(b1) wait 2 of 3
*** c3 0.0 Connect to ::1 58533
*** c3 0.0 connected fd 13 from ::1 58537 to ::1 58533
** c3 0.0 === timeout 17
** c3 0.0 === barrier b1 sync
**** c3 0.0 Barrier(b1) wake 3
** c3 0.0 === txreq -url "/"
** c2 0.0 === txreq -url "/"
** c1 0.0 === txreq -url "/"
**** c3 0.0 txreq|GET / HTTP/1.1\r
**** c3 0.0 txreq|Host: 127.0.0.1\r
**** c3 0.0 txreq|\r
**** c2 0.0 txreq|GET / HTTP/1.1\r
**** c2 0.0 txreq|Host: 127.0.0.1\r
**** c2 0.0 txreq|\r
**** c1 0.0 txreq|GET / HTTP/1.1\r
**** c1 0.0 txreq|Host: 127.0.0.1\r
**** c1 0.0 txreq|\r
** c2 0.0 === rxresp
** c1 0.0 === rxresp
** c3 0.0 === rxresp
**** h1 0.1 STDOUT poll 0x11
**** Slg_1 1.8 syslog|<134>Sep 19 20:50:31 haproxy[36002]: ::1:61511 [19/Sep/2018:20:50:31.249] fe1 b2/<NOSRV> 0/-1/-1/-1/1 200 146 - - PR-- 2978/2978/0/0/3 0/0 "GET / HTTP/1.1"
**** Slg_1 1.8 syslog|<134>Sep 19 20:50:31 haproxy[36002]: ::1:61534 [19/Sep/2018:20:50:31.270] fe1 b2/<NOSRV> 0/-1/-1/-1/1 200 146 - - PR-- 2989/2989/1/0/3 0/0 "GET / HTTP/1.1"
**** Slg_1 1.8 syslog|<134>Sep 19 20:50:31 haproxy[36002]: ::1:61535 [19/Sep/2018:20:50:31.271] fe1 b2/<NOSRV> 0/-1/-1/-1/0 200 146 - - PR-- 2988/2988/0/0/3 0/0 "GET / HTTP/1.1"
**** c2 2.6 rxhdr|HTTP/1.0 200 OK\r
**** c2 2.6 rxhdr|Cache-Control: no-cache\r
**** c2 2.6 rxhdr|Content-Type: text/html\r
**** c2 2.6 rxhdr|\r
**** c2 2.6 rxhdrlen = 69
**** c2 2.6 http[ 0] |HTTP/1.0
**** c2 2.6 http[ 1] |200
**** c2 2.6 http[ 2] |OK
**** c2 2.6 http[ 3] |Cache-Control: no-cache
**** c2 2.6 http[ 4] |Content-Type: text/html
**** c2 2.6 rxeof|<html><body><h1>200 OK</h1>
**** c2 2.6 rxeof|Service ready.
**** c2 2.6 rxeof|</body></html>
**** c2 2.6 bodylen = 58
** c2 2.6 === expect resp.status == 200
**** c2 2.6 EXPECT resp.status (200) == "200" match
*** c2 2.6 closing fd 12
** c2 2.6 Ending
**** c1 2.6 rxhdr|HTTP/1.0 200 OK\r
**** c1 2.6 rxhdr|Cache-Control: no-cache\r
**** c1 2.6 rxhdr|Content-Type: text/html\r
**** c1 2.6 rxhdr|\r
**** c1 2.6 rxhdrlen = 69
**** c1 2.6 http[ 0] |HTTP/1.0
**** c1 2.6 http[ 1] |200
**** c1 2.6 http[ 2] |OK
**** c1 2.6 http[ 3] |Cache-Control: no-cache
**** c1 2.6 http[ 4] |Content-Type: text/html
**** c1 2.7 rxeof|<html><body><h1>200 OK</h1>
**** c1 2.7 rxeof|Service ready.
**** c1 2.7 rxeof|</body></html>
**** c1 2.7 bodylen = 58
** c1 2.7 === expect resp.status == 200
**** c1 2.7 EXPECT resp.status (200) == "200" match
*** c1 2.7 closing fd 9
** c1 2.7 Ending
**** c3 2.7 rxhdr|HTTP/1.0 200 OK\r
**** c3 2.7 rxhdr|Cache-Control: no-cache\r
**** c3 2.7 rxhdr|Content-Type: text/html\r
**** c3 2.7 rxhdr|\r
**** c3 2.7 rxhdrlen = 69
**** c3 2.7 http[ 0] |HTTP/1.0
**** c3 2.7 http[ 1] |200
**** c3 2.7 http[ 2] |OK
**** c3 2.7 http[ 3] |Cache-Control: no-cache
**** c3 2.7 http[ 4] |Content-Type: text/html
**** c3 2.7 rxeof|<html><body><h1>200 OK</h1>
**** c3 2.7 rxeof|Service ready.
**** c3 2.7 rxeof|</body></html>
**** c3 2.7 bodylen = 58
** c3 2.7 === expect resp.status == 200
**** c3 2.7 EXPECT resp.status (200) == "200" match
*** c3 2.7 closing fd 13
** c3 2.7 Ending
** top 2.7 === client c2 -wait
** c2 2.7 Waiting for client
** top 2.7 === client c3 -wait
** c3 2.7 Waiting for client
** top 2.7 === haproxy h1 -cli {
** h1 2.7 CLI starting
** h1 2.7 CLI waiting
*** h1 2.7 CLI connected fd 9 from ::1 61536 to ::1 58532
** h1 2.7 === send "show info"
**** h1 2.7 CLI send|show info
** h1 2.7 === expect ~ "CumConns: 3001"
**** h1 2.7 CLI connection normally closed
*** h1 2.7 CLI closing fd 9
**** h1 2.7 CLI recv|Name: HAProxy
**** h1 2.7 CLI recv|Version: 1.8.13
**** h1 2.7 CLI recv|Release_date: 2018/07/30
**** h1 2.7 CLI recv|Nbthread: 3
**** h1 2.7 CLI recv|Nbproc: 1
**** h1 2.7 CLI recv|Process_num: 1
**** h1 2.7 CLI recv|Pid: 36002
**** h1 2.7 CLI recv|Uptime: 0d 0h00m03s
**** h1 2.7 CLI recv|Uptime_sec: 3
**** h1 2.7 CLI recv|Memmax_MB: 0
**** h1 2.7 CLI recv|PoolAlloc_MB: 11
**** h1 2.7 CLI recv|PoolUsed_MB: 0
**** h1 2.7 CLI recv|PoolFailed: 0
**** h1 2.7 CLI recv|Ulimit-n: 1000042
**** h1 2.7 CLI recv|Maxsock: 1000042
**** h1 2.7 CLI recv|Maxconn: 500000
**** h1 2.7 CLI recv|Hard_maxconn: 500000
**** h1 2.7 CLI recv|CurrConns: 0
**** h1 2.7 CLI recv|CumConns: 3001
**** h1 2.7 CLI recv|CumReq: 3001
**** h1 2.7 CLI recv|MaxSslConns: 0
**** h1 2.7 CLI recv|CurrSslConns: 0
**** h1 2.7 CLI recv|CumSslConns: 0
**** h1 2.7 CLI recv|Maxpipes: 0
**** h1 2.7 CLI recv|PipesUsed: 0
**** h1 2.7 CLI recv|PipesFree: 0
**** h1 2.7 CLI recv|ConnRate: 358
**** h1 2.7 CLI recv|ConnRateLimit: 0
**** h1 2.7 CLI recv|MaxConnRate: 1837
**** h1 2.7 CLI recv|SessRate: 358
**** h1 2.7 CLI recv|SessRateLimit: 0
**** h1 2.7 CLI recv|MaxSessRate: 1837
**** h1 2.7 CLI recv|SslRate: 0
**** h1 2.7 CLI recv|SslRateLimit: 0
**** h1 2.7 CLI recv|MaxSslRate: 0
**** h1 2.7 CLI recv|SslFrontendKeyRate: 0
**** h1 2.7 CLI recv|SslFrontendMaxKeyRate: 0
**** h1 2.7 CLI recv|SslFrontendSessionReuse_pct: 0
**** h1 2.7 CLI recv|SslBackendKeyRate: 0
**** h1 2.7 CLI recv|SslBackendMaxKeyRate: 0
**** h1 2.7 CLI recv|SslCacheLookups: 0
**** h1 2.7 CLI recv|SslCacheMisses: 0
**** h1 2.7 CLI recv|CompressBpsIn: 0
**** h1 2.7 CLI recv|CompressBpsOut: 0
**** h1 2.7 CLI recv|CompressBpsRateLim: 0
**** h1 2.7 CLI recv|ZlibMemUsage: 0
**** h1 2.7 CLI recv|MaxZlibMemUsage: 0
**** h1 2.7 CLI recv|Tasks: 7
**** h1 2.7 CLI recv|Run_queue: 1
**** h1 2.7 CLI recv|Idle_pct: 76
**** h1 2.7 CLI recv|node: freebsd11
**** h1 2.7 CLI recv|
**** h1 2.7 CLI expect match ~ "CumConns: 3001"
** h1 2.7 === send "show info"
*** h1 2.7 CLI connected fd 9 from ::1 61537 to ::1 58532
**** h1 2.7 CLI send|show info
** h1 2.7 === expect ~ "CumReq: 3002"
**** h1 2.7 CLI connection normally closed
*** h1 2.7 CLI closing fd 9
**** h1 2.7 CLI recv|Name: HAProxy
**** h1 2.7 CLI recv|Version: 1.8.13
**** h1 2.7 CLI recv|Release_date: 2018/07/30
**** h1 2.7 CLI recv|Nbthread: 3
**** h1 2.7 CLI recv|Nbproc: 1
**** h1 2.7 CLI recv|Process_num: 1
**** h1 2.7 CLI recv|Pid: 36002
**** h1 2.7 CLI recv|Uptime: 0d 0h00m03s
**** h1 2.7 CLI recv|Uptime_sec: 3
**** h1 2.7 CLI recv|Memmax_MB: 0
**** h1 2.7 CLI recv|PoolAlloc_MB: 11
**** h1 2.7 CLI recv|PoolUsed_MB: 0
**** h1 2.7 CLI recv|PoolFailed: 0
**** h1 2.7 CLI recv|Ulimit-n: 1000042
**** h1 2.7 CLI recv|Maxsock: 1000042
**** h1 2.7 CLI recv|Maxconn: 500000
**** h1 2.7 CLI recv|Hard_maxconn: 500000
**** h1 2.7 CLI recv|CurrConns: 0
**** h1 2.7 CLI recv|CumConns: 3002
**** h1 2.7 CLI recv|CumReq: 3002
**** h1 2.7 CLI recv|MaxSslConns: 0
**** h1 2.7 CLI recv|CurrSslConns: 0
**** h1 2.7 CLI recv|CumSslConns: 0
**** h1 2.7 CLI recv|Maxpipes: 0
**** h1 2.7 CLI recv|PipesUsed: 0
**** h1 2.7 CLI recv|PipesFree: 0
**** h1 2.7 CLI recv|ConnRate: 358
**** h1 2.7 CLI recv|ConnRateLimit: 0
**** h1 2.7 CLI recv|MaxConnRate: 1837
**** h1 2.7 CLI recv|SessRate: 358
**** h1 2.7 CLI recv|SessRateLimit: 0
**** h1 2.7 CLI recv|MaxSessRate: 1837
**** h1 2.7 CLI recv|SslRate: 0
**** h1 2.7 CLI recv|SslRateLimit: 0
**** h1 2.7 CLI recv|MaxSslRate: 0
**** h1 2.7 CLI recv|SslFrontendKeyRate: 0
**** h1 2.7 CLI recv|SslFrontendMaxKeyRate: 0
**** h1 2.7 CLI recv|SslFrontendSessionReuse_pct: 0
**** h1 2.7 CLI recv|SslBackendKeyRate: 0
**** h1 2.7 CLI recv|SslBackendMaxKeyRate: 0
**** h1 2.7 CLI recv|SslCacheLookups: 0
**** h1 2.7 CLI recv|SslCacheMisses: 0
**** h1 2.7 CLI recv|CompressBpsIn: 0
**** h1 2.7 CLI recv|CompressBpsOut: 0
**** h1 2.7 CLI recv|CompressBpsRateLim: 0
**** h1 2.7 CLI recv|ZlibMemUsage: 0
**** h1 2.7 CLI recv|MaxZlibMemUsage: 0
**** h1 2.7 CLI recv|Tasks: 7
**** h1 2.7 CLI recv|Run_queue: 1
**** h1 2.7 CLI recv|Idle_pct: 75
**** h1 2.7 CLI recv|node: freebsd11
**** h1 2.7 CLI recv|
**** h1 2.7 CLI expect match ~ "CumReq: 3002"
** h1 2.7 CLI ending
* top 2.7 RESETTING after ./test/b00000-loadtest.vtc
** h1 2.7 Reset and free h1 haproxy 36001
** h1 2.7 Wait
** h1 2.7 Stop HAproxy pid=36001
**** h1 2.7 Kill(2)=0: No error: 0
** s1 4.8 Waiting for server (3/-1)
** Slg_1 4.8 Waiting for syslog server (4)
* top 4.8 TEST ./test/b00000-loadtest.vtc completed
# top TEST ./test/b00000-loadtest.vtc passed (4.800)
Hi Pieter,

On Thu, Sep 20, 2018 at 12:48:35AM +0200, PiBa-NL wrote:
> Test takes like 5 seconds to run here, and while that is a bit long if you
> get a hundred more similar tests and want to continue tweaking developments
> while running tests in between. It wouldn't hurt to run such a (series) of
> longer tests before creating a patch and submitting it for inclusion on the
> official git repository in my opinion, or before a release.

Definitely, however those run before a release should be almost 100%
system-agnostic. Having to prepare the system and tune it properly for
the test not to fail is going to cause a headache every time there is
a failure because it will mean fixing the root cause and re-run the
whole suite, which precisely is what will make all this suite not to
be used at all anymore. This is why I'm really picky on the reliability
and the speed of these tests. It should be stupid-proof, with me being
the stupid (and believe me when it comes to doing repetitive stuff, I'm
among the stupidest persons you have ever met).

> My attempt was
> to test a bit differently than just looking for regressions of known fixed
> bugs, and putting a little load on haproxy so that threads and simultaneous
> actions 'might' get into conflicts/locks/stuff which might by chance, show
> up, which is why i choose to go a little higher on the number of round-trips
> with ever slightly increasing payload..

I really understand the point and I think it is valid to a certain extent.
But that's really not a thing to run by default. And I want to encourage us
(including me) to run reg tests from time to time. If you know that some of
them will take too long, you'll quickly end up avoiding all the ones you can
easily avoid using a single command (eg: playing with the LEVEL variable, or
not running it at all).

> For me the test produces like 345 lines of output as attached. which seems
> not to bad (if the test succeeds).

It's already far too much for a user. I should only know if it works
or not, otherwise it hides the output of all other ones (which is what
happened). We must not have to parse the output to know if we didn't
break anything, we just have to check that it looks normal. Here's what
make reg-tests gives me on 1.8 :

[email protected]:haproxy$ time sh make-reg-tests-1.8
# top TEST reg-tests/lua/b00002.vtc passed (0.159)
# top TEST reg-tests/lua/b00001.vtc passed (0.122)
# top TEST reg-tests/lua/b00000.vtc passed (0.110)
# top TEST reg-tests/lua/b00003.vtc passed (0.137)
# top TEST reg-tests/connection/b00000.vtc passed (0.172)
# top TEST reg-tests/server/b00000.vtc passed (0.110)
# top TEST reg-tests/spoe/b00000.vtc passed (0.008)
# top TEST reg-tests/ssl/b00000.vtc passed (0.139)
# top TEST reg-tests/stick-table/b00001.vtc passed (0.110)
# top TEST reg-tests/stick-table/b00000.vtc passed (0.110)
# top TEST reg-tests/log/b00000.vtc passed (0.125)
# top TEST reg-tests/seamless-reload/b00000.vtc passed (0.123)

real 0m1.713s
user 0m0.316s
sys 0m0.068s

As you can see there's no output to parse, it *visually* looks correct.

> Besides the 2 instances of cli output
> for stats, its seems not that much different from other tests..
> And with 1.8.13 on FreeBSD (without qkueue) it succeeds:  #    top TEST
> ./test/b00000-loadtest.vtc passed (4.800

OK then you get a valid output there. It's here that it's ugly. But we
spend enough time analysing bugs, I really refuse to spend extra time
fixing bugs in tests supposed to help detect bugs, otherwise it becomes
recursive...

> Taking into account conntrack and ulimit, would that mean we can never
> 'reg-test' if haproxy can really handle like 10000 connections without
> issue?

10k conns definitely is way beyond what you can expect from a non-root
user on a regular shell. I run most of my configs with "maxconn 400"
because that's less than 1024 FDs once you add the extra FDs for
listeners and checks. Anything beyond that will depend on the users'
setup and becomes tricky. And in this case it's more a matter of
stress-testing the system, and we can have stress-test procedures or
tools (just like we all do on our respective setups with different
tools). It's just that one has to know in advance that some preparation
is needed (typically killing a possible browser, unloading some modules,
checking that there's still memory left, maybe adding some addresses to
the loopback, etc). So it's a different approach and it definitely is
out of the scope of automatizing detection of potential regressions
during development.

> Or should the environment be configured by the test?? ,that seems
> very tricky at least and probably would be undesirable..

No definitely it would be even worse. For sure those used to run
"curl | sudo bash" will have no problem letting it configure their
system, but I'm not among such people and I appreciate a lot that
my machine works every morning when I want to work :-)

> > $ make reg-tests/heavy/conn-counter-3000-req.log
> I'm not exactly sure..("make: don't know how to make reg-tests. Stop"). i
> would still like to have a way to run all 'applicable' tests with 1 command,
> even if it takes a hour or so to verify haproxy is working 'perfectly'.

One thing is important to keep in mind regarding automated testing : the
tests are *only* useful if they take less cumulated time to detect a bug
than the time it would have taken to detect it oneself. I mean (I'm
exagerating a bit but you'll get it), if the tool takes 1 minute per build,
100 builds a day, thus roughly 25000 minutes per year, that's roughly 52
work days at 8h/day. I definitely claim that a human will not waste 52 full
days a year to detect a bug, not even to fix it. So there is a reasonable
tradeoff to set. That's also why I'm saying that I'm *not* interested in
tests for already fixed bugs, they only waste test time. Their only purpose
is for backports, because like any reproducer, it helps the stable team to
verify that 1) the bug is indeed present in the stable version and needs
a fix, and 2) that the backport was properly done. But once done, this
test becomes useless and for now I don't have a good solution to propose
to keep them without having to re-run them.

I suspect that we could use sequence numbers for such tests, or maybe
just dates, and have a file somewhere that we update from time to time,
which contains the earliest version that we're going to test in that
category (i.e. "run tests for all bugs fixed since 20170101"). It would
only require a single commit in each maintenance release to bump that
file and say "OK, no need to test these ones anymore, they are fixed".

My interest is in large coverage, functional coverage. We can have
config files making use of 10-15% of the known features at once, and
which will fail if any of such features get broken, but will never ever
fail otherwise. This is useful testing. But it's not as easy to implement
as it seems, because once you factor in the maintenance aspect, you'll
realise that sometimes you have to update the test file to adjust something
related to a minor behaviour change and that it doesn't backport as easily.
But that's where most of the value lies in my opinion.

> But
> like [email protected] tests cant work on FreeBSD, but they should not 'fail', perhaps
> get skipped automatically though.?.

Very likely. In fact given that we want *functional* coverage, this means
that either the test is specific to abns and should be skipped on FreeBSD,
or it's generic and makes use of abns becaus eit was convenient there, and
it has to be modified to be portable.

> Anyhow thats a question for my other
> mail-topic ( https://www.mail-archive.com/[email protected]/msg31195.html
> )

Thanks for the link, I think I missed this one.

> > Thus for now I'm not applying your patch, but I'm interested in seeing
> > what can be done with it.
> Okay no problem :) , ill keep running this particular test myself for the
> moment, it 'should' be able to pass normally..  (On my environment anyhow..)
(...)
> I'm interested in Fred's and anyone elses opinion ;) , and well maybe this
> particular test-case could be replaced by something simpler/faster/ with
> more or less the same likelihood of catching yet unknown issues..? Looking
> forward to reactions :) .

Yep. Despite what some people might think, I'm really interested in tests.
I used to write test programs to detect lots of issues on PCs 30 years ago
when I was still a kid, I even managed to detect some fake chips and caches
by then. That might also be why I insist on efficient testing and not just
testing which makes people feel good. I'd really prefer to have only 20
quick tests covering more than 50% of the tricky parts we regularly break
and which people never have any excuse for not running, than 500 tests that
are a pain to run or debug or that constantly report "OK" because they're
related to bugs that were fixed 5 years ago and that are impossible to meet
again unless someone does it on purpose.

However I know that for this to work, we need to create momentum around
tests, process and methodology. If I start by asking that we work on such
efficient tests, we won't ever see nothing because each attempt will suffer
from the same failures we already see and will be demotivating. By starting
the way we do right now, we can experiment, test, collect feedback and ideas,
encourage people to use the tool to help developers reproduce a bug in their
environment, etc. Once enough people have an experience and a valuable
opinion on what can be done, it will be easier to go further and improve
the situation. At the moment I can say I'm really pleased to see that this
is progressing faster than I would have imagined ;-)

Cheers,
Willy
Hi Willy,

Op 20-9-2018 om 13:56 schreef Willy Tarreau:
>> For me the test produces like 345 lines of output as attached. which seems
>> not to bad (if the test succeeds).
> It's already far too much for a user.

Well those 345 lines are if it succeeds while in 'verbose' mode, in
'normal' mode it only produces 1 line of output when successful. Pretty
much all tests produce 100+ lines of 'logging' if they fail for some
reason. From what ive seen varnishtest either produces a bulk of logging
on a failure, or it only logs the failures. There isn't much in between.

As for all the rest of the email, thanks for your elaborate response :).

Regards,

PiBa-NL (Pieter)
Sorry, only registered users may post in this forum.

Click here to login